Skip to content

Model for arXiv text generation

  • Description: This is an AI benchmark to evaluate how well text data is generated, using the arXiv dataset. Here we use the recall-oriented understudy for gisting evaluation (ROUGE) score as a metric.


Model benchmarks

Model nameDataset Rouge Team name Dataset size Date submitted Notes
transformers_gpt2_medium_basearxiv_gen0.2036ChemNLP49001-14-2023CSV, JSON,, Info
transformers_gpt2_medium_finetunedarxiv_gen0.2413ChemNLP49001-14-2023CSV, JSON,, Info
ChatGPT_May23arxiv_gen0.3006ChatGPT49006-05-2023CSV, JSON,, Info