Skip to content

Model for arXiv text generation

  • Description: This is an AI benchmark to evaluate how well text data is generated, using the arXiv dataset. Here we use the recall-oriented understudy for gisting evaluation (ROUGE) score as a metric.


Reference(s): https://github.com/usnistgov/chemnlp, https://doi.org/10.48550/arXiv.1905.00075, https://chat.openai.com/

Model benchmarks

Model nameDataset Rouge Team name Dataset size Date submitted Notes
ChatGPT_May23arxiv_gen0.3006ChatGPT49006-05-2023CSV, JSON, run.sh, Info
transformers_gpt2_medium_basearxiv_gen0.2036ChemNLP49001-14-2023CSV, JSON, run.sh, Info
transformers_gpt2_medium_finetunedarxiv_gen0.2413ChemNLP49001-14-2023CSV, JSON, run.sh, Info