Skip to content

Model for arXiv text generationΒΆ

  • Description: This is an AI benchmark to evaluate how well text data is generated, using the arXiv dataset. Here we use the recall-oriented understudy for gisting evaluation (ROUGE) score as a metric.
transformers_gpt2_medium_basetransformers_gpt2_medium_finetunedChatGPT_May2300.050.10.150.20.250.3
AI-TextGen-text-arxiv_gen-test-rougeROUGE (text)


Reference(s): https://github.com/usnistgov/chemnlp, https://chat.openai.com/, https://doi.org/10.48550/arXiv.1905.00075

Model benchmarks

Model nameDataset Rouge Team name Dataset size Date submitted Notes
transformers_gpt2_medium_basearxiv_gen0.2036ChemNLP49001-14-2023CSV, JSON, run.sh, Info
ChatGPT_May23arxiv_gen0.3006ChatGPT49006-05-2023CSV, JSON, run.sh, Info
transformers_gpt2_medium_finetunedarxiv_gen0.2413ChemNLP49001-14-2023CSV, JSON, run.sh, Info