Model for arXiv text generation¶
- Description: This is an AI benchmark to evaluate how well text data is generated, using the arXiv dataset. Here we use the recall-oriented understudy for gisting evaluation (ROUGE) score as a metric.
Reference(s): https://github.com/usnistgov/chemnlp, https://doi.org/10.48550/arXiv.1905.00075, https://chat.openai.com/
Model benchmarks
Model name | Dataset | Rouge | Team name | Dataset size | Date submitted | Notes |
---|---|---|---|---|---|---|
ChatGPT_May23 | arxiv_gen | 0.3006 | ChatGPT | 490 | 06-05-2023 | CSV, JSON, run.sh, Info |
transformers_gpt2_medium_base | arxiv_gen | 0.2036 | ChemNLP | 490 | 01-14-2023 | CSV, JSON, run.sh, Info |
transformers_gpt2_medium_finetuned | arxiv_gen | 0.2413 | ChemNLP | 490 | 01-14-2023 | CSV, JSON, run.sh, Info |