Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation
TL;DR
Long-context LLMs can generate and evaluate book-scale QA data, with pairwise side-by-side ranking giving more consistent evaluation.
Venue
arXiv
BibTeX
@article{bohnet2024longspan,
title={Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation},
author={Bernd Bohnet and Kevin Swersky and Rosanne Liu and Pranjal Awasthi and Azade Nova and Javier Snaider and Hanie Sedghi and Aaron T Parisi and Michael Collins and Angeliki Lazaridou and Orhan Firat and Noah Fiedel},
year={2024},
eprint={2406.00179},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
title={Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation},
author={Bernd Bohnet and Kevin Swersky and Rosanne Liu and Pranjal Awasthi and Azade Nova and Javier Snaider and Hanie Sedghi and Aaron T Parisi and Michael Collins and Angeliki Lazaridou and Orhan Firat and Noah Fiedel},
year={2024},
eprint={2406.00179},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Date
May, 2024
Links