Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

TL;DR

Long-context LLMs can generate and evaluate book-scale QA data, with pairwise side-by-side ranking giving more consistent evaluation.

Venue
arXiv
BibTeX
@article{bohnet2024longspan,
  title={Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation},
  author={Bernd Bohnet and Kevin Swersky and Rosanne Liu and Pranjal Awasthi and Azade Nova and Javier Snaider and Hanie Sedghi and Aaron T Parisi and Michael Collins and Angeliki Lazaridou and Orhan Firat and Noah Fiedel},
  year={2024},
  eprint={2406.00179},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
Date
Links