Findings of EMNLP 2025
Nov 2025 »
Ko-LongRAG: A Korean Long-Context RAG Benchmark Built with a Retrieval-Free Approach
Yongil Kim, Heuiyeen Yeen, Hyeongu Yun, Jinsik Lee
Abstract: We introduce Ko-LongRAG, the first Korean long-context RAG benchmark, addressing a significant gap in existing evaluation frameworks which have primarily focused on English. Unlike conventional benchmarks that depend on external retrievers, Ko-LongRAG adopts a retrieval-free approach designed around Specialized Content Knowledge (SCK), enabling controlled and high-quality QA pair generation without the need for an extensive retrieval infrastructure. Our evaluation found that OpenAI’s o1 model performed best among proprietary systems, while EXAONE 3.5 led among open-source options. We publicly release both the dataset and source code.