EMNLP 2023
Dec 2023 »
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
Yerin Hwang*, Yongil Kim*, Hyunkyung Bae, Hwanhee Lee, Jeesoo Bang, Kyomin Jung
Abstract: To address the data scarcity issue in Conversational Question Answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance. To overcome this limitation, we propose a novel framework called Dialogizer, which has the capability to automatically generate ConvQA datasets with high contextual relevance from textual sources. The framework incorporates two training tasks: question-answer matching (QAM) and topic-aware dialog generation (TDG). Using our framework, we produce four ConvQA datasets by utilizing documents from multiple domains as the primary source. Through automatic and human evaluation, we validate that our proposed framework exhibits the ability to generate datasets of higher quality compared to the baseline dialog inpainting model.