Evaluating Conversational Question Answering in Multi-Instructional Documents with Large Language Models

Wu Shiwei; Liu Yu; Gao Yan; Li Zhi; Zhang Chao; Wang Qimeng; Chen Enhong; He You

doi:10.23919/cje.2025.00.277

Shiwei Wu, Yu Liu, Yan Gao, et al., “Evaluating conversational question answering in multi-instructional documents with large language models,” Chinese Journal of Electronics, vol. x, no. x, pp. 1–10, xxxx. DOI: 10.23919/cje.2025.00.277

Citation:

Evaluating Conversational Question Answering in Multi-Instructional Documents with Large Language Models

Abstract

Abstract

Instructional documents serve as valuable resources for accomplishing diverse real-world tasks, yet their complexity poses significant challenges for conversational question answering (CQA), which remains underexplored. Most existing benchmarks center on factual queries over single-source narrative documents, making them inadequate for assessing a model’s ability to comprehend complex real-world instructional documents and provide accurate step-by-step guidance in daily life. To address this limitation, we introduce \mathrmInsCoQA, a new benchmark designed to evaluate large language models (LLMs) in CQA settings involving multi-source instructional documents. Sourced from extensive, encyclopedia-style instructional content, \mathrmInsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents, reflecting the intricate and multi-faceted nature of real-world instructional tasks. To support rigorous evaluation, we also present \textInsEval, an automated, LLM-based assessment framework that evaluates the factuality, completeness, and procedural correctness of model-generated answers.

FullText(HTML)

References (33)

Supplements (1)

Cited By

Evaluating Conversational Question Answering in Multi-Instructional Documents with Large Language Models

Abstract

Catalog

Follow Us

Evaluating Conversational Question Answering in Multi-Instructional Documents with Large Language Models

Abstract

Catalog

Follow Us

Export File

Citation

Format

Content