Shiwei Wu, Yu Liu, Yan Gao, et al., “Evaluating conversational question answering in multi-instructional documents with large language models,” Chinese Journal of Electronics, vol. x, no. x, pp. 1–10, xxxx. DOI: 10.23919/cje.2025.00.277
Citation: Shiwei Wu, Yu Liu, Yan Gao, et al., “Evaluating conversational question answering in multi-instructional documents with large language models,” Chinese Journal of Electronics, vol. x, no. x, pp. 1–10, xxxx. DOI: 10.23919/cje.2025.00.277

Evaluating Conversational Question Answering in Multi-Instructional Documents with Large Language Models

  • Instructional documents serve as valuable resources for accomplishing diverse real-world tasks, yet their complexity poses significant challenges for conversational question answering (CQA), which remains underexplored. Most existing benchmarks center on factual queries over single-source narrative documents, making them inadequate for assessing a model’s ability to comprehend complex real-world instructional documents and provide accurate step-by-step guidance in daily life. To address this limitation, we introduce \mathrmInsCoQA, a new benchmark designed to evaluate large language models (LLMs) in CQA settings involving multi-source instructional documents. Sourced from extensive, encyclopedia-style instructional content, \mathrmInsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents, reflecting the intricate and multi-faceted nature of real-world instructional tasks. To support rigorous evaluation, we also present \textInsEval, an automated, LLM-based assessment framework that evaluates the factuality, completeness, and procedural correctness of model-generated answers.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return