Factuality Detection in Generative AI

1Shanghai Jiao Tong University 2Carnegie Mellon University 3City University of Hong Kong 4New York University 5Meta AI 6The Hong Kong University of Science and Technology 7Shanghai Artificial Intelligence Laboratory *Corresponding Author

Fact-checking the information generated by GPT-4 about Elon Musk, xAI, and his proposed cage fight with Mark Zuckerberg.




FacTool is an innovative, tool-augmented framework designed to detect factual errors in texts generated by Large Language Models (LLMs) across various scenarios. FacTool supports factual error detection across four tasks: knowledge-based question answering, code generation, math problem solving, and scientific literature review.

Figure 1
Figure 2


The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text.

In particular:
(1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models.
(2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts.
(3) There is a scarcity of explicit evidence available during the process of fact checking.

With the above challenges in mind, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT).

FacTool now supports factual error detection of four different tasks: knowledge-based QA, code generation, math problem solving, and scientific literature review.

Code and ChatGPT plugin configs can be found here.

Knowledge-based QA

FacTool powered by GPT-4 fact-checking Knowledge-based QA responses on Barbenheimer generated by GPT-4.

Code Generation

FacTool powered by GPT-4 fact-checking code generated by GPT-4.

Math Problem Solving

FacTool powered by GPT-4 fact-checking math solutions generated by GPT-4.

Scientific Literature Review

FacTool powered by GPT-4 fact-checking scientific literature review generated by GPT-4.


  title={FacTool: Factuality Detection in Generative AI--A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios},
  author={Chern, I-Chun and Chern, Steffi and Chen, Shiqi and Yuan, Weizhe and Feng, Kehua and Zhou, Chunting and He, Junxian and Neubig, Graham and Liu, Pengfei and others},
  journal={arXiv preprint arXiv:2307.13528},