© 2025 CoolTechZone - Latest tech news,
product reviews, and analyses.

AI’s knowledge of history remains to be desired, but there is a silver lining


Large Language Models (LLMs) are great in numerous tasks. Yet their knowledge and comprehension of history fall short. However, there’s hope for future historians.

Analyzing images, summarizing texts, writing book reports, coding, generating podcasts, making pub quizzes: artificial intelligence (AI) is capable of doing all kinds of stuff. Because of this potential, LLMs are capable of making our lives easier and more efficient.

However, you’d better not ask an LLM to write a history report. According to researchers, LLMs aren’t that accurate when it comes to history. As a matter of fact, they don’t score much higher than random guessing.

A team of researchers introduced the History Seshat Test for LLMs, which is based on a subset of the Seshat Global History Databank, containing 36,000 data points across 600 historical societies and over 2,700 scholarly references. This dataset covers every major event in history.

Next, seven models from OpenAI’s GPT-4, Meta’s Llama, and Google’s Gemini families were tested on accuracy on historical topics.

“We find that, in a four-choice format, LLMs have a balanced accuracy ranging from 33.6% (Llama-3.1-8B) to 46% (GPT-4-Turbo), outperforming random guessing (25%) but falling short of expert comprehension” researchers state in their paper, which was presented last month at the NeurIPS conference.

“Our benchmark shows that while LLMs possess some expert-level historical knowledge, there is considerable room for improvement,” the data scientists conclude.

How is this possible? According to Maria del Rio-Chanona, one of the paper’s co-authors and an associate professor of computer science at University College London, LLMs tend to extrapolate from historical data and find it hard to deduce a single event.

“If you get told A and B a hundred times, and C one time, and then get asked a question about C, you might just remember A and B and try to extrapolate from that,” del Rio-Chanona explains to TechCrunch.

The researchers are hopeful that LLMs can help historians in the future.

“By leveraging LLMs for preliminary data gathering and having experts review and refine the information, we can enhance both the efficiency and accuracy of historical data collection. Overall, while our results highlight areas where LLMs need improvement, they also underscore the potential for these models to aid in historical research,” the paper wraps up.


Leave a Reply

Your email address will not be published. Required fields are marked