Responsible human-guided AI in extracting and exposing different historical narratives

Chih-Chun Chen

Author

•

min read

February 28, 2024

Share this article on:

Responsible Human-Guided AI in Diverse Historical Narratives

Balancing performance and safety is one of the greatest challenges for AI practitioners and digital humanities practitioners, each will have their own particular stance when it comes to this balance, with substantial implications for the solutions they develop. In this article I share my own stance by describing some of the approaches I take when extracting narratives from school history learning materials, as part of the World School History Project.

I should emphasise that the approaches I have chosen to adopt are specific both to this particular use case and to my personal inclinations (in particular my small appetite for risk and desire for confidence).

Responsible “AI”

I have always been wary of the term “Artificial Intelligence” (or even “intelligence”) because it is one that eludes a consensual definition. I prefer instead to speak/write in terms of capabilities, some of which might be more impressive than others. For example, some are associated with “higher level” human cognition. Developing an AI solution responsibly means developing a solution that reliably exhibits the capabilities it needs to for the specific purpose it is intended for.

The performance part of this entails ensuring that the outputs/behaviours arising from the system are sufficient for their use case. Within the context of extracting information from text-based learning materials, this might equate to having the capabilities to summarise text, extract historical entities and events, identify topics, and evaluate sentiment.
The safety part of this entails ensuring that the outputs/behaviours that we do not want to occur are not produced. Within the context of extracting information from text-based materials, this might equate to summaries not containing fabricated content and the system not mis-identifying entities, events, topics and sentiment.

Decomposing “intelligence” into individual capabilities for transparency

While there has been much excitement and hype around Generative AI and large language models (LLMs), I have resisted applying this approach in extracting information from texts while building a knowledge base (for interested readers who want to learn more about why, you might want to listen to this podcast).

Instead, my approach has been to decompose the problems I want to solve or capabilities I wish the system to manifest into individual capabilities and adopt the most transparent solution possible for each of the individual capabilities.

For example, rather than using an LLM to extract information from history learning materials, I prefer to use more “traditional” natural language processing (NLP) methods such as extractive summarisation (where I can automatically check that the parts of text deemed to be important by the model are truly within the text and not fabricated) and named entity recognition (where I can automatically check that the events, people, and other historical entities identified were truly contained in the text).

In other words, the models I choose to apply are the ones that I understand. This understanding might be achieved from first-principles reasoning or from having observed its behaviour through rigorous experimentation. Often the most transparent models are also the “stupidest”, in the sense that they address only a single task. But because I can rely on them to perform the way I expect them to with respect to a particular task, I feel far more confident in their outputs and am happier building things on top of them.

Experimenting rigorously when transparency is not possible

There are times when the capabilities we require demand the adoption of a model that is not transparent. Furthermore, even the simple “stupid” models I have mentioned above can have a black box aspect to them. As a general rule, I treat models that I have not yet experimented with very much as if they were new colleagues that I had not previously worked with. I don’t presume to know to what extent they will misclassify, misidentify, hallucinate, or manifest the human biases of the material they have been trained on.

Just as I would hold back from trusting a new colleague with high stakes tasks on day 1 of meeting them, the kinds of tasks (both in terms of domain knowledge and complexity) I would entrust an AI model or system with would depend very much on what I had gleaned from experimenting rigorously with tasks I need them to perform reliably on.

For this reason, I test all models rigorously against a set of human ratings of outputs generated from the model. These ratings indicate only to what extent the machine outputs agree with the resources that they are derived from, but not that they are factually accurate, i.e. if the resource stated a falsehood we would want it to be reflected in the response. This helps to also protect against human bias since the question is not “is this true” (which could easily be subject to biassed responses), but rather “is the model’s output supported by the resources it has seen?”. To scale this, I adopt human-guided machine evaluation to tune an automated solution based on the human ratings (see also this article for more details).

Broadening horizons while laying solid foundations

While I believe there to be much potential for AI to play a significant role in historical research, I hope that by sharing my more cautious approach, I have made it easier for readers to think about and discuss what is at stake when making their own decisions with respect to AI. For my part, I would like to ensure that the foundations we lay for AI applications in this domain are as solid as possible.

Chih-Chun Chen

Complexity Scientist, Data Scientist, and Machine Learning Engineer with 15 years experience as a technical practitioner in both research and industry. She is currently working on the World School History Project, applying human-in-the-loop methods to develop a multi-perspective knowledge base of school history curricula and learning materials.

Meet on:

Don't miss out on the latest news!

Oops! Something went wrong while submitting the form.

Contribute to Historica's blog!

Learn guidelines, requirements, and join our history-loving community.

Become an author

FAQs

How can I contribute to or collaborate with the Historica project?

If you're interested in contributing to or collaborating with Historica, you can use the contact form on the Historica website to express your interest and detail how you would like to be involved. The Historica team will then be able to guide you through the process.

What role does Historica play in the promotion of culture?

Historica acts as a platform for promoting cultural objects and events by local communities. It presents these in great detail, from previously inaccessible perspectives, and in fresh contexts.

How does Historica support educational endeavors?

Historica serves as a powerful tool for research and education. It can be used in school curricula, scientific projects, educational software development, and the organization of educational events.

What benefits does Historica offer to local cultural entities and events?

Historica provides a global platform for local communities and cultural events to display their cultural artifacts and historical events. It offers detailed presentations from unique perspectives and in fresh contexts.

Can you give a brief overview of Historica?

Historica is an initiative that uses artificial intelligence to build a digital map of human history. It combines different data types to portray the progression of civilization from its inception to the present day.

What is the meaning of Historica's principles?

The principles of Historica represent its methodological, organizational, and technological foundations: Methodological principle of interdisciplinarity: This principle involves integrating knowledge from various fields to provide a comprehensive and scientifically grounded view of history. Organizational principle of decentralization: This principle encourages open collaboration from a global community, allowing everyone to contribute to the digital depiction of human history. Technological principle of reliance on AI: This principle focuses on extensively using AI to handle large data sets, reconcile different scientific domains, and continuously enrich the historical model.

Who are the intended users of Historica?

Historica is beneficial to a diverse range of users. In academia, it's valuable for educators, students, and policymakers. Culturally, it aids workers in museums, heritage conservation, tourism, and cultural event organization. For recreational purposes, it serves gamers, history enthusiasts, authors, and participants in historical reenactments.

How does Historica use artificial intelligence?

Historica uses AI to process and manage vast amounts of data from various scientific fields. This technology allows for the constant addition of new facts to the historical model and aids in resolving disagreements and contradictions in interpretation across different scientific fields.

Can anyone participate in the Historica project?

Yes, Historica encourages wide-ranging collaboration. Scholars, researchers, AI specialists, bloggers and all history enthusiasts are all welcome to contribute to the project.

Responsible human-guided AI in extracting and exposing different historical narratives

Responsible Human-Guided AI in Diverse Historical Narratives

Responsible “AI”

Decomposing “intelligence” into individual capabilities for transparency

Experimenting rigorously when transparency is not possible

Broadening horizons while laying solid foundations

People also read

When AI ‘Fictions’ Redirect History: Generative Models, Historiography and Misinformation

History, Data, and the Role of AI in Research

AI in Historical Research: 2025 Insights and Trends

Contribute to Historica's blog!

FAQs