generative ai in trascription

Revolutionizing Archival Processing: How Generative AI Can Contribute to Transcriptions

Most conversations about generative AI have been focused on ChatGPT and how it can boost efficiency across a myriad of fields including history and the humanities. There are several innovative ai tools that have been developed for processing historical materials to improve accessibility for researchers. An often neglected, but essential task for archivists and other information specialists is the tedious task of transcribing historical collections for public use.

Projects like the National Archives Citizen Archivist program including projects such as the Revolutionary War Pension Files Transcription Mission and the Smithsonian DigitalVolunteers: Transcription Center indicate a reliance on volunteers and researchers to perform the bulk of transcriptions for large collections. The inability to dedicate full-time staff to processing large historical collections is often due to lack of resources in the form of labor, time, and funding. AI can help reduce the resource cost by quickly processing collections using new emerging technologies.

Transkribus: AI Project in Transcribing Paper Materials

Transcribing handwritten historical documents stands as one of historians' most prevalent and time-intensive tasks. For smaller archives or historical organizations, these processes can be extremely time consuming, spread resources too thin, and prevent team members from working on other projects for a long duration. This circumstance may result in specific collections or documents being "forgotten," merely awaiting discovery by a researcher on a dusty shelf. Even larger organizations such as the United States National Archives occasionally depend on volunteers to transcribe materials over extended periods instead of allocating a full-time staff member to the project.

Figure 1. Transkribus.com

Transkribus, a web-based program developed by the University of Innsbruck as part of two larger collaborative European research groups, employs machine learning to transcribe both printed text and handwriting. It provides a free trial (with 500 pages of free transcription initially) and offers additional credits through a subscription service for further transcription needs. ChatGPT can be utilized effectively to correct transcriptions, particularly for low-resolution materials.

Figure 2. Transkribus sites page which includes public sites for viewing

Transkribus offers a range of benefits that significantly enhance the transcription of historical documents. One of its most notable features is the ability to train custom TextRecognition models. These AI-driven models are tailored to recognize specific handwriting styles or scripts by learning from a set of images and their corresponding transcriptions.This customization enables the software to handle unique or obscure scripts and languages from various periods or regions with greater accuracy. For users working with documents that do not fit standard models, this feature is invaluable as it allows for precise recognition of distinctive handwriting styles. In addition to custom models, Transkribus provides access to numerous public models created by the community, offering a broad range of general-use options. The software’s sophisticated machine learning algorithms contribute to its ability to transcribe text efficiently, potentially saving significant time compared to manual methods. The user-friendly interface facilitates the management of large volumes of text and supports detailed annotation and indexing, which are crucial for thorough historical analysis. Its versatility extends to multiple languages and scripts, making it adaptable for diverse projects. Furthermore, the ability to create and share transcription models fosters collaboration and knowledge sharing within the academic community.

Figure 3. Transkribus layout recognition options.

However, Transkribus is not without its challenges. The effectiveness of its machine learning models relies heavily on the quality and quantity of training data. For documents with particularly challenging or highly stylized handwriting, achieving accurate transcriptions may require extensive customization and fine-tuning, which can be both time-consuming and complex. New users might face a steep learning curve when training custom models, necessitating a deep understanding of machine learning principles and data preparation. The software can also be resource-intensive, demanding a robust computer system to function optimally. The cost of advanced features and additional processing capabilities, available through paid subscriptions or licenses, may be a barrier for some users. Despite its powerful capabilities, Transkribus is not infallible; manualverification and correction are often needed to ensure the historical accuracy of transcriptions, which can somewhat diminish the efficiency gains promised by the software.

Figure 4. Example of Transkribus text recognition interface.

In summary, while Transkribus offers impressive features like customizable TextRecognition models and a range of public resources, it also presents challenges related to model training, system requirements, and costs. Users must balance these factors to leverage the full potential of the software for their historical transcription projects.

Whisper: AI Project in Transcribing Oral Histories

Processing and transcribing accurate information from oral histories poses unique challenges for researchers. Various audio formats may necessitate distinct processing methods to access recordings stored on different devices. Unlike written documents, oral histories require manual transcription, which can be a time-consuming process.

Transcribers often find themselves listening to the entire recording multiple times to capture spoken words, nuances, and context accurately. This investment of time escalates with recording length and content complexity. Interpreting spoken language in oral histories introduces subjectivity, making it prone to errors or misinterpretations. While the subjectivity of each recording might not be crucial for the final product, it offers valuable insights into crafting metadata during the design of digital finding aids.

Different transcribers may interpret words or phrases differently, resulting in inconsistencies in the transcribed text. Maintaining accuracy and fidelity to the original recording demands meticulous attention to detail and potential collaboration among multiple transcribers or reviewers. Moreover, poor audio quality in oral history recordings, stemming from issues like background noise or low volume during recording, poses further challenges. This can complicate transcription, particularly in cases involving multiple speakers or accents unfamiliar to the transcriber.

Whisper, OpenAI's machine learning model for speech recognition and transcription, was released as open-source software in 2022. It operates via command-line functionality and offers a broad range of features, including voice assistance, chatbots, English speech translation, automated meeting note-taking, and transcription services. With careful verification, Whisper can significantly enhance the utility of digital recordings by transcribing them and incorporating the transcriptions into metadata and catalogues. As a cutting-edge AI model, Whisper excels at transforming spoken language into written text with remarkable accuracy. Its advanced automatic speech recognition (ASR) capabilities make it invaluable across various industries, particularly in handling speech with thick accents and diverse dialects, which can challenge other transcription systems. Whisper's versatility extends to real-time transcription, making it ideal for live events or streaming, and its multi-language support allows it to manage multiple languages within the same audio file, which is perfect for multilingual meetings or interviews. For more specialized needs, Whisper’s models can be fine-tuned to better suit specific audio requirements, though this requires technical skill but can significantly enhance results.

Whisper also automatically generates subtitles and closed captions, improving accessibility for the deaf and hard-of-hearing and providing text accompaniment for video content. Its multilingual capabilities aid in language learning and real-time translation, while its integration into assistive technologies benefits individuals with speech impairments. Additionally, Whisper facilitates efficient content search by converting multimedia into text, supports voice-controlled applications for intuitive technology interaction, and automates customer support through real-time call transcriptions. For podcasters and journalists, it offers a quick and efficient method for content transcription and creation.

Overall, OpenAI Whisper represents a significant advancement in speech recognition technology, enhancing accessibility, optimizing workflows, and driving innovation. The "prompt" function in OpenAI's audio transcription API further refines this technology by improving the coherence and stylistic consistency of transcripts. Users can provide a prompt—either a genuine transcript from a previous segment or a fictitious example—to give the Whisper model additional context. This helps maintain a consistent writing style across segments and can guide the transcription process, such as specifying spelling or stylistic preferences. Although prompts are limited to 224 tokens (meaning only the last portion of a lengthy prompt will be considered), crafting them thoughtfully can greatly enhance transcription results.

Figure 5. Sequence-to-sequence Transformer model for Whisper.

Conclusion

New AI-based innovations in transcription are fundamentally reshaping our understanding of history by unlocking inadequately processed or unprocessed historical collections. These advancements enable the rapid and accurate transcription of vast amounts of historical content, including oral histories, interviews, and handwritten materials, that were previously challenging to access due to time constraints and resource limitations. By automating the transcription process, AI technology not only accelerates research but also opens up new avenues for exploring and interpreting historical narratives. As a result, historians, researchers, and the broader public can now delve deeper into diverse historical perspectives and uncover invaluable insights that were once hidden within these dormant collections.

Don't miss out on the latest news!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

People also read

a man studies history

Artificial Intelligence’s Unexpected Role in Uncovering Historical Silences

This article explores how AI can help address the problem of “historical silence,” where marginalized voices are excluded from narratives. Drawing on Michel-Rolph Trouillot’s framework, it highlights the potential of AI to expose biases and create counter-narratives, amplifying overlooked perspectives such as Indigenous resilience.
Miray Özmutlu
Miray Özmutlu
November 18, 2024
6
min read
Historical Research
Historical Events
ai and culture

AI for a Cultural History

This article explores the cultural history of artificial intelligence, from ancient myths to contemporary advancements. Learn how AlphaGo and GANs emphasize the connections between technology, philosophy, and culture. It advocates for a transcultural understanding of AI, addressing societal anxieties and aspirations in a narrative that humanizes technology.
Alexandre Gefen
Alexandre Gefen
November 5, 2024
10
min read
History
Cultural History
3D Technologies and AI

3D technologies and spatial AI applications in archaeology

This article explores the transformative impact of 3D technologies on archaeological research and documentation. It outlines three primary sensing tools used in modern archaeology: photogrammetry, Lidar scanning, and 3D cameras. Each method's strengths and applications are discussed, highlighting their roles in creating accurate digital representations of artifacts, sites, and landscapes. The integration of artificial intelligence in processing 3D models is examined, including element classification and the conversion of outdated maps into vector polygons.
noam bar
Noam Bar David
October 22, 2024
7
min read
Generative AI
Archaeology

Contribute to Historica's blog!

Learn guidelines, requirements, and join our history-loving community.

Become an author

FAQs

How can I contribute to or collaborate with the Historica project?
If you're interested in contributing to or collaborating with Historica, you can use the contact form on the Historica website to express your interest and detail how you would like to be involved. The Historica team will then be able to guide you through the process.
What role does Historica play in the promotion of culture?
Historica acts as a platform for promoting cultural objects and events by local communities. It presents these in great detail, from previously inaccessible perspectives, and in fresh contexts.
How does Historica support educational endeavors?
Historica serves as a powerful tool for research and education. It can be used in school curricula, scientific projects, educational software development, and the organization of educational events.
What benefits does Historica offer to local cultural entities and events?
Historica provides a global platform for local communities and cultural events to display their cultural artifacts and historical events. It offers detailed presentations from unique perspectives and in fresh contexts.
Can you give a brief overview of Historica?
Historica is an initiative that uses artificial intelligence to build a digital map of human history. It combines different data types to portray the progression of civilization from its inception to the present day.
What is the meaning of Historica's principles?
The principles of Historica represent its methodological, organizational, and technological foundations: Methodological principle of interdisciplinarity: This principle involves integrating knowledge from various fields to provide a comprehensive and scientifically grounded view of history. Organizational principle of decentralization: This principle encourages open collaboration from a global community, allowing everyone to contribute to the digital depiction of human history. Technological principle of reliance on AI: This principle focuses on extensively using AI to handle large data sets, reconcile different scientific domains, and continuously enrich the historical model.
Who are the intended users of Historica?
Historica is beneficial to a diverse range of users. In academia, it's valuable for educators, students, and policymakers. Culturally, it aids workers in museums, heritage conservation, tourism, and cultural event organization. For recreational purposes, it serves gamers, history enthusiasts, authors, and participants in historical reenactments.
How does Historica use artificial intelligence?
Historica uses AI to process and manage vast amounts of data from various scientific fields. This technology allows for the constant addition of new facts to the historical model and aids in resolving disagreements and contradictions in interpretation across different scientific fields.
Can anyone participate in the Historica project?
Yes, Historica encourages wide-ranging collaboration. Scholars, researchers, AI specialists, bloggers and all history enthusiasts are all welcome to contribute to the project.