Wilson Library (North Carolina) staff are using AI tools to help transcribe historical documents that contain information about people enslaved in North Carolina. The result is better access for researchers and genealogists.
In the vast collections of the Wilson Special Collections Library are documents that tell stories of North Carolina. Among them are handwritten letters, ledgers and more that contain information about people enslaved in the state during the 18th and 19th centuries.
These records offer rare insight into the past for researchers and genealogists—if they can find and read them. Almost all these items are handwritten and can’t be easily searched or sometimes even deciphered. Transcribing the documents makes the content legible and accessible to a wider audience. Machine-readable transcriptions also power search engines and enable visually impaired people to use screen-readers.
When done by humans, transcription can be incredibly time-intensive. But help may exist in the space where the archivist meets artificial intelligence.
In a 2024 pilot project, staff from Wilson Library turned to a transcription platform called From the Page (fromthepage.com) to test the effectiveness of AI in transcribing handwritten manuscript documents. Jackie Dean, head of archival processing at the Wilson Special Collections Library, says the team focused its pilot project on 1,500 pages from four high-use collections that researchers—including families working on their genealogies—have been interested in.
“We see so many people looking for information about their relatives in the plantation records held in Wilson Library,” says Dean. “So, for this project, we focused on sources about enslaved people with the thought that it would help genealogists and people researching their family history.”
From the Page began as a way for libraries and archives to crowdsource the painstaking work of transcription by uploading scanned documents for volunteers to work on. As a subscriber, the University Libraries had a chance to test out the platform’s experimental AI features.
From the Page uses an AI model to transcribe handwriting, overlaying original documents with transcribed words. It then runs the output through ChatGPT to see if the transcription makes sense and flows like language. Finally, a human checks the output for mistakes and accuracy. Transcriptions are preserved online, alongside the digitized images and metadata.
Because GenAI is based on existing models of language, the results can reflect human biases that have existed throughout society.
“From the Page is very aware of the biases and issues with AI transcription. If they found something where they didn’t want ChatGPT to guess at words, they’d put an emoji placeholder for a human reader to check.”
While the program has successfully decoded many documents, it struggles with certain cases. Pencil is hard for the software to read, as are faded and damaged texts. Slavery-era records contain a lot of tabular data, hand-drawn grids that are harder for AI to recognize and interpret. Dean says that working on this project has provided an understanding of how her team can use AI to save time on some tasks and allow them to focus on ones that need more attention and expertise from the archivists.
“We want to have our collections extremely accessible, and in doing that, there are some needles and some haystacks. If we could have the AI help us find the needles, train it to look for things like bills of sale that are intermingled with correspondence, or to find the names of enslaved people in the correspondence of the white plantation owners, it might help surface some of these stories.”