Recent News Articles

Ancestry® to Apply Handwriting Recognition Artificial Intelligence to Create a Searchable Index of the 1950 U.S. Census

28 Jan 2022 10:22 AM | Anonymous

The following is a press release from Ancestry.com:

Access to this highly anticipated census collection will be available to search sooner than any previous census collection

Using new, proprietary Artificial Intelligence (AI) handwriting recognition technology, Ancestry® today announced it will deliver a searchable index of the 1950 U.S. Census to customers faster than ever before.   

The 1950 U.S. Census is set to be released to the public in early April. With handwriting recognition technology, what previously took years to index, now will only take weeks. Ancestry anticipates the indexing of the 1950 Census to be completed and available on Ancestry.com this summer, with states released in real time upon completion. 

Corporate Genealogist Crista Cowan explains the value of census records in powering meaningful discoveries saying, “The 1950 U.S. Census contains the details of names, ages, birthplaces, residences, and relationships of more than 150 million people. This glimpse into American households at a critical time in U.S. history will help people discover even more about the effects the Great Depression, World War II, and the beginning of the Baby Boom had on their families. Many of our customers will see themselves, parents or grandparents' names in this census for the first time, which will bring even more family stories to life.”

Cutting-Edge Technology to Power Discoveries

Ancestry developed machine learning algorithms to power our proprietary AI handwriting recognition technology. Ancestry created AI software that reads handwriting from historical documents and transcribes the data, enabling our community to easily and quickly search historical records. The technology uses a unique and iterative blend of machine and human evaluation which is based on an Ancestry-developed confidence score framework.  

Given the unique nature of the 1950 U.S. Census and the unavailability of images in advance, Ancestry used a novel approach to simulate sample document images to ensure it is representative of anticipated variation in aged, inconsistent or damaged historical documents that may be encountered in order to train the AI. Employees recreated full-size census forms in a variety of handwriting styles before intentionally damaging some of these forms by ripping, burning, and pouring liquid on the forms in order to simulate the wear and tear that historical documents go through over time. Ancestry then re-scanned these forms, using them in our sample set to ensure our unique algorithms are prepared to support the anticipated condition of these 70-year-old historical documents. 

 

 

Calling All Family History Buffs
Ancestry and FamilySearch volunteers are partnering to evaluate the handwriting recognition extraction and ensure a complete and accurate index. Those interested in volunteering to help should visit familysearch.org/1950Census to learn more.  

Keep an eye out for additional details around the 1950s U.S. Census and the AI handwriting recognition technology at RootsTech 2022. To register, go to www.RootsTech.org

Comments

  • 29 Jan 2022 3:19 AM | Anonymous
    By all accounts, FMP tried to index hte 1921 UK Census using OCR. That little exercise lit up the internet with complaints about the poor transcription and problems finding people because of misspelling of names, places etc etc.

    Hope Ancestry does better, but I foresee it being a major topic discussion in the middle of the year for much the same reasons. We shall see.
    Link  •  Reply
    • 29 Jan 2022 4:49 AM | Anonymous
      I am wondering what is the source for your information? Findmypast did not use OCR. They used the same company in India that they use for all of their own transcription work. That company employ humans who and make it their life's work to specialise in different aspects of transcribing. For the 1939 register and the 1921 census they had 'one hand tied behind their back' because no one person was allowed to view a whole census page to compare characters, only snippets. Findmypast only had from October to December 2021 to knit all of those partial transcriptions together and index them. It is therefore a work in progress, as in one million corrections being released this week alone.

      What Ancestry are proposing - to OCR the US 1950 census - appears to be a completely different excercise.
      Link  •  Reply
    • 29 Jan 2022 8:27 AM | Anonymous
      Regarding AI and the Census, I really don't know whether to jump up and down with joy or sit in a corner and weep. As with other endeavors of the past, in 5 or 10 years we will look back in amazement at progress. I'm looking forward to seeing my name in a census, muddled or not, and am glad I've lived long enough to have the opportunity.
      Link  •  Reply
  • 29 Jan 2022 7:35 AM | Anonymous
    Back in 2013 MOCAVO (now purchased by FinfMyPast) tried using such AI tech to transcribe our files - we ended up re-doing it all with more mistakes that correctness. Hope this is better? CHBloss/ Archivist Emeritus of bethany Children's Home
    Link  •  Reply
  • 29 Jan 2022 9:49 AM | Anonymous
    A Dec 14, 2022 article by NARA indicates that government employees have already been doing the indexing using AI/ML and OCR.
    https://www.archives.gov/news/articles/1950-census-access
    So the Ancestry article is confusing to me. I am sure RootsTech will sort it out.
    Link  •  Reply
  • 29 Jan 2022 10:25 AM | Anonymous
    Here is a lengthy blog by FamilySearch from Jan 27 that adds significant detail.
    https://www.familysearch.org/en/blog/indexing-1950-census
    Link  •  Reply

Blog posts

Eastman's Online Genealogy Newsletter









































Powered by Wild Apricot Membership Software