Artificial intelligence deciphers ancient records

The Library of the Monastery of St. Gall in Switzerland is home to about 160,000 volumes of literary and historical manuscripts dating back to the 8th century.

All were handwritten on parchment and in languages ​​rarely spoken in modern times.

To preserve these 'historical treasures' of mankind, millions of such texts have been kept in libraries and monasteries around the world. Much of the collection is shared with the public through digital images. However, experts say there is an 'extraordinary' amount of documents that have never been read, written in the ancient language.

Picture 1 of Artificial intelligence deciphers ancient records
The system automatically transcribes book pages.

Now, researchers at the University of Notre Dame are developing an artificial neural network to read complex ancient handwriting based on human perception. Walter Scheirer - Associate Professor in the Department of Computer Science and Engineering at Notre Dame, shared:

'We are dealing with historical documents that date back centuries and are in languages ​​like Latin, which are rarely spoken today. What we set out to do was automate the transcription of the book page, in a way that mimics perception through the eyes of a professional reader. It also provides fast, searchable text readability'.

In the newly published study, Scheirer outlines how his team combines traditional machine learning methods with visual psychophysiology. It is a method of measuring the association between physical stimuli and mental phenomena.

For example, the amount of time it takes for a professional reader to recognize a particular character, evaluate the quality of handwriting, or determine the use of certain abbreviations.

Scheirer's team studied digitized Latin manuscripts. This version was written by the scribes at St. Gall in the 9th century. Readers entered their manual transcriptions into a specially designed software interface.

The team then measured the reaction time during transcription to see which words, characters and passages were easy or difficult. That approach, Mr. Scheirer explains, has created a network that is more consistent with human behavior. This reduces errors and provides a more realistic, accurate reading of text.

'It's a strategy that's not often used in machine learning. We label the data through these psychophysiological measurements. They derive directly from cognitive psychological studies, by taking behavioral measurements.

We then inform the network of common and fixable difficulties based on those measurements,' explains Mr. Scheirer.

However, according to Associate Professor Scheirer, this method still faces many challenges. His team is working to improve the accuracy of the transcripts, especially in the case of damaged or incomplete documents. At the same time, calculating other aspects when the record page can confuse the system.

The good news is that the team has successfully adapted the program to transcribe Ethiopian texts. Then adapt it to a language with a completely different character set. This is supposed to be the first step in developing a program that can transcribe and translate information for users.