BYU and Oxford Brookes University
Collaborate on Transcription of Syriac Texts
Following the success of the BYU Dead Seas Scrolls Electronic Library (2nd ed., Brill, 2006), the Maxwell Institute's Center for the Preservation of Ancient Religious Texts (CPART) has initiated a project to produce an electronic library of ancient Syriac literature. Syriac is a dialect of Aramaic, the language of Jesus and his disciples. Syriac was the language spoken by ancient Christians throughout the Middle East, from Syria to India, and a large and important body of early Christian literature is preserved in it. Electronic libraries have been produced for Greek, Latin and other ancient literatures, but this will be the first project to do the same for Syriac.
Because the corpus of Syriac literature is many times larger than that of the Dead Sea Scrolls, new approaches and technologies to automate the transcription and grammatical tagging of the texts are being developed. In the past, transcription and tagging of texts has been done manually, requiring enormous investments of capital and time. The advent of optical character recognition (OCR) software has largely automated the transcription process for printed texts in many languages, but unfortunately, the technical challenges of doing OCR on connected script languages such as Arabic and Syriac are considerable. Until now, no OCR software for Syriac has been developed.
That challenge has been met by Professor William Clocksin of Oxford Brookes University in Oxford, England. He has been working for a number of years on the unique problems posed to OCR by connected script languages, and CPART has been following his research since an initial meeting with him in 2003. After meeting with Clocksin again this past June, CPART officials and Clocksin jointly decided that his Syriac and Arabic OCR software, called Qoruyo (Syriac for "reader"), was sufficiently developed to permit limited deployment and usage. Clocksin generously agreed to train CPART staff on its usage and permit BYU to use this software in the production of its Syriac electronic library.
In October, Carl Griffin and Kristian Heal of CPART traveled to Oxford for training on Qoruyo. The technical challenges of Syriac OCR are such that Qoruyo requires manual character mapping and modeling for each distinct typeface. While this requires an initial time investment, the accuracy achieved with a fully-trained typeface model can be even higher for Syriac than typical commercial OCR software is for English. While manuscripts and some irregular or complex printed texts will still need to be transcribed by hand, Qoruyo will greatly facilitate and economize the production of CPART's Syriac electronic library. As CPART begins electronic text production with this software, its staff will continue to collaborate with Professor Clocksin on the development of Qoruyo.