Digitization & OCR

OCR accuracy varies; some engines struggle when music is present.

Sounding Spirit’s work revolves around digitizing, processing, and displaying songbooks. Sounding Spirit’s approach to digitization is driven by our emphasis on songbooks’ social function as carriers of race, place, religion, and culture.

The books included in our scholarly editions and digital library collections matter materially. Elements of the physical items themselves—their dimensions, musical form, music notation, and evidence of use—are important to practitioners and researchers alike. Our digitization process results in high quality images and textual information that record both content and layout. Readux, our platform for display and engagement, draws on this information, allowing Sounding Spirit publications to engage materials for textual evidence of cultural context and encounter.

Specs and Speculation

Sounding Spirit has defined high standards for digitization that rely on national and international guidelines and cover areas ranging from optical resolution to color fidelity. These standards allow us to best represent each songbook’s physical characteristics on the web and in new print editions. Our collaborative approach to collection building allows us to workshop our standards with institutions representing a wide range of digital affordances. These collaborations allow us to balance best practices with accommodations that make both our processes and products accessible to a variety of partners.

Overlays and Understories

Optical character recognition (OCR), the electronic recognition and encoding of the text in digital images, is a rapidly evolving field central to Sounding Spirit’s approach to digital collections and editions alike. Sounding Spirit’s Readux platform enables the selection and annotation of any element of a book’s page, including music and illustrative elements. Readux also allows users and editors to select and annotate text generated through OCR. Though most archives use OCR to generate plain text, many leading OCR engines can also generate positional information for each word. Readux uniquely harnesses this positional information, transparently overlaying the encoded text on the digital page image, making annotation of semantic text possible.

Sounding Spirit encourages institutions to adopt positional text OCR workflows to facilitate scholarship and public interaction that builds on pages’ structure and layout in addition to the words themselves. Sounding Spirit’s focus on songbooks whose juxtapositions of words and music can signal cultural affiliations requires OCR technology that can differentiate between textual and musical content. Our team is exploring the potential of software designed to identify text in environmental settings to differentiate text from music while achieving a high-level of accuracy. The Sounding Spirit team is actively researching both OCR and optical music recognition (OMR) practices that best represent these texts and allow our software developers to build additional affordances into the Readux platform.

Federating the Future

The Readux platform uses the International Image Interoperability Framework (IIIF) to enable the aggregation and display of federated collections. While IIIF is not yet universally adopted, a growing list of institutions have implemented this protocol to enhance access to their digital collections. Others share their digitized books with the Internet Archive, a publicly accessible digital archive with over 20 million books that is experimenting with IIIF implementation. Sounding Spirit encourages institutions to adopt digitization and hosting practices that harness IIIF to allow for public access to digitized books’ page images and the rich metadata associated with these collections, including robust positional text derived from OCR when available.