Workshops box

Custom models with Transkribus (Session 2 of 2 in workshop series: Automatic multi-language text transcription for hand-written or typeset documents)

Custom models with Transkribus (Session 2 of 2 in workshop series: Automatic multi-language text transcription for hand-written or typeset documents)

Transkribus is a suite of tools which facilitate the automatic transcription of documents. The software is maintained by a European cooperative (READ-COOP), with access provided through an RU subscription. Although originally designed to provide an environment to facilitate the manual transcription of documents, the software now supports automatic transcription in large number of languages, both hand-written and typeset text, and can provide competent transcriptions in these languages at different stages of their evolution (e.g. Old English). In addition to its ability to process non-modern languages, Transkribus can also process text on discolored or otherwise degraded pages. It is hence suitable for transcription projects featuring historical documents. An additional key benefit of this software is the capability it provides to users to easily train custom models that can automatically identify complex layouts and fine-tune the text recognition process. This is helpful when transcribing difficult documents (e.g. historical diaries).

This workshop series consists of two sessions. The first session covers introductory material and introduces participants to common Transkribus workflows and document management. The second provides an in-depth overview of the model training system for a diverse set of documents provided for the session. Participants are free to attend one or both sessions depending on their background and interest. This workshop series is perhaps most useful to humanities researchers, but is likely interesting to anyone interested in text transcription.

Session 1: Introduction to Transkribus

In this session, participants will learn about the capabilities and functionality of Transkribus. This includes how to upload and manage documents, how process documents to produce transcribed output, how to correct the output, and how to use the transcriptions for down-stream analyses. Participants are encouraged to bring their own documents to experiment with Transkribus during the session. Some time will also be dedicated to describing the different deep-learning models available for various aspects of text and layout recognition.

Session 2: Training custom layout and text-recognition models with Transkribus

Automatic transcription in Transkribus often works extremely well out-of-the-box, producing fewer errors in many cases than expert human transcribers. There are limitations however which become clear when dealing with, for example, complex layouts, unconventional typesetting, poor hand-writing and spelling errors. In this workshop, users will learn how to train their own models to address these challenges. Model training can be somewhat complicated, but the workflow in Transkribus simplifies the process enough so that non-programmers can make use of powerful tools otherwise unavailable to them. In this session users will learn how to perform a first-pass transcription to produce output which can then be corrected and used as training data for a custom text recognition model. Participants will also learn how to assess model performance, and understand what kind of models should be used for specific stages of the text transcription workflow. Some background with Transkribus is necessary to follow the material in this session. Participants who are new to Transkribus or who would like a refresher are encouraged to first attend the introductory session.

Related LibGuide: Transkribus by Daniel Sharoh

Date:
Tuesday, October 14, 2025
Time:
2:30pm - 4:00pm
Location:
1.05B Central Library Instruction Room
Campus:
Central Library
Faculty:
  All faculties  
Categories:
  DCC     Text Mining  

Registration is required. There are 20 seats available.

Teacher(s)

Daniel Sharoh

More events like this...