Processing Ancient Text Corpora

17 - 21 February 2020

Venue: Lorentz Center@Snellius

If you are invited or already registered for this workshop, you have received login details by email.

This workshop aims to promote scholarly exchange and to build a community of scholars with an interest in digital humanities and ancient texts. Research into ancient texts undergoes strong development: the application of ever more methods from statistics and machine learning. Given the fact that the text disciplines are organized by language, rather than by method, we think the methodological exchange can be strengthened. The sharing of IT techniques is a natural playground for this, but only a starting point. Theoretically, we need to discuss where these methods bring us. Are big data methods also applicable to small data? What is particular about the fact that the texts of interest are historical? Practically we want to discuss how we can optimally employ IT methods. Can we assess the landscape of IT and make an informed selection of the regions that are most useful to us?


 The programme will move from a deductive primer of various corpora to an in-depth discussion on research questions and the digital tools for addressing these questions. It will move from corpora specimens to data structures and text models, and from there to analytical techniques. The in-depth discussion will deal with narratives from research questions through data models and analytical techniques to results or the absence of results. Sharing these narratives among people working in different linguistic, literary and historical disciplines will lead to reflection: where does it leave us? Do we see tools, toolkits that we all can use, mutatis mutandis? What are the limits of data-driven research, or put otherwise, which part of the problem do they solve and which part not? How are digital techniques evolving, and how can we keep up with developments? How can an individual master the necessary skills, how can a team acquire them?


    Monday 17 February


    09:00 – 09:30     Arrival, registration

    09:30 – 09:45    Welcome by the Lorentz Centre

    09:45 – 10:15     Ice Breaker Game

    10:15 – 10:30     Introduction to the programme – Wido van Peursen

    10:30 – 11:20     Teaser / introductory paper: Graph models for text – Elli Bleeker and Ronald Dekker


    Part One: Corpora

    A. Corpora: Primer

    11:20 – 11:40    Sanskrit anonymous literature – Peter Bisschop

    11:40 – 12:00    Cuneiform (Uruk, Babylonian) – Cale Johnson   

    12:00 – 12:20    Old Testament / Hebrew Bible – Eep Talstra


    12:30 – 14:30    Lunch @Snellius restaurant


    14:15 – 14:30     Arrival of additional guests just for the Monday afternoon

    14:30 – 14:50     Syriac Texts – Wido van Peursen

    14:50 – 15:10     New Testament and Patristic literature – Ernst Boogert

    15:10 – 15:30    Incantation Bowls  –¬ Margaretha Folmer

    15:30 – 16:00    Coffee break

    16:00 – 17:00     Informal discussions: corpora, research, various problems and challenges

    17:00 –                Wine and cheese party


    Tuesday February 18

    B. Corpora: in-depth discussion


    09:30 – 11:00     In-depth discussions: breakout-sessions on shared problems and challenges

    11:00 – 11:30     Break

    11:30 – 12:00     Feedback from break-out sessions


    12:00 – 13:45    Lunch @Snellius restaurant


    Part Two: Data structure and text model

    13:45 – 14:45     The ETCBC data model  – Constantijn Sikkel

    14:45 – 15:30     Bible Online Learner – Nicolai Winther Nielsen

    15:30 – 16:00    Coffee break

    16:00 – 16:45     Corpora, annotation, and Text-Fabric – Cody Kingham, Dirk Roorda

    16:45 – 17:45     Informal discussions: modeling, computing, various problems and challenges


    Wednesday February 19

    09:00 – 10:30     In-depth discussions: breakout-sessions on shared problems and challenges

    10:30 – 11:00     Coffee break

    11:00 – 11:30     Feedback from break-out sessions

    11:30 – 13:00    Hands-on session in small groups: ETCBC workflow / Bible OL / Text-Fabric


    13:00 – 14:30    Lunch @Snellius restaurant


    14:30 – 15:30    Feedback from break-out sessions and plenary discussion (chaired by Cody Kingham) on the agreements and differences between the various approaches, their advantages and disadvantages and the question how we should anticipate future developments.

    15:30 - 17:30     Open space: discussions, hands-on, study

    17:30 - 21:30     Boat trip (including dinner)


    Thursday February 20

    Part Three: Corpus analysis


    09:00 – 10:00     Improvised Pitches: New ideas that surface after the first three days – first come first serve

    10:00 – 10:20     Image processing, facsimiles – Cornelis van Lit

    10:20 – 10:40     Where do you mean? Improving BHSA with Vector Semantics – Cody Kingham

    10:40 – 11:00    Text-Fabric and coreference/participant analysis – Christiaan Erwich

    11:00 – 11:30     Break

    11:30 – 11:50     Machine Learning – Mathias Coeckelbergs

    11:50 – 12:10     Onomastics – Elizabeth Robar

    12:10 – 12:30     Computational Stylometry – Pierre Van Hecke

    12:30 – 12:50     Data and ontology modeling and the ReIReS project – Roxanne Wyns


    13:00 – 14:30     Lunch @Snellius restaurant


    14:30 – 15:30     In-depth discussions: breakout-sessions on shared problems and challenges

    15:30 – 16:00     Break

    16:00 – 16:30     Feedback from break-out sessions

    16:30 -               Open space: discussions, hands-on, study


    Friday February 21

    Part Four: Making knowledge accessible


    09:00 – 09:20     Making digital manuscript scholarship accessible – Cornelis van Lit

    09:20 – 09:40     SHEBANQ and TF in education – Oliver Glanz

    09:40 – 10:00     Teaching Theological students to use Text-Fabric – Christian H. Jensen

    10:00 – 10:20     TFbuilder: a generic Python library to build TF datasets from TEI XML and CSV – Ernst Boogert

    10:20 – 10:50     Break

    10:50 – 11:30     Building a digital text archive for a large community – Gregory Crane, James Tauber

    11:30 – 12:00     General discussion


    12:00 – 13:30     Lunch @Snellius restaurant




    13:30 – 14:30    Keynote Perspective from another discipline: Sandjai Bhulai

    We all know that ancient text corpora harbor an enormous amount of complexity. But there are more complex systems than that. Does the study of social media, business planning and economy offer us paradigms and tools to tackle the history of ancient texts?

    Sandjai Bhulai is full professor of Business Analytics at Vrije Universiteit Amsterdam. He studied "Mathematics" and "Business Mathematics and Informatics", and obtained a PhD on Markov decision processes for the control of complex, high-dimensional systems.

    Sandjai's research is on the interface of mathematics, computer science, and operations management. His specialization is in decision making under uncertainty, optimization, data science, and business analytics.


    14:30 – 15:00     General discussion


    Please login to view the participants information. You have received the log in details in your registration confirmation.

    Cody Kingham, Cambridge University  

    Wido van Peursen, ETCBC, Faculty of Religion and Theology,  

    Dirk Roorda, Data Archiving and Networked Services  

    Nicolai Winther-Nielsen, FIUC-DK & Vrije Universiteit  

Follow us on:

Niels Bohrweg 1 & 2

2333 CA Leiden

The Netherlands

+31 71 527 5400