Processing Ancient Text Corpora

17 - 21 February 2020

Venue: Snellius

If you are invited or already registered for this workshop, you have received login details by email.

This workshop aims to promote scholarly exchange and to build a community of scholars with an interest in digital humanities and ancient texts. Research into ancient texts undergoes strong development: the application of ever more methods from statistics and machine learning. Given the fact that the text disciplines are organized by language, rather than by method, we think the methodological exchange can be strengthened. The sharing of IT techniques is a natural playground for this, but only a starting point. Theoretically, we need to discuss where these methods bring us. Are big data methods also applicable to small data? What is particular about the fact that the texts of interest are historical? Practically we want to discuss how we can optimally employ IT methods. Can we assess the landscape of IT and make an informed selection of the regions that are most useful to us?

The programme will move from a deductive primer of various corpora to an in-depth discussion on research questions and the digital tools for addressing these questions. It will move from corpora specimens to data structures and text models, and from there to analytical techniques. The in-depth discussion will deal with narratives from research questions through data models and analytical techniques to results or the absence of results. Sharing these narratives among people working in different linguistic, literary and historical disciplines will lead to reflection: where does it leave us? Do we see tools, toolkits that we all can use, mutatis mutandis? What are the limits of data-driven research, or put otherwise, which part of the problem do they solve and which part not? How are digital techniques evolving, and how can we keep up with developments? How can an individual master the necessary skills, how can a team acquire them?

Program

Monday 17 February

Introduction

09:00 – 09:30 Arrival, registration

09:30 – 09:45 Welcome by the Lorentz Centre

09:45 – 10:15 Ice Breaker Game

10:15 – 10:30 Introduction to the programme – Wido van Peursen

10:30 – 11:20 Teaser / introductory paper: Graph models for text – Elli Bleeker and Ronald Dekker

Part One: Corpora

A. Corpora: Primer

11:20 – 11:40 Sanskrit anonymous literature – Peter Bisschop

11:40 – 12:00 Cuneiform (Uruk, Babylonian) – Cale Johnson

12:00 – 12:20 Old Testament / Hebrew Bible – Eep Talstra

12:30 – 14:30 Lunch @Snellius restaurant

14:15 – 14:30 Arrival of additional guests just for the Monday afternoon

14:30 – 14:50 Syriac Texts – Wido van Peursen

14:50 – 15:10 New Testament and Patristic literature – Ernst Boogert

15:10 – 15:30 Incantation Bowls –¬ Margaretha Folmer

15:30 – 16:00 Coffee break

16:00 – 17:00 Informal discussions: corpora, research, various problems and challenges

17:00 – Wine and cheese party

Tuesday February 18

B. Corpora: in-depth discussion

09:30 – 11:00 In-depth discussions: breakout-sessions on shared problems and challenges

11:00 – 11:30 Break

11:30 – 12:00 Feedback from break-out sessions

12:00 – 13:45 Lunch @Snellius restaurant

Part Two: Data structure and text model

13:45 – 14:45 The ETCBC data model – Constantijn Sikkel

14:45 – 15:30 Bible Online Learner – Nicolai Winther Nielsen

15:30 – 16:00 Coffee break

16:00 – 16:45 Corpora, annotation, and Text-Fabric – Cody Kingham, Dirk Roorda

16:45 – 17:45 Informal discussions: modeling, computing, various problems and challenges

Wednesday February 19

09:00 – 10:30 In-depth discussions: breakout-sessions on shared problems and challenges

10:30 – 11:00 Coffee break

11:00 – 11:30 Feedback from break-out sessions

11:30 – 13:00 Hands-on session in small groups: ETCBC workflow / Bible OL / Text-Fabric

13:00 – 14:30 Lunch @Snellius restaurant

14:30 – 15:30 Feedback from break-out sessions and plenary discussion (chaired by Cody Kingham) on the agreements and differences between the various approaches, their advantages and disadvantages and the question how we should anticipate future developments.

15:30 - 17:30 Open space: discussions, hands-on, study

17:30 - 21:30 Boat trip (including dinner)

Thursday February 20

Part Three: Corpus analysis

09:00 – 10:00 Improvised Pitches: New ideas that surface after the first three days – first come first serve

10:00 – 10:20 Image processing, facsimiles – Cornelis van Lit

10:20 – 10:40 Where do you mean? Improving BHSA with Vector Semantics – Cody Kingham

10:40 – 11:00 Text-Fabric and coreference/participant analysis – Christiaan Erwich

11:00 – 11:30 Break

11:30 – 11:50 Machine Learning – Mathias Coeckelbergs

11:50 – 12:10 Onomastics – Elizabeth Robar

12:10 – 12:30 Computational Stylometry – Pierre Van Hecke

12:30 – 12:50 Data and ontology modeling and the ReIReS project – Roxanne Wyns

13:00 – 14:30 Lunch @Snellius restaurant

14:30 – 15:30 In-depth discussions: breakout-sessions on shared problems and challenges

15:30 – 16:00 Break

16:00 – 16:30 Feedback from break-out sessions

16:30 - Open space: discussions, hands-on, study

Friday February 21

Part Four: Making knowledge accessible

09:00 – 09:20 Making digital manuscript scholarship accessible – Cornelis van Lit

09:20 – 09:40 SHEBANQ and TF in education – Oliver Glanz

09:40 – 10:00 Teaching Theological students to use Text-Fabric – Christian H. Jensen

10:00 – 10:20 TFbuilder: a generic Python library to build TF datasets from TEI XML and CSV – Ernst Boogert

10:20 – 10:50 Break

10:50 – 11:30 Building a digital text archive for a large community – Gregory Crane, James Tauber

11:30 – 12:00 General discussion

12:00 – 13:30 Lunch @Snellius restaurant

Conclusion

13:30 – 14:30 Keynote Perspective from another discipline: Sandjai Bhulai

We all know that ancient text corpora harbor an enormous amount of complexity. But there are more complex systems than that. Does the study of social media, business planning and economy offer us paradigms and tools to tackle the history of ancient texts?

Sandjai Bhulai is full professor of Business Analytics at Vrije Universiteit Amsterdam. He studied "Mathematics" and "Business Mathematics and Informatics", and obtained a PhD on Markov decision processes for the control of complex, high-dimensional systems.

Sandjai's research is on the interface of mathematics, computer science, and operations management. His specialization is in decision making under uncertainty, optimization, data science, and business analytics.

14:30 – 15:00 General discussion