skip to navigation skip to content

All Cambridge Digital Humanities courses

Show only:

Showing courses 51-72 of 72
Courses per page: 10 | 25 | 50 | 100

Methods Workshop: TEI workshop new Mon 18 Jan 2021   10:00 Finished

The TEI (Text Encoding Initiative is a standard for the transcription and description of text bearing objects, and is very widely used in the digital humanities – from digital editions and manuscript catalogues to text mining and linguistic analysis. This course will take you through the basics of the TEI – what it is and what it can be used for – with a particular focus on uses in research, paths to publication (both web and print) and the use of TEI documents as a dataset for analysis. There will be a chance to create some TEI yourself as well as looking at existing projects and examples. The course will take place over two sessions a week apart – with an introductory taught session, then a chance to work on TEI records yourself, followed by a review and discussion session.

Network Analysis for Humanities Scholars new Mon 27 Jan 2020   12:30 Finished

This workshop is a very basic introduction to network analysis for humanities scholars. It will introduce the concepts of networks, nodes, edges, directed and weighted networks, bi- and multi-partite networks. It will give an overview of the kinds of things that can be thought about through a network framework, as well as some things that can’t. And it will introduce key theories, including weak ties, and small worlds. There will be an activity where participants will build their own test data set that they can then visualise. In the second half of the workshop we will cover some networks metrics including various centrality measures, clustering coefficient, community detection algorithms. It will include an activity introducing one basic web-based tool that allows you to run some of these algorithms and will provide suggestions for routes forward with other tools and coding libraries that allows quantitative analysis.

Attendees should bring their own laptops.

Ruth Ahnert is Professor of Literary History & Digital Humanities at Queen Mary University of London, and is currently leading two large AHRC-funded projects: Living with Machines, and Networking Archives. She is author of The Rise of Prison Literature in the Sixteenth Century (2013), and co-author of Tudor Networks of Power, and The Network Turn (both forthcoming).

We are running a focus group to try out Gale Digital Scholar Lab, an online platform of Digital Humanities tools for organising and analysing the historical texts in their archive. Gale representatives will demo the capabilities of the Lab and give you a practical opportunity to build your own corpus and do some analysis and visualisation (without writing a line of code). You will have a chance to feedback your opinions and research needs, and discuss broader issues of how these sorts of tools might fit in with your Digital Humanities research, and the role of private sector providers in the provision of tools and resources to researchers.

Gale Digital Scholar Lab will be available to participants in advance of the focus group. A link will be sent to participants by email. Refreshments and light lunch will be provided. Please bring your own laptop.

Optical Character Recognition is a term used to describe techniques for converting images containing printed or handwritten text into a format that can be searched and analysed computationally. This workshop will introduce several such tools along with some practical techniques for using them, and will also highlight OCR and related services offered by the Digital Content Unit at the Cambridge University Library.

Podcasting: An Introduction new Fri 12 Oct 2018   11:00 Finished

An introduction to audio recording and editing aimed at students and staff interested in learning how podcasting can help disseminate research.

This CDH Basics session will see discussion on how to assess the impact of relevant legal frameworks, including data protection, intellectual property and media law, on your digital research project and consider what approach researchers should take to the terms of service of third-party digital platforms. We will explore the challenge of informed consent in a highly-networked world and look at a range of strategies for dealing with this problem.

Qualitative Research in Online Environments new Tue 21 Jan 2020   11:30 Finished

What happens to the practice of qualitative research when interactions between researcher and research subject are largely mediated. This session will explore a wide range of topics including the challenge of consent, researcher presence and ‘lurking’ in mediated settings, how to engage with digital gatekeepers, information security for researchers, and understanding the impact of digital platform architecture on qualitative research design.

Re:search new Tue 10 Nov 2020   10:00 Finished

This CDHBasics session looks at how searching and finding technologies structure scholarship. It also covers

  • an introduction to search engines, both for web search and custom search functions within collections;
  • discussion about OCR errors and blindspots in digital search in historical collections
  • problems of fragmentation of the source text, and the legacy of pre-digital formats such as microfilm.
Social Network Analysis (SNA) new Thu 18 Feb 2021   11:00 Finished

Application forms should be returned to CDH Learning (

Social Network Analysis (SNA) is an exciting and rapidly growing methodology. You will find researchers in almost every faculty at the University of Cambridge applying SNA methods within their research. However, SNA researchers can only go so far before they must learn a coding language. Many SNA tools- descriptive metrics, visualisation techniques, and mathematical models- require researchers to use R. This session is for those researchers interested in SNA methods, but lack experience in the R environment.

While network visualisation is just one component of SNA, data visualisation can be a great gateway into a new programming language. This session will introduce you to the R environment by leading you through the creation of static network diagrams. The session is directed at beginners and basic R users that want to explore SNA tools in R.

Social Network Analysis with Digital Data new Tue 4 Feb 2020   11:00 Finished

This course will provide a hands-on introduction to the field of Social Network Analysis, giving participants the opportunity to “learn by doing” the process of network data collection and analysis. After being introduced to the basic concepts, the participants will have the opportunity to explore all stages of a social network analysis project, including research design, essential measures, data collection and data analysis. The focus will be on the retrieval of electronic archival data (e.g. websites, digital archives and social media platforms) for non-programmers and on the production of network analysis with specialised software (e.g. Gephi). At the end, the participants will be equipped with the basic tools to perform meaningful visualisations and analyses of network data.

Sorting things out - why metadata matters new Tue 27 Oct 2020   10:00 Finished

This CDHBasics session focuses on the importance of metadata (‘data about data’), examining the crucial role played by classification systems and standards in shaping how scholars interact with historical and cultural records.

Sources to Data new Wed 3 Jun 2020   11:00 CANCELLED

We are currently reformatting our Learning programme for remote teaching; this will require some rescheduling so bookings will reopen and new sessions will be created for online courses as soon as possible. In the interim we would encourage you to register your interest so as to be notified of the new schedule. Please be aware that we hope to run many of our courses online, but that this is dependent on staff availability and resources so please be aware we may have to postpone or cancel some sessions

Archives typically hold records containing enormous quantities of data presented in a variety of scribal and print formats. Extracting this information has traditionally involved long hours of expensive manual data-entry work. Nowadays this work can be automated to a large degree and could soon open archives and allow for unprecedentedly large structured data sets for curators, researchers, and the public alike. This workshop will examine new methods for collecting historical data from manuscript and printed documents. We will look at archival photography, OCR, page structure recognition, and new handwritten text recognition systems. Cutting-edge Cambridge research in this field will be demonstrated.

Sources to Data (Workshop) Wed 5 Jun 2019   11:00 Finished

This workshop will examine database creation from historical documents. Extracting data from these can be hard work and involves quite unusual skill combinations. You may need to digitise and transcribe from primary sources, and then design and build a database from scratch with the information. Other sources you use could already be digitised but may be arranged or filed in an unsuitable way for your project and therefore need conversion. We will look at techniques used when employing crumbling manuscripts, printed documents, books, or text searchable images, to harvest historical data. Techniques include manual data-entry, scanning and OCR, and handwritten text recognition systems.

Letters have been for centuries the main form of communication between scientists. Correspondence collections are a unique window into the social networks of prominent historical figures. What can digital social sciences and humanities reveal about the correspondence networks of 19th century scientists? This two-session intensive workshop will give participants the opportunity to explore possible answers to this question.

With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which we propose to study through social network analysis (SNA). The workshop will be divided in two sessions during which participants will “learn by doing” how to apply SNA to personal correspondence datasets. Following a guided project framework, participants will work on the correspondence collections of John Herschel and Charles Darwin. After a contextual introduction to the datasets, the sessions will focus on the basic concepts of SNA, data transformation and preparation, data visualisation and data analysis, with particular emphasis on “ego network” measures.

The two demonstration datasets used during the workshop will be provided by the Epsilon project, a research consortium between Cambridge Digital Library, The Royal Institution and The Royal Society of London aimed at building a collaborative digital framework for 19th century letters of science. The first dataset, the “Calendar of the Correspondence of Sir John Hershel Database at the Adler Planetarium”, is a collection of the personal correspondence of John Frederick William Herschel (1792-1871), a polymath celebrated for his contributions to the field of astronomy. Its curation process started in the 50s at the Royal Society and currently comprises 14.815 digitised letters encoded in extensible markup language (.xml) format. The second dataset, the “Darwin Correspondence Project” has been locating, researching, editing and publishing Charles Darwin’s letters since 1974. In addition to a 30-volume print edition, the project has also made letters available in .xml format.

The workshop will provide a step-by-step guide to analysing correspondence networks from these collections, which will cover:

- Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines; - Preparation and transformation of .xml files for analysis with an open source data wrangler; - Rendering of network visualisations using an open source SNA tool; - Analysis of the Ego Networks of John Herschel and Charles Darwin (requires UCINET)

About the speakers and course facilitators:

Anne Alexander is Director of Learning at Cambridge Digital Humanities

Hugo Leal is Methods Fellow at Cambridge Digital Humanities and Co-ordinator of the Cambridge Data School

Louisiane Ferlier is Digital Resources Manager at the Centre for the History of Science at the Royal Society. In her current role she facilitates research collaborations with the Royal Society collections, curates digital and physical exhibitions, as well as augmenting its portfolio of digital assets. A historian of ideas by training, her research investigates the material and intellectual circulation of ideas in the 17th and 18th centuries.

Elizabeth Smith is the Associate Editor for Digital Development at the Darwin Correspondence Project, where she contributed to the conversion of the Project’s work into TEI several years ago, and has since been collaborating with the technical director in enhancing the Darwin Project’s data. She is one of the co-ordinators of Epsilon, a TEI-based portal for nineteenth-century science letters.

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the workshop. Some exercises and preparation for the second session will be set during the first and participants should allow 2-3 hours for this. Please note, priority will be given to staff and students at the University of Cambridge for booking onto this workshop.

CDH Learning gratefully acknowledges the support of the Isaac Newton Trust and the Faculty of History for this workshop.

The Library as Data new Mon 15 Oct 2018   13:30 Finished

Discover the rich digital collections of Cambridge University Library and explore the methods and tools that researchers are using to analyse and visualise data.

The Library as Data: An overview new Wed 16 Oct 2019   11:00 Finished

Is the "digital library" more than a virtual rendering of the bookshelf or filing cabinet? Does the transformation of books into bytes and manuscripts into pixels change the way we create and share knowledge? This session introduces a conceptual toolkit for understanding the library collection in the digital age, and provides a guide to key methods for accessing, transforming and analysing the contents as data. Using the rich collections of Cambridge University Library as a starting point, we will explore:

  • Relations between digital and material texts and artefacts
  • Definitions of data and metadata
  • Methods for accessing data in bulk from digital collections
  • Understanding file formats and standards

The session will also provide an overview of the content in the rest of the term’s Library as Data programme, and introduce our annual call for applications to the Machine Reading the Archive Projects mentoring scheme.

The Library as Data: Digital Text Markup and TEI new Wed 23 Oct 2019   11:00 Finished

Text encoding, or the addition of semantic meaning to text, is a core activity in digital humanities, covering everything from linguistic analysis of novels to quantitative research on manuscript collections. In this session we will take a look at the fundamentals of text encoding – why we might want to do it, and why we need to think carefully about our approaches. We will also introduce the TEI (Text Encoding Initiative), the most commonly used standard for markup in the digital humanities, and look at some common research applications through examples.

Recent advances in machine learning are allowing computer vision and humanities researchers to develop new tools and methods for exploring digital image collections. Neural network models are now able to match, differentiate and classify images at scale in ways which would have been impossible a few years ago. This session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories, and demonstrates a range of different machine learning- based methods for exploring digital image collections. We will also discuss some of the ethical challenges of applying computer vision algorithms to cultural and historical image collections. Topics covered will include:

  • Unlocking image collections with the IIIF image data framework
  • Machine Learning: a very short introduction
  • Working with images at scale: ethical and methodological challenges
  • Applying computer vision methods to digital collections

Correspondence collections are a unique window into the social networks of prominent historical figures. With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which can be studied using social network analysis.

This session will introduce and demonstrate foundational concepts, methods and tools in social network analysis using datasets prepared from the Darwin Correspondence collection. Topics covered will include

  • Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines
  • Preparation and transformation of .xml files for analysis with an open source data wrangler
  • Rendering of network visualisations using an open source SNA tool

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the session

The Transkribus Guided Project new Wed 29 Jul 2020   16:00 Finished

We introduce the Transkribus software system that can be taught to read handwriting from images of documents and rapidly convert it into useful digital formats. This guided course provides basic training by practical immersion in this software, which requires only basic IT skills. Transkribus was developed by READ under the Horizon 2020 funding framework and is now a co-operative. It had 20,000+ users in 2019, and is becoming a standard research tool for mass transcription of archival sources. Participants will transcribe anonymised data from pre-loaded scans of forms filled out for the French national census of 1999 in Transkribus's downloadable software interface. These manual transcriptions will help train a handwritten text recognition (HTR) model to automatically transcribe many more of these forms later. In fact, the model will eventually allow the creation of one of the largest data sets ever attempted from manuscript sources. This course is a collaboration with Transkribus and Cambridge Digital Humanities. It is funded by a Cambridge Humanities Research Grant.

Image big data are increasingly being used to understand the built and natural environment and to observe behaviours within it. Data sources include satellite and airborne imagery, 360 street views, and fixed video or time lapse traffic and CCTV cameras. While some of these sources are newer than others what has been changing are the quality of the images, the geographical coverage, and the potential for assessing changes over time. At the same time improvements in machine learning have made it possible to turn images into quantitative data at scale.

In this workshop we will explore the challenges that researchers face when using images at scale to understand environments and behaviours, building on work at Cambridge to estimate cycling levels, using satellite data to estimate motor vehicle volume, and planned data collection in Kenya using 360 cameras.

This CDH Basics session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories and methods of access to image collections including the collections of Cambridge University Digital Library. We will also discuss a range of methods using IIIF image data in humanities research.

[Back to top]