skip to navigation skip to content
Providers & themes
Find theme:
Select provider / theme

Theme: The Library as Data

Show:
Show only:

5 matching courses


The Library as Data: Digital Text Markup and TEI new Wed 23 Oct 2019   11:00 Finished

Text encoding, or the addition of semantic meaning to text, is a core activity in digital humanities, covering everything from linguistic analysis of novels to quantitative research on manuscript collections. In this session we will take a look at the fundamentals of text encoding – why we might want to do it, and why we need to think carefully about our approaches. We will also introduce the TEI (Text Encoding Initiative), the most commonly used standard for markup in the digital humanities, and look at some common research applications through examples.

Correspondence collections are a unique window into the social networks of prominent historical figures. With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which can be studied using social network analysis.

This session will introduce and demonstrate foundational concepts, methods and tools in social network analysis using datasets prepared from the Darwin Correspondence collection. Topics covered will include

  • Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines
  • Preparation and transformation of .xml files for analysis with an open source data wrangler
  • Rendering of network visualisations using an open source SNA tool

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the session

The Library as Data: An overview new Wed 16 Oct 2019   11:00 Finished

Is the "digital library" more than a virtual rendering of the bookshelf or filing cabinet? Does the transformation of books into bytes and manuscripts into pixels change the way we create and share knowledge? This session introduces a conceptual toolkit for understanding the library collection in the digital age, and provides a guide to key methods for accessing, transforming and analysing the contents as data. Using the rich collections of Cambridge University Library as a starting point, we will explore:

  • Relations between digital and material texts and artefacts
  • Definitions of data and metadata
  • Methods for accessing data in bulk from digital collections
  • Understanding file formats and standards

The session will also provide an overview of the content in the rest of the term’s Library as Data programme, and introduce our annual call for applications to the Machine Reading the Archive Projects mentoring scheme.

This session focusses on providing photography skills for those undertaking archival research. Dr Oliver Dunn has experience spanning a decade filming documents for major academic research projects. He will go over practical approaches to finding and ordering materials in the archive, methods of handling and filming them, digital file storage, and transcription strategies. The focus is very much on low-tech approaches and small budgets. We’ll consider best uses of smartphones, digital cameras and tripods. The session is held in the IT training room at the University Library.

Recent advances in machine learning are allowing computer vision and humanities researchers to develop new tools and methods for exploring digital image collections. Neural network models are now able to match, differentiate and classify images at scale in ways which would have been impossible a few years ago. This session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories, and demonstrates a range of different machine learning- based methods for exploring digital image collections. We will also discuss some of the ethical challenges of applying computer vision algorithms to cultural and historical image collections. Topics covered will include:

  • Unlocking image collections with the IIIF image data framework
  • Machine Learning: a very short introduction
  • Working with images at scale: ethical and methodological challenges
  • Applying computer vision methods to digital collections