skip to navigation skip to content

All Cambridge Digital Humanities courses

Show:
Show only:

72 matching courses
Courses per page: 10 | 25 | 50 | 100


Analysing and Visualising Social Media Data (Workshop) new Mon 11 Feb 2019   14:00 Finished

This session introduces a variety of analytical strategies, with a focus on Social Network Analysis, the most widely used and abused method for analysing and visualising digital and social media data. At the end of this session, you will be familiar with the basic concepts, techniques and measures of social network analysis.

Archival Photography: An Introduction new Wed 12 Jun 2019   11:00 Finished

This session focusses on providing photography skills for those undertaking archival research. Dr Oliver Dunn has experience spanning a decade filming documents for major academic research projects. He will go over practical approaches to finding and ordering materials in the archive, methods of handling and filming them, digital file storage, and transcription strategies. The focus is very much on low-tech approaches and small budgets. We’ll consider best uses of smartphones, digital cameras and tripods. The session is held at the Digital Content Unit at the University Library.

Automated writing in the age of Machine Learning new Mon 7 Dec 2020   11:30 Finished

Computer programmes which predict the likely next words in sentences are a familiar part of everyday life for billions of people who encounter them in auto-complete tools for search engines and the predictive keyboards used by mobile phones and word processing software. These tools rely on “language models” developed by researchers in fields such as natural language processing (NLP) and information retrieval which assign probabilities to words in a sequence based on a specific set of “training data” (in this case a collection of texts where the frequencies of word pairings or three-word phrases have been calculated in advance).

Recent developments in machine learning have led to the creation of general language models trained on extremely large datasets which can now produce ‘synthetic’ texts, answer questions, summarise information without the need for lengthy or costly processes of training for each new task. The difficulties in distinguishing the outputs of these language models from texts written by humans has provoked widespread interest in the media. Researchers have experimented with prompting GPT-3, a language model developed by OpenAI to write short stories, answer philosophical questions and apparently propose potential medical treatments -although GPT-3 did have some difficulty with the question “how many eyes does a horse have?”. Meanwhile, The Guardian ‘commissioned’ an op-ed from GPT-3.

This Methods Workshop will explore the generation of ‘synthetic’ texts through presentations, discussion and demonstrations of text generation techniques which participants will be encouraged to try out for themselves during the sessions. We will also report back from the Ghost Fictions Guided Project, organised by Cambridge Digital Humanities Learning Programme in October and November this year. The project looks at how ideas about the distinction between ‘fact’, ‘fiction’ and ‘nonfiction’ are shaping the reception of text generation methods and aims to stimulate deeper critical engagement with machine learning by humanities researchers.

Prior knowledge of programming, computer science or Machine Learning is not required. In order to try out the text generation techniques demonstrated during the course you will need access to Google Drive (accessible via Raven login for University of Cambridge users).

Beginner's Filmmaking Workshop new Mon 17 Feb 2020   10:00 Finished

Tutors: Sarah McEvoy / Kostas Chondros

Are you curious about making a short documentary film?

This beginner’s filmmaking workshop will help you to start thinking visually and communicate using sound and film. Over two days you will be introduced to different camera shot types, how to construct a basic story, use digital video cameras and sound recorders to shoot your own footage, and then edit a short sequence for export.

The workshop assumes no or very little prior knowledge of filmmaking and no prior preparation is required for the workshop. This is a hands-on practical workshop, working in small teams of two or three people. We expect a willingness to be open to ideas and work in a team to jointly create a short film clip.

The workshop will give you the foundational skills to incorporate film and sound in your own future projects, for example short clips for social media, publicity about research projects as a way to engage wider audiences etc.

During the workshop you will work with dedicated video equipment, but the techniques you will learn can be adapted to film making with smartphones, tablets and other readily available personal electronic devices.

COURSE PROGRAMME

Day 1 – Monday 17th February

  • 10.00 Welcome and introductions
  • 10.30 Aims of the session
  • 10.45 Introduction to shot types, camera movements, framing, telling a story, basic rules of camera use, rules of recording sound
  • 11.45 Splitting into groups – interactive demonstration of how to use the cameras
  • 13.00 Lunch
  • 14.00 Filming around Cambridge, practical exercise working in groups
  • 16.00 Return to room to look at footage from all groups
  • 17.00 Feedback session and summary of day 1 intro to day 2

Day 2 – Tuesday 18th February

We will be working on apple macs and Final Cut X; however we do not expect any prior knowledge of working with either computer or software

  • 10.00 Importing footage onto computers
  • 10.15 Basic editing, creating a 2-minute clip, summary of creating a sequence
  • 10.45 Adding clips to timeline, tools for manipulating clips, using second video track, transitions and filters, syncing audio
  • 13.00 Lunch
  • 14.00 Credits, titles, adjusting audio levels, adding music or narration, exporting footage, saving files
  • 16.00 Looking at each other’s edited clips
  • 16.45 Evaluation
  • 17.00 Finish

Handouts will be emailed after the workshop, and include:

Presentation – shot types, how to construct a sequence Editing on Final Cut x Camera functions, audio recording, info about equipment and editing software and model release forms

What you need to take with you

Headphones – preferably the kind you can plug in rather than Bluetooth headphones

Storage device – if you want to take footage you shoot with you after the workshop, you will need a hard drive, USB or SD card that can hold at least 8GB. Video files are large. Please make sure that the device is formatted to FAT32 if you use it on a PC, as we will be using macs. You can check this by right clicking the device and checking the properties. If you prefer, you don’t need to save the footage that you film and can also upload the exported film to Dropbox.

Upon booking this workshop a questionnaire will be issued to participants which must be completed in order to satisfy the booking.

The workshop is led by:

Sarah McEvoy holds BA Hons Fine Art and an MA in Visual Anthropology from Goldsmiths University of London and has most recently completed an MA in Art and Design in Education at UCL Institute of Education. Sarah has worked with arts organisations and charities creating short documentaries and has most recently filmed and edited a film working with a socially engaged artist in the community of South East London. As an artist-educator, Sarah works with youth groups and adults with learning disabilities in the community and museums and galleries.

Kostas Chondros holds an MA in Visual Anthropology from Goldsmiths College, University of London. He also holds an MA in Social Exclusion, Minorities & Gender from Panteion University and a BA in Social Anthropology & History from the University of the Aegean, Greece. Since joining the Personal Histories film production team in 2011, Kostas has filmed several events and taught camera & film production skills. Additionally, as a freelance filmmaker, Kostas documents improvised music performances and collaborates on film projects with other artists and performers. He is also a musician, poet and translator.

Find out how to use blogging in your research. The first of two sessions on research blogging will explore the benefits and limitations of blogging for public engagement.

The second of two sessions on research blogging will explore how social media can enable public engagement with your blog, learn how to set up a Twitter chat and explore other methods to get people talking about your research.

Bug Hunt 2020 [cancelled - Covid 19] new Tue 21 Apr 2020   13:00 CANCELLED

This programme is an opportunity to learn, through practical experience and shared investigation, how to apply digital methods for exploring and analysing a body of archival texts. The core of the programme will be 5 x 2 hour classroom based sessions supplemented by group and individual work on tasks related to the project design, delivery and documentation in between sessions. In addition to attending all five face-to-face sessions, participants should set aside an additional 8-10 hours over the duration of the course for work on project-related tasks.

During the programme we’ll work together on a particular topic: how insects were represented in books created for children in the 19th century. This question will help us to think about how children’s encounters with the natural world might have been framed and shaped by their reading. We’ll work on digital collections of 19th century children’s books exploring how such collections are built and how they can be used for machine reading. We’ll develop specific research questions and you’ll learn how to explore them using different tools for textual stylistic analysis. At the end, we’ll present findings and consider the implications of what we’ve discovered.

Topics covered include;

• The development of methods for machine reading the archive – ideas, motivations and ethics • Children’s books of the long 19th century – a beginner’s guide • Designing a small-scale investigation • Building a collection of digital texts • Transforming texts into searchable data • Analysing stylistic patterns in the data

Bulk Data Capture: an overview new Tue 23 Feb 2021   10:00 Finished

This CDH Basics session provides a brief introduction to different methods for capturing bulk data from online sources or via agreement with data collection holders, including Application Programme Interfaces (APIs). We will address issues of data provenance, exceptions to copyright for text and data-mining, and discuss good practice in managing and working with data that others have created.

Chris Houghton (Head of Digital Scholarship for Gale) joins us to deliver this suite of CDH Labs sessions. Chris collaborates globally with scholars, in the digital humanities community, ensuring the development of Gale Digital Scholar Lab continues to meet their needs.

Are you interested in looking at primary sources in new ways? Would you like to learn how to analyse large sets of historical and contemporary materials to provide a different perspective on your research?

In this session we will introduce Gale Digital Scholar Lab, a cloud hosted text and data mining platform available to the University. The Lab combines the text from Gale’s archive collections available at Cambridge, including Times Digital Archive and Eighteenth-Century Collection Online (ECCO), with powerful text mining tools that enable sophisticated, wide-ranging analysis.

You don’t need any previous experience in text and data mining, and you don’t have to have any interest in coding or algorithms – this session will explain how absolutely anyone can run these analyses and enhance their research accordingly.

Chris Houghton (Head of Digital Scholarship for Gale) joins us to deliver this suite of CDH Labs sessions. Chris collaborates globally with scholars, in the digital humanities community, ensuring the development of Gale Digital Scholar Lab continues to meet their needs.

Are you interested in looking at primary sources in new ways? Would you like to learn how to analyse large sets of historical and contemporary materials to provide a different perspective on your research?

In this session we will introduce Gale Digital Scholar Lab, a cloud hosted text and data mining platform available to the University. The Lab combines the text from Gale’s archive collections available at Cambridge, including Times Digital Archive and Eighteenth-Century Collection Online (ECCO), with powerful text mining tools that enable sophisticated, wide-ranging analysis.

You don’t need any previous experience in text and data mining, and you don’t have to have any interest in coding or algorithms – this session will explain how absolutely anyone can run these analyses and enhance their research accordingly.

Chris Houghton (Head of Digital Scholarship for Gale) joins us to deliver this suite of CDH Labs sessions. Chris collaborates globally with scholars, in the digital humanities community, ensuring the development of Gale Digital Scholar Lab continues to meet their needs.

CDH Labs: Digital Scholar Lab sessions: Tools in Depth new Thu 13 May 2021   15:00 [Places]

Chris Houghton (Head of Digital Scholarship for Gale) joins us to deliver this suite of CDH Labs sessions. Chris collaborates globally with scholars, in the digital humanities community, ensuring the development of Gale Digital Scholar Lab continues to meet their needs.

This CDH Basics session explores how data which you have captured rather than created yourself, is likely to need cleaning up before you can use it effectively. This short session will introduce you to the basic principles of creating structured datasets and walk through some case studies in data cleaning with OpenRefine, a powerful open source tool for working with messy data.

Computer Vision: A critical introduction new Tue 25 May 2021   10:00 [Places]

Machine vision systems can potentially help humanities researchers see historical and cultural image collections differently, and could provide tools to answer new research questions. This CDH Basics session provides an introductory overview of basic tasks in machine vision, such as Image Classification, Object Detection and Image Captioning within a critical framework highlighting the challenges of algorithmic bias and the limits of automation as a method for humanistic enquiry.

Creating Databases from Historical Sources (Workshop) Mon 25 Feb 2019   11:00 Finished

This workshop will examine strategies for transforming a variety of sources into structured digital data, ranging from crumbling manuscripts to printed documents and books.

Leonardo Impett, Cambridge Digital Humanities

Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Friday 22 May 2020. Successful applicants will be notified by 26 May 2020.

This course will introduce graduate students, early-career researchers, and professionals in the humanities to the technologies of image recognition and machine vision, including recent developments in machine vision research in the past half-decade. The course will seek to combine a technical understanding of how machine vision systems work, with a detailed understanding of the possibilities they open to research and study in the humanities, and with a critical exploration of the social, political and ideological dimensions of machine vision.

Learning outcomes

By the end of the course, students should be able to:

  • Understand the basic tasks of machine vision, such as Image Classification, Object Detection, Image-to-Image Translation, Image Captioning, Image Segmentation etc.
  • Understand the fundamental technical operations of image processing and machine vision: the pixel encoding of images, Gaussian and convolutional filters,
  • Explore critical aspects of machine vision in a technically-informed way: e.g. the problems in algorithmic bias brought about by featureless convolutional networks
  • Develop and run their own simple machine vision and image processing pipelines, in a visual programming language compiling to Python
  • Understand the potential synergies and limitations of machine vision applications in humanities research and cultural heritage institutions
Data Presentation and Preservation new Tue 28 Jan 2020   11:30 Finished

The afterlife of your research data forms a vitally important part of your research project. Research funders and academic journal publishers are often strongly committed to the re-use of data and are reluctant to fund or publish research where datasets are not accessible for the purposes of peer review or further use. Yet the push for open data exists in tension with the expectations of data protection law which requires transparency from researchers about how long they will retain personal data. This session will explore good practice in data sharing and archiving as well as introducing sources of further information and advice within the University on this topic.

Data Wrangling (Workshop) new Mon 4 Feb 2019   14:00 Finished

Garbage in, garbage out! Your output is as good or as bad as your input. Data collected from online sources is often dirty and messy. Discover how to clean and organise your data. After transforming raw data into a structured dataset, you will be ready to perform data analysis.

Application forms https://www.cdh.cam.ac.uk/file/cdhdelvingintomassivedaapplicationdocx should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Tuesday 6 October 2020. Successful applicants will be notified by Thursday 8 October 2020.

Massive digital archives such as the Internet Archive offer researchers tantalising possibilities for the recovery of lost, forgotten and neglected literary texts. Yet the reality can be very frustrating due to limitations in the design of the archives and the tools available for exploring them. This programme supports researchers in understanding the issues they are likely to encounter and developing practical methods for delving into massive digital archives.

Digital Data Collection and Wrangling new Tue 14 Jan 2020   11:30 Finished

This session addresses the technical and ethical aspects of digital data collection and wrangling – two fundamental stages in the lifecycle of a digital research project. Participants will be introduced to online data sources and practices of internet-mediated data collection, including retrieving data from social media platforms. As data collected from online sources is often dirty and messy, we will also provide a short practical introduction to the process of transforming raw data into a clean and structured dataset using free and open-source software.

Digital Data Collection (Workshop) new Mon 28 Jan 2019   14:00 Finished

This session is a primer on digital data collection. The goal is to become familiar with online data sources and practices of internet-mediated data collection, including retrieving data from social media platforms.

The shelf-life of your dataset dictates the longevity of your findings. Sharing your data and assuring its integrity is a fundamental part of a digital research project. In this session we will discuss the principles of open data, channels for data dissemination and the fundamentals of data preservation.

Digital Mapping for Historians new Wed 26 Jun 2019   09:30 Finished

This intensive workshop will provide an overview of a range of applications of digital mapping in historical research projects and introduce GIS tools and software.

Digital Research Design and Data Ethics new Tue 24 Nov 2020   10:00 Finished

This CDHBasics session explores the lifecycle of a digital research project across the stages of design;

  • data capture
  • transformation
  • analysis
  • presentation and preservation

it also introduces tactics for embedding ethical research principles and practices at each stage of the research process.

Digital Research Design, Methods and Ethics (Workshop) new Mon 21 Jan 2019   14:00 Finished

Find out how to shape a digital research project from scratch. This session will introduce the building blocks of online research design, from the several methodologies available to conduct the research to the ethical guidelines that should underpin our projects.

Doing Qualitative Research Online new Mon 1 Feb 2021   14:00 Finished

What happens to practices of qualitative research when interactions between researcher and research subject are largely mediated? From observations of users’ interactions on social media platforms, to interviews conducted through WhatsApp or Skype, digital communications offer both opportunities and challenges for qualitative research in a wide range of disciplines across the Social Sciences and Humanities. This methods workshop will explore a wide range of topics including:

  • Establishing trust and credibility
  • Engaging with digital gatekeepers
  • Navigating blurred boundaries between ‘private’ and ‘public’
  • Re-conceptualising ‘researchers’, the ‘research field’ and ‘ research subjects’
  • Identity, anonymity and visibility - implications for research practice
  • Mitigation strategies: from data parsimony to creative obfuscation
  • Self-care for researchers in online research
  • Embedding ethical research practice across the project lifecycle

The workshop will take place over two sessions, an introductory seminar and discussion led by Dr Anne Alexander on 1 February, after which participants will be asked to complete a short reflective piece of work assessing their own research design and identifying areas where they feel they need further help and advice. The second session on 8 February will be participant led including small group and plenary discussions exploring strategies for dealing with challenges identified by participants.

Participants should set aside around 1 hour between the two sessions to complete and submit their self-assessment.

Participants are strongly encouraged to attend the CDH Basics session Privacy, information security and consent: a guide for researchers with Dr Anne Alexander on 26 January in advance of the Methods Workshop.

We are currently reformatting our Learning programme for remote teaching; this will require some rescheduling so bookings will reopen and new sessions will be created for online courses as soon as possible. In the interim we would encourage you to register your interest so as to be notified of the new schedule. Please be aware that we hope to run many of our courses online, but that this is dependent on staff availability and resources so please be aware we may have to postpone or cancel some sessions

This workshop will develop your coding practice from testing ideas to creating an efficient workflow for your code, data and analysis. If you are using Jupyter Notebooks (but even if you’re not) this workshop will demonstrate how to better manage your code using good programming practices, and package your code into a program that is easier and quicker to run for lots of data and more reliable.

Required preparation (instructions provided): Python 3 installed on laptop; a text editor or IDE installed on laptop; git installed on laptop and signed up for GitHub; a short internet-based exercise in working with the command line.

Dr Nathan Crilly and Chih-Chun Chen explore the challenges of communicating complex ideas to diverse audiences through a variety of digital media formats. Three case studies will be reported from an EPSRC-funded research project which sought to clarify and communicate the nature of complex system design and its relationship to emerging technologies. For example, the project studied the way in which technologists working in Synthetic Biology and Swarm Robotics conceptualise and address the complexity of the systems they are designing. Outputs from the project include: • A 35-page ‘primer’ on the subject of complexity (now with over 6000 downloads) • A three-minute animated movie discussing the subjectivity of complexity (now with 2500 views) • An interactive website (implemented by Dr Chen since she has programming skills) that generates annotated bibliographies for complexity resources tailored to a user’s interests (launched in March 2019) Dr Crilly and Dr Chih-Chun will discuss the process of engaging with media partners, including working with science communication agencies, animators and film-makers, reflect on what they learned from the process and what they would do differently in future.

Film-making for Beginners new Sat 1 Dec 2018   09:30 Finished

Learn to think visually and communicate using sound and film: participants will be introduced to the language of film, shot types, camera movements, framing, basic rules of camera use, how to tell a story, and editing in the Phoenix Training Suite.

Film-making for Beginners (Level 2) new Mon 24 Jun 2019   09:30 Finished

Learn to think visually and to communicate using sound and film. Participants will be introduced to the language of film, shot types, camera movements, framing, basic rules of camera use, how to tell a story, and editing. Some prior knowledge of filming is required. Please see the CDH website for more details (www.cdh.cam.ac.uk).

First steps in coding with Jupyter Notebooks new Tue 9 Feb 2021   10:00 Finished

This CDH Basics session is aimed at researchers who have never done any coding before. We will explore basic principles and approaches to writing and adapting code, using the popular programming language Python as a case study. Participants will also gain familiarity with using Jupyter Notebooks, an open-source web application which allows users to create and share documents containing live code alongside visualisations and narrative text.

From Blog to Book new Thu 10 Oct 2019   14:00 Finished

Blogging as a digital means of research communication seems so simple: with free, easy-to-use platforms we’re all just a few clicks away from setting one up. But having set a blog up, the difficult work begins. Who are you talking to? What are you trying to achieve? How will you generate your content? How will the people you want to talk to find it? How are you going to keep it going alongside your research and teaching commitments? Will it make any difference to anything? And will you ever be able to transform any of this work into a scholarly publication that ‘counts’?

This session will be an interactive conversation between Julie Blake, Cambridge Digital Humanities Methods Fellow and Connie Ruzich, University Professor of English at Robert Morris University, Pittsburgh, USA. Connie’s Behind Their Lines blog started in 2014 during a Fulbright Scholarship at Exeter University to research First World War poetry in the context of the Centenary Commemorations. She became interested in the lost and neglected poetry of the First World War and began blogging about her ‘finds’. Five years later, she has had almost 400,000 visits to her blog, she maintains a lively dialogue with public and academic audiences including via Twitter and she is in the final stages of completing a monograph about this material with Bloomsbury Academic.

We’ll discuss the highs and lows of Connie’s research blogging experience, the surprises, the pitfalls and the lessons learned by hard won experience. We’ll try to answer all the questions listed above, and participants will be invited to join in with their own questions.

Emma Reay is a third-year PhD researcher at the University of Cambridge and an associate lecturer at Anglia Ruskin University. Her current project explores depictions of children in videogames, and her research interests include representation studies, children's digital media, gaming and education, and playful activism.

Adam Dixon is a game designer and writer who makes both physical and digital games. He has worked on everything from big public games that involve running around cities to narrative video games about learning scientific skills. Much of his work has involved working with museums and research organisations such as the Wellcome Trust, Science Museum, Nottingham Trent University and the V&A. This has included designing games, using play for public research engagement and most recently, teaching teenagers to create digital games for Wellcome Collection’s Play Well exhibition. Outside of that he works and releases his own games including roleplaying games, LARPs and interactive fiction.

Applications https://www.cdh.cam.ac.uk/file/cdhgamedesign201920applicationdocx-0 should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Wednesday 10 June 2020. Successful applicants will be notified by 15 June 2020.

This online course will introduce participants to the practice of game design. It will explore the different ways that digital and analogue games are designed, particularly how you can design with intent to communicate a mood, theme or message. Participants will learn game design skills - such as boxing-in, design documents and prototyping – alongside opportunities to test them out by creating their own short games. Examples will focus on game design in research-related contexts, including using games as part of your research process and to communicate research outcomes to diverse audiences.

The sessions focus on game design, how to shape mechanics and play experiences, so no technical skills are needed. Participants will create their short games using both non-digital tools and simple, free software that will be taught in the sessions.

Topics covered:

  • Game design basics
  • A chance to play and consider thoughtful games
  • Boxing in
  • Planning games
  • Making games
  • Bitsy and Twine
  • Playtesting and iteration

Format

The course will be delivered online, with live teaching sessions taking place on Zoom.

  • Weds 17 June, 4pm BST: Introduction (45 minutes)
  • Weds 24 June, 4pm BST: Game play feedback (45 minutes)
  • Weds 1 July, 4pm BST: Game design seminar (45 minutes)
  • Weds 15 July, 4pm BST: Final session (60 minutes with break)

A CRASSH blog post was created for the originally scheduled session which may be of interest to read and can be found here: http://www.crassh.cam.ac.uk/blog/post/Play-as-Research-Practice

Game Design Workshop [cancelled - industrial action] new Mon 2 Dec 2019   09:30 CANCELLED

This two-day intensive workshop will introduce participants to the practice of game design. It will explore the different ways that digital and analogue games are designed, particularly how you can design with intent to communicate a mood, theme or message. Participants will learn game design skills - such as boxing-in, design documents and prototyping – alongside opportunities to test them out by creating their own short games.

The sessions focus on game design, how to shape mechanics and play experiences, so no technical skills are needed. Participants will create their short games using both non-digital tools and simple, free software that will be taught in the session.

The course participants will be selected via an application process, once a provisional place is booked a call for application form will be issued for completion and return by 1 November 2019. Once the applications are reviewed, places will be confirmed directly in the week beginning 18 November 2019.

Generative Adversarial Networks Experimentation Lab new Tue 11 Dec 2018   11:30 Finished

This workshop will discuss prospective methods and approaches for critically engaging with the images of people created through Generative Adversarial Networks, using design experiments as provocations to expand debate about notions of ‘realism’ and ‘authenticity’ in an era where human and machine vision are ever more systematically intertwined.

Ghost fictions (Guided project) new Mon 26 Oct 2020   14:00 Finished

'Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Tuesday 13 October 2020. Successful applicants will be notified by 15 October 2020.

This CDH Guided Project series which also includes a Methods Workshop will explore the generation of ‘synthetic’ texts using neural networks.

The release of OpenAI’s GPT-2 and GPT-3 language models in 2019 and 2020 has shown that predictive algorithms trained on very large general datasets can generate ‘synthetic’ texts, perform machine translation tasks, rudimentary reading comprehension, question answering and summarisation automatically without needing large amounts of task-specific training. These ‘ghostwritten’ texts have provoked wide attention in the media.

Researchers have experimented with prompting GPT-3 to write short stories, answer philosophical questions and apparently propose potential medical treatments -although GPT-3 had some difficulty with the question “how many eyes does a horse have?”. The Guardian ‘commissioned’ op-ed from GPT-3.

Through interactive hands-on sessions and demonstrations we will explore synthetic text production and look at how ideas about the distinction between ‘fact’, ‘fiction’ and ‘non-fiction’ are shaping the reception of this emerging technology. Our aim is to stimulate deeper critical engagement with machine learning by humanities researchers and to encourage more public debate about the role of AI in culture and society.

We invite applications from early career researchers and others at the University of Cambridge to join a small project team for four online sessions during the Guided Project phase in Oct-November. Participants will need to commit to joining the live sessions and to set aside at least 3-4 hours work on a small-scale individual project during the course. We are interested in assembling an interdisciplinary group of researchers drawing on insights from across humanities, social science and technology disciplines .Prior knowledge of programming, computer science or Machine Learning is not required.

Humanities Data: a basic introduction new Tue 13 Oct 2020   10:00 Finished

This CDHBasics session will explain what data is, and what ‘humanities data’ looks like (via a behind-the-scenes tour of the Digital Library). This session covers good practice around file formats, version control and the principles of data curation for individual researchers.

Interaction with Machine Learning new Mon 1 Feb 2021   10:00 Finished

Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Thursday 7 January 2021. We will review applications on a rolling basis and applicants will be notified at the latest by the end of Monday 11 January.

This CDH Guided Project aims to provide humanities, arts and social science researchers with an overview of current theory and practice in the design of human-computer interaction in the age of AI and equip the participants with analytical tools necessary for a critical investigation of contemporary design with AI/ML. Looking closely at interactions between humans and emerging AI systems, the workshop will also explore the potential for interaction between humanities scholars and computer scientists in the process of development and assessment of new solutions.

Lectures and practical research design sessions in Interaction with Machine Learning taught by Professor Alan Blackwell and Advait Sarkar (Microsoft Research) as part of an optional course for Part III and MPhil Computer Science students will form the anchoring element of the Project. These will allow researchers without a Computer Science background to explore how key challenges in AI design are being addressed within the field of interaction design, as well as identify areas in which humanities methodologies and approaches could be adopted to improve the production process, by making it more fair, critical, and socially-aware.

Participants will also take part in three workshops specifically tailored to humanities and social science researchers and will be supported in developing a mini research project investigating how humans interact with systems based on computational models. The projects may include:

  • probing an already existing dataset, system, or user interface from a critical perspective
  • developing an idea for new interaction design based on critical applications of ML/AI.

Please note: no prior practical experience or knowledge of programming is required to take part in the Project, however some awareness of how AI systems work will be beneficial.

Minimum time commitment:

  • 8 weekly online lectures led by Professor Alan Blackwell (Computer Science and Technology) and Advait Sarkar (Microsoft Research). Weekly from 26 January, 2-4pm (with the last hour as an optional session for Guided Project participants).
  • 3 x 1.5 hour specialist workshops for humanities and social science participants led by Tomasz Hollanek and Anne Alexander (CDH)
  • 1.5 hour project showcase and final discussion

Participants are encouraged to set aside additional time to work on their projects between sessions. A Moodle email forum and drop-in ‘clinic’ style support sessions will be available during the Guided Project.

Lecture topics and dates

  • Current research themes in intelligent user interfaces (26 January, 2pm)
  • Program synthesis (2 February, 2pm)
  • Mixed initiative interaction (9 February, 2pm)
  • Interpretability / explainable AI (16 February, 2pm)
  • Labelling as a fundamental problem (23 February, 2pm)
  • Machine learning risks and bias (2 March, 2pm)
  • Visualisation and visual analytics (9 March, 2pm)
  • Research presentations by Computer Science Students (16 March, 2pm)

Workshop themes

  • AI critique, humanities methodologies and user interface design (1 February, 10-11.30am)
  • Recommender systems (1 March 10-11.30am)
  • Machine vision (8 March 10-11.30am)
  • Project presentations and discussion (15 March 10-11.30am)

Objectives By the end of the course participants should:

  • be familiar with current state of the art in intelligent interactive systems
  • understand the human factors that are most critical in the design of such systems
  • be able to evaluate evidence for and against the utility of novel systems
  • be able to apply critical methodologies to current interaction design practices
  • understand the interplay between ML/AI research and humanities approaches

We are currently reformatting our Learning programme for remote teaching; this will require some rescheduling so bookings will reopen and new sessions will be created for online courses as soon as possible. In the interim we would encourage you to register your interest so as to be notified of the new schedule. Please be aware that we hope to run many of our courses online, but that this is dependent on staff availability and resources so please be aware we may have to postpone or cancel some sessions

This session focusses on providing photography skills for those undertaking archival research. Dr Oliver Dunn has experience spanning more than 10 years digitising written and printed historical sources for major university research projects in the humanities and social sciences. The focus is very much on low-tech approaches and small budgets. We’ll consider best uses of smartphones, digital cameras and tripods.

Introduction to Text-Mining with Python 1 new Tue 30 Apr 2019   11:00 Finished

This session will introduce basic methods for reading and processing text files in Python. We will walk through an example that reads in a large text corpus, splits it into tokens (words) and sentences, removes unwanted words (stopwords), counts the words (frequency analysis), and visualises results. We will talk about the 5 steps of text mining and what resources to use when learning text mining for your research in your own time. No prior knowledge of Python is required, and no installations will be needed. We will use web services available in your browser to follow along.

Introduction to Text-Mining with Python 2 new Tue 7 May 2019   11:00 Finished

This session will introduce topic modelling. Topic modelling is looking for clusters of words that summarise the meaning of documents. We will talk about how to choose what sort of text mining you might want for your research. Some knowledge of Python is required, as gained from 'Introduction to Text-Mining with Python 1', or equivalent. No installations will be needed; we will use web services available in your browser to follow along with the examples.

This online session will introduce basic methods for reading and processing text files in Python with Jupyter Notebooks. We'll discuss why you might wish to do text-mining, and whether coding with Python is the right choice for you. We'll run through the 5 steps of text-mining, and start to walk through an example that reads in a text corpus, splits it into words and sentences (tokens), removes unwanted words (stopwords), counts the tokens (frequency analysis), and visualises results.

This initial session is one hour long and will be delivered remotely by video conferencing. During the session we will cover the essentials of working with the Jupyter Notebooks provided so that you can carry on working through the materials in your own time. The first session will be followed by a second, optional Q&A session for troubleshooting issues and recapping essentials.

Required preparation: A short internet-based exercise in working with variables and text in Python will be sent out one week prior to the session. You will also get instructions on how to find the materials we will be using and how to log onto the video conferencing platform. Please make sure you have some time to prepare properly so that we can concentrate on teaching during the remote session.

We are currently reformatting our Learning programme for remote teaching; this will require some rescheduling so bookings will reopen and new sessions will be created for online courses as soon as possible. In the interim we would encourage you to register your interest so as to be notified of the new schedule. Please be aware that we hope to run many of our courses online, but that this is dependent on staff availability and resources so please be aware we may have to postpone or cancel some sessions

This public workshop will mark the end of the 2020 programme of Machine Reading the Archive, a digital methods development programme organised by Cambridge Digital Humanities with the support of the Researcher Development Fund.

It will showcase the digital archive projects created by our cohort of project participants as well as invited contributions from leading experts in the field.

Mapping the Past [remote delivery] new Fri 22 May 2020   11:00 Finished

This intensive workshop is split into two online chats and two 1-hour sessions. Participants will first learn to collect and process geospatial data from historical sources and process it using geographical information systems from Google Earth to QGIS.

The first online session introduces research techniques for collecting, arranging and mapping geospatial data from historical sources, and is taught by Dr Oliver Dunn. His session is split into two parts: Part A will introduce both online sessions by showing some of our own research that makes use of Google Earth, 3D Maps in Excel, and historical GIS. In Part B you will be asked to locate a set of Scotland’s historical lighthouses on historical maps online and map their location and other attributes in Google earth and 3D Maps.

The second online session introduces students to mapping humanities data using Q-GIS which is a free GIS (Geographical Information System) software platform. Course participants will need to download and install QGIS on their laptops before 5th of June. On the 1st of June there will be further details concerning downloading QGIS, a chat forum where we can discuss why you might wish to use GIS, and whether GIS is the right choice for you, and a release of course teaching materials. On 5 June you will be taken through the map creation process step-by-step. This session will be taught by Max Satchell.

Methods Fellow Workshop: Audible knowledge: soundscapes, podcasts and digital audio scholarship

Dr Peter McMurray (CDH Methods Fellow)

With the rise of web-based scholarship and affordable digital audio equipment, artists and researchers are increasingly turning to audio formats as way to share their work with a larger audience and to cultivate new forms of knowledge rooted in listening. This workshop will offer an introduction to digital audio recording and editing (using Reaper, a digital audio workstation which can be downloaded/used for free on an extended trial basis). We will focus particularly on the editing choices for soundscape composition and podcasting, and participants will have the opportunity to produce a short audio piece over the course of the workshop.

Applications for this workshop have now closed.

As religious services and communities have shifted online so too have scholars of religion. But at what cost? These sessions raise some of the epistemological and ethical issues of doing fieldwork in a digital environment from an inclusive anthropological perspective with a close-up on a particular case study in each session.

The first session considers conducting virtual ethnography, what is gained and what is lost, with a focus on ethnography with Orthodox Jewish populations; the second session assesses digital surveys of religious communities and their attitudes e.g. what the 'bean-counters' might miss (and strategies not to) and finally in the third session we problematize the ethical tensions in online studies of community media with a particular focus on French Muslim media, already heavily surveilled.

The sessions are intended to develop researcher knowledge and explore cross-cutting issues that concern a broad spectrum of humanities and social science-based scholarship serving as;

  • a forum for the critical discussion of digital methods and epistemologies,
  • a place to learn more about specific case studies particularly in the UK and France, and
  • an assembly of early research minds in the throes of a related or relevant project themselves who wish to share and learn from one another

Applications for this workshop have now closed.

Corpus linguistic approach to language is based on collections of electronic texts. It uses software to search and quantify various linguistic phenomena that make up patterns, which it then compares within and across texts based on their frequency. Corpus stylistics applies tools and methods from corpus linguistics to stylistic research. Corpus stylistics mainly focuses on literary texts, individual or corpora. Corpora are here, usually, principled collections of texts, for example a collection of texts by one author, or texts from a specific period. It focuses both on more general patterns and meanings that are observable across corpora and patterns and meanings in one individual text. In terms of quantitative approaches that corpus stylistics employs, it is in many ways similar to work that is referred to as ‘distant reading’ and also ‘cultural analytics’. These approaches emphasise the gains that we get from looking at texts from “distance”, i.e., in large quantities. For corpus stylistics, it is the relationship between quantitative and qualitative that is central. Therefore, research in corpus stylistics often deals with much smaller “cleaner” data sets, so that the qualitative step in the analysis is more manageable.

This workshop aims to introduce the basic corpus linguistic techniques and methods for working with literary and other texts. It aims:

  • To provide an introduction to corpus linguistics in relation to digital humanities approaches;
  • To develop critical understanding of how data representativeness used in quantitative research may influence results;
  • To critically examine the relationship between quantitative and qualitative textual analyses;
  • To provide a practical toolkit for computational textual analysis.

The aim of this course is to support students, researchers, and professionals interested in exploring the changing nature of the English vocabulary in historical texts at scale, and to reflect critically on the limitations of these computational analyses. We will focus on computational methods for representing word meaning and word meaning change from large-scale historical text corpora. The corpus used will consist of Darwin’s letters from the (Darwin Project https://www.darwinproject.ac.uk/) at Cambridge University Library. All code will be in online Python notebooks.

If you are interested in attending this course, please fill in the application form

Methods Workshop: Best Practices in Coding for Digital Humanities

Mary Chester-Kadwell (CDH Methods Fellow)

Please note this workshop has limited spaces and an application process in place. Application forms should be completed by Tuesday, 11 May 2021. Successful applicants will be notified by end-of-day Wednesday, 12 May 2021.

This course introduces best practices and techniques to help you better manage your code and data, and develop your project into a usable, sustainable, and reproducible workflow for research.

Developing your coding practice is an ongoing process throughout your career. This intermediate course is aimed at students and staff who use coding in research, or plan on starting such a project soon. We present an introduction to a range of best practices and techniques to help you better manage your code and data, and develop your project into a usable, sustainable, and reproducible workflow. All the examples and exercises will be in Python.

If you are interested in attending this course, please fill in the application form. Places will be prioritised for students and staff in the schools of Arts & Humanities, Humanities & Social Sciences, libraries and museums. If you study or work in a STEM department and use humanities or social sciences approaches you are also welcome to apply.

If you are interested in attending this course, please fill in the application form.

Text-mining is extracting information from unstructured text, such as books, newspapers, and manuscript transcriptions. This foundational course is aimed at students and staff who are new to text-mining, and presents a basic introduction to text-mining principles and methods, with coding examples and exercises in Python. To discuss the process, we will walk through a simple example of collecting, cleaning and analysing a text.

If you are interested in attending this course, please fill in, and return, the application form by Monday, 22 February 2021. Places will be prioritised for students and staff in the schools of Arts & Humanities, Humanities & Social Sciences, libraries and museums. If you study or work in a STEM department and use humanities or social sciences approaches you are also welcome to apply.

Methods Workshop: TEI workshop new Mon 18 Jan 2021   10:00 Finished

The TEI (Text Encoding Initiative https://tei-c.org/) is a standard for the transcription and description of text bearing objects, and is very widely used in the digital humanities – from digital editions and manuscript catalogues to text mining and linguistic analysis. This course will take you through the basics of the TEI – what it is and what it can be used for – with a particular focus on uses in research, paths to publication (both web and print) and the use of TEI documents as a dataset for analysis. There will be a chance to create some TEI yourself as well as looking at existing projects and examples. The course will take place over two sessions a week apart – with an introductory taught session, then a chance to work on TEI records yourself, followed by a review and discussion session.

Network Analysis for Humanities Scholars new Mon 27 Jan 2020   12:30 Finished

This workshop is a very basic introduction to network analysis for humanities scholars. It will introduce the concepts of networks, nodes, edges, directed and weighted networks, bi- and multi-partite networks. It will give an overview of the kinds of things that can be thought about through a network framework, as well as some things that can’t. And it will introduce key theories, including weak ties, and small worlds. There will be an activity where participants will build their own test data set that they can then visualise. In the second half of the workshop we will cover some networks metrics including various centrality measures, clustering coefficient, community detection algorithms. It will include an activity introducing one basic web-based tool that allows you to run some of these algorithms and will provide suggestions for routes forward with other tools and coding libraries that allows quantitative analysis.

Attendees should bring their own laptops.

Ruth Ahnert is Professor of Literary History & Digital Humanities at Queen Mary University of London, and is currently leading two large AHRC-funded projects: Living with Machines, and Networking Archives. She is author of The Rise of Prison Literature in the Sixteenth Century (2013), and co-author of Tudor Networks of Power, and The Network Turn (both forthcoming).

We are running a focus group to try out Gale Digital Scholar Lab, an online platform of Digital Humanities tools for organising and analysing the historical texts in their archive. Gale representatives will demo the capabilities of the Lab and give you a practical opportunity to build your own corpus and do some analysis and visualisation (without writing a line of code). You will have a chance to feedback your opinions and research needs, and discuss broader issues of how these sorts of tools might fit in with your Digital Humanities research, and the role of private sector providers in the provision of tools and resources to researchers.

Gale Digital Scholar Lab will be available to participants in advance of the focus group. A link will be sent to participants by email. Refreshments and light lunch will be provided. Please bring your own laptop.

Optical Character Recognition is a term used to describe techniques for converting images containing printed or handwritten text into a format that can be searched and analysed computationally. This workshop will introduce several such tools along with some practical techniques for using them, and will also highlight OCR and related services offered by the Digital Content Unit at the Cambridge University Library.

Podcasting: An Introduction new Fri 12 Oct 2018   11:00 Finished

An introduction to audio recording and editing aimed at students and staff interested in learning how podcasting can help disseminate research.

This CDH Basics session will see discussion on how to assess the impact of relevant legal frameworks, including data protection, intellectual property and media law, on your digital research project and consider what approach researchers should take to the terms of service of third-party digital platforms. We will explore the challenge of informed consent in a highly-networked world and look at a range of strategies for dealing with this problem.

Qualitative Research in Online Environments new Tue 21 Jan 2020   11:30 Finished

What happens to the practice of qualitative research when interactions between researcher and research subject are largely mediated. This session will explore a wide range of topics including the challenge of consent, researcher presence and ‘lurking’ in mediated settings, how to engage with digital gatekeepers, information security for researchers, and understanding the impact of digital platform architecture on qualitative research design.

Re:search new Tue 10 Nov 2020   10:00 Finished

This CDHBasics session looks at how searching and finding technologies structure scholarship. It also covers

  • an introduction to search engines, both for web search and custom search functions within collections;
  • discussion about OCR errors and blindspots in digital search in historical collections
  • problems of fragmentation of the source text, and the legacy of pre-digital formats such as microfilm.
Social Network Analysis (SNA) new Thu 18 Feb 2021   11:00 Finished

Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk).

Social Network Analysis (SNA) is an exciting and rapidly growing methodology. You will find researchers in almost every faculty at the University of Cambridge applying SNA methods within their research. However, SNA researchers can only go so far before they must learn a coding language. Many SNA tools- descriptive metrics, visualisation techniques, and mathematical models- require researchers to use R. This session is for those researchers interested in SNA methods, but lack experience in the R environment.

While network visualisation is just one component of SNA, data visualisation can be a great gateway into a new programming language. This session will introduce you to the R environment by leading you through the creation of static network diagrams. The session is directed at beginners and basic R users that want to explore SNA tools in R.

Social Network Analysis with Digital Data new Tue 4 Feb 2020   11:00 Finished

This course will provide a hands-on introduction to the field of Social Network Analysis, giving participants the opportunity to “learn by doing” the process of network data collection and analysis. After being introduced to the basic concepts, the participants will have the opportunity to explore all stages of a social network analysis project, including research design, essential measures, data collection and data analysis. The focus will be on the retrieval of electronic archival data (e.g. websites, digital archives and social media platforms) for non-programmers and on the production of network analysis with specialised software (e.g. Gephi). At the end, the participants will be equipped with the basic tools to perform meaningful visualisations and analyses of network data.

Sorting things out - why metadata matters new Tue 27 Oct 2020   10:00 Finished

This CDHBasics session focuses on the importance of metadata (‘data about data’), examining the crucial role played by classification systems and standards in shaping how scholars interact with historical and cultural records.

Sources to Data new Wed 3 Jun 2020   11:00 CANCELLED

We are currently reformatting our Learning programme for remote teaching; this will require some rescheduling so bookings will reopen and new sessions will be created for online courses as soon as possible. In the interim we would encourage you to register your interest so as to be notified of the new schedule. Please be aware that we hope to run many of our courses online, but that this is dependent on staff availability and resources so please be aware we may have to postpone or cancel some sessions

Archives typically hold records containing enormous quantities of data presented in a variety of scribal and print formats. Extracting this information has traditionally involved long hours of expensive manual data-entry work. Nowadays this work can be automated to a large degree and could soon open archives and allow for unprecedentedly large structured data sets for curators, researchers, and the public alike. This workshop will examine new methods for collecting historical data from manuscript and printed documents. We will look at archival photography, OCR, page structure recognition, and new handwritten text recognition systems. Cutting-edge Cambridge research in this field will be demonstrated.

Sources to Data (Workshop) Wed 5 Jun 2019   11:00 Finished

This workshop will examine database creation from historical documents. Extracting data from these can be hard work and involves quite unusual skill combinations. You may need to digitise and transcribe from primary sources, and then design and build a database from scratch with the information. Other sources you use could already be digitised but may be arranged or filed in an unsuitable way for your project and therefore need conversion. We will look at techniques used when employing crumbling manuscripts, printed documents, books, or text searchable images, to harvest historical data. Techniques include manual data-entry, scanning and OCR, and handwritten text recognition systems.

Letters have been for centuries the main form of communication between scientists. Correspondence collections are a unique window into the social networks of prominent historical figures. What can digital social sciences and humanities reveal about the correspondence networks of 19th century scientists? This two-session intensive workshop will give participants the opportunity to explore possible answers to this question.

With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which we propose to study through social network analysis (SNA). The workshop will be divided in two sessions during which participants will “learn by doing” how to apply SNA to personal correspondence datasets. Following a guided project framework, participants will work on the correspondence collections of John Herschel and Charles Darwin. After a contextual introduction to the datasets, the sessions will focus on the basic concepts of SNA, data transformation and preparation, data visualisation and data analysis, with particular emphasis on “ego network” measures.

The two demonstration datasets used during the workshop will be provided by the Epsilon project, a research consortium between Cambridge Digital Library, The Royal Institution and The Royal Society of London aimed at building a collaborative digital framework for 19th century letters of science. The first dataset, the “Calendar of the Correspondence of Sir John Hershel Database at the Adler Planetarium”, is a collection of the personal correspondence of John Frederick William Herschel (1792-1871), a polymath celebrated for his contributions to the field of astronomy. Its curation process started in the 50s at the Royal Society and currently comprises 14.815 digitised letters encoded in extensible markup language (.xml) format. The second dataset, the “Darwin Correspondence Project” has been locating, researching, editing and publishing Charles Darwin’s letters since 1974. In addition to a 30-volume print edition, the project has also made letters available in .xml format.

The workshop will provide a step-by-step guide to analysing correspondence networks from these collections, which will cover:

- Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines; - Preparation and transformation of .xml files for analysis with an open source data wrangler; - Rendering of network visualisations using an open source SNA tool; - Analysis of the Ego Networks of John Herschel and Charles Darwin (requires UCINET)

About the speakers and course facilitators:

Anne Alexander is Director of Learning at Cambridge Digital Humanities

Hugo Leal is Methods Fellow at Cambridge Digital Humanities and Co-ordinator of the Cambridge Data School

Louisiane Ferlier is Digital Resources Manager at the Centre for the History of Science at the Royal Society. In her current role she facilitates research collaborations with the Royal Society collections, curates digital and physical exhibitions, as well as augmenting its portfolio of digital assets. A historian of ideas by training, her research investigates the material and intellectual circulation of ideas in the 17th and 18th centuries.

Elizabeth Smith is the Associate Editor for Digital Development at the Darwin Correspondence Project, where she contributed to the conversion of the Project’s work into TEI several years ago, and has since been collaborating with the technical director in enhancing the Darwin Project’s data. She is one of the co-ordinators of Epsilon, a TEI-based portal for nineteenth-century science letters.

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the workshop. Some exercises and preparation for the second session will be set during the first and participants should allow 2-3 hours for this. Please note, priority will be given to staff and students at the University of Cambridge for booking onto this workshop.

CDH Learning gratefully acknowledges the support of the Isaac Newton Trust and the Faculty of History for this workshop.

The Library as Data new Mon 15 Oct 2018   13:30 Finished

Discover the rich digital collections of Cambridge University Library and explore the methods and tools that researchers are using to analyse and visualise data.

The Library as Data: An overview new Wed 16 Oct 2019   11:00 Finished

Is the "digital library" more than a virtual rendering of the bookshelf or filing cabinet? Does the transformation of books into bytes and manuscripts into pixels change the way we create and share knowledge? This session introduces a conceptual toolkit for understanding the library collection in the digital age, and provides a guide to key methods for accessing, transforming and analysing the contents as data. Using the rich collections of Cambridge University Library as a starting point, we will explore:

  • Relations between digital and material texts and artefacts
  • Definitions of data and metadata
  • Methods for accessing data in bulk from digital collections
  • Understanding file formats and standards

The session will also provide an overview of the content in the rest of the term’s Library as Data programme, and introduce our annual call for applications to the Machine Reading the Archive Projects mentoring scheme.

The Library as Data: Digital Text Markup and TEI new Wed 23 Oct 2019   11:00 Finished

Text encoding, or the addition of semantic meaning to text, is a core activity in digital humanities, covering everything from linguistic analysis of novels to quantitative research on manuscript collections. In this session we will take a look at the fundamentals of text encoding – why we might want to do it, and why we need to think carefully about our approaches. We will also introduce the TEI (Text Encoding Initiative), the most commonly used standard for markup in the digital humanities, and look at some common research applications through examples.

Recent advances in machine learning are allowing computer vision and humanities researchers to develop new tools and methods for exploring digital image collections. Neural network models are now able to match, differentiate and classify images at scale in ways which would have been impossible a few years ago. This session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories, and demonstrates a range of different machine learning- based methods for exploring digital image collections. We will also discuss some of the ethical challenges of applying computer vision algorithms to cultural and historical image collections. Topics covered will include:

  • Unlocking image collections with the IIIF image data framework
  • Machine Learning: a very short introduction
  • Working with images at scale: ethical and methodological challenges
  • Applying computer vision methods to digital collections

Correspondence collections are a unique window into the social networks of prominent historical figures. With the digitisation and encoding of personal letters, researchers have at their disposal a wealth of relational data, which can be studied using social network analysis.

This session will introduce and demonstrate foundational concepts, methods and tools in social network analysis using datasets prepared from the Darwin Correspondence collection. Topics covered will include

  • Explanation of the encoding procedures and rationale following the Text Encoding Initiative guidelines
  • Preparation and transformation of .xml files for analysis with an open source data wrangler
  • Rendering of network visualisations using an open source SNA tool

No knowledge of prior knowledge of programming is required, instructions on software to install will be sent out before the session

The Transkribus Guided Project new Wed 29 Jul 2020   16:00 Finished

We introduce the Transkribus software system that can be taught to read handwriting from images of documents and rapidly convert it into useful digital formats. This guided course provides basic training by practical immersion in this software, which requires only basic IT skills. Transkribus was developed by READ under the Horizon 2020 funding framework and is now a co-operative. It had 20,000+ users in 2019, and is becoming a standard research tool for mass transcription of archival sources. Participants will transcribe anonymised data from pre-loaded scans of forms filled out for the French national census of 1999 in Transkribus's downloadable software interface. These manual transcriptions will help train a handwritten text recognition (HTR) model to automatically transcribe many more of these forms later. In fact, the model will eventually allow the creation of one of the largest data sets ever attempted from manuscript sources. This course is a collaboration with Transkribus and Cambridge Digital Humanities. It is funded by a Cambridge Humanities Research Grant.

Image big data are increasingly being used to understand the built and natural environment and to observe behaviours within it. Data sources include satellite and airborne imagery, 360 street views, and fixed video or time lapse traffic and CCTV cameras. While some of these sources are newer than others what has been changing are the quality of the images, the geographical coverage, and the potential for assessing changes over time. At the same time improvements in machine learning have made it possible to turn images into quantitative data at scale.

In this workshop we will explore the challenges that researchers face when using images at scale to understand environments and behaviours, building on work at Cambridge to estimate cycling levels, using satellite data to estimate motor vehicle volume, and planned data collection in Kenya using 360 cameras.

This CDH Basics session introduces the IIIF image data framework, which has been developed by a consortium of the world’s leading research libraries and image repositories and methods of access to image collections including the collections of Cambridge University Digital Library. We will also discuss a range of methods using IIIF image data in humanities research.

[Back to top]