Cambridge Digital Humanities course timetable
December 2020
Mon 7 |
Computer programmes which predict the likely next words in sentences are a familiar part of everyday life for billions of people who encounter them in auto-complete tools for search engines and the predictive keyboards used by mobile phones and word processing software. These tools rely on “language models” developed by researchers in fields such as natural language processing (NLP) and information retrieval which assign probabilities to words in a sequence based on a specific set of “training data” (in this case a collection of texts where the frequencies of word pairings or three-word phrases have been calculated in advance). Recent developments in machine learning have led to the creation of general language models trained on extremely large datasets which can now produce ‘synthetic’ texts, answer questions, summarise information without the need for lengthy or costly processes of training for each new task. The difficulties in distinguishing the outputs of these language models from texts written by humans has provoked widespread interest in the media. Researchers have experimented with prompting GPT-3, a language model developed by OpenAI to write short stories, answer philosophical questions and apparently propose potential medical treatments -although GPT-3 did have some difficulty with the question “how many eyes does a horse have?”. Meanwhile, The Guardian ‘commissioned’ an op-ed from GPT-3. This Methods Workshop will explore the generation of ‘synthetic’ texts through presentations, discussion and demonstrations of text generation techniques which participants will be encouraged to try out for themselves during the sessions. We will also report back from the Ghost Fictions Guided Project, organised by Cambridge Digital Humanities Learning Programme in October and November this year. The project looks at how ideas about the distinction between ‘fact’, ‘fiction’ and ‘nonfiction’ are shaping the reception of text generation methods and aims to stimulate deeper critical engagement with machine learning by humanities researchers. Prior knowledge of programming, computer science or Machine Learning is not required. In order to try out the text generation techniques demonstrated during the course you will need access to Google Drive (accessible via Raven login for University of Cambridge users). |
Computer programmes which predict the likely next words in sentences are a familiar part of everyday life for billions of people who encounter them in auto-complete tools for search engines and the predictive keyboards used by mobile phones and word processing software. These tools rely on “language models” developed by researchers in fields such as natural language processing (NLP) and information retrieval which assign probabilities to words in a sequence based on a specific set of “training data” (in this case a collection of texts where the frequencies of word pairings or three-word phrases have been calculated in advance). Recent developments in machine learning have led to the creation of general language models trained on extremely large datasets which can now produce ‘synthetic’ texts, answer questions, summarise information without the need for lengthy or costly processes of training for each new task. The difficulties in distinguishing the outputs of these language models from texts written by humans has provoked widespread interest in the media. Researchers have experimented with prompting GPT-3, a language model developed by OpenAI to write short stories, answer philosophical questions and apparently propose potential medical treatments -although GPT-3 did have some difficulty with the question “how many eyes does a horse have?”. Meanwhile, The Guardian ‘commissioned’ an op-ed from GPT-3. This Methods Workshop will explore the generation of ‘synthetic’ texts through presentations, discussion and demonstrations of text generation techniques which participants will be encouraged to try out for themselves during the sessions. We will also report back from the Ghost Fictions Guided Project, organised by Cambridge Digital Humanities Learning Programme in October and November this year. The project looks at how ideas about the distinction between ‘fact’, ‘fiction’ and ‘nonfiction’ are shaping the reception of text generation methods and aims to stimulate deeper critical engagement with machine learning by humanities researchers. Prior knowledge of programming, computer science or Machine Learning is not required. In order to try out the text generation techniques demonstrated during the course you will need access to Google Drive (accessible via Raven login for University of Cambridge users). |
January 2021
Mon 18 |
Methods Workshop: TEI workshop
Finished
The TEI (Text Encoding Initiative https://tei-c.org/) is a standard for the transcription and description of text bearing objects, and is very widely used in the digital humanities – from digital editions and manuscript catalogues to text mining and linguistic analysis. This course will take you through the basics of the TEI – what it is and what it can be used for – with a particular focus on uses in research, paths to publication (both web and print) and the use of TEI documents as a dataset for analysis. There will be a chance to create some TEI yourself as well as looking at existing projects and examples. The course will take place over two sessions a week apart – with an introductory taught session, then a chance to work on TEI records yourself, followed by a review and discussion session. |
Mon 25 |
Methods Workshop: TEI workshop
Finished
The TEI (Text Encoding Initiative https://tei-c.org/) is a standard for the transcription and description of text bearing objects, and is very widely used in the digital humanities – from digital editions and manuscript catalogues to text mining and linguistic analysis. This course will take you through the basics of the TEI – what it is and what it can be used for – with a particular focus on uses in research, paths to publication (both web and print) and the use of TEI documents as a dataset for analysis. There will be a chance to create some TEI yourself as well as looking at existing projects and examples. The course will take place over two sessions a week apart – with an introductory taught session, then a chance to work on TEI records yourself, followed by a review and discussion session. |
Tue 26 |
This CDH Basics session will see discussion on how to assess the impact of relevant legal frameworks, including data protection, intellectual property and media law, on your digital research project and consider what approach researchers should take to the terms of service of third-party digital platforms. We will explore the challenge of informed consent in a highly-networked world and look at a range of strategies for dealing with this problem. |
February 2021
Mon 1 |
Interaction with Machine Learning
Finished
Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Thursday 7 January 2021. We will review applications on a rolling basis and applicants will be notified at the latest by the end of Monday 11 January. This CDH Guided Project aims to provide humanities, arts and social science researchers with an overview of current theory and practice in the design of human-computer interaction in the age of AI and equip the participants with analytical tools necessary for a critical investigation of contemporary design with AI/ML. Looking closely at interactions between humans and emerging AI systems, the workshop will also explore the potential for interaction between humanities scholars and computer scientists in the process of development and assessment of new solutions. Lectures and practical research design sessions in Interaction with Machine Learning taught by Professor Alan Blackwell and Advait Sarkar (Microsoft Research) as part of an optional course for Part III and MPhil Computer Science students will form the anchoring element of the Project. These will allow researchers without a Computer Science background to explore how key challenges in AI design are being addressed within the field of interaction design, as well as identify areas in which humanities methodologies and approaches could be adopted to improve the production process, by making it more fair, critical, and socially-aware. Participants will also take part in three workshops specifically tailored to humanities and social science researchers and will be supported in developing a mini research project investigating how humans interact with systems based on computational models. The projects may include:
Please note: no prior practical experience or knowledge of programming is required to take part in the Project, however some awareness of how AI systems work will be beneficial. Minimum time commitment:
Participants are encouraged to set aside additional time to work on their projects between sessions. A Moodle email forum and drop-in ‘clinic’ style support sessions will be available during the Guided Project. Lecture topics and dates
Workshop themes
Objectives By the end of the course participants should:
|
Doing Qualitative Research Online
Finished
What happens to practices of qualitative research when interactions between researcher and research subject are largely mediated? From observations of users’ interactions on social media platforms, to interviews conducted through WhatsApp or Skype, digital communications offer both opportunities and challenges for qualitative research in a wide range of disciplines across the Social Sciences and Humanities. This methods workshop will explore a wide range of topics including:
The workshop will take place over two sessions, an introductory seminar and discussion led by Dr Anne Alexander on 1 February, after which participants will be asked to complete a short reflective piece of work assessing their own research design and identifying areas where they feel they need further help and advice. The second session on 8 February will be participant led including small group and plenary discussions exploring strategies for dealing with challenges identified by participants. Participants should set aside around 1 hour between the two sessions to complete and submit their self-assessment. Participants are strongly encouraged to attend the CDH Basics session Privacy, information security and consent: a guide for researchers with Dr Anne Alexander on 26 January in advance of the Methods Workshop. |
|
Mon 8 |
Doing Qualitative Research Online
Finished
What happens to practices of qualitative research when interactions between researcher and research subject are largely mediated? From observations of users’ interactions on social media platforms, to interviews conducted through WhatsApp or Skype, digital communications offer both opportunities and challenges for qualitative research in a wide range of disciplines across the Social Sciences and Humanities. This methods workshop will explore a wide range of topics including:
The workshop will take place over two sessions, an introductory seminar and discussion led by Dr Anne Alexander on 1 February, after which participants will be asked to complete a short reflective piece of work assessing their own research design and identifying areas where they feel they need further help and advice. The second session on 8 February will be participant led including small group and plenary discussions exploring strategies for dealing with challenges identified by participants. Participants should set aside around 1 hour between the two sessions to complete and submit their self-assessment. Participants are strongly encouraged to attend the CDH Basics session Privacy, information security and consent: a guide for researchers with Dr Anne Alexander on 26 January in advance of the Methods Workshop. |
Tue 9 |
This CDH Basics session is aimed at researchers who have never done any coding before. We will explore basic principles and approaches to writing and adapting code, using the popular programming language Python as a case study. Participants will also gain familiarity with using Jupyter Notebooks, an open-source web application which allows users to create and share documents containing live code alongside visualisations and narrative text. |
Thu 18 |
Social Network Analysis (SNA)
Finished
Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk). Social Network Analysis (SNA) is an exciting and rapidly growing methodology. You will find researchers in almost every faculty at the University of Cambridge applying SNA methods within their research. However, SNA researchers can only go so far before they must learn a coding language. Many SNA tools- descriptive metrics, visualisation techniques, and mathematical models- require researchers to use R. This session is for those researchers interested in SNA methods, but lack experience in the R environment. While network visualisation is just one component of SNA, data visualisation can be a great gateway into a new programming language. This session will introduce you to the R environment by leading you through the creation of static network diagrams. The session is directed at beginners and basic R users that want to explore SNA tools in R. |
Mon 22 |
Applications for this workshop have now closed. As religious services and communities have shifted online so too have scholars of religion. But at what cost? These sessions raise some of the epistemological and ethical issues of doing fieldwork in a digital environment from an inclusive anthropological perspective with a close-up on a particular case study in each session. The first session considers conducting virtual ethnography, what is gained and what is lost, with a focus on ethnography with Orthodox Jewish populations; the second session assesses digital surveys of religious communities and their attitudes e.g. what the 'bean-counters' might miss (and strategies not to) and finally in the third session we problematize the ethical tensions in online studies of community media with a particular focus on French Muslim media, already heavily surveilled. The sessions are intended to develop researcher knowledge and explore cross-cutting issues that concern a broad spectrum of humanities and social science-based scholarship serving as;
|
Tue 23 |
Bulk Data Capture: an overview
Finished
This CDH Basics session provides a brief introduction to different methods for capturing bulk data from online sources or via agreement with data collection holders, including Application Programme Interfaces (APIs). We will address issues of data provenance, exceptions to copyright for text and data-mining, and discuss good practice in managing and working with data that others have created. |
Thu 25 |
Social Network Analysis (SNA)
Finished
Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk). Social Network Analysis (SNA) is an exciting and rapidly growing methodology. You will find researchers in almost every faculty at the University of Cambridge applying SNA methods within their research. However, SNA researchers can only go so far before they must learn a coding language. Many SNA tools- descriptive metrics, visualisation techniques, and mathematical models- require researchers to use R. This session is for those researchers interested in SNA methods, but lack experience in the R environment. While network visualisation is just one component of SNA, data visualisation can be a great gateway into a new programming language. This session will introduce you to the R environment by leading you through the creation of static network diagrams. The session is directed at beginners and basic R users that want to explore SNA tools in R. |
Applications for this workshop have now closed. As religious services and communities have shifted online so too have scholars of religion. But at what cost? These sessions raise some of the epistemological and ethical issues of doing fieldwork in a digital environment from an inclusive anthropological perspective with a close-up on a particular case study in each session. The first session considers conducting virtual ethnography, what is gained and what is lost, with a focus on ethnography with Orthodox Jewish populations; the second session assesses digital surveys of religious communities and their attitudes e.g. what the 'bean-counters' might miss (and strategies not to) and finally in the third session we problematize the ethical tensions in online studies of community media with a particular focus on French Muslim media, already heavily surveilled. The sessions are intended to develop researcher knowledge and explore cross-cutting issues that concern a broad spectrum of humanities and social science-based scholarship serving as;
|
March 2021
Mon 1 |
Interaction with Machine Learning
Finished
Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Thursday 7 January 2021. We will review applications on a rolling basis and applicants will be notified at the latest by the end of Monday 11 January. This CDH Guided Project aims to provide humanities, arts and social science researchers with an overview of current theory and practice in the design of human-computer interaction in the age of AI and equip the participants with analytical tools necessary for a critical investigation of contemporary design with AI/ML. Looking closely at interactions between humans and emerging AI systems, the workshop will also explore the potential for interaction between humanities scholars and computer scientists in the process of development and assessment of new solutions. Lectures and practical research design sessions in Interaction with Machine Learning taught by Professor Alan Blackwell and Advait Sarkar (Microsoft Research) as part of an optional course for Part III and MPhil Computer Science students will form the anchoring element of the Project. These will allow researchers without a Computer Science background to explore how key challenges in AI design are being addressed within the field of interaction design, as well as identify areas in which humanities methodologies and approaches could be adopted to improve the production process, by making it more fair, critical, and socially-aware. Participants will also take part in three workshops specifically tailored to humanities and social science researchers and will be supported in developing a mini research project investigating how humans interact with systems based on computational models. The projects may include:
Please note: no prior practical experience or knowledge of programming is required to take part in the Project, however some awareness of how AI systems work will be beneficial. Minimum time commitment:
Participants are encouraged to set aside additional time to work on their projects between sessions. A Moodle email forum and drop-in ‘clinic’ style support sessions will be available during the Guided Project. Lecture topics and dates
Workshop themes
Objectives By the end of the course participants should:
|
Applications for this workshop have now closed. Corpus linguistic approach to language is based on collections of electronic texts. It uses software to search and quantify various linguistic phenomena that make up patterns, which it then compares within and across texts based on their frequency. Corpus stylistics applies tools and methods from corpus linguistics to stylistic research. Corpus stylistics mainly focuses on literary texts, individual or corpora. Corpora are here, usually, principled collections of texts, for example a collection of texts by one author, or texts from a specific period. It focuses both on more general patterns and meanings that are observable across corpora and patterns and meanings in one individual text. In terms of quantitative approaches that corpus stylistics employs, it is in many ways similar to work that is referred to as ‘distant reading’ and also ‘cultural analytics’. These approaches emphasise the gains that we get from looking at texts from “distance”, i.e., in large quantities. For corpus stylistics, it is the relationship between quantitative and qualitative that is central. Therefore, research in corpus stylistics often deals with much smaller “cleaner” data sets, so that the qualitative step in the analysis is more manageable. This workshop aims to introduce the basic corpus linguistic techniques and methods for working with literary and other texts. It aims:
|
|
Tue 2 |
This CDH Basics session explores how data which you have captured rather than created yourself, is likely to need cleaning up before you can use it effectively. This short session will introduce you to the basic principles of creating structured datasets and walk through some case studies in data cleaning with OpenRefine, a powerful open source tool for working with messy data. |
Applications for this workshop have now closed. As religious services and communities have shifted online so too have scholars of religion. But at what cost? These sessions raise some of the epistemological and ethical issues of doing fieldwork in a digital environment from an inclusive anthropological perspective with a close-up on a particular case study in each session. The first session considers conducting virtual ethnography, what is gained and what is lost, with a focus on ethnography with Orthodox Jewish populations; the second session assesses digital surveys of religious communities and their attitudes e.g. what the 'bean-counters' might miss (and strategies not to) and finally in the third session we problematize the ethical tensions in online studies of community media with a particular focus on French Muslim media, already heavily surveilled. The sessions are intended to develop researcher knowledge and explore cross-cutting issues that concern a broad spectrum of humanities and social science-based scholarship serving as;
|
|
Thu 4 |
Applications for this workshop have now closed. Corpus linguistic approach to language is based on collections of electronic texts. It uses software to search and quantify various linguistic phenomena that make up patterns, which it then compares within and across texts based on their frequency. Corpus stylistics applies tools and methods from corpus linguistics to stylistic research. Corpus stylistics mainly focuses on literary texts, individual or corpora. Corpora are here, usually, principled collections of texts, for example a collection of texts by one author, or texts from a specific period. It focuses both on more general patterns and meanings that are observable across corpora and patterns and meanings in one individual text. In terms of quantitative approaches that corpus stylistics employs, it is in many ways similar to work that is referred to as ‘distant reading’ and also ‘cultural analytics’. These approaches emphasise the gains that we get from looking at texts from “distance”, i.e., in large quantities. For corpus stylistics, it is the relationship between quantitative and qualitative that is central. Therefore, research in corpus stylistics often deals with much smaller “cleaner” data sets, so that the qualitative step in the analysis is more manageable. This workshop aims to introduce the basic corpus linguistic techniques and methods for working with literary and other texts. It aims:
|
Mon 8 |
Interaction with Machine Learning
Finished
Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Thursday 7 January 2021. We will review applications on a rolling basis and applicants will be notified at the latest by the end of Monday 11 January. This CDH Guided Project aims to provide humanities, arts and social science researchers with an overview of current theory and practice in the design of human-computer interaction in the age of AI and equip the participants with analytical tools necessary for a critical investigation of contemporary design with AI/ML. Looking closely at interactions between humans and emerging AI systems, the workshop will also explore the potential for interaction between humanities scholars and computer scientists in the process of development and assessment of new solutions. Lectures and practical research design sessions in Interaction with Machine Learning taught by Professor Alan Blackwell and Advait Sarkar (Microsoft Research) as part of an optional course for Part III and MPhil Computer Science students will form the anchoring element of the Project. These will allow researchers without a Computer Science background to explore how key challenges in AI design are being addressed within the field of interaction design, as well as identify areas in which humanities methodologies and approaches could be adopted to improve the production process, by making it more fair, critical, and socially-aware. Participants will also take part in three workshops specifically tailored to humanities and social science researchers and will be supported in developing a mini research project investigating how humans interact with systems based on computational models. The projects may include:
Please note: no prior practical experience or knowledge of programming is required to take part in the Project, however some awareness of how AI systems work will be beneficial. Minimum time commitment:
Participants are encouraged to set aside additional time to work on their projects between sessions. A Moodle email forum and drop-in ‘clinic’ style support sessions will be available during the Guided Project. Lecture topics and dates
Workshop themes
Objectives By the end of the course participants should:
|
Applications for this workshop have now closed. Corpus linguistic approach to language is based on collections of electronic texts. It uses software to search and quantify various linguistic phenomena that make up patterns, which it then compares within and across texts based on their frequency. Corpus stylistics applies tools and methods from corpus linguistics to stylistic research. Corpus stylistics mainly focuses on literary texts, individual or corpora. Corpora are here, usually, principled collections of texts, for example a collection of texts by one author, or texts from a specific period. It focuses both on more general patterns and meanings that are observable across corpora and patterns and meanings in one individual text. In terms of quantitative approaches that corpus stylistics employs, it is in many ways similar to work that is referred to as ‘distant reading’ and also ‘cultural analytics’. These approaches emphasise the gains that we get from looking at texts from “distance”, i.e., in large quantities. For corpus stylistics, it is the relationship between quantitative and qualitative that is central. Therefore, research in corpus stylistics often deals with much smaller “cleaner” data sets, so that the qualitative step in the analysis is more manageable. This workshop aims to introduce the basic corpus linguistic techniques and methods for working with literary and other texts. It aims:
|
|
Thu 11 |
Text-mining is extracting information from unstructured text, such as books, newspapers, and manuscript transcriptions. This foundational course is aimed at students and staff who are new to text-mining, and presents a basic introduction to text-mining principles and methods, with coding examples and exercises in Python. To discuss the process, we will walk through a simple example of collecting, cleaning and analysing a text. If you are interested in attending this course, please fill in, and return, the application form by Monday, 22 February 2021. Places will be prioritised for students and staff in the schools of Arts & Humanities, Humanities & Social Sciences, libraries and museums. If you study or work in a STEM department and use humanities or social sciences approaches you are also welcome to apply. |
Applications for this workshop have now closed. Corpus linguistic approach to language is based on collections of electronic texts. It uses software to search and quantify various linguistic phenomena that make up patterns, which it then compares within and across texts based on their frequency. Corpus stylistics applies tools and methods from corpus linguistics to stylistic research. Corpus stylistics mainly focuses on literary texts, individual or corpora. Corpora are here, usually, principled collections of texts, for example a collection of texts by one author, or texts from a specific period. It focuses both on more general patterns and meanings that are observable across corpora and patterns and meanings in one individual text. In terms of quantitative approaches that corpus stylistics employs, it is in many ways similar to work that is referred to as ‘distant reading’ and also ‘cultural analytics’. These approaches emphasise the gains that we get from looking at texts from “distance”, i.e., in large quantities. For corpus stylistics, it is the relationship between quantitative and qualitative that is central. Therefore, research in corpus stylistics often deals with much smaller “cleaner” data sets, so that the qualitative step in the analysis is more manageable. This workshop aims to introduce the basic corpus linguistic techniques and methods for working with literary and other texts. It aims:
|
|
Mon 15 |
Interaction with Machine Learning
Finished
Application forms should be returned to CDH Learning (learning@cdh.cam.ac.uk) by Thursday 7 January 2021. We will review applications on a rolling basis and applicants will be notified at the latest by the end of Monday 11 January. This CDH Guided Project aims to provide humanities, arts and social science researchers with an overview of current theory and practice in the design of human-computer interaction in the age of AI and equip the participants with analytical tools necessary for a critical investigation of contemporary design with AI/ML. Looking closely at interactions between humans and emerging AI systems, the workshop will also explore the potential for interaction between humanities scholars and computer scientists in the process of development and assessment of new solutions. Lectures and practical research design sessions in Interaction with Machine Learning taught by Professor Alan Blackwell and Advait Sarkar (Microsoft Research) as part of an optional course for Part III and MPhil Computer Science students will form the anchoring element of the Project. These will allow researchers without a Computer Science background to explore how key challenges in AI design are being addressed within the field of interaction design, as well as identify areas in which humanities methodologies and approaches could be adopted to improve the production process, by making it more fair, critical, and socially-aware. Participants will also take part in three workshops specifically tailored to humanities and social science researchers and will be supported in developing a mini research project investigating how humans interact with systems based on computational models. The projects may include:
Please note: no prior practical experience or knowledge of programming is required to take part in the Project, however some awareness of how AI systems work will be beneficial. Minimum time commitment:
Participants are encouraged to set aside additional time to work on their projects between sessions. A Moodle email forum and drop-in ‘clinic’ style support sessions will be available during the Guided Project. Lecture topics and dates
Workshop themes
Objectives By the end of the course participants should:
|
Thu 18 |
Text-mining is extracting information from unstructured text, such as books, newspapers, and manuscript transcriptions. This foundational course is aimed at students and staff who are new to text-mining, and presents a basic introduction to text-mining principles and methods, with coding examples and exercises in Python. To discuss the process, we will walk through a simple example of collecting, cleaning and analysing a text. If you are interested in attending this course, please fill in, and return, the application form by Monday, 22 February 2021. Places will be prioritised for students and staff in the schools of Arts & Humanities, Humanities & Social Sciences, libraries and museums. If you study or work in a STEM department and use humanities or social sciences approaches you are also welcome to apply. |