Workshops box

Course overview: Learning Python for text-mining and the analysis of natural language

The gROW learning environment is the official location to register for this course and explore the materials. Follow this link to go directly to the gROW page for this course. Users who are unable to access gROW are welcome to register using libcal, but should email daniel.sharoh@ru.nl to gain access to materials and to indicate which sessions they would like to attend. Please register using gROW, if possible.

In addition, users with access to gROW (those with a U/E-number) can find more information about workshops and courses offered by the Radboud Digital Competence Centre here

This course will be in English. Location to be determined. In-person only, no virtual attendance. 

Course overview:
Text-mining refers to techniques which can involve the collection, processing and parsing of text derived from a range of sources (e.g., corpora, digital libraries, web forums). Text-mining is often performed with the goal of analyzing text to gain insights not readily available without the use of digital methods. It is useful to or shares methods with fields which seek to understand words and their context using computer friendly representations (e.g., word embeddings, or word vectors) such as natural language processing, computational models of language, psycholinguistcs, or digital humanities. In this hands-on course, students and researchers will learn text-mining and natural language processing (NLP) techniques using the Python programming language for a range of use-cases. Six sessions are currently available. Participants may choose to attend whichever of these self-contained sessions they find useful, but it is recommended that participants who are new to this topic attend at least Session 3. Readers who are interested in learning about additional tools and methods for text-mining and computer-based analysis of natural language are encouraged to consult the central library's text-mining guide, written by the text-mining support group at Information & Library Services (textminingsupport@ru.nl). Questions about these sessions can be sent via email to daniel.sharoh@ru.nl.

This course is offered twice per year, typically in April and in October.

Sessions 1/2: Python basics (9,10 March, 10:00-13:00)

Do you want to learn how to use Python? The Digital Competence Centre offers two sessions in Python basics to teach participants how to set up a programming environment, develop familiarity with the language, and learn a number of basic programming constructs essential to coding in an academic environment.

These introductory sessions are a pre-panion workshop for the other sessions. The Basics provides training concerning the setup of the Python language, interpreter and development environment on RU-issued computers, essential programming constructs, and Python syntax and quirks. After following this session, participants should understand data types, functions and methods, working with textual data and lists, if-statements, conditionals, for-loops, working with files, importing libraries, writing custom functions, and more. Participants in this session will also learn how to debug code, learn about Python through querying docstrings and other documentation, and develop a foundation for future learning. Participants who have previously setup Python environments and who have some experience programming in Python may already be familiar with this material. Materials for this session will be made available and can be reviewed by participants who are unsure of their own level of experience.

Session 3: Introduction to the analysis of natural language in Python (16 March, 10:00-13:00)

Do you want to learn how to use Python to analyze natural language? The Digital Competence Centre organizes an educational session on this topic.

This session provides an introduction to preprocessing, manipulating and analyzing linguistic and textual properties of natural language text in Python, with limited time dedicated to basic concepts in the Python language. Participants in this workshop will learn how to use the Natural Language Toolkit (NLTK) to segment or tokenize a given text in English. They will also learn techniques to analyze text based on this information. This could mean, for example, identifying the individual sentences and paragraphs in a text, labeling all words in a text with a part-of-speech tag, and analyzing word-frequency or co-occurrence statistics for all identified nouns. As a case-study, participants will perform a sentiment analysis (similar to the concept of valence) to quantify properties of the text that can be related to aspects of its emotional content. After this workshop, participants will be prepared to further develop their programming skills individually, and they will be able to write simple text processing scripts that can be integrated into a larger workflow.

Session 4: Syntactic parsing of natural language in Python (17 March, 10:00-13:00)

Do you want to learn how to use Python to parse the syntactic structure of natural language? The Digital Competence Centre organizes an educational session on this topic.

Natural languages such as English and Dutch can be analyzed in terms of both their syntactic and semantic properties. More "semantic" techniques such as sentiment analysis often analyze words and their context to calculate the "sentiment" of a news article or piece of text, but considerable information can also be derived from the syntactic structure of sentences. In this workshop, participants will learn to parse (e.g. derive syntactic trees), from sample text in Dutch or English that has been prepared for this workshop. Participants will also learn to visualize these parses using libraries in Python (and LaTeX). The ultimate goal of the workshop is to produce a simple analysis of the syntactic complexity of workshop-provided sentences, which can then be used for a number of down-stream analyses. For example: analyzing student essays, predicting response reaction time for experimental stimuli, or assessing readability. Similar analyses might also be performed to understand the distribution of structure types in a given text (e.g. active v. passive voice, double object datives). This session is therefore useful for anyone who would like to analyze the syntactic properties of text at large or small scales.

Sessions 5/6: Generating and analyzing sentence and word embeddings in Python (23,24 March, 10:00-13:00)

Do you want to learn how to use Python to generate and analyze vector representations of words and sentences? The Digital Competence Centre organizes educational sessions on this topic.

Word/sentence embeddings, or simply embeddings, are essential to natural language processing and useful to disciplines which benefit from analyzing numerical representations of words and larger chunks of text. Embeddings can capture aspects of the meaning of words and their context-dependent relationships to other words in a vocabulary. They are also critical for large language models and are regularly used in processes such as "semantic search." In this workshop, participants will learn to setup a pipeline to analyze the semantic similarity of sentences in a sample text. The workshop will cover embedding models and how to select, install and use them. Finally, participants will learn simple techniques to group, cluster and visualize sentences in a text based on these embeddings. Simple applications for this information might be determining the diversity of ideas in an article, how coherent a text is or the prevalence of ideas over time in specific genres (e.g. Sci-fi novels). With a small amount of tweaking, these embeddings might also be used to implement semantic search over a user's documents, which is commonly performed with LLM chatbots (e.g. "chat with your documents") but does not depend on them.

Related LibGuide: Text mining by Nina Lanke

Date:
Monday, March 9, 2026
Time:
10:00am - 1:00pm
Faculty:
  All faculties  
Categories:
  DCC     Digital Humanities     Text Mining  

Registration is required. There are 46 seats available.

Teacher(s)

Daniel Sharoh

More events like this...