Workshops box

Generating and analyzing sentence and word embeddings in Python: Session 4/4 in Learning Python for text-mining and the analysis of natural language

Users with access to the gROW learning environment (those with a U/E-number) can find more information about workshops and courses offered by the Radboud Digital Competence Centre here. The gROW environment is the official location for these materials, and gROW is preferred to libcal for sign-ups and for exploring our materials. Follow this link to go directly to the gROW page for the current workshop series. Users who are unable to access gROW are welcome to register using libcal. A overview of the currently scheduled digital methods trainings can be found on libcal after filtering by the category "DCC" (linked here for convenience).

Series overview:

Text-mining refers to techniques which can involve the collection, processing and parsing of text derived from a range of sources (e.g., corpora, digital libraries, web forums). Text-mining is often performed with the goal of analyzing text to gain insights not readily available without the use of digital methods. It is useful to or shares methods with fields which seek to understand words and their context using computer friendly representations (e.g., word embeddings, or word vectors) such as natural language processing, computational models of language, psycholinguistcs, or digital humanities. In this workshop series, students and researchers will learn text-mining and natural language processing (NLP) techniques using the Python programming language for a range of use-cases. Four workshop sessions are currently available. Participants may choose to attend whichever of these self-contained sessions they find useful, but it is recommended that participants who are new to this topic attend at least Session 2. Readers who are interested in learning about additional tools and methods for text-mining and computer-based analysis of natural language are encouraged to consult the recently published text-mining guide, written by the text-mining support group at Information & Library Services (textminingsupport@ru.nl). Questions about these workshops or their contents can be sent via email to daniel.sharoh@ru.nl.

****************

Session 4: Generating and analyzing sentence and word embeddings in Python

Do you want to learn how to use Python to generate and analyze vector representations of words and sentences? The Digital Competence Centre organizes a workshop session on this topic which is suitable for participants with a broad range of skill-levels.

Word/sentence embeddings, or simply embeddings, are essential to natural language processing and useful to disciplines which benefit from analyzing numerical representations of words and larger chunks of text. Embeddings can capture aspects of the meaning of words and their context-dependent relationships to other words in a vocabulary. They are also critical for large language models and are regularly used in processes such as "semantic search." In this workshop, participants will learn to setup a pipeline to analyze the semantic similarity of sentences in a sample text. The workshop will cover embedding models and how to select, install and use them. Finally, participants will learn simple techniques to group, cluster and visualize sentences in a text based on these embeddings. Simple applications for this information might be determining the diversity of ideas in an article, how coherent a text is or the prevalence of ideas over time in specific genres (e.g. Sci-fi novels). With a small amount of tweaking, these embeddings might also be used to implement semantic search over a user's documents, which is commonly performed with LLM chatbots (e.g. "chat with your documents") but does not depend on them.

Please follow the links below to sign-up for the other sessions:

Link to Session 1

Link to Session 2

Link to Session 3

Related LibGuide: Text mining by Nina Lanke