Workshops box

Extracting text from academic articles using Grobid (Workshop/code-along)

Users with access to the gROW learning environment (those with a U/E-number) can find more information about workshops and courses offered by the Radboud Digital Competence Centre here. The gROW environment is the official location for these materials, and gROW is preferred to libcal for sign-ups and for exploring our materials. Follow this link to go directly to the gROW page for the current workshop series. Users who are unable to access gROW are welcome to register using libcal. A overview of the currently scheduled digital methods trainings can be found on libcal after filtering by the category "DCC" (linked here for convenience). 

Series overview:

Extracting text from academic articles using Grobid (workshop/code-along)

When performing a literature review, whether to better understand a topic or for the purpose of writing a systematic or scoping review, researchers are often faced with the task of synthesizing information from a daunting number of articles. In this workshop session, participants will learn a range of computer tools and methods to approach this problem. Several of these involve the use a software workflow developed at the RU Library to extract and analyze text from PDFs contained in a reference manager library (e.g. zotero or endnote). Article text can furthermore be extracted based on section (e.g. Introduction, Methods), which allows for the use of tools or methods conventionally applied only to article abstracts. For example, this might include preparation of the extracted text to be used as input to ASReview, analysis with natural language processing tools, or summarization.

The use-case for this session is to extract text from the introduction sections of these articles, which can then be optionally used to generate a summary or for further parsing depending on the needs of the participant (e.g. for use in ASReview). Participants will also learn how to create PDFs from these extracted and parsed texts for re-integration with reference manager software via the commonly used RIS format references file.

All necessary software will be provided in a virtual disk image accessible on zenodo. A companion guide to this procedure as well as a repository of related scripts will also be made available on the RU library github page.

Date:
Thursday, November 13, 2025
Time:
1:30pm - 3:30pm
Location:
1.05B Central Library Instruction Room
Campus:
Central Library
Faculty:
  All faculties  
Categories:
  DCC     Literature search     Reference managers     Text Mining  
Registration has closed.

Teacher(s)

Daniel Sharoh

More events like this...