Workshops box

Extracting text from academic articles using Grobid (Workshop/code-along)
Extracting text from academic articles using Grobid (workshop/code-along)
When performing a literature review, whether to better understand a topic or for the purpose of writing a systematic or scoping review, researchers are often faced with the task of synthesizing information from a daunting number of articles. In this workshop session, participants will learn a range of computer tools and methods to approach this problem. Several of these involve the use a software workflow developed at the RU Library to extract and analyze text from PDFs contained in a reference manager library (e.g. zotero or endnote). Article text can furthermore be extracted based on section (e.g. Introduction, Methods), which allows for the use of tools or methods conventionally applied only to article abstracts. For example, this might include preparation of the extracted text to be used as input to ASReview, analysis with natural language processing tools, or summarization.
The use-case for this session is to extract text from the introduction sections of these articles, which can then be optionally used to generate a summary or for further parsing depending on the needs of the participant (e.g. for use in ASReview). Participants will also learn how to create PDFs from these extracted and parsed texts for re-integration with reference manager software via the commonly used RIS format references file.
All necessary software will be provided in a virtual disk image accessible on zenodo. A companion guide to this procedure as well as a repository of related scripts will also be made available on the RU library github page.
- Date:
- Thursday, November 13, 2025
- Time:
- 1:30pm - 3:30pm
- Location:
- 1.05B Central Library Instruction Room
- Campus:
- Central Library
- Faculty:
- All faculties
- Categories:
- DCC Literature search Reference managers Text Mining