On March 25, Elena Knudsen conducted an illuminating workshop on text analysis using R, guiding participants through the fundamentals of processing and analyzing textual data. The session introduced essential R packages, what text mining is, and what tidy text is, and demonstrated how to structure, tokenize, and analyze literary texts effectively.

The workshop began with an overview of how Jane Austen’s novels are stored in the janeaustenr package and how they can be transformed into a structured dataset. Using dplyr, participants learned to organize the text by adding line numbers and identifying chapter breaks. The session then moved into text tokenization with tidytext, breaking down the novels into individual words to facilitate further analysis.

Throughout the session, attendees explored basic text mining techniques, including word frequency analysis, and discussed potential applications of these methods in research and data-driven projects. With a mix of online and in-person participants, the workshop encouraged interactive learning and practical experimentation with R. The session concluded with an open discussion, allowing attendees to share their insights and ask questions about text analysis workflows.