Exploring the use of LLMs in humanities and social sciences research

Petre Breazu, Senior Research Associate, Department of Theoretical and Applied Linguistics

18 October 2024

The rapid advancement of Large Language Models (LLMs) such as ChatGPT-4 is revolutionising research across various disciplines, including the humanities and social sciences. Over the next five years, my research project at the Department of Theoretical and Applied Linguistics will explore the diverse applications of LLMs within these fields. My interdisciplinary project will focus on three key areas: critical analysis of AI-generated narratives in the media sector, enhancing qualitative research methodologies, and improving online hate speech detection. By combining AI technology with the expertise of humanities and social sciences researchers, we aim to uncover new insights, improve research efficiency, and address pressing social issues.

Exploring AI-generated narratives in the media sector

The rapid emergence of generative AI models in the media sector requires an examination of the narratives these models produce, particularly regarding sensitive topics such as politics, racism, immigration, public health, gender, and violence. While generative AI offers vast opportunities for content creation by professionals and amateurs alike, it also poses significant challenges. One major concern is the potential harm from amplifying biases or spreading misinformation.

LLMs like ChatGPT-4, trained on large datasets, can produce diverse content but also carry societal prejudices and biases inherent in the data they were trained on. Consequently, these models can reproduce stereotypes or amplify existing societal prejudices. Moreover, LLMs can generate content that appears factual and authoritative but is largely fictional, making them potential tools for disseminating misinformation, especially on sensitive or controversial topics – a challenge that we also discussed in the LLM Study Group run by the Accelerate Programme which I joined in Michaelmas Term last year.

This research aims to critically analyse these AI-generated narratives and compare them with traditional media articles to identify patterns of representation, inherent biases, and potential discrepancies. To achieve this, we use well-established methods in the humanities and social sciences to systematically analyse textual and multimodal data from both AI and traditional media sources. For example, we analyse the coverage of a major political event by collecting articles generated by an LLM and comparing them with those from established news outlets. By examining these datasets, we can assess whether the LLM tends to present the event with a certain slant, or how the tone or political views compare to those in human-written articles. The literature we discussed in our study group helped us better contextualise our findings.

Through this detailed examination, we aim to uncover the broader implications of using AI in media production and develop strategies to mitigate any risks, ensuring that AI’s role in content creation is both responsible and ethical.

Advancing Qualitative Analysis in Humanities and Social Sciences

The second area of my research focuses on the potential of LLMs - to enhance qualitative research methodologies in the humanities and social sciences, especially for researchers who work with large amounts of data. With Professor Napoleon Katsos and colleagues from computational linguistics at the Department of Theoretical and Applied Linguistics, we run various experiments where we replicate qualitative analyses performed by human researchers and compare them to those done by LLMs. The course provided valuable insights into how LLMs can process large, diverse datasets efficiently and innovative strategies for working with incomplete or unstructured data.

Drawing upon this, we aim to integrate LLMs to improve the scalability and efficiency of qualitative analysis, including thematic analysis, various forms of text, discourse, and multimodal analysis. We employed a two-fold approach with OpenAI’s GPT-4 via the OpenAI API for our first experiment using LLMs for thematic analysis. Initially, GPT-4 was tasked with inductively categorising YouTube comments from an existing dataset. We directed ChatGPT-4 to follow Braun and Clarke’s (2006) six steps of thematic analysis, which included reading the data, coding by highlighting key phrases, identifying overarching themes, and providing descriptions for each theme. The dataset was fed in small batches to allow the model to independently analyse the comments without pre-defined categories to ensure organic theme emergence. We then compared GPT-4’s identified themes with those found by a human qualitative researcher, with additional assessment by four experts familiar with the dataset. In the second phase, we used the identified categories to instruct GPT-4 to deductively assign each comment to one of the established categories, as we experimented with various prompts. To ensure generalizable comparisons of GPT-4’s categorization quality, we conducted the deductive analysis twice: once with categories created by ChatGPT and once with categories assembled by a qualitative researcher familiar with the field. This comprehensive approach allowed us to assess GPT-4’s thematic classification capabilities, its alignment with human evaluators, and its overall efficacy in qualitative research tasks.

Our experiments combine human expertise with AI capabilities to document both the advantages and limitations of using LLMs in qualitative research. This research explores the promising potential of LLMs to process large datasets, streamline analysis, and reduce subjectivity without having to develop a model from scratch, or even fine-tune one.

Integrating LLMs like GPT-4 in qualitative research offers promising opportunities but also presents challenges. Our first study shows that while GPT-4 can generate useful initial categorizations, the depth and specificity provided by human researchers are crucial for accurate and meaningful analysis. For example, GPT-4’s neutral approach sometimes misclassifies comments, while human researchers can accurately identify specific themes due to their deep understanding of socio-political contexts. Additionally, the AI’s tendency to avoid explicitly labelling hate speech underscores the need for human interpretation to correctly categorise sensitive data. Using theory-driven prompts with predefined frameworks helps align GPT-4’s categorizations more closely with human analysis. Training the model with context-specific information and clear methodological steps enhances its accuracy. Our research is pushing the field forward by refining the synergy between human intelligence and LLM capabilities, focusing on integrating in-context learning and theory-driven prompts into LLM training and deployment. We aim to enhance the interpretative abilities of LLMs by incorporating feedback mechanisms that allow human researchers to interactively refine and guide the model’s analysis. This collaborative approach uses the strengths of both AI and human expertise to improve qualitative research outcomes.

Enhancing Hate Speech Detection Online

We are beginning work on a third area of this project, which aims to fine-tune existing LLMs to improve the detection of online hate speech, which is widespread yet challenges existing content moderation tools. While current AI tools have some success in detecting harmful content, they often struggle to identify hate speech conveyed through indirect or non-literal language, such as irony, and hate speech embedded in multiple such as images or videos. Our research aims to overcome current limitations by developing a cloud-based service and a web-based application to improve hate speech detection, but it is in its infancy. I intend to keep engaging with the Accelerate Programme as it’s a great resource for people like me who do not have a background in computer science. The programme can offer valuable support and advice, including opportunities to collaborate and network with experts.

Bringing everything together

These projects may seem different, but they are closely connected. My five-year research project at the Department of Theoretical and Applied Linguistics examines the human-AI synergy in humanities and social science research. The aim is to use the expertise in these fields to critically analyse and understand the implications, biases, and societal impacts of AI technologies. Simultaneously, the project explores the capabilities of LLMs to improve research methodologies and offer solutions to contemporary social problems, such as hate speech on social media.

Together, these projects aim to integrate LLMs into humanities and social sciences to enhance analysis, promote ethical AI use, and address social issues. In the future, I hope this project leads to more responsible AI applications in media, research, and content moderation, ultimately contributing to a more informed and fair society. I hope that capitalising on human-AI expertise will allow us to develop effective methods to tackle contemporary social problems and advance research methodologies across disciplines.