Responsible Use of LLMs in Research: Moving Beyond the Hype
13 January 2025
Sebastian Hickman, PhD student, AI for the study of Environmental Risks CDT and Yusuf Hamied Department of Chemistry
27 February 2023
Let’s imagine that there is a new technology that purports to reduce emissions of harmful air pollutants from power plants. If we are going to invest in this technology, need to quantify its positive impact, such as how much it reduces deaths from air pollution in nearby areas. This might seem simple. We can count the number of deaths before the technology was implemented, and then compare this to the number of deaths after it was rolled out. However, in many cases, this will not give the true causal effect of the technology, as other factors may affect deaths from air pollution. For example, what if, at the same time as implementing the technology, governments placed restrictions on air pollution emissions from cars? How do we work out the causal effect on deaths on the power plant technology, given this confounder? These are the complicated questions that causal methods seek to answer.
Cause for optimism
Causal methods provide an exciting set of tools to determine and quantify the causal effects of interventions on systems of interest such as the environment, going beyond traditional correlative methods to analyse a specific situation or problem and better explain the patterns of relationships between variables. Where traditional statistical and machine learning methods learn from correlations in data, causal methods focus on identifying and quantifying the causal relationships in data.
Furthermore, causal methods may provide a possible solution to the reduced predictive performance of traditional statistical and machine learning methods on out-of-distribution data, and allow practitioners to determine the effect of interventions on variables from observational data.
This is of particular interest to environmental researchers where experiments to quantify causal relationships are typically impossible or even unethical, such as experimenting with geoengineering/intentionally increasing emissions of greenhouse gases to see what happens/intentional deforestation to determine if that effects biodiversity.
Workshopping progress
Causal methods have yet to be widely adopted in environmental science. While these methods have enormous potential in the field, there is a danger that if applied improperly, their use may lead to incorrect conclusions that could potentially misinform real-world policy, damaging morale in fighting global warming, or even playing into the hands of climate change deniers keen to capitalise on any mistakes.
To help data scientists understand the current benefits and limitations of causal methods in typically complex environmental problems, and to identify which areas of environmental science would most benefit from the application of causal methods, we are setting up a network of researchers across Cambridge and the UK. In the longer-term, we hope the network will lead this new field and act as a hub for future funding applications. The first part of our endeavour was a kick-off meeting in December funded by Accelerate Science and the Cambridge Centre for Data Driven Discovery. The workshop brought together causal researchers and environmental data scientists. Four keynote speakers from research organisations across the world discussed their work, and areas in environmental science in which causal methods have already successfully been applied, and the questions these methods could help answer in the future.
The experts also talked about best practices in applying causal methods and software development using existing code packages and tools such as DoWhy and Tigramite, offering attendees valuable technical information they could take away and use. In addition, there were breakout sessions and a social dinner to encourage networking.
Cause for excitement
To build upon this event, we plan on organising a further two-day meeting and collaborative sessions. We hope these will deliver new collaborations between causal and environmental researchers and identify areas of common interest as well as the tools required by environmental researchers to facilitate applied causal research with real-world impact.
We plan to set up smaller working groups where researchers could voice problems they are interested in addressing and where others could help out. Some promising suggestions for groups include robust out-of-sample prediction in air quality forecasting and the quantification of the effect of interventions on environmental risk using causal techniques, and the identification of climate tipping points and fast climate transitions. We are also interested in holding a more hands-on event with coding examples to give attendees hands-on experience with code packages and tools and exposure to best practice. Currently there are not many formal workshops or teaching on causal methods in environmental science, so developing follow-up events and building a community is hugely beneficial. Attendees have already been spreading the word to other researchers, so we are hopeful workshops in the future will prove popular.
Growing a community centred on causal methods would certainly be a cause for celebration.
Sebastian and Paul Griffiths were awarded funding through the Accelerate Science and the Cambridge Centre for Data Driven Discovery funding scheme in 2022, you can read more about the awarded projects here.