Accelerate Lunchtime Seminar Series

Starts: 2025/12/15 at 12:00

Ends: 2025/12/15 at 13:00

Read More

Join us to find out more about research taking place in AI for Science across the Accelerate Science community.

Details of future talks are available on Talks@Cam

Lunch provided, please register to attend via this form so we can confirm catering arrangements.

Lessons from integrating Geospatial Foundation Models into environmental journalism methodologies

Dr Anne Alexander, Cambridge Digital Humanities, University of Cambridge

The use of foundation models produced from geospatial data, such as satellite images, is becoming increasingly common in ‘remote-sensing’ methods deployed by journalists alongside traditional reporting methods of field visits, observation and interviews. This presentation discusses the lessons learned from a project undertaken by CDH in collaboration with the Pulitzer Center and Watershed Investigations to deploy EarthIndex, a tool which combines code to produce cloud-free composite Sentinel-2 images and run foundation model inference on 32×32-pixel patches worldwide with an interface allowing users to label images and generate fast approximate nearest neighbour searches from a vector database. We were able to test model predictions against investigation and observation at ground level and demonstrate the importance of triangulating information derived from the use of AI tools with other sources and perspectives. Our team members’ field work also underlined the ethical challenges of relying on remote sensing data and the potential for the use of such methods to harm vulnerable communities.

Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World

Dr Srijit Seal, Broad Institute, USA

Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This talk will emphasize the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We will focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.

These seminars are open to members of the University of Cambridge. For further details, please email accelerate-science@cst.cam.ac.uk.