Resources

Accelerate Programme Annual Report 2024

2024 has brought a new wave of excitement about the potential of AI for science. Thanks to the continuing support of Schmidt Sciences, the University of Cambridge has been positioned to respond to this excitement with a portfolio of research, training, and community-building activities convened by the Accelerate Programme for Scientific Discovery. During year four of this programme, we’ve delivered a step-change in our engagement with the Cambridge AI for science community. We are pleased to share our 2024 report that introduces the highlights from this work.

Accelerate's 2021 Annual Symposium

2021’s Annual Symposium convened researchers from across Cambridge to explore how AI is advancing their work, and what action is needed to support its wider deployment for scientific discovery. Check out videos and Symposium visuals here.

AI and Large Language Models

A collection of resources for researchers interested in using Large Language Models (LLMs) in their research.

Accelerate has released code for working with Large Language Models in research; you can find it in our large-language-models GitHub repository. This code covers a range of ways to use and tune LLMs, including calling APIs, finetuning models, and creating more complex solutions like Retrieval Augmented Generation (RAG). This code is freely available for researchers to build on in their work.

AI Core Concepts

AI and data-driven techniques have the potential to revolutionise research. AI is being used across the entire scientific research lifecycle - speeding up literature review, enhancing data analysis, accelerating experimentation, generating research hypotheses, simulating physical equations, and directly modelling physical phenomena. Researchers across disciplines are looking to learn more about AI and understand how they can apply it in their own work.

This short set of online resources explain some of the core concepts behind AI. They’re not a full introduction to the theory and practice of AI, but a starting point for researchers who are interested in exploring applied AI in their work.

An Introduction to Diffusion Models in Generative AI

A collection of resources for researchers interested in using Diffusion Models in their research.

With the increase in AI-generated imagery using models such as Dall-E, Midjourney and Sora and research applications such as AlphaFold, there has been a surge in workflows incorporating models like Stable Diffusion. These models have potential in research applications including drug discovery, weather forecasting, synthetic speech and medical imaging.

An Introduction to Docker

A collection of resources for researchers interested in using Docker.

Writing research software in Python presents numerous challenges to reproducibility - what version of Python is being used? What about the versions of PyTorch, Scikit Learn or Numpy? Should we use Conda, or venv, or Poetry to manage dependencies and environments? How can we control randomness? Do I have the right version of Cuda Toolkit? In principle, given the same data, and same algorithms and methodology, we should be able to reproduce the results of any given experiment to within an acceptable degree of error. Dealing with the above questions introduces significant problems to reproducing experiments in machine learning. This workshop will explore the use of Docker to help alleviate almost all of these questions. Furthermore, combining Docker, git and GitHub can be a powerful workflow, helping to minimise your tech stack, and declutter your python development experience.

Data Pipelines for Science

Well-curated and managed data is central to the effective use of AI, in science and elsewhere. How can scientists build the data pipelines they need to accelerate their research with AI? Accelerate Science’s ‘Data Pipelines for Science’ School helps scientists overcome such data pipeline challenges by equipping them with the latest best-practice software techniques.

Designing Machine Learning For Real-World Challenges

This course - Machine Learning and the Physical World - is focused on how to build machine learning systems that interact directly with the real world. It explores how to create models with a principled treatment of uncertainty, allowing researchers to leverage prior knowledge and provide decisions that can be interrogated.

Doing Data Science In The Real World

This course - Advanced Data Science - looks at the real world challenges of data science, separating them into three stages: access, assess and address. The stages help in understanding that the data science pipeline is not just about the machine learning methods, but the ethical concerns, the challenges of data management as well as model fitting.

Introducing Data Science for Research

The Accelerate Programme offers PhDs and postdocs disciplines across Cambridge University the opportunity to participate in a ‘Data for Science’ training course. This structured Accelerate-Cambridge Spark Introducing Data Science for Research will equips scientists with modern practical data analysis skills using Python in a virtual instructor-led accelerated masterclass.

Machine Learning Accelerator

2021’s Accelerate Science winter school brought together researchers at the interface of machine learning and the sciences to share insights and methods in machine learning that can support scientific discovery.

Machine Learning for Science Jupyter Notebooks

This gallery brings together Jupyter notebooks produced by participants in the Data Science for Science Residency. They contain information and code about the data science techniques that participants have used in research areas that range from genetics to astronomy.

NETTS - Networks of Transcript Semantics

The algorithms in this toolbox create a semantic speech graph from transcribed speech. Speech transcripts are short paragraphs of largely raw, uncleaned speech-like text.

Publishing and Packaging Python Code for Research

A collection of resources for researchers interested in publishing and packaging python code for research.

Accelerate Science have developed a one day course for researchers interested in building knowledge of workflows and tools they can use to package and publish their code. Releasing software outputs from your research is an important step for open science and enables other researchers to utilise your code and for your work to have further impact. These materials explore the importance of sharing code in line with FAIR principles for research software. The course provides a step by step guide to publishing your code using the package Poetry and will walk though an example project.

Python Programming For Science

This self-learning module introduces the fundamentals of Python, some of the kinds of data it can handle, and how to store that data. Designed for researchers across disciplines, it supports learners to rapidly learn how to code in the context of working with real-world data. To access the module, register at the link.

Strategic Research Agenda for AI for science

Accelerate is developing a strategic research agenda in AI for science, which we hope will help drive further progress across projects and institutions. This page gives an overview of our recent workshops convened with the AI for Science community and outputs from these discussions contributing towards an emerging strategic research agenda for the field.

Understanding Machine Learning and AI

This course - Machine Learning and Adaptive Intelligence - was originally delivered at the University of Sheffield (2011-2015), but has been updated with current material to introduce key concepts and methods in machine learning.