Accelerate's 2021 Annual Symposium
2021’s Annual Symposium convened researchers from across Cambridge to explore how AI is advancing their work, and what action is needed to support its wider deployment for scientific discovery. Check out videos and Symposium visuals here.
AI and Large Language Models
A collection of resources for researchers interested in using Large Language Models (LLMs) in their research.
Accelerate has released code for working with Large Language Models in research; you can find it in our large-language-models GitHub repository. This code covers a range of ways to use and tune LLMs, including calling APIs, finetuning models, and creating more complex solutions like Retrieval Augmented Generation (RAG). This code is freely available for researchers to build on in their work.
An Introduction to Diffusion Models in Generative AI
A collection of resources for researchers interested in using Diffusion Models in their research.
With the increase in AI-generated imagery using models such as Dall-E, Midjourney and Sora and research applications such as AlphaFold, there has been a surge in workflows incorporating models like Stable Diffusion. These models have potential in research applications including drug discovery, weather forecasting, synthetic speech and medical imaging.
An Introduction to Docker
A collection of resources for researchers interested in using Docker.
Writing research software in Python presents numerous challenges to reproducibility - what version of Python is being used? What about the versions of PyTorch, Scikit Learn or Numpy? Should we use Conda, or venv, or Poetry to manage dependencies and environments? How can we control randomness? Do I have the right version of Cuda Toolkit? In principle, given the same data, and same algorithms and methodology, we should be able to reproduce the results of any given experiment to within an acceptable degree of error. Dealing with the above questions introduces significant problems to reproducing experiments in machine learning. This workshop will explore the use of Docker to help alleviate almost all of these questions. Furthermore, combining Docker, git and GitHub can be a powerful workflow, helping to minimise your tech stack, and declutter your python development experience.
Data Pipelines for Science
Well-curated and managed data is central to the effective use of AI, in science and elsewhere. How can scientists build the data pipelines they need to accelerate their research with AI? Accelerate Science’s ‘Data Pipelines for Science’ School helps scientists overcome such data pipeline challenges by equipping them with the latest best-practice software techniques.
Designing Machine Learning For Real-World Challenges
This course - Machine Learning and the Physical World - is focused on how to build machine learning systems that interact directly with the real world. It explores how to create models with a principled treatment of uncertainty, allowing researchers to leverage prior knowledge and provide decisions that can be interrogated.
Doing Data Science In The Real World
This course - Advanced Data Science - looks at the real world challenges of data science, separating them into three stages: access, assess and address. The stages help in understanding that the data science pipeline is not just about the machine learning methods, but the ethical concerns, the challenges of data management as well as model fitting.
Introducing Data Science for Research
The Accelerate Programme offers PhDs and postdocs disciplines across Cambridge University the opportunity to participate in a ‘Data for Science’ training course. This structured Accelerate-Cambridge Spark Introducing Data Science for Research will equips scientists with modern practical data analysis skills using Python in a virtual instructor-led accelerated masterclass.
Machine Learning Accelerator
2021’s Accelerate Science winter school brought together researchers at the interface of machine learning and the sciences to share insights and methods in machine learning that can support scientific discovery.
Machine Learning for Science Jupyter Notebooks
This gallery brings together Jupyter notebooks produced by participants in the Data Science for Science Residency. They contain information and code about the data science techniques that participants have used in research areas that range from genetics to astronomy.
NETTS - Networks of Transcript Semantics
The algorithms in this toolbox create a semantic speech graph from transcribed speech. Speech transcripts are short paragraphs of largely raw, uncleaned speech-like text.
Publishing and Packaging Python Code for Research
A collection of resources for researchers interested in publishing and packaging python code for research.
Accelerate Science have developed a one day course for researchers interested in building knowledge of workflows and tools they can use to package and publish their code. Releasing software outputs from your research is an important step for open science and enables other researchers to utilise your code and for your work to have further impact. These materials explore the importance of sharing code in line with FAIR principles for research software. The course provides a step by step guide to publishing your code using the package Poetry and will walk though an example project.
Python Programming For Science
This self-learning module introduces the fundamentals of Python, some of the kinds of data it can handle, and how to store that data. Designed for researchers across disciplines, it supports learners to rapidly learn how to code in the context of working with real-world data. To access the module, register at the link.
Strategic Research Agenda for AI for science
Accelerate is developing a strategic research agenda in AI for science, which we hope will help drive further progress across projects and institutions. This page gives an overview of our recent workshops convened with the AI for Science community and outputs from these discussions contributing towards an emerging strategic research agenda for the field.
Understanding Machine Learning and AI
This course - Machine Learning and Adaptive Intelligence - was originally delivered at the University of Sheffield (2011-2015), but has been updated with current material to introduce key concepts and methods in machine learning.