LLMs in Science

Large language models (GPT-N from OpenAI, Claude from Anthropic, LLaMAs from Meta, the list goes on) have emerged as the de-facto standard of language modeling. Transformer models trained on language show incredible capacity and generalizability that was unheard of in deep learning before 2017. However, the applications of LLMs are not limited to language. In fact, LLMs are a generalizable tool that can be applied to any data that can be represented as a sequence of tokens. This includes some well-researched applications (images, audio, video), but that is not the end of where transformers can be applied. This page hosts some works that focus on applying transformers to various domains of scientific data. By no means is this list complete, but it has some interesting efforts in it.

I am currently working to apply transformer architectures to three separate problems:

Genomic data, in particular, long context-windows for identifying long-range dependencies in genomic data
Cosmological data: using transformers to model the merger history of massive galaxies and clusters
A foundational model of astrophysical observation: Using transformers to learn to reconstruct astrophysical images from sparse data that is as close to the observing instrument as possible.

A primer on transformers for the uninitiated

Biology/Life Sciences

There has been a lot of work applying transformers to genomic or protein sequence problems. It makes a lot of sense, as genomic data is readily representable as a sequence of letters, so the transition to LLM modeling is simple (in theory)–in reality, the rules of language do not directly translate to genomics, so the problem is quite different.

Long sequence modeling in transformers

Fine-tuning: making the transformer work for you

Modifications to RLHF approaches

Unlearning and controlled forgetting

Chemistry/Molecular applications

The molecular transformer

Physics and physical systems

Materials/Condensed Matter Physics

Astrophysics/Cosmology

Mathematics

Foundation models for PDEs