Large language models (GPT-N from OpenAI, Claude from Anthropic, LLaMAs from Meta, the list goes on) have emerged as the de-facto standard of language modeling. Transformer models trained on language show incredible capacity and generalizability that was unheard of in deep learning before 2017. However, the applications of LLMs are not limited to language. In fact, LLMs are a generalizable tool that can be applied to any data that can be represented as a sequence of tokens. This includes some well-researched applications (images, audio, video), but that is not the end of where transformers can be applied. This page hosts some works that focus on applying transformers to various domains of scientific data. By no means is this list complete, but it has some interesting efforts in it.

I am currently working to apply transformer architectures to three separate problems:

  • Genomic data, in particular, long context-windows for identifying long-range dependencies in genomic data
  • Cosmological data: using transformers to model the merger history of massive galaxies and clusters
  • A foundational model of astrophysical observation: Using transformers to learn to reconstruct astrophysical images from sparse data that is as close to the observing instrument as possible.

A primer on transformers for the uninitiated

Biology/Life Sciences

There has been a lot of work applying transformers to genomic or protein sequence problems. It makes a lot of sense, as genomic data is readily representable as a sequence of letters, so the transition to LLM modeling is simple (in theory)–in reality, the rules of language do not directly translate to genomics, so the problem is quite different.

  1. EVO: Long-context DNA transformer
  2. A Review: Transformer-based deep learning for predicting protein properties in the life sciences
  3. ESM-1/2 is the SOTA for protein applications; learned on amino-acid representations
  4. GenSLMs trained on gene sequences as 3-mer represenations as applied to COVID
  5. Single-nucleotide genomic model with >500K context window

Long sequence modeling in transformers

  1. Position interpolation with RoPE
  2. Feedback loops for infinite working memory
  3. YaRN: Yet another RoPE extensioN method
  4. Data engineering for 128K+ context windows
  5. Longer contexts via rediculous amounts of fine-tuning, but minimal model changes
  6. Extending context windows with 100 samples

Fine-tuning: making the transformer work for you

  1. ORPO: preference optimization without a reference model
  2. Adversarial preference optimization
  3. Direct preference optimization
  4. Limitations of instruction tuning for LLMs

Modifications to RLHF approaches

  1. Leveraging reward models for more robust modeling
  2. Thinking before speaking

Unlearning and controlled forgetting

  1. LLM Unlearning
  2. Who’s Harry Potter?

Chemistry/Molecular applications

  1. The molecular transformer

Physics and physical systems

  1. Transformers for modeling physical systems: a review
  2. Astronomical foundation models for Stars

Materials/Condensed Matter Physics

  1. Generative materials design with transformers
  2. Crystal Transformer
  3. Predicting polymer properties
  4. Predict multiscale physics fields and nonlinear material properties

Astrophysics/Cosmology

  1. Time-series transformer for Photometric Classification
  2. Representing light curves with transformer embeddings

Mathematics

  1. Foundation models for PDEs