Projects of the Sasse Lab

Projects in the Sasse lab

For more information on projects and open positions. Please email us at office-sasse@zmbh.uni-heidelberg.de

Multi-modal Deep Neural Networks to understand the cis-regulatory code at single-cell resolution

The cis-regulatory code governs when, where, and how much of gene products are created in cells, enabling the formation of various cell types from a single genome. Variants disrupting this code can lead to genetic diseases. Experimentally measuring all cis-regulatory sequence variations is nearly impossible. However, deep sequence-to-function models can learn the relationship between genomic sequences and their regulatory function from massive collections of genome-wide datasets. In this process, they gain knowledge about the regulatory sequence grammar of our cells which can be used to predict the effects of unseen regulatory sequences. However, current state-of-the-art models have limitations, particularly in understanding complex sequence grammar arising from the complex multi-layered regulatory processes or the interplay of distal sequence elements. Improving these models' foundational understanding of gene regulation involves enhancing cell type resolution, integrating various data modalities, and using cross-species data. This poses significant engineering challenges, and new model architectures are needed to effectively manage and learn from the large, and diverse datasets. This project will develop multi-modal sequence-to-function models and new explainable AI methods in pytorch or similar deep learning environments to determine the cis-regulatory code of multi-cellular species.

Generative genomic sequence-to-function models to design cell type specific regulatory nucleotide sequences

Genomic sequence-to-function models can learn the cis-regulatory language from large sets of genome-wide measurements. By combining these deep sequence-to-function models with generative approaches, we can exploit their learned representations to generate new synthetic sequences with specific regulatory functions. However, many challenges remain in this process, including model uncertainty outside the training distribution and limited sequence diversity and variation. A major challenge for algorithm development is the lack of experimental feedback. After training on an initial dataset, subsequent rounds to evaluate designed sequences are often inaccessible to computational laboratories. At the Center for Synthetic Genomics (SynGen), we collaborate with a wide range of center-associated labs, spanning experimental systems from Drosophila and zebrafish to human cell lines, to enable high-throughput manipulation and measurement of regulatory elements. This allows rapid evaluation of designed regulatory landscapes at scale. A core component of the Center for Synthetic Genomics is the Synthetic DNA Accelerator Lab, which provides automated and cost-effective assembly of large DNA sequence elements. This infrastructure is essential for closing the design-build-test loop, enabling rapid experimental validation and iterative refinement of in silico sequence design methods.

Cell type aware sequence-to-function models to predict effects of regulatory factor perturbations

Deep sequence-to-function models can learn relationships between genomic sequences and regulatory signals across the cell types on which they are trained, effectively capturing a specific regulatory “dialect” for each cell type or tissue. However, current models often struggle to generalize to unseen cell types or conditions and typically lack explicit mechanistic interpretability. This project aims to develop advanced sequence-to-function (S2F) models that move beyond purely predictive frameworks toward mechanistic, biophysically informed representations of gene regulation. These models will incorporate information about cell state—such as regulatory factor expression—in addition to genomic sequence, enabling prediction of gene regulatory effects in previously unseen cells and conditions. By integrating widely available collections of datasets with genome-wide measurements following transcription factor perturbations, we aim to improve the models' ability to learn how regulatory factors interact across multiple regulatory layers to control gene expression. The central goal of this project is to build models that not only predict regulatory outcomes, but also capture mechanistic principles of gene regulation that can be interpreted as biological knowledge. In particular, we seek to uncover how regulatory factors interact across different layers—such as DNA sequence, chromatin state, and transcription factor activity—to coordinate gene expression. In parallel, novel interpretation methods will be developed to extract mechanistic insights from these models, providing a framework to systematically describe regulatory interactions and improve predictions of variant effects across diverse cell types and conditions.

Search pubmed for publications by Alexander Sasse

TODO