Projects in the Sasse lab
For more information on projects and open positions. Please email us at office-sasse@zmbh.uni-heidelberg.de Multi-modal Deep Neural Networks to understand the cis-regulatory code at single-cell resolution
The cis-regulatory code governs when, where, and how much of gene products are created in cells, enabling the formation of various cell types from a single genome. Variants disrupting this code can lead to genetic diseases. Experimentally measuring all cis-regulatory sequence variations is nearly impossible. However, deep sequence-to-function models can learn the relationship between genomic sequences and their regulatory function from massive collections of genome-wide datasets. In this process, they gain knowledge about the regulatory sequence grammar of our cells which can be used to predict the effects of unseen regulatory sequences. However, current state-of-the-art models have limitations, particularly in understanding complex sequence grammar arising from the complex multi-layered regulatory processes or the interplay of distal sequence elements. Improving these models' foundational understanding of gene regulation involves enhancing cell type resolution, integrating various data modalities, and using cross-species data. This poses significant engineering challenges, and new model architectures are needed to effectively manage and learn from the large, and diverse datasets. This project will develop multi-modal sequence-to-function models and new explainable AI methods in pytorch or similar deep learning environments to determine the cis-regulatory code of multi-cellular species. Generative genomic sequence-to-function models to design cell type specific regulatory nucleotide sequences
Genomic sequence-to-function models can learn the cis-regulatory language from a large set of genome-wide measurements. Combining these deep sequence-to-function models with generative processes enables us to exploit these model¿s knowledge to generate new synthetic sequences with specific regulatory functions. However, many different challenges remain in this process, such as model uncertainty outside of training distributions, lacking sequence diversity and variation. A big issue for algorithm development is the lack of experimental feedback. After training on an original data set, subsequent rounds to assess the designed sequences are often inaccessible to computational labs. Yeast is a versatile model organism that allows for high-throughput manipulation and measurement of regulatory elements for fast evaluation of designed sequences. This project will use high-throughput methods in collaboration with the Knop, and the Synthetic DNA accelerator lab to test and refine in silico sequence design methods. Cell type agnostic sequence-to-function models to predict effects from transcription factor perturbtations
Deep sequence-to-function models can learn the relationship between genomic sequences and gene function across cell types, but current models struggle to generalise to unseen cell types or conditions. This project aims to develop advanced S2F models that can use information about the cell state in addition to genomic sequence to predict effects on gene regulation in unseen cells and conditions. This project will use widely available collections of data sets in combination with data from genome-wide measurements following transcription factor perturbations to improve their ability to learn about their influence on gene-regulatory mechanisms in previously unmeasured conditions. The goal is to enhance the mechanistic understanding of these next-generation models about gene regulatory processes to improve their predictions of variant effects across cell types. Search pubmed for publications by Alexander Sasse
TODO
|