Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
  • Project No: NCKIR4
  • Intake: 2022 KIR Non Clinical

Supervisor: Luke Jostins-Dean 

Co-Supervisor: Yang Luo


A key goal in the study of human disease genetics is to understand the impact of genetic variation on the phenotype of individual cells. Single cell sequencing techniques, such as single-cell RNA-seq (scRNA-seq), allow us to characterize gene expression and regulation at very high resolution. When applied to, for instance, inflammatory bowel disease (IBD), these experimental techniques have identified new cell types that are dysregulated in disease and have the potential to shed light on the cell types where genetic risk variants are active. New studies are coming online that integrate data across hundreds of genotyped individuals, in order to directly study how genetic variants impact gene expression of fine-grained cell types, by discovering single-cell expression quantitative trait loci (eQTLs).

However, in practice single-cell sequencing data is sparse, with only a handful of observations per gene, which introduces noise and censorship. Standard analysis approaches pool data across cells with similar expression profiles, which overcomes these issues but only at the expense of single-cell resolution. In this project, the student will develop new computational approaches that share data across cell types without collapsing them into clusters, to allow us to map the impact of genetic variants at a truly single-cell level and allowing us to answer the direct question “what is the impact of genetic variant X on expression of gene Y in specific cell Z”.

To answer this question, the student will develop a nonparametric statistical framework to allow true single-cell parameter estimation from single cell sequencing data. High dimensional Gaussian process models have successfully been used to model single-cell data. By setting a prior on the similarity of cell states as a function of their distance on an underlying latent space, the Gaussian process approach allows information to be shared across cells while still modelling each as a unique data point. The student will extend this approach to produce a generative model of single-cell gene expression across multiple individuals, with genotype-driven, individual-driven and cell-intrinsic sources of variation. This model will be used to estimate key parameters of biological interest (such as the predicted impact of a specific genetic variant on expression of a gene in a specific cell).  The student will then apply this method to a range of datasets, including circulating immune cells and intestinal biopsy data, from both health and disease, in order to fine-map the impact of disease-associated genetic variants to individual cells. Further bioinformatic analysis of identified cells will be used to characterise the molecular pathways that these variants impact, to develop hypotheses for follow-up experiments with collaborators, and to suggest potential cell-types and molecules as drug targets.


Genetics, statistics, single-cell, machine learning, eQTLs


This project is well suited to a student with a background in statistical genetics, or a background in statistical modelling or machine learning who is interested in developing applied knowledge in the biological sciences. Training will be provided in R programming, if required, and in the principles and theory of statistical genetics and genome analysis. There will be opportunities to collaborate with both computational and experimental scientists, as well as receive training in cutting-edge analysis techniques for high-throughput genetic and genomic datasets. 


  1. Gutierrez-Arcelus et al (2020). Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci Nat Genet. 52(3): 247–253.
  2. Oelen et al (2021) Single-cell RNA-sequencing reveals widespread personalized, context-specific gene expression regulation in immune cells  bioRxiv
  3. Jostins et al (2012) Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease, Nature 491(7422):119-24
  4. Verma and Engelhardt (2019) A robust nonlinear low-dimensional manifold for single cell RNA-seq data. bioRxiv
  5. Titsias and Lawrence (2010) Bayesian Gaussian Process Latent Variable Model Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR 9:844-851, 2010.


Inflammation biology, computational biology


Luke Jostins-Dean