This document compares two approaches to analyse single-cell data: First, we use Seurat, and then we perform all steps using only basic R, in order to be able to study and modify each step. I have presented it at the Single Cell Center Heidelberg meeting in Mar 2019.

As example data, I use the study on the development of the mouse brain by Carter et al.: “A single-cell transcriptional atlas of the developing murine cerebellum”, Current Biology 28:2910 (2018).

Specifically, I will use their sample E13a, i.e., which comprises cells from the cerebellum of a mouse embryo at day 13.5. The “a” means that it is the first of their two replicates.

I first perform a standard analysis with Seurat.

`library( Seurat )`

Kevin has downloaded the bam files from the European Nucleotide Archive (accession PRJEB23051) and run them again through CellRanger. In the next line, I read in the output from CellRanger. If you want to rerun this script yourself, you can find the three CellRanger output files for sample E13a here.

`seu <- Read10X("E13_A/")`

Now I create a Seurat object, keeping only the genes that are expressed in at least 3 cells, and only those cells expressing at least 1000 genes.

`seu <- CreateSeuratObject( seu, min.cells = 3, min.genes=1000 )`

I normalize the data. This performs, by default, the transformation \(y_{ij} = \log\left( 1 + 10^4 \frac{k_{ij}}{\sum_{i'}k_{i'k}}\right)\) (where \(k_{ij}\) is the UMI count for gene \(i\) in cell \(j\)).

`seu <- NormalizeData( seu )`

A short detour here on the topic of normalization:

The median UMI count in our cells is

```
median_umi_count_per_cell <- median( colSums(seu@raw.data) )
median_umi_count_per_cell
```

`## [1] 6199`

Here is a plot showing how the transformation changes UMI counts for such a “median” cell

```
plot( 0:100, log( 1 + 1e4 * (0:100) / median_umi_count_per_cell ),
type = "l", xlab="UMI count", ylab="expression measure", ylim = c(0, 6.5) )
meet_at_50 <- function(y)
y / y[51] * log( 1 + 1e4 * 50 / median_umi_count_per_cell )
lines( 0:100, meet_at_50( log( 1 + 1e3 * (0:100) / median_umi_count_per_cell ) ), type = "l", col="darkgreen" )
lines( 0:100, meet_at_50( log( 1 + 1e5 * (0:100) / median_umi_count_per_cell ) ), type = "l", col="darkblue" )
lines( 0:100, meet_at_50( log10( 1 + (0:100) ) ), type = "l", col="green" )
lines( 0:100, meet_at_50( sqrt( (0:100) / median_umi_count_per_cell ) ), type = "l", col="orange" )
```

The black line above is Seurat’s default normalization. For future reference, I have also included the transformation if one changes Seurat’s scaling factor from the default \(10^4\) (black) to \(10^3\) (dark green) or \(10^5\) (dark blue). Light green is the often used simple \(y_{ij} = \log_{10}(1+k_{ij})\) and orange is the square-root transform \(y_{ij} = \sqrt{ {k_{ij}}/{\sum_{i'}k_{i'k}}}\). I have rescaled these extra lines so that they all meet at 50.

Continuing with the standard Seuirat work-flow: Find the variable genes

`seu <- FindVariableGenes( seu )`

Scale and center the data and regress out UMI totals

`seu <- ScaleData( seu, vars.to.regress = "nUMI" )`

`## Regressing out: nUMI`

```
##
## Time Elapsed: 29.526376247406 secs
```

`## Scaling data matrix`

Run a PCA to reduce the data

`seu <- RunPCA( seu, do.print = FALSE )`

Perform Louvain clustering

`seu <- FindClusters( seu, print.output = FALSE, save.SNN = TRUE )`

Get an embedding. I use UMAP. By default, `RunUMAP`

only uses the first 5 PCS, but this gives quite unsatisfactory results. Hence, I use a few more, otherwise it get’s too blurry.

`seu <- RunUMAP( seu, dims.use=1:20 )`

Plot the embedding, and highlight the Louvain clusters:

`DimPlot( seu, "umap" )`