Seurat Weekly NO.2 || 我该如何取子集？

Original 周运来生信菜鸟团 2022-08-10

收录于合集 #Seurat Weekly 18个

天子呼来不上船，
自称臣是菜鸟团。

在这里，和国际同行一起学习单细胞数据分析。

本期的封面故事是：

Randomly downsample seurat object .

我们知道单细胞数据挖掘的基本思路是

缩小基因集和
缩小细胞子集

足见对Seurat对象取子集多重要。

不过，在这之前我们先看几个数据处理的Tips. 读入数据，并人工生成两个分组：

library(Seurat)
pbmc <- readRDS(file = "F:\\Rstudio\\SingleR\\data\\pbmc5k.rds")
pbmc
An object of class Seurat 
18792 features across 4666 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

# pretend that cells were originally assigned to one of two replicates (we assign randomly here)
# if your cells do belong to multiple replicates, and you want to add this info to the Seurat
# object create a data frame with this information (similar to replicate.info below)
set.seed(42)
pbmc$replicate <- sample(c("rep1", "rep2"), size = ncol(pbmc), replace = TRUE)


                   orig.ident nCount_RNA nFeature_RNA percent.mito percent.HB RNA_snn_res.0.6 seurat_clusters
AAACCCAAGCGTATGG-1     pbmc5k      13509         3498    10.659560          0               1               1
AAACCCAGTCCTACAA-1     pbmc5k      12637         3382     5.610509          0               1               1
AAACGCTAGGGCATGT-1     pbmc5k       5743         1798    10.691276          0               7               7
AAACGCTGTAGGTACG-1     pbmc5k      13107         2888     7.866026          0               9               9
AAACGCTGTGTCCGGT-1     pbmc5k      15510         3807     7.446809          0               3               3
AAACGCTGTGTGATGG-1     pbmc5k       6131         2348     9.982058          0               5               5
                   replicate
AAACCCAAGCGTATGG-1      rep1
AAACCCAGTCCTACAA-1      rep1
AAACGCTAGGGCATGT-1      rep1
AAACGCTGTAGGTACG-1      rep1
AAACGCTGTGTCCGGT-1      rep2
AAACGCTGTGTGATGG-1      rep2

分群和样本间标签转换

分群标签：

# Plot UMAP, coloring cells by cell type (currently stored in object@ident)
DimPlot(pbmc, reduction = "umap")

样本标签：

# How do I create a UMAP plot where cells are colored by replicate?  First, store the current
# identities in a new column of meta.data called CellType
pbmc$CellType <- Idents(pbmc)
# Next, switch the identity class of all cells to reflect replicate ID
Idents(pbmc) <- "replicate"
DimPlot(pbmc, reduction = "umap")

这里注意Idents的应用，可以直接指定meta.data某一列作为pbmc@active.ident.演示完了，我们再把标签换回来。

# alternately : DimPlot(pbmc, reduction = 'umap', group.by = 'replicate') you can pass the
# shape.by to label points by both replicate and cell type

# Switch back to cell type labels
Idents(pbmc) <- "CellType"

其实我们不转换ident大部分的绘图也可以用group.by来指定分群方式

DimPlot(pbmc, reduction = "umap",group.by = "replicate")

统计分群信息

# How many cells are in each cluster
table(Idents(pbmc))

  0    1    2    3    4    5    6    7    8    9   10   11   12 
1183  752  662  325  314  311  305  260  258  212   32   26   26

# How many cells are in each replicate?
table(pbmc$replicate)

rep1 rep2 
2356 2310

prop.table(table(Idents(pbmc)))

          0           1           2           3           4           5           6           7           8 
0.253536219 0.161165881 0.141877411 0.069652808 0.067295328 0.066652379 0.065366481 0.055722246 0.055293613 
          9          10          11          12 
0.045435062 0.006858123 0.005572225 0.005572225

# How does cluster membership vary by replicate?
table(Idents(pbmc), pbmc$replicate)

     rep1 rep2
  0   599  584
  1   405  347
  2   315  347
  3   170  155
  4   159  155
  5   140  171
  6   138  167
  7   140  120
  8   132  126
  9   110  102
  10   18   14
  11   15   11
  12   15   11

prop.table(table(Idents(pbmc), pbmc$replicate), margin = 2)

            rep1        rep2
  0  0.254244482 0.252813853
  1  0.171901528 0.150216450
  2  0.133701188 0.150216450
  3  0.072156197 0.067099567
  4  0.067487267 0.067099567
  5  0.059422750 0.074025974
  6  0.058573854 0.072294372
  7  0.059422750 0.051948052
  8  0.056027165 0.054545455
  9  0.046689304 0.044155844
  10 0.007640068 0.006060606
  11 0.006366723 0.004761905
  12 0.006366723 0.004761905

常见的频率统计图：

as.data.frame(prop.table(table(Idents(pbmc), pbmc@meta.data[,"replicate"]), margin = 2))-> pdf -> td
library(tidyverse)
allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB","#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00","#808000","#FF00FF","#FA8072","#7B68EE","#9400D3","#800080","#A0522D","#D2B48C","#D2691E","#87CEEB","#40E0D0","#5F9EA0","#FF1493","#0000CD","#008B8B","#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4","#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347","#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")
allcolour -> colour1
plt<- ggplot(td,aes(x=td[,2],y=td[,3],fill=td[,1]))+
  geom_bar(position = 'stack',stat="identity")+
  labs(x="replicate",y="Cells Ratio")+
  theme(panel.background=element_rect(fill='transparent', color='black'),
        legend.key=element_rect(fill='transparent', color='transparent'),axis.text = element_text(color="black"))+
  scale_y_continuous(expand=c(0.001,0.001))+
  scale_fill_manual(values=colour1)+
  guides(fill = guide_legend(keywidth = 1, keyheight = 1,ncol=1,title = 'Cluster'))

plt

封面故事

取子集

Randomly downsample seurat object

讨论池：

github: https://github.com/satijalab/seurat/issues/3108

pbmc[, sample(colnames(pbmc), size =30, replace=F)]
An object of class Seurat 
18792 features across 30 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

当我们进入Seurat的进一步分析的时候，就会遇到这种情况：

取符合某种规律的细胞出来分析
取符合某种规律的基因出来分析
取符合某种规律的Seurat对象
循环地取符合某种规律的对象
随机取细胞子集
取Seurat对象中数据存储某一部分
Seurat 在计算的过程中默认值的过滤，特别是基因,如：

Genes filtering prior to Normalization #3104

讨论池：

https://github.com/satijalab/seurat/issues/3104

missing features in SCT scale data of integrated object #3056

讨论池：

https://github.com/satijalab/seurat/issues/3056

有时候在做了用了相应的函数后发现基因数少了很多，其实是函数默认值卡掉了，这个是可以自己设置的。

取子的动机很简单，就是为了提出来重点分析：

统计某类特征
与其他分析结合

# What are the cell names of all 12 cells?
WhichCells(pbmc, idents = "12")
 [1] "AAGATAGTCTTTACAC-1" "AATTCCTAGGATCACG-1" "ACCGTTCTCTTAGCAG-1" "ACCTACCGTGGACCAA-1" "AGGACTTGTAGCGAGT-1" "AGGTAGGGTACTCGAT-1" "CATTCTAAGATCGGTG-1"
 [8] "CGGGTGTGTGTTACTG-1" "CTAAGTGTCCTCAGAA-1" "CTCAGGGTCAACCTTT-1" "CTCTCGACAGGTCCGT-1" "GAAATGAAGCGCACAA-1" "GAGGGTATCCTAGCTC-1" "GGATCTATCGGCTATA-1"
[15] "GGGTCTGAGCGACATG-1" "GGGTTATCATGGAGAC-1" "GTAGAGGTCAGGACAG-1" "GTAGGTTCAGGGTTGA-1" "GTTGCGGCACTTGAGT-1" "TAACACGCACTGCACG-1" "TAACTTCTCAGGTGTT-1"
[22] "TCGGGTGGTCTGTTAG-1" "TCTTTGACAAACCATC-1" "TTACAGGTCGCCTCTA-1" "TTCCAATTCCACGAAT-1" "TTCCTTCTCAACACGT-1"

特定群的count矩阵：

# How can I extract expression matrix for all ident  =  12  cells (perhaps, to load into another package)
subraw.data <- as.matrix(GetAssayData(pbmc, slot = "counts")[, WhichCells(pbmc, ident = "12")])
subraw.data[1:4,1:4]

              AAGATAGTCTTTACAC-1 AATTCCTAGGATCACG-1 ACCGTTCTCTTAGCAG-1 ACCTACCGTGGACCAA-1
RP11-34P13.7                   0                  0                  0                  0
FO538757.2                     0                  0                  0                  1
AP006222.2                     0                  1                  1                  0
RP4-669L17.10                  0                  0                  0                  0

Seurat 对象，特定分组：


 subset(pbmc, subset = replicate == "rep2")
An object of class Seurat 
18792 features across 2310 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

特定亚群：

# Can I create a Seurat object of just the 12 cells and 11 cells?
subset(pbmc, idents = c("12", "11"))
An object of class Seurat 
18792 features across 52 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

表达量条件：


 # Can I create a Seurat object based on expression of a feature or value in object metadata?
 subset(pbmc, subset = MS4A1 > 1)
An object of class Seurat 
18792 features across 567 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

排除法：

# Can I create a Seurat object of all cells except the 12 cells and 11  cells?

subset(pbmc, idents = c("12", "11"), invert = TRUE)
An object of class Seurat 
18792 features across 4614 samples within 1 assay 
Active assay: RNA (18792 features)
 3 dimensional reductions calculated: pca, tsne, umap

计算平均表达量

# How can I calculate the average expression of all cells within a cluster?
cluster.averages <- AverageExpression(pbmc)
head(cluster.averages[["RNA"]][, 1:5])

                       0            1           2           3           4
RP11-34P13.7  0.006605121 0.0127983446 0.010980787 0.008408694 0.006315117
FO538757.2    0.318666146 0.5629066563 0.323559434 0.596101470 0.378227054
AP006222.2    0.076024402 0.0882946264 0.088864328 0.080678612 0.082825181
RP4-669L17.10 0.005443071 0.0081874454 0.001254630 0.005508605 0.004664880
RP5-857K21.4  0.002683901 0.0008380836 0.001704552 0.000000000 0.004602852
RP11-206L10.9 0.099380365 0.0914327250 0.099045164 0.110635758 0.109613675

返回一个平均表达的Seurat对象：

# Return this information as a Seurat object (enables downstream plotting and analysis) First,
# replace spaces with underscores '_' so ggplot2 doesn't fail
#orig.levels <- levels(pbmc)
#Idents(pbmc) <- gsub(pattern = " ", replacement = "_", x = Idents(pbmc))
#orig.levels <- gsub(pattern = " ", replacement = "_", x = orig.levels)
#levels(pbmc) <- orig.levels
cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE)
cluster.averages

An object of class Seurat 
18792 features across 13 samples within 1 assay 
Active assay: RNA (18792 features)

 head(cluster.averages@meta.data)
  orig.ident nCount_RNA nFeature_RNA
0    Average      10000        16897
1    Average      10000        15744
2    Average      10000        16344
3    Average      10000        15908
4    Average      10000        14570
5    Average      10000        14523

啊，为什么nCount_RNA都是10000？计算平均的数据用的solt一定是data了，验证一下：?AverageExpression果然。

# How can I calculate expression averages separately for each replicate?

cluster.averages <- AverageExpression(pbmc, return.seurat = TRUE, add.ident = "replicate")

一把短刀，怎么就让他连捅18人？！

向杨大市长道歉

向不容妄议的杨市长道歉

以色列搞大了：伊朗说要直接出兵参战

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

Seurat Weekly NO.2 || 我该如何取子集？

分群和样本间标签转换

统计分群信息

封面故事

您可能也对以下帖子感兴趣

一把短刀，怎么就让他连捅18人？！

向杨大市长道歉

向不容妄议的杨市长道歉

以色列搞大了：伊朗说要直接出兵参战

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

生成图片，分享到微信朋友圈

Seurat Weekly NO.2 || 我该如何取子集？

分群和样本间标签转换

统计分群信息

封面故事

您可能也对以下帖子感兴趣