扶沟修车招工群

发布日期：2025-12-17 15:01 点击次数：178

图片扶沟修车招工群

绪论

Hello小伙伴们各人好，我是生信技巧树的小学徒”我才不吃蛋黄“。继胃癌单细胞复现系列已毕之后，经过了顷刻的休整，我又归来了。之前每一期的推文，我齐有厚爱准备。推文发出之后，也仔细阅读了每一条留言。相等感谢各人的荧惑。

接下来的一段时刻里，我将再次开启一个新的学徒共享系列，给各人系统整理肺腺癌单细胞测序的代码。

著述的具体内容，二由淳厚已在生信菜鸟团公众号发布，请各人移步阅读：单细胞测序+空间转录组刻画从癌前病变到浸润性肺腺癌的动态演变。

本系列包括但不限于以下内容：数据下载与读取；质控和去批次；降维聚类；分群堤防；互异分析；富集分析；亚群细分；inferCNV;拟时序分析；细胞通信；SCENIC转录因子分析等。

为了加深对代码的领悟，除了阅读推文，不错加入单细胞月更群，也不错互助不雅看生信技巧树B站系列教悔视频。

图片

要是你是的单细胞生人，保举报名插足生信马拉松课程（生信初学&数据挖掘线上直播课10月班），让你快速掌执生信数据挖掘技巧。

本系列推文展望12篇掌握，接待各人不息追更，关爱公众号，点赞+批驳+储藏+在看，您的荧惑将是我更新的能源，同期接待各人品评指正。

数据处置历程

最初吊销系统环境变量，加载R包：

rm(list=ls())options(stringsAsFactors = F) getwd()setwd("../GSE189357-LUAD-scRNA-ST")source('scRNA_scripts/lib.R')library(Seurat)library(ggplot2)library(clustree)library(cowplot)library(data.table)library(dplyr)

当我们在配置数据框的时候，R讲话将会默许把字符型（character）当成因子（factor）。stringsAsFactors = F意味着，“在读入数据时，遭受字符串之后，不将其治愈为factors，仍然保留为字符串神色”。

在R讲话中，stringsAsFactors是一个全局选项，它扬弃着当读取数据时，字符串类型的列是否自动治愈为因子(factors)类型。因子类型在R中是一种特殊的数据类型，用于暗示分类数据。

在R的早期版块中，默许情况下，读取数据时字符串会被治愈为因子。可是，因子类型有其特定的属性和举止，这在某些情况下可能不是用户所盼望的。举例，因子类型会将数据视为分类变量，而且会存储数据的惟一值当作里面整数编码，这可能会加多内存使用量，而且在进行某些类型的数据分析时可能会引起混浊。

在单细胞数据分析中，数据频繁包含多数的基因抒发信息，这些信息是以字符串阵势存在的。在进行数据分析之前，配置stringsAsFactors = F的观念频繁是为了：

幸免无须要的内存滥用：将字符串治愈为因子类型会加多内存的使用，因为因子需要存储一个里面的整数编码和对应的惟一值列表。在处置大限制的单细胞数据集时，这可能会成为一个问题。

保持数据的一致性：在数据分析过程中，保持数据类型的一致性是很报复的。要是某些操作盼望数据是字符串类型，而数据被自动治愈为因子，这可能会导致异常或者不一致的完毕。

擢升代码的可读性和可叹气性：明确地配置stringsAsFactors = F不错让代码的读者知谈数据不会被自动治愈为因子，这有助于领悟代码的意图和举止。

幸免因子类型带来的潜在问题：因子类型在某些统计分析中可能会引起问题，比如在进行机器学习或者数据挖掘时，因子类型的数据可能需要特别的处置智力被正确使用。

因此，在读取单细胞数据之前配置stringsAsFactors = F是一种考究的编程推论，它有助于确保数据以正确的阵势被处置，而且不错减少后续分析中可能出现的问题。

step1:导入数据

数据存储在“GSE189357_RAW”文献夹中：

dir='GSE189357_RAW/' fs=list.files('GSE189357_RAW/','^GSM')fslibrary(tidyverse)samples=str_split(fs,'_',simplify = T)[,1]

处置数据，将原始文献永别整理为barcodes.tsv.gz，features.tsv.gz和matrix.mtx.gz到各自的文献夹。批量将文献名改为 Read10X() 函数概况识别的名字：

if(F){lapply(unique(samples),function(x){  # x = unique(samples)[1]  y=fs[grepl(x,fs)]  folder=paste0("GSE189357_RAW/", paste(str_split(y[1],'_',simplify = T)[,1:2], collapse = "_"))  dir.create(folder,recursive = T)  #为每个样本创建子文献夹  file.rename(paste0("GSE189357_RAW/",y[1]),file.path(folder,"barcodes.tsv.gz"))  #重定名文献，并转移到相应的子文献夹里  file.rename(paste0("GSE189357_RAW/",y[2]),file.path(folder,"features.tsv.gz"))  file.rename(paste0("GSE189357_RAW/",y[3]),file.path(folder,"matrix.mtx.gz"))})}

自行创建函数function，内置Read10X()函数和CreateSeuratObject()函数，批量读取文献原始文献创建Seurat对象：

dir='GSE189357_RAW/'samples=list.files( dir )samples sceList = lapply(samples,function(pro){   # pro=samples[1]   print(pro)    tmp = Read10X(file.path(dir,pro ))   if(length(tmp)==2){ ct = tmp[[1]]   }else{ct = tmp}  sce =CreateSeuratObject(counts =  ct ,  project =  pro  ,  min.cells = 5,  min.features = 300 )  return(sce)})

这里我们获取的sceList实验上包含了9个样本的Seurat对象，接着，泰国按摩群我们需要使用Seurat包的merge函数，将九个Seurat吞并成一个Seurat对象：

do.call(rbind,lapply(sceList, dim))sce.all=merge(x=sceList[[1]],  y=sceList[ -1 ],  add.cell.ids = samples  ) names(sce.all@assays$RNA@layers)#> [1] "counts.GSM5699777_TD1" "counts.GSM5699778_TD2" "counts.GSM5699779_TD3"#> [4] "counts.GSM5699780_TD4" "counts.GSM5699781_TD5" "counts.GSM5699782_TD6"#> [7] "counts.GSM5699783_TD7" "counts.GSM5699784_TD8" "counts.GSM5699785_TD9"

此时，天然我们如故完成了Seurat对象的创建，可是不错看到，九个样本仍然有9个layers。要是不进一步处置，后续在索取counts时数据不完好，分析会一直出错。因此我们需要使用JoinLayers函数对layers进行吞并。

sce.all[["RNA"]]$counts # Alternate accessor function with the same resultLayerData(sce.all, assay = "RNA", layer = "counts")

望望吞并前后的sce变化：

sce.allsce.all <- JoinLayers(sce.all)sce.alldim(sce.all[["RNA"]]$counts )

在吞并Seurat和layers后，终于获取了一个完好的Seurat对象”sce.all“。我们不错检讨”sce.all“里面的一些信息，以此来查验数据是否完好。

as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])head(sce.all@meta.data, 10)table(sce.all$orig.ident) length(sce.all$orig.ident)# fivenum(sce.all$nFeature_RNA)# table(sce.all$nFeature_RNA>800) # sce.all=sce.all[,sce.all$nFeature_RNA>800]# sce.all

在得手构建Seurat对象”sce.all“后，我们还需要给样本添加meta.data分组信息，以便后续作念不同分组之间的对比以及索取亚组后进行进一步分析。最初，我们检讨现存的meta.data信息有哪些：

library(stringr)phe = sce.all@meta.datatable(phe$orig.ident)phe$patient = str_split(phe$orig.ident,'[_]',simplify = T)[,2]table(phe$patient)

按照患者起原，肺腺癌（LUAD）从原位腺癌（AIS）发扬到微浸润性腺癌（MIA）和随后的浸润性腺癌（IAC）

phe$sample = phe$patienttable(phe$sample)phe$sample = gsub("TD5|TD7|TD8", "AIS", phe$sample) phe$sample = gsub("TD3|TD4|TD6", "MIA", phe$sample) phe$sample = gsub("TD1|TD2|TD9", "IAC", phe$sample) table(phe$sample)sce.all@meta.data = phe

step2: QC质控

质控（quality control, QC）的观念是为了去除质地较差细胞，低质地的细胞会变成独有的亚群，使分群完毕变得复杂；在主因素分析过程中，前几个主要因素将拿获质地互异而不是生物学互异，从而缩小降维后果。因此，低质地的细胞可能会导致下流分析出现误导性完毕。为了幸免上述情况的发生，我们需要不才游分析启动前排撤回这些低质地细胞。QC主要的策动有nCount_RNA(每个细胞的UMI数量)和nFeature_RNA（每个细胞中检测到的基因数量）以及"percent_mito"（暗示细胞中线粒体基因的比例）这三个策动。此外，还不错纳入”percent_ribo“（核糖体基因比例）和”percent_hb“（红血细胞基因比例）两个策动。细胞的UMI分子数和基因数反应细胞的质地，数量太低可能是细胞碎屑，太高则可能是两个或多个细胞结团的情况；线粒体基因高抒发的细胞，可能是处于凋一火情景或者裂解情景的细胞；核糖体基因高抒发的细胞，细胞内出现RNA降解时；血红卵白基因高抒发的细胞频繁为红细胞，而红细胞自身是不含有细胞核的，且佩戴的基因少，其佩戴的基因与疾病以及生物发育等过程莫得太大的关连，是以不错奏凯剔撤回。在这里，我们要愚弄PercentageFeatureSet函数永别经营每个细胞的"percent_mito"，”percent_ribo“和”percent_hb“，具体可见scripts代码

sp='human'dir.create("./1-QC")setwd("./1-QC")# 要是过滤的太狠，就需要去修改这个过滤代码source('../scRNA_scripts/qc.R')sce.all.filt = basic_qc(sce.all)print(dim(sce.all))print(dim(sce.all.filt))##细胞减少了极少setwd('../')getwd()fivenum(sce.all.filt$percent_ribo)table(sce.all.filt$nFeature_RNA> 5)

qc函数如下：

basic_qc <- function(input_sce){  #经营线粒体基因比例  mito_genes=rownames(input_sce)[grep("^MT-", rownames(input_sce),ignore.case = T)]   print(mito_genes) #可能是13个线粒体基因  #input_sce=PercentageFeatureSet(input_sce, "^MT-", col.name = "percent_mito")  input_sce=PercentageFeatureSet(input_sce, features = mito_genes, col.name = "percent_mito")  fivenum(input_sce@meta.data$percent_mito)    #经营核糖体基因比例  ribo_genes=rownames(input_sce)[grep("^Rp[sl]", rownames(input_sce),ignore.case = T)]  print(ribo_genes)  input_sce=PercentageFeatureSet(input_sce,  features = ribo_genes, col.name = "percent_ribo")  fivenum(input_sce@meta.data$percent_ribo)    #经营红血细胞基因比例  Hb_genes=rownames(input_sce)[grep("^Hb[^(p)]", rownames(input_sce),ignore.case = T)]  print(Hb_genes)  input_sce=PercentageFeatureSet(input_sce,  features = Hb_genes,col.name = "percent_hb")  fivenum(input_sce@meta.data$percent_hb)    #可视化细胞的上述比例情况  feats <- c("nFeature_RNA", "nCount_RNA", "percent_mito", "percent_ribo", "percent_hb")  feats <- c("nFeature_RNA", "nCount_RNA")  p1=VlnPlot(input_sce, group.by = "orig.ident", features = feats, pt.size = 0, ncol = 2) +  NoLegend()  p1   w=length(unique(input_sce$orig.ident))/3+5;w  ggsave(filename="Vlnplot1.pdf",plot=p1,width = w,height = 5)    feats <- c("percent_mito", "percent_ribo", "percent_hb")  p2=VlnPlot(input_sce, group.by = "orig.ident", features = feats, pt.size = 0, ncol = 3, same.y.lims=T) +  scale_y_continuous(breaks=seq(0, 100, 5)) + NoLegend()  p2   w=length(unique(input_sce$orig.ident))/2+5;w  ggsave(filename="Vlnplot2.pdf",plot=p2,width = w,height = 5)    p3=FeatureScatter(input_sce, "nCount_RNA", "nFeature_RNA", group.by = "orig.ident", pt.size = 0.5)  ggsave(filename="Scatterplot.pdf",plot=p3)    #阐明上述策动，过滤低质地细胞/基因  #过滤策动1:最少抒发基因数的细胞&最少抒发细胞数的基因  # 一般来说，在CreateSeuratObject的时候如故是进行了这个过滤操作  # 要是后期看到了我方的单细胞降维聚类分群完毕很诡异，就不错回过甚来看质地扬弃设施  # 先走默许历程即可  if(F){ selected_c <- WhichCells(input_sce, expression = nFeature_RNA > 500) selected_f <- rownames(input_sce)[Matrix::rowSums(input_sce@assays$RNA$counts > 0 ) > 3] input_sce.filt <- subset(input_sce, features = selected_f, cells = selected_c) dim(input_sce)  dim(input_sce.filt)   }    input_sce.filt =  input_sce    # par(mar = c(4, 8, 2, 1))  # 这里的C 这个矩阵，有极少大，不错洽商随抽样   C=subset(input_sce.filt,downsample=100)@assays$RNA$counts  dim(C)  C=Matrix::t(Matrix::t(C)/Matrix::colSums(C)) * 100  most_expressed <- order(apply(C, 1, median), decreasing = T)[50:1]    pdf("TOP50_most_expressed_gene.pdf",width=14)  boxplot(as.matrix(Matrix::t(C[most_expressed, ])), cex = 0.1, las = 1,  xlab = "% total count per cell",  col = (scales::hue_pal())(50)[50:1],  horizontal = TRUE)  dev.off()  rm(C)    #过滤策动2:线粒体/核糖体基因比例(阐明上头的violin图)  selected_mito <- WhichCells(input_sce.filt, expression = percent_mito < 25)  selected_ribo <- WhichCells(input_sce.filt, expression = percent_ribo > 3)  selected_hb <- WhichCells(input_sce.filt, expression = percent_hb < 1 )  length(selected_hb)  length(selected_ribo)  length(selected_mito)    input_sce.filt <- subset(input_sce.filt, cells = selected_mito)  #input_sce.filt <- subset(input_sce.filt, cells = selected_ribo)  input_sce.filt <- subset(input_sce.filt, cells = selected_hb)  dim(input_sce.filt)    table(input_sce.filt$orig.ident)     #可视化过滤后的情况  feats <- c("nFeature_RNA", "nCount_RNA")  p1_filtered=VlnPlot(input_sce.filt, group.by = "orig.ident", features = feats, pt.size = 0, ncol = 2) +  NoLegend()  w=length(unique(input_sce.filt$orig.ident))/3+5;w   ggsave(filename="Vlnplot1_filtered.pdf",plot=p1_filtered,width = w,height = 5)    feats <- c("percent_mito", "percent_ribo", "percent_hb")  p2_filtered=VlnPlot(input_sce.filt, group.by = "orig.ident", features = feats, pt.size = 0, ncol = 3) +  NoLegend()  w=length(unique(input_sce.filt$orig.ident))/2+5;w   ggsave(filename="Vlnplot2_filtered.pdf",plot=p2_filtered,width = w,height = 5)   return(input_sce.filt)   }

可视化

图片

Vlnplot1

图片

Vlnplot2

图片

Scatterplot

图片

TOP50_most_expressed_gene

图片

Vlnplot1_filtered

图片

Vlnplot2_filteredstep3: harmony整合多个单细胞样品

细胞筛选之后，我们需要进行harmony整合。第一期我们在创建总的Seurat对象时，使用了merge函数对多个Seurat进行了肤浅的吞并。merge仅仅按照行和列进行了吞并，并未对数据进行其他处置。

在拿到下流单细胞矩阵前，样本经验了多个实验设施的处置，时刻、处置东谈主员、试剂以及本事平台等变量齐会成为混杂因素。以上因素羼杂到一谈，就会导致数据产生批次效应（batch effect）。为了尽可能幸免混杂因素，我们不错严格把控测序的本事历程，同期也需要不才游分析中进行过后盾助（算法去批次）。当今单细胞测序常用的去批次算法有Harmony，CCA，RPCA,FastMNN,scVI等。在这里，我们使用Harmony进行演示。

set.seed(10086)table(sce.all.filt$orig.ident)if(T){  dir.create("2-harmony")  getwd()  setwd("2-harmony")  source('../scRNA_scripts/harmony.R')  # 默许 ScaleData 莫得添加"nCount_RNA", "nFeature_RNA"  # 默许的  sce.all.int = run_harmony(sce.all.filt)  setwd('../')}

harmony函数如下：

run_harmony <- function(input_sce){    print(dim(input_sce))  input_sce <- NormalizeData(input_sce,   normalization.method = "LogNormalize",  scale.factor = 1e4)   input_sce <- FindVariableFeatures(input_sce)  input_sce <- ScaleData(input_sce)  input_sce <- RunPCA(input_sce, features = VariableFeatures(object = input_sce))  seuratObj <- RunHarmony(input_sce, "orig.ident")  names(seuratObj@reductions)  seuratObj <- RunUMAP(seuratObj,  dims = 1:15,   reduction = "harmony")    # p = DimPlot(seuratObj,reduction = "umap",label=T )   # ggsave(filename='umap-by-orig.ident-after-harmony',plot = p)    input_sce=seuratObj  input_sce <- FindNeighbors(input_sce, reduction = "harmony",  dims = 1:15)   input_sce.all=input_sce    #配置不同的分辨率，不雅察分群后果(选拔哪一个？)  for (res in c(0.01, 0.05, 0.1, 0.2, 0.3, 0.5,0.8,1)) { input_sce.all=FindClusters(input_sce.all, #graph.name = "CCA_snn",  resolution = res, algorithm = 1)  }  colnames(input_sce.all@meta.data)  apply(input_sce.all@meta.data[,grep("RNA_snn",colnames(input_sce.all@meta.data))],2,table)    p1_dim=plot_grid(ncol = 3, DimPlot(input_sce.all, reduction = "umap", group.by = "RNA_snn_res.0.01") +    ggtitle("louvain_0.01"), DimPlot(input_sce.all, reduction = "umap", group.by = "RNA_snn_res.0.1") +    ggtitle("louvain_0.1"), DimPlot(input_sce.all, reduction = "umap", group.by = "RNA_snn_res.0.2") +    ggtitle("louvain_0.2"))  ggsave(plot=p1_dim, filename="Dimplot_diff_resolution_low.pdf",width = 14)    p1_dim=plot_grid(ncol = 3, DimPlot(input_sce.all, reduction = "umap", group.by = "RNA_snn_res.0.8") +    ggtitle("louvain_0.8"), DimPlot(input_sce.all, reduction = "umap", group.by = "RNA_snn_res.1") +    ggtitle("louvain_1"), DimPlot(input_sce.all, reduction = "umap", group.by = "RNA_snn_res.0.3") +    ggtitle("louvain_0.3"))  ggsave(plot=p1_dim, filename="Dimplot_diff_resolution_high.pdf",width = 18)    p2_tree=clustree(input_sce.all@meta.data, prefix = "RNA_snn_res.")  ggsave(plot=p2_tree, filename="Tree_diff_resolution.pdf")  table(input_sce.all@active.ident)   saveRDS(input_sce.all, "sce.all_int.rds")  return(input_sce.all)  }

可视化

图片

Tree_diff_resolution

图片

Dimplot_diff_resolution_low

图片

Dimplot_diff_resolution_high结语

本期我们整理了肺腺癌单细胞数据集GSE189357个的10X神色的单细胞测序数据，在批量读取了10X文献后，进行了吞并并得手构建Seurat对象，在此基础上将患者的临床信息添加到meta.data分组信息中。在此基础上走Seurat V5次第历程，对Seurat对象进行QC质控、并愚弄harmony整合去批次、并按次第历程进行降维聚类分群。下一期，我们将进行单细胞测序数据分析最报复（个东谈主以为）的一步：细胞堤防。下一期我们不见不散！

图片

本站仅提供存储作事，统统内容均由用户发布，如发现存害或侵权内容，请点击举报。

扶沟修车招工群

扶沟修车招工群

热点资讯

推荐资讯