其他
跟着 Cell 学 Ribo-seq 分析 一
随缘
1引言
文章标题:
Profiling Ssb-Nascent Chain Interactions Reveals Principles of Hsp70-Assisted Folding
拟复现图片:
metagene plot:
single gene plot:
2原始数据下载
下载方式根据自己选择, 单端测序的 fastq 数据:
GSE93830
3质控
# 1.qc raw-data
mkdir -p 2.QC-data/raw-qc 2.QC-data/clean-qc
fastqc -t 12 -q -o 2.QC-data/raw-qc/ 1.raw-data/*.gz
multiqc 2.QC-data/raw-qc/* -o 2.QC-data/raw-qc/
去接头前
去接头后
4去接头
# 2.trim adaptors
mkdir 3.trim-data
for i in SRR5188601 SRR5188602 SRR5188603 SRR5188604 SRR5188613 SRR5188614 SRR5188615 SRR5188616
do
cutadapt -j 10 -f fastq \
-a CTGTAGGCACCATCAATTCGTATGCCGTCTT \
-O 6 -m 20 -M 45 --discard-untrimmed \
-o 3.trim-data/${i}.trimmed.fq.gz \
1.raw-data/${i}.fastq.gz
done
# qc
fastqc -t 12 -q -o 2.QC-data/clean-qc/ 3.trim-data/*.gz
multiqc 2.QC-data/clean-qc/* -o 2.QC-data/clean-qc/
5建立索引
去 NCBI 及 ensembl 网站下载 rRNA
,tRNA
,gtf
及基因组
文件,然后建索引:
# 3.build index for rRNA/tRNA and genome
mkdir rRNA-index tRNA-index genome-index bw1-genome-index
# rRNA
bowtie2-build Saccharomyces-cerevisiae-rRNA.fasta rRNA-index/Saccharomyces-cerevisiae-rRNA
# tRNA
bowtie2-build Saccharomyces-cerevisiae-tRNA.fasta tRNA-index/Saccharomyces-cerevisiae-tRNA
# genome
bowtie-build Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa bw1-genome-index/Saccharomyces_cerevisiae-genome
6去除 rRNA/tRNA 序列污染
# 4.rm rtRNA contamination from fastq files
mkdir 4.rmrtRNA-data
# rmrRNA
for i in SRR5188601 SRR5188602 SRR5188603 SRR5188604 SRR5188613 SRR5188614 SRR5188615 SRR5188616
do
bowtie2 -p 20 -x ./index/rRNA-index/Saccharomyces-cerevisiae-rRNA \
--un-gz 4.rmrRNA-data/${i}.rmrRNA.fq.gz \
-U 3.trim-data/${i}.trimmed.fq.gz \
-S 4.rmrRNA-data/null
done
# rmtRNA
for i in SRR5188601 SRR5188602 SRR5188603 SRR5188604 SRR5188613 SRR5188614 SRR5188615 SRR5188616
do
bowtie2 -p 20 -x ./index/tRNA-index/Saccharomyces-cerevisiae-tRNA
--un-gz 4.rmrtRNA-data/${i}.rmtRNA.fq.gz \
-U 4.rmrtRNA-data/${i}.rmrRNA.fq.gz \
-S 4.rmrtRNA-data/null
done
7比对至基因组
因为作者提供的后续分析用的 python 脚本是基于 bowtie 比对的结果写的, 所以这里使用 bowtie 比对:
# 6.map to genome with bowtie
mkdir -p 5.map-data/bowtie-map-data
for i in SRR5188601 SRR5188602 SRR5188603 SRR5188604 SRR5188613 SRR5188614 SRR5188615 SRR5188616
do
bowtie -p 20 ./index/bw1-genome-index/Saccharomyces_cerevisiae-genome \
-m 1 -v 2 --best --strata \
<(zcat 4.rmrRNA-data/${i}.rmrRNA.fq.gz) \
5.map-data/bowtie-map-data/${i}.map
done
8结尾
此次的目录结构:
比对结果文件,根据样本信息重新命名一下:
9分析 reads 长度分布
"""
Supplementary Note 1: Read length distribution
Author: Annemarie Becker
inputFile:
.map file generated by Bowtie default output.
outputFile:
read length distribution
col0: length of footprint read
col1: how often is this length is counted
"""
def readLength(inputFile, outputFile):
Dict = {}
inFile = open(inputFile, 'r')
outFile = open(outputFile, 'w')
line = inFile.readline()
while line != '':
fields = line.split()
col5 = str(fields[5]) #note: if sequencing was performed without barcode reading, the column needs to be changed to 4
length = len(col5)
columns = len(fields) #check: how many columns?
if columns > 8:
col8 = str(fields[8])
if col8.startswith("0"): #check: is there a mismatch at 1st position?
length = length - 1 #subtract wrong base at 1st position
else:
pass
else:
pass
if length in Dict:
Dict[length] += 1
else:
Dict[length] = 1
line = inFile.readline()
List = list(Dict.items())
List.sort()
for length in range(20,46): #the range depends on the number of cycles that are performed in sequencing
if length not in Dict:
Dict[length] = 0
List = list(Dict.items())
List.sort()
for J in List:
outFile.write(str(J[0]) + '\t' + str(J[1]) + '\n')
inFile.close()
outFile.close()
作者提供的代码并不能直接运行,这里稍微修改了,并在 vscode 里分析的。
代码也很简单,需要注意的是作者考虑了 reads 的第一个碱基是否错配,如果错配,则这个 reads 的长度减一,最后输出 20-45 nt 的长度 reads 的数量。
批量分析:
创建文件夹:
import os
# make folder
os.mkdir('./1.readlength-data')
批量运行函数:
# sample name
sample = ['Ssb1-inter-rep1.map','Ssb1-inter-rep2.map','Ssb2-inter-rep1.map','Ssb2-inter-rep2.map',
'Ssb1-trans-rep1.map','Ssb1-trans-rep2.map','Ssb2-trans-rep1.map','Ssb2-trans-rep2.map']
outPlus = ['Ssb1-inter-rep1.len.txt','Ssb1-inter-rep2.len.txt','Ssb2-inter-rep1.len.txt','Ssb2-inter-rep2.len.txt',
'Ssb1-trans-rep1.len.txt','Ssb1-trans-rep2.len.txt','Ssb2-trans-rep1.len.txt','Ssb2-trans-rep2.len.txt']
# run
for i in range(0,8):
readLength(sample[i],''.join(['1.readlength-data/',outPlus[i]]))
绘图
可以将结果可视化一下:
library(tidyverse)
library(ggplot2)
file <- list.files('1.read-length-data/',pattern = '.txt')
# bacth read
map_df(file,function(x){
tmp = read.table(paste('1.read-length-data/',x,sep = ''))
colnames(tmp) <- c('readlength','numbers')
tmp$percent <- tmp$numbers/sum(tmp$numbers)
tmp$sample <- sapply(strsplit(x,split = '\\.'),'[',1)
# filter length 25-35 nt
tmp <- tmp %>% filter(readlength >= 25 & readlength <= 35)
return(tmp)
}) -> len
# plot 6x15
ggplot(len,aes(x = factor(readlength),y = percent)) +
geom_col() +
theme_bw(base_size = 14) +
xlab('') +
facet_wrap(~sample,scales = 'free',ncol = 4)
10结尾
未完待续...
欢迎加入生信交流群。加我微信我也拉你进 微信群聊 老俊俊生信交流群
哦,数据代码已上传至QQ群,欢迎加入下载。
群二维码:
老俊俊微信:
知识星球:
所以今天你学习了吗?
今天的分享就到这里了,敬请期待下一篇!
最后欢迎大家分享转发,您的点赞是对我的鼓励和肯定!
如果觉得对您帮助很大,赏杯快乐水喝喝吧!
往期回顾
◀南京的春
◀..