[
¥È¥Ã¥×
] [
¿·µ¬
|
°ìÍ÷
|
ñ¸ì¸¡º÷
|
ºÇ½ª¹¹¿·
|
¥Ø¥ë¥×
]
³«»Ï¹Ô:
[[Python¥Ð¥¤¥ª]]¡¡[[Python¥Ð¥¤¥ª/¥Ä¡¼¥ë]]~
&counter();¡¡¡¡¡¡&lastmod();~
*¥ê¡¼¥É¥Ç¡¼¥¿¼èÆÀ [#a9d72733]
¤³¤³¤Ç¤Ï¡¢´û¤Ë¸ø³«¤µ¤ì¤Æ¤¤¤ë¼Â¸³¥Ç¡¼¥¿¤ò»È¤Ã¤Æ¥Þ¥Ã¥Ô¥ó¥°½èÍý¤ò»î¤¹¤³¤È¤ò¹Í¤¨¤Æ¤¤¤ë¡£¼ê¸µ¤Ç¼Â¸³¤ò¹Ô¤¤¥·¡¼¥±¥ó¥µ¡¼¤«¤é¤Î½ÐÎϥǡ¼¥¿¤¬¤¢¤ë¾ì¹ç¤Ï¡¢ÅöÁ³¤Ê¤¬¤é¤½¤Î¥Ç¡¼¥¿¤ò»È¤¦¤³¤È¤Ë¤Ê¤ë¡£
[[A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488244/]]¡¡Saccharomyces¤ÎÎã¤ò»È¤Ã¤¿Îã
[[GEO Accession viewer GSE37599:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37599]]
**¥ê¡¼¥É¥Ç¡¼¥¿¤ò¸¡º÷¤·¤¿¤¤¾ì¹ç [#i763f2bc]
-[[¸ø¶¦NGS¥Ç¡¼¥¿¤Î¸¡º÷¤ÈÅÐÏ¿(2017NGS¥Ï¥ó¥º¥ª¥ó¹Ö½¬²ñ-8·î29Æü):https://biosciencedbc.jp/gadget/human/20170829_3_nakazato_20170818.pdf]]
-[[DDBJ Sequence Read Archive Handbook:https://www.ddbj.nig.ac.jp/dra/submission.html#metadata]]¡¡¡¡[[¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï¤É¤³¡©:https://yokazaki.hatenablog.com/entry/2015/05/27/215757]]
-[[ÌçÅÄÀèÀ¸ ÇÀ³ØÀ¸Ì¿¾ðÊó²Ê³Ø ÆÃÏÀI Âè1²ó ¥Ç¡¼¥¿¥Ù¡¼¥¹¡¢¥Ç¡¼¥¿¼èÆÀ¡¢¥Õ¥¡¥¤¥ë·Á¼°¡¢Quality Control:http://www.iu.a.u-tokyo.ac.jp/~kadota/20150616_kadota.pdf]]
**Îã¡¡¡ÌçÅġ֥ȥé¥ó¥¹¥¯¥ê¥×¥È¡¼¥à²òÀϡס¡p71 ¤Ç¤Î¥ê¡¼¥É¥Ç¡¼¥¿¼èÆÀ [#j1940e2a]
¤³¤ÎÎã¤Î¾ì¹ç¤Ï¡¢NBCI¤Î¸ø³«¼Â¸³¥Ç¡¼¥¿¥Ù¡¼¥¹SRA¡ÊSequence Read Archive¡Ë¤«¤éÅê¹ÆÁ´ÂΡÊSRA¡¢submission accession¡Ë¤Î¥¢¥¯¥»¥·¥ç¥óÈÖ¹æ¤Ç¤¢¤ëSRAÈÖ¹æ
SRA¤ò¥¢¥¯¥»¥¹¤¹¤ë¡Êp72¡Ë~
// [[Getting data from the SRA:https://edwards.sdsu.edu/research/getting-data-from-the-sra/]]
¶½Ì£¤Î¤¢¤ë¥Ç¡¼¥¿¤¬¡¢submission_accession=SRA000299¤Ç¤¢¤ë¤È¤¹¤ë¡£¡ÊÏÀʸÅù¤«¤éÍ¿¤¨¤é¤ì¤Æ¤¤¤ë¡Ë
¤³¤ì¤«¤é¡¢NCBI¤ÎSRA¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÇSRA000299¤ò¸¡º÷¤¹¤ë¡£
Web¤Ç¼êºî¶È¤Ç¥¢¥¯¥»¥¹¤¹¤ë¾ì¹ç¤Ï¡¢Ä¾ÀܤËNCBI¤ÎSRA¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë¥¢¥¯¥»¥¹¤¹¤ì¤Ð¤è¤¤¡£
https://www.ncbi.nlm.nih.gov/sra/?term=SRA000299
°Ê²¼¡¢Web-API¥Ù¡¼¥¹¡Ê¡áPython¥×¥í¥°¥é¥à·Ðͳ¡Ë¤Çºî¶È¤¹¤ë¤³¤È¤ò¹Í¤¨¤ë¡£NCBI¤Î
SRA¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÏWeb-API¡Ê¤Î¤ä¤êÊý¡Ë¤ò¸ø³«¤·¤Æ¤¤¤Ê¤¤¤è¤¦¤Ç¡¢Entrez¤ò»È¤¦¤è¤¦¤Ë
»Ø¼¨¤·¤Æ¤¤¤ë¡£[[¡ÊDownload SRA sequences from Entrez search results¡Ë:https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/]]¡¡
¤³¤³¤ÎÎã¤Ë¤¢¤ë¤è¤¦¤Ë¡¢Entrez¤ÎÌ䤤¹ç¤ï¤»¤òºî¤ë¡£
Entrez API¤Î¥á¥â
-[[NCBI APIs:https://www.ncbi.nlm.nih.gov/home/develop/api/]]
-[[Entrez Programming Utilities Help:https://www.ncbi.nlm.nih.gov/books/NBK25501/]]
-[[The E-utilities In-Depth: Parameters, Syntax and More:https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch]]
-[[Table 1 – Valid values of &retmode and &rettype for EFetch (null = empty string):https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly]]
[[Download SRA sequences from Entrez search results:https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/]]
[[SRA Toolkit download:https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/]] <--fastq-dump and sam-dump
[[Python¤ÇNCBI¤ÎAPI¤«¤éʸ¸¥¾ðÊó¤ò¼èÆÀ¤·¤Æ¤ß¤¿:https://qiita.com/MTNakata/items/1538da3e97fe0a8b951a]] ¤Ç¤ÏBioPython¤ò»È¤ï¤Ê¤¤¤ÇľÀÜWeb-API¤òᤤ¤Æ¤¤¤ë¤¬¡¢º£¤Ï¤³¤ì¤Ï»È¤ï¤Ê¤¤¤Ç¤ª¤¯¡£
Python¤Ë¤è¤ëEntrez SRA¤Î¥¢¥¯¥»¥¹
from Bio import Entrez
Entrez.email = "yamanouc@hyperresearch.com"
handle = Entrez.esearch(db="sra", term="SRA000299")
result = Entrez.read(handle) # ¸¡º÷·ë²Ì¤òresult¤ËÆþ¤ì¤ë
print('SRA SRA000299, IDList', result['IdList'])
for id in result['IdList'][:1]: # result¤Î¤¦¤ÁIdList¤ò¼è¤ê½Ð¤·¤Æ¡¢£±¤Ä¤º¤Ä¥¢¥¯¥»¥¹
handle = Entrez.efetch(db="sra", id=id, retmode="xml") # £±¤Ä¤º¤Ä¥¢¥¯¥»¥¹
print(handle.read())
·ë²Ì¤Ïxml¤·¤«¤Ê¤¤¡£xml¤ò²òÆÉ¤·¤Æ¡¢Íߤ·¤¤RUN accession id ¤òÆþ¼ê¤¹¤ë¤Ë¤Ï¡¢
import xml.etree.ElementTree as ET
from Bio import Entrez
Entrez.email = "yamanouc@hyperresearch.com"
handle = Entrez.esearch(db="sra", term="SRA000299")
result = Entrez.read(handle)
print('SRA SRA000299, IDList', result['IdList'])
for id in result['IdList']:
handle = Entrez.efetch(db="sra", id=id, retmode="xml")
root = ET.fromstring(handle.read())
# find
items = root.findall('.//RUN')
for i, u in enumerate(items):
#print(i, 'tag', u.tag, 'attr', u.attrib)
print(u.get('accession'))
·ë²Ì¤Ï
SRR002324
SRR002320
SRR002325
SRR002322
SRR002321
SRR002323
¤È¤Ê¤ë¡£
¤³¤Îrun accession number SRR002320¡Á5¤ò»È¤Ã¤Æ¡¢fastq¥Æ¡¼¥Ö¥ë¤ò¥¢¥¯¥»¥¹¤¹¤ë¡£
¤³¤³¤«¤é¡¢SRA¤Î¥Ç¡¼¥¿¡ÊSRA¥Ç¡¼¥¿¡¢ºÇ¸å¤Ë¤Ïfastq¥Ç¡¼¥¿¡Ë¤Î¥À¥¦¥ó¥í¡¼¥É¡£~
¤Þ¤º[[SRA¤Î¥Ú¡¼¥¸:https://www.ncbi.nlm.nih.gov/sra]]¢Í[[NCBI SRA Toolkit:https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software]]~
¤³¤³¤«¤éNCBI SRA Toolkit¤ò¥À¥¦¥ó¥í¡¼¥É¡¦Å¸³«¤·¤Æ¡¢¤³¤ì¤Ë¤è¤Ã¤Æ¥¢¥¯¥»¥¹¤¹¤ë¡£R¤À¤ÈÃæ¤Ç¼«Æ°Åª¤Ë¤¤¤í¤¤¤í¤È¤ä¤Ã¤Æ¤¯¤ì¤ë¤è¤¦¤À¡£
fasterq-dump SRR002320
·ë²Ì¤Ï
spots read : 39,266,713
reads read : 39,266,713
reads written : 39,266,713
¤Ç¡¢¥Õ¥¡¥¤¥ë¤Ï SRR002320.fastq¤È¤¤¤¦8340468900¥Ð¥¤¥È¤Î¥Õ¥¡¥¤¥ë¤¬ºî¤é¤ì¤¿¡£ÀèÆ¬¤òÇÁ¤¤¤Æ¤ß¤ë¤È
@SRR002320.1 080226_CMLIVERKIDNEY_0007:1:1:112:735 length=36
GTGGTGGGGTTGGTATTTGGTTTCTCGTTTTAATTA
+SRR002320.1 080226_CMLIVERKIDNEY_0007:1:1:112:735 length=36
IIIIIIII"IIIII)I$I1%HII"I#./(#/'$#*#
@SRR002320.2 080226_CMLIVERKIDNEY_0007:1:1:114:564 length=36
GGATACTCAGGCTGGCCCAATTTCTGGGCGTGGGAA
+SRR002320.2 080226_CMLIVERKIDNEY_0007:1:1:114:564 length=36
IIII:>&<I;I%I88II1&+I:IF>II,&D:I-'),
@SRR002320.3 080226_CMLIVERKIDNEY_0007:1:1:109:558 length=36
GTAGAATTAGAATTGTGAAGATGATAAGTGTAGAGG
+SRR002320.3 080226_CMLIVERKIDNEY_0007:1:1:109:558 length=36
IIIIIIIIIIIIIIIIIIIIIII<IIAIIII6I?I:
¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£
¥ê¥Õ¥¡¥ì¥ó¥¹ÇÛÎó¤ÎÊý¤Ï¡¢NCBI Genome¤Ë¤ª¤¤¤ÆHomo sapiens genome¤ò¸¡º÷~
¡¡¢Í [[¸¡º÷:https://www.ncbi.nlm.nih.gov/search/all/?term=Homo+sapiens+genome]]~
¡¡¡¡¢Í GRCh38.p12 (December 2017) Download¡¡¢Í ¥Õ¥¡¥¤¥ëGRCh38.p12.tar¤È¤·¤Æºî¤ë(938MB)¡¡¡¡(p90)
¡¡¡¡¡¡
*°äÅÁ¸¦¹Ö½¬²ñ¤Ç¤ÎÎã [#c85616f7]
[[¡ÖÀè¿Ê¥²¥Î¥à»Ù±ç¡×¾ðÊó²òÀϹֽ¬²ñ¤Î¤´°ÆÆâ:https://www.genome-sci.jp/whatsnew/event/news20180920.html]]~
¢Í[[¾ðÊó²òÀϹֽ¬²ñ¥Ó¥Ç¥ª¡ã2018ǯÅÙ¡¡¾ðÊó²òÀϹֽ¬²ñ¡ÊÃæµé¼Ô¸þ¤±¡Ë¡ä:https://www.genome-sci.jp/lecture20181st]]~
¢Í[[»ñÎÁ¡ÊGitHub¡Ë:https://github.com/genome-sci/python_bioinfo_2018]]
1-1¤ÎÂêºà¤Ï½Ð²ê¹ÚÊìSaccharomyces cerevisiae¤Ç2¤Ä¤Î°Û¤Ê¤ë¾ò·ï¤Ç¤ÎÇÝÍÜ~
Intawat Nookaew et al~
"A comprehensive comparison of RNA-Seq-based transcriptome analysis
from reads to differential gene expression and cross-comparison with microarrays: a case study in Sacchaomyces cerevisiae"~
Nucleic Acids Research, 2012, Vol. 40, No. 20, Septemter 2012
doi: 10.1093/nar/gks804~
[[full text:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488244/]]
ÏÀʸ¤Ç¤Îµ½Ò¡Êp10095¡Ë¤Ë¡¢¥ê¡¼¥É¤Ë´Ø¤¹¤ëACCESSION NUMBERS¤È¤·¤Æ
GSE37599, SRS307298, SRR453566, SRR453567,
SRR453568, SRR453569, SRR453570, SRR453571 and
SRR453578.
¤È¤¢¤ë¤Î¤Ç¡¢¤³¤ì¤òÍê¤ê¤Ë¡¢¾å¤ÎÎã¤ÈƱ¤¸¤è¤¦¤Ë¤·¤Æ¡¢SRA¤Î¥Ç¡¼¥¿¡ÊSRA¥Ç¡¼¥¿¡¢ºÇ¸å¤Ë¤Ïfastq¥Ç¡¼¥¿¡Ë¤ò¥À¥¦¥ó¥í¡¼¥É¤Ç¤¤ë¡£
¤Þ¤ºSRA¤Î¥Ú¡¼¥¸ ¢Í NCBI SRA Toolkit ¤«¤éNCBI SRA Toolkit¤ò¥À¥¦¥ó¥í¡¼¥É¡¦Å¸³«¤·¤Æ¡¢¤³¤ì¤ò»È¤Ã¤Æ¥¢¥¯¥»¥¹¤¹¤ë¡£SRR453566¤Î¥Ç¡¼¥¿¤Î¥À¥¦¥ó¥í¡¼¥É¤Ï
fasterq-dump SRR453566
¤Î¤è¤¦¤Ë¤¹¤ì¤Ð¤è¤¤¡£
¤Þ¤¿ÏÀÊ¸Ãæ¤Ë reference¤È¤·¤ÆS288c¥²¥Î¥à¤ò»È¤¦¤³¤È¤¬½ñ¤«¤ì¤Æ¤¤¤ë¤Î¤Ç¡¢¤³¤Î¥Ç¡¼¥¿¤â
¥¢¥¯¥»¥¹¤¹¤ëɬÍפ¬¤¢¤ë¡£¤³¤³¤ò¸«¤¿¡£
>Transcriptome analysis using reference genome-based reads mapping
The genome sequence of S. cerevisiae strain S288c and its annotations were retrieved from the SGD databases and used for all analysis.
¤³¤³¤«¤é¡¢[[SGD:https://www.yeastgenome.org/]] (Saccharomyces Genome Database) ¤ÎS288C¤ò¥µ¡¼¥Á¤¹¤ë¤È¡¢[[Strain: S288C:https://www.yeastgenome.org/strain/S000203483]]¤¬ÆÀ¤é¤ì¤ë¡£¤³¤ÎÃæ¤Ç¡¢GenBank GCF_000146045.2¤Î¥¨¥ó¥È¥ê¡¼¤ò»È¤¦¤³¤È¤Ë¤¹¤ë¡£
GCF_000146045.2¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢[[GenBank¤ÎR64 Organism name: Saccharomyces cerevisiae S288C (baker's yeast) Strain: S288C¤Î¥Ú¡¼¥¸:https://www.ncbi.nlm.nih.gov/assembly/GCF_000146045.2/]]¤ËÄ·¤Ö¡£Ä·¤ó¤ÀÀè¤Î¥Ú¡¼¥¸¤Î±¦Â¦¡ÖAccess the data¡×¤Î[[Download the RefSeq assembly:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64]]¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢¥Õ¥¡¥¤¥ë¥ê¥¹¥È¤¬É½¼¨¤µ¤ì¤ë¡£¤½¤ÎÃæ¤«¤é¡¢»²¾È¥·¡¼¥±¥ó¥¹¤È¤·¤Æ[[GCF_000146045.2_R64_genomic.fna.gz:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz]]¡¢µ½ÒGFF¥Õ¥¡¥¤¥ë¤È¤·¤Æ[[GCF_000146045.2_R64_genomic.gff.gz:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz]]¤ò¥À¥¦¥ó¥í¡¼¥É¤¹¤ë¡£
Ê̤ÎÅþã¥Ñ¥¹¤È¤·¤Æ¤Ï¡¢NBCI¤Îgenome¤Î[[¥Ú¡¼¥¸:https://www.ncbi.nlm.nih.gov/genome]]¤«¤ésaccharomyces cerevisiae s288c[orgn] ¤ò[[¸¡º÷:https://www.ncbi.nlm.nih.gov/genome?term=saccharomyces+cerevisiae+s288c%5Borgn%5D&cmd=DetailsSearch]]¤¹¤ë¡£~
¤³¤ÎÃæ¤Î[[Fasta¥Õ¥©¡¼¥Þ¥Ã¥È¤Îgenome¤ò¥À¥¦¥ó¥í¡¼¥É:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz]]¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢¥Õ¥¡¥¤¥ëGCF_000146045.2_R64_genomic.fna.gz¤¬ÆÀ¤é¤ì¤ë¡£~
¹¹¤Ë[[gff¥Õ¥©¡¼¥Þ¥Ã¥È¤Îannotation¤ò¥À¥¦¥ó¥í¡¼¥É:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz]]¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢¥Õ¥¡¥¤¥ëGCF_000146045.2_R64_genomic.gff.gz¤¬ÆÀ¤é¤ì¤ë¡£
* ¤½¤Î¾¤¤¤í¤¤¤í [#v944cf06]
[[RNA-seq±é½¬:https://www.genome-sci.jp/seminar2018/7_doc_takahashi.pdf]](2018-03)¹â¶¶¹°´î
**Ensembl Genomes¤Î¥¢¥¯¥»¥¹ [#l037d90e]
**UCSC ? [#d97a4303]
*FastQC¤Ë¤è¤ë¥¯¥©¥ê¥Æ¥£¥Á¥§¥Ã¥¯ [#k7363eae]
¥Ð¥Ã¥Á¥â¡¼¥É¤Ç¼Â¹Ô¤¹¤ë¾ì¹ç¡£
fastqc --nogroup SRR453566_1.fastq
fastqc --nogroup SRR453566_2.fastq
¥¯¥ª¥ê¥Æ¥£¤Î¥°¥é¥Õ¤¬À¸À®¤µ¤ì¤ë¡£»È¤¤Êý¤Î¾ÜºÙ¤Ï[[FASTQ ¥¯¥ª¥ê¥Æ¥£¥³¥ó¥È¥í¡¼¥ë:https://bi.biopapyrus.jp/rnaseq/qc/fastqc.html]]¤Ë¥Ñ¥é¥á¡¼¥¿»ØÄꤢ¤ê¡£
¸µ¥Ú¡¼¥¸¤Ï[[fastqc:https://www.bioinformatics.babraham.ac.uk/projects/fastqc/]]
*Trimmomatic¤Ë¤è¤ë¥È¥ê¥ß¥ó¥° [#df89e2ab]
Trimmomatic¤Ï[[¤³¤Á¤é:http://www.usadellab.org/cms/?page=trimmomatic]]»²¾È¡£~
¥À¥¦¥ó¥í¡¼¥É¡Ê2019-03-02»þÅÀ¤ÇVersion 0.38¡Ë~
/usr/local/Trimmomatic¤Ë¥¤¥ó¥¹¥È¡¼¥ë
»È¤¤Êý¤Ï
java -jar trimmomatic-0.38.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz
output_forward_paired.fq.gz output_forward_unpaired.fq.gz
output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz
ILLUMINACLIP:TruSeq3- PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
This will perform the following:
Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
Remove leading low quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Scan the read with a 4-base wide sliding window, cutting when the average quality
per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)
°äÅÁ¸¦¥¹¥é¥¤¥É¤Ë¤è¤ë¤È
java -jar -Xmx512m trimmomatic-0.38.jar \
PE \
-threads ${NSLOTS} \
-phred33 \
-trimlog log_SRR${NUM}.txt \
SRR${NUM}_1.fastq.gz \
SRR${NUM}_2.fastq.gz \
paired_SRR${NUM}_1.trim.fastq.gz \
unpaired_SRR${NUM}_1.trim.fastq.gz \
paired_SRR${NUM}_2.trim.fastq.gz \
unpaired_SRR${NUM}_2.trim.fastq.gz \
ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:4:15 \
MINLEN:36
°ìÈ̤ˡ¢¥¢¥À¥×¥¿½üµî¤Ï¾¤Î²Ã¹©¤è¤êÀè¤Ë¹Ô¤¦Êý¤¬Îɤ¤¡Ê¾¤Î²Ã¹©¤Ë¤è¤ê¥Þ¥Ã¥Á¥ó¥°¤¬Æñ¤·¤¯¤Ê¤ë¤¿¤á¡Ë¤È¤·¤Æ¤¤¤ë¡£
¸Ä¡¹¤ÎºÙ¤«¤¤¥Ñ¥é¥á¡¼¥¿¤Î°ÕÌ£¤Ï[[trimmomatic¤Î¥Ú¡¼¥¸:http://www.usadellab.org/cms/?page=trimmomatic]]¤È[[¥Þ¥Ë¥å¥¢¥ë:http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf]]¤Ë½ñ¤«¤ì¤Æ¤¤¤ë¡£°Ê²¼¡¢¾åµ¤ÎÎã¤Ë¤Ä¤¤¤ÆÀâÌÀ¤¹¤ë¡ÊPaired End¤Î¾ì¹ç¤Ë¸Â¤ë¡Ë¡£
|-Xmx256m |¤³¤ì¤Ïjava¥³¥Þ¥ó¥É¤ËÂФ¹¤ë»ØÄê¤Ç¡¢trimmomatic¤Î¥Ñ¥é¥á¡¼¥¿¤Ç¤Ï¤Ê¤¤¡£-Xmx256m¤Ï¥á¥â¥ê³äÅö¤Æ¤ÎºÇÂçÎ̤ò256M¥Ð¥¤¥È¤Ë¤¹¤ë¡£Ìµ»ØÄê»þ¤Ï64M|
|PE |ưºî¥â¡¼¥É¤¬SE(SingleEnd)¤«PE(PairedEnd)¤«|
|--threads 16 |½èÍý»þ¤ÎÊÂÎó¥¹¥ì¥Ã¥É¿ô|
|-phred33 |±ö´ð¡ÊÆÉ¼è¤ê¡ËÉʼÁ¤Îµ½ÒË¡¡¢-phread33¤«-phread64¡¢Ìµ»ØÄê»þ¤Ï¼«Æ°È½ÊÌ¡Ê-v0.32°Ê¹ß¡Ë|
|-trimlog log_SRR453566.txt|¼Â¹Ô¥í¥°¤Î½ÐÎÏÀè¥Õ¥¡¥¤¥ë̾¤Î»ØÄê|
|SRR453566_1.fastq|PairedEnd¤Ç¤ÎÆþÎÏforward¥Õ¥¡¥¤¥ë |
|SRR453566_2.fastq|PairedEnd¤Ç¤ÎÆþÎÏbackward¥Õ¥¡¥¤¥ë |
|paired_SRR453566}_1.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏpaired forward¥Õ¥¡¥¤¥ë |
|unpaired_SRR453566}_1.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏunpaired forward¥Õ¥¡¥¤¥ë |
|paired_SRR453566}_2.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏpaired backward¥Õ¥¡¥¤¥ë |
|unpaired_SRR453566}_2.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏunpaired backward¥Õ¥¡¥¤¥ë |
¤³¤ì°Ê¹ß¤Ï¡¢¸ÄÊ̤νüµî¥¹¥Æ¥Ã¥×¤ò»ØÄꤹ¤ë¡£
|ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10|¥¹¥Æ¥Ã¥×1¤ÇIllumina adapter¤ò½üµî¡¢TruSeq***¤Ï¥¢¥À¥×¥¿¡¼¤òµ½Ò¤·¤¿fasta¥Õ¥¡¥¤¥ë¡¢2¤ÏºÇÂç¥ß¥¹¥Þ¥Ã¥Á¿ô¡¢30¤Ï²óʸ¥¢¥é¥¤¥á¥ó¥È»þ¤Ë£²¤Ä¤ÎÎÙÀܥ꡼¥É¤¬¤É¤ì¤À¤±Àµ³Î¤Ë¥Þ¥Ã¥Á¤¹¤ë¤«¤ò»ØÄê¡¢10¤Ï¥¢¥À¥×¥¿¡¼¤È¥ê¡¼¥É´Ö¤Î¥¢¥é¥¤¥á¥ó¥È¥Þ¥Ã¥Á¤ÎÀµ³Î¤µ|
|LEADING:20|ÀèÆ¬¤«¤éÄãÉʼÁ¥Ù¡¼¥¹¤ò¼è¤ê½ü¤¯¡¢¤³¤Î»þ¤Î»Ä¤¹¤¿¤á¤ÎºÇÄãÉʼÁ¤¬20|
|TRAILING:20|ËöÈø¤«¤éÄãÉʼÁ¥Ù¡¼¥¹¤ò¼è¤ê½ü¤¯¡¢¤³¤Î»þ»Ä¤¹¤¿¤á¤ÎºÇÄãÉʼÁ¤¬20|
|CROP:? Îã¤Ç¤Ï»È¤ï¤ì¤Æ¤¤¤Ê¤¤|ÉʼÁ¤Ë´Ø·¸¤Ê¤¯¡¢ÀèÆ¬¤«¤é»ØÄꤵ¤ì¤¿±ö´ð¿ô¤À¤±¤ò»Ä¤·¸å¤í¤ò½üµî|
|HEADCROP:? Îã¤Ç¤Ï»È¤ï¤ì¤Æ¤¤¤Ê¤¤|ÉʼÁ¤Ë´Ø·¸¤Ê¤¯¡¢ÀèÆ¬¤«¤é»ØÄꤵ¤ì¤¿±ö´ð¿ô¤À¤±½üµî¤·¸å¤í¤ò»Ä¤¹|
|SLIDINGWINDOW:4:15|¥¹¥é¥¤¥Ç¥£¥ó¥°¥¦¥£¥ó¥É¥¦Éý¤ò4¤È¤·¡¢¤½¤ÎÃæ¤Ç¤ÎÊ¿¶ÑÉʼÁ¤¬15°Ê¾å¤Î¤â¤Î¤ò»Ä¤¹|
|MINLEN:36|¡ÊÄ̾ïºÇ¸å¤Ë¹Ô¤¦¡Ë»Ä¤Ã¤Æ¤¤¤ë¥ê¡¼¥É¤Î¤¦¤Á¡¢Ä¹¤µ¤ÎºÇ¾®ÃÍ36°Ê¾å¤Î¤â¤Î¤ò»Ä¤¹|
¼ÂºÝ¤Î¥³¥Þ¥ó¥É¤Ï
java -jar -Xmx512m /usr/local/Trimmomatic/trimmomatic-0.38.jar PE \
-threads 32 \
-phred33 \
-trimlog log_SRR453566.txt \
SRR453566_1.fastq \
SRR453566_2.fastq \
paired_SRR453566_1.trim.fastq \
unpaired_SRR453566_1.trim.fastq \
paired_SRR453566_2.trim.fastq \
unpaired_SRR453566_2.trim.fastq \
ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:4:15 \
MINLEN:36
·ë²Ì¤Ï
TrimmomaticPE: Started with arguments:
-threads 32 -phred33 -trimlog log_SRR453566.txt SRR453566_1.fastq SRR453566_2.fastq
paired_SRR453566}_1.trim.fastq unpaired_SRR453566_1.trim.fastq
paired_SRR453566_2.trim.fastq unpaired_SRR53566_2.trim.fastq ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10
LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and
'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only
sequences, 0 reverse only sequences
Input Read Pairs: 5725730 Both Surviving: 5115482 (89.34%) Forward Only Surviving:
514793 (8.99%) Reverse Only Surviving: 46123 (0.81%) Dropped: 49332 (0.86%)
TrimmomaticPE: Completed successfully
¤Ê¤ª¡¢IlluminaClip¤Ç»ØÄꤹ¤ë¥¢¥À¥×¥¿¡¼¥·¡¼¥±¥ó¥¹¤Ï
[[GitHub¤Îtrimmomatic¤Î¥Ñ¥Ã¥±¡¼¥¸Ãæ¤Îadapters/TruSeq30PE-2.fa:https://github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-PE-2.fa]]
¤ò»È¤¦¤³¤È¤¬¤Ç¤¤¿¡£
½ªÎ»¹Ô:
[[Python¥Ð¥¤¥ª]]¡¡[[Python¥Ð¥¤¥ª/¥Ä¡¼¥ë]]~
&counter();¡¡¡¡¡¡&lastmod();~
*¥ê¡¼¥É¥Ç¡¼¥¿¼èÆÀ [#a9d72733]
¤³¤³¤Ç¤Ï¡¢´û¤Ë¸ø³«¤µ¤ì¤Æ¤¤¤ë¼Â¸³¥Ç¡¼¥¿¤ò»È¤Ã¤Æ¥Þ¥Ã¥Ô¥ó¥°½èÍý¤ò»î¤¹¤³¤È¤ò¹Í¤¨¤Æ¤¤¤ë¡£¼ê¸µ¤Ç¼Â¸³¤ò¹Ô¤¤¥·¡¼¥±¥ó¥µ¡¼¤«¤é¤Î½ÐÎϥǡ¼¥¿¤¬¤¢¤ë¾ì¹ç¤Ï¡¢ÅöÁ³¤Ê¤¬¤é¤½¤Î¥Ç¡¼¥¿¤ò»È¤¦¤³¤È¤Ë¤Ê¤ë¡£
[[A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488244/]]¡¡Saccharomyces¤ÎÎã¤ò»È¤Ã¤¿Îã
[[GEO Accession viewer GSE37599:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37599]]
**¥ê¡¼¥É¥Ç¡¼¥¿¤ò¸¡º÷¤·¤¿¤¤¾ì¹ç [#i763f2bc]
-[[¸ø¶¦NGS¥Ç¡¼¥¿¤Î¸¡º÷¤ÈÅÐÏ¿(2017NGS¥Ï¥ó¥º¥ª¥ó¹Ö½¬²ñ-8·î29Æü):https://biosciencedbc.jp/gadget/human/20170829_3_nakazato_20170818.pdf]]
-[[DDBJ Sequence Read Archive Handbook:https://www.ddbj.nig.ac.jp/dra/submission.html#metadata]]¡¡¡¡[[¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï¤É¤³¡©:https://yokazaki.hatenablog.com/entry/2015/05/27/215757]]
-[[ÌçÅÄÀèÀ¸ ÇÀ³ØÀ¸Ì¿¾ðÊó²Ê³Ø ÆÃÏÀI Âè1²ó ¥Ç¡¼¥¿¥Ù¡¼¥¹¡¢¥Ç¡¼¥¿¼èÆÀ¡¢¥Õ¥¡¥¤¥ë·Á¼°¡¢Quality Control:http://www.iu.a.u-tokyo.ac.jp/~kadota/20150616_kadota.pdf]]
**Îã¡¡¡ÌçÅġ֥ȥé¥ó¥¹¥¯¥ê¥×¥È¡¼¥à²òÀϡס¡p71 ¤Ç¤Î¥ê¡¼¥É¥Ç¡¼¥¿¼èÆÀ [#j1940e2a]
¤³¤ÎÎã¤Î¾ì¹ç¤Ï¡¢NBCI¤Î¸ø³«¼Â¸³¥Ç¡¼¥¿¥Ù¡¼¥¹SRA¡ÊSequence Read Archive¡Ë¤«¤éÅê¹ÆÁ´ÂΡÊSRA¡¢submission accession¡Ë¤Î¥¢¥¯¥»¥·¥ç¥óÈÖ¹æ¤Ç¤¢¤ëSRAÈÖ¹æ
SRA¤ò¥¢¥¯¥»¥¹¤¹¤ë¡Êp72¡Ë~
// [[Getting data from the SRA:https://edwards.sdsu.edu/research/getting-data-from-the-sra/]]
¶½Ì£¤Î¤¢¤ë¥Ç¡¼¥¿¤¬¡¢submission_accession=SRA000299¤Ç¤¢¤ë¤È¤¹¤ë¡£¡ÊÏÀʸÅù¤«¤éÍ¿¤¨¤é¤ì¤Æ¤¤¤ë¡Ë
¤³¤ì¤«¤é¡¢NCBI¤ÎSRA¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÇSRA000299¤ò¸¡º÷¤¹¤ë¡£
Web¤Ç¼êºî¶È¤Ç¥¢¥¯¥»¥¹¤¹¤ë¾ì¹ç¤Ï¡¢Ä¾ÀܤËNCBI¤ÎSRA¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë¥¢¥¯¥»¥¹¤¹¤ì¤Ð¤è¤¤¡£
https://www.ncbi.nlm.nih.gov/sra/?term=SRA000299
°Ê²¼¡¢Web-API¥Ù¡¼¥¹¡Ê¡áPython¥×¥í¥°¥é¥à·Ðͳ¡Ë¤Çºî¶È¤¹¤ë¤³¤È¤ò¹Í¤¨¤ë¡£NCBI¤Î
SRA¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÏWeb-API¡Ê¤Î¤ä¤êÊý¡Ë¤ò¸ø³«¤·¤Æ¤¤¤Ê¤¤¤è¤¦¤Ç¡¢Entrez¤ò»È¤¦¤è¤¦¤Ë
»Ø¼¨¤·¤Æ¤¤¤ë¡£[[¡ÊDownload SRA sequences from Entrez search results¡Ë:https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/]]¡¡
¤³¤³¤ÎÎã¤Ë¤¢¤ë¤è¤¦¤Ë¡¢Entrez¤ÎÌ䤤¹ç¤ï¤»¤òºî¤ë¡£
Entrez API¤Î¥á¥â
-[[NCBI APIs:https://www.ncbi.nlm.nih.gov/home/develop/api/]]
-[[Entrez Programming Utilities Help:https://www.ncbi.nlm.nih.gov/books/NBK25501/]]
-[[The E-utilities In-Depth: Parameters, Syntax and More:https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch]]
-[[Table 1 – Valid values of &retmode and &rettype for EFetch (null = empty string):https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly]]
[[Download SRA sequences from Entrez search results:https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/]]
[[SRA Toolkit download:https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/]] <--fastq-dump and sam-dump
[[Python¤ÇNCBI¤ÎAPI¤«¤éʸ¸¥¾ðÊó¤ò¼èÆÀ¤·¤Æ¤ß¤¿:https://qiita.com/MTNakata/items/1538da3e97fe0a8b951a]] ¤Ç¤ÏBioPython¤ò»È¤ï¤Ê¤¤¤ÇľÀÜWeb-API¤òᤤ¤Æ¤¤¤ë¤¬¡¢º£¤Ï¤³¤ì¤Ï»È¤ï¤Ê¤¤¤Ç¤ª¤¯¡£
Python¤Ë¤è¤ëEntrez SRA¤Î¥¢¥¯¥»¥¹
from Bio import Entrez
Entrez.email = "yamanouc@hyperresearch.com"
handle = Entrez.esearch(db="sra", term="SRA000299")
result = Entrez.read(handle) # ¸¡º÷·ë²Ì¤òresult¤ËÆþ¤ì¤ë
print('SRA SRA000299, IDList', result['IdList'])
for id in result['IdList'][:1]: # result¤Î¤¦¤ÁIdList¤ò¼è¤ê½Ð¤·¤Æ¡¢£±¤Ä¤º¤Ä¥¢¥¯¥»¥¹
handle = Entrez.efetch(db="sra", id=id, retmode="xml") # £±¤Ä¤º¤Ä¥¢¥¯¥»¥¹
print(handle.read())
·ë²Ì¤Ïxml¤·¤«¤Ê¤¤¡£xml¤ò²òÆÉ¤·¤Æ¡¢Íߤ·¤¤RUN accession id ¤òÆþ¼ê¤¹¤ë¤Ë¤Ï¡¢
import xml.etree.ElementTree as ET
from Bio import Entrez
Entrez.email = "yamanouc@hyperresearch.com"
handle = Entrez.esearch(db="sra", term="SRA000299")
result = Entrez.read(handle)
print('SRA SRA000299, IDList', result['IdList'])
for id in result['IdList']:
handle = Entrez.efetch(db="sra", id=id, retmode="xml")
root = ET.fromstring(handle.read())
# find
items = root.findall('.//RUN')
for i, u in enumerate(items):
#print(i, 'tag', u.tag, 'attr', u.attrib)
print(u.get('accession'))
·ë²Ì¤Ï
SRR002324
SRR002320
SRR002325
SRR002322
SRR002321
SRR002323
¤È¤Ê¤ë¡£
¤³¤Îrun accession number SRR002320¡Á5¤ò»È¤Ã¤Æ¡¢fastq¥Æ¡¼¥Ö¥ë¤ò¥¢¥¯¥»¥¹¤¹¤ë¡£
¤³¤³¤«¤é¡¢SRA¤Î¥Ç¡¼¥¿¡ÊSRA¥Ç¡¼¥¿¡¢ºÇ¸å¤Ë¤Ïfastq¥Ç¡¼¥¿¡Ë¤Î¥À¥¦¥ó¥í¡¼¥É¡£~
¤Þ¤º[[SRA¤Î¥Ú¡¼¥¸:https://www.ncbi.nlm.nih.gov/sra]]¢Í[[NCBI SRA Toolkit:https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software]]~
¤³¤³¤«¤éNCBI SRA Toolkit¤ò¥À¥¦¥ó¥í¡¼¥É¡¦Å¸³«¤·¤Æ¡¢¤³¤ì¤Ë¤è¤Ã¤Æ¥¢¥¯¥»¥¹¤¹¤ë¡£R¤À¤ÈÃæ¤Ç¼«Æ°Åª¤Ë¤¤¤í¤¤¤í¤È¤ä¤Ã¤Æ¤¯¤ì¤ë¤è¤¦¤À¡£
fasterq-dump SRR002320
·ë²Ì¤Ï
spots read : 39,266,713
reads read : 39,266,713
reads written : 39,266,713
¤Ç¡¢¥Õ¥¡¥¤¥ë¤Ï SRR002320.fastq¤È¤¤¤¦8340468900¥Ð¥¤¥È¤Î¥Õ¥¡¥¤¥ë¤¬ºî¤é¤ì¤¿¡£ÀèÆ¬¤òÇÁ¤¤¤Æ¤ß¤ë¤È
@SRR002320.1 080226_CMLIVERKIDNEY_0007:1:1:112:735 length=36
GTGGTGGGGTTGGTATTTGGTTTCTCGTTTTAATTA
+SRR002320.1 080226_CMLIVERKIDNEY_0007:1:1:112:735 length=36
IIIIIIII"IIIII)I$I1%HII"I#./(#/'$#*#
@SRR002320.2 080226_CMLIVERKIDNEY_0007:1:1:114:564 length=36
GGATACTCAGGCTGGCCCAATTTCTGGGCGTGGGAA
+SRR002320.2 080226_CMLIVERKIDNEY_0007:1:1:114:564 length=36
IIII:>&<I;I%I88II1&+I:IF>II,&D:I-'),
@SRR002320.3 080226_CMLIVERKIDNEY_0007:1:1:109:558 length=36
GTAGAATTAGAATTGTGAAGATGATAAGTGTAGAGG
+SRR002320.3 080226_CMLIVERKIDNEY_0007:1:1:109:558 length=36
IIIIIIIIIIIIIIIIIIIIIII<IIAIIII6I?I:
¤Î¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¡£
¥ê¥Õ¥¡¥ì¥ó¥¹ÇÛÎó¤ÎÊý¤Ï¡¢NCBI Genome¤Ë¤ª¤¤¤ÆHomo sapiens genome¤ò¸¡º÷~
¡¡¢Í [[¸¡º÷:https://www.ncbi.nlm.nih.gov/search/all/?term=Homo+sapiens+genome]]~
¡¡¡¡¢Í GRCh38.p12 (December 2017) Download¡¡¢Í ¥Õ¥¡¥¤¥ëGRCh38.p12.tar¤È¤·¤Æºî¤ë(938MB)¡¡¡¡(p90)
¡¡¡¡¡¡
*°äÅÁ¸¦¹Ö½¬²ñ¤Ç¤ÎÎã [#c85616f7]
[[¡ÖÀè¿Ê¥²¥Î¥à»Ù±ç¡×¾ðÊó²òÀϹֽ¬²ñ¤Î¤´°ÆÆâ:https://www.genome-sci.jp/whatsnew/event/news20180920.html]]~
¢Í[[¾ðÊó²òÀϹֽ¬²ñ¥Ó¥Ç¥ª¡ã2018ǯÅÙ¡¡¾ðÊó²òÀϹֽ¬²ñ¡ÊÃæµé¼Ô¸þ¤±¡Ë¡ä:https://www.genome-sci.jp/lecture20181st]]~
¢Í[[»ñÎÁ¡ÊGitHub¡Ë:https://github.com/genome-sci/python_bioinfo_2018]]
1-1¤ÎÂêºà¤Ï½Ð²ê¹ÚÊìSaccharomyces cerevisiae¤Ç2¤Ä¤Î°Û¤Ê¤ë¾ò·ï¤Ç¤ÎÇÝÍÜ~
Intawat Nookaew et al~
"A comprehensive comparison of RNA-Seq-based transcriptome analysis
from reads to differential gene expression and cross-comparison with microarrays: a case study in Sacchaomyces cerevisiae"~
Nucleic Acids Research, 2012, Vol. 40, No. 20, Septemter 2012
doi: 10.1093/nar/gks804~
[[full text:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488244/]]
ÏÀʸ¤Ç¤Îµ½Ò¡Êp10095¡Ë¤Ë¡¢¥ê¡¼¥É¤Ë´Ø¤¹¤ëACCESSION NUMBERS¤È¤·¤Æ
GSE37599, SRS307298, SRR453566, SRR453567,
SRR453568, SRR453569, SRR453570, SRR453571 and
SRR453578.
¤È¤¢¤ë¤Î¤Ç¡¢¤³¤ì¤òÍê¤ê¤Ë¡¢¾å¤ÎÎã¤ÈƱ¤¸¤è¤¦¤Ë¤·¤Æ¡¢SRA¤Î¥Ç¡¼¥¿¡ÊSRA¥Ç¡¼¥¿¡¢ºÇ¸å¤Ë¤Ïfastq¥Ç¡¼¥¿¡Ë¤ò¥À¥¦¥ó¥í¡¼¥É¤Ç¤¤ë¡£
¤Þ¤ºSRA¤Î¥Ú¡¼¥¸ ¢Í NCBI SRA Toolkit ¤«¤éNCBI SRA Toolkit¤ò¥À¥¦¥ó¥í¡¼¥É¡¦Å¸³«¤·¤Æ¡¢¤³¤ì¤ò»È¤Ã¤Æ¥¢¥¯¥»¥¹¤¹¤ë¡£SRR453566¤Î¥Ç¡¼¥¿¤Î¥À¥¦¥ó¥í¡¼¥É¤Ï
fasterq-dump SRR453566
¤Î¤è¤¦¤Ë¤¹¤ì¤Ð¤è¤¤¡£
¤Þ¤¿ÏÀÊ¸Ãæ¤Ë reference¤È¤·¤ÆS288c¥²¥Î¥à¤ò»È¤¦¤³¤È¤¬½ñ¤«¤ì¤Æ¤¤¤ë¤Î¤Ç¡¢¤³¤Î¥Ç¡¼¥¿¤â
¥¢¥¯¥»¥¹¤¹¤ëɬÍפ¬¤¢¤ë¡£¤³¤³¤ò¸«¤¿¡£
>Transcriptome analysis using reference genome-based reads mapping
The genome sequence of S. cerevisiae strain S288c and its annotations were retrieved from the SGD databases and used for all analysis.
¤³¤³¤«¤é¡¢[[SGD:https://www.yeastgenome.org/]] (Saccharomyces Genome Database) ¤ÎS288C¤ò¥µ¡¼¥Á¤¹¤ë¤È¡¢[[Strain: S288C:https://www.yeastgenome.org/strain/S000203483]]¤¬ÆÀ¤é¤ì¤ë¡£¤³¤ÎÃæ¤Ç¡¢GenBank GCF_000146045.2¤Î¥¨¥ó¥È¥ê¡¼¤ò»È¤¦¤³¤È¤Ë¤¹¤ë¡£
GCF_000146045.2¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢[[GenBank¤ÎR64 Organism name: Saccharomyces cerevisiae S288C (baker's yeast) Strain: S288C¤Î¥Ú¡¼¥¸:https://www.ncbi.nlm.nih.gov/assembly/GCF_000146045.2/]]¤ËÄ·¤Ö¡£Ä·¤ó¤ÀÀè¤Î¥Ú¡¼¥¸¤Î±¦Â¦¡ÖAccess the data¡×¤Î[[Download the RefSeq assembly:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64]]¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢¥Õ¥¡¥¤¥ë¥ê¥¹¥È¤¬É½¼¨¤µ¤ì¤ë¡£¤½¤ÎÃæ¤«¤é¡¢»²¾È¥·¡¼¥±¥ó¥¹¤È¤·¤Æ[[GCF_000146045.2_R64_genomic.fna.gz:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz]]¡¢µ½ÒGFF¥Õ¥¡¥¤¥ë¤È¤·¤Æ[[GCF_000146045.2_R64_genomic.gff.gz:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz]]¤ò¥À¥¦¥ó¥í¡¼¥É¤¹¤ë¡£
Ê̤ÎÅþã¥Ñ¥¹¤È¤·¤Æ¤Ï¡¢NBCI¤Îgenome¤Î[[¥Ú¡¼¥¸:https://www.ncbi.nlm.nih.gov/genome]]¤«¤ésaccharomyces cerevisiae s288c[orgn] ¤ò[[¸¡º÷:https://www.ncbi.nlm.nih.gov/genome?term=saccharomyces+cerevisiae+s288c%5Borgn%5D&cmd=DetailsSearch]]¤¹¤ë¡£~
¤³¤ÎÃæ¤Î[[Fasta¥Õ¥©¡¼¥Þ¥Ã¥È¤Îgenome¤ò¥À¥¦¥ó¥í¡¼¥É:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz]]¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢¥Õ¥¡¥¤¥ëGCF_000146045.2_R64_genomic.fna.gz¤¬ÆÀ¤é¤ì¤ë¡£~
¹¹¤Ë[[gff¥Õ¥©¡¼¥Þ¥Ã¥È¤Îannotation¤ò¥À¥¦¥ó¥í¡¼¥É:ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz]]¤ò¥¯¥ê¥Ã¥¯¤¹¤ë¤È¡¢¥Õ¥¡¥¤¥ëGCF_000146045.2_R64_genomic.gff.gz¤¬ÆÀ¤é¤ì¤ë¡£
* ¤½¤Î¾¤¤¤í¤¤¤í [#v944cf06]
[[RNA-seq±é½¬:https://www.genome-sci.jp/seminar2018/7_doc_takahashi.pdf]](2018-03)¹â¶¶¹°´î
**Ensembl Genomes¤Î¥¢¥¯¥»¥¹ [#l037d90e]
**UCSC ? [#d97a4303]
*FastQC¤Ë¤è¤ë¥¯¥©¥ê¥Æ¥£¥Á¥§¥Ã¥¯ [#k7363eae]
¥Ð¥Ã¥Á¥â¡¼¥É¤Ç¼Â¹Ô¤¹¤ë¾ì¹ç¡£
fastqc --nogroup SRR453566_1.fastq
fastqc --nogroup SRR453566_2.fastq
¥¯¥ª¥ê¥Æ¥£¤Î¥°¥é¥Õ¤¬À¸À®¤µ¤ì¤ë¡£»È¤¤Êý¤Î¾ÜºÙ¤Ï[[FASTQ ¥¯¥ª¥ê¥Æ¥£¥³¥ó¥È¥í¡¼¥ë:https://bi.biopapyrus.jp/rnaseq/qc/fastqc.html]]¤Ë¥Ñ¥é¥á¡¼¥¿»ØÄꤢ¤ê¡£
¸µ¥Ú¡¼¥¸¤Ï[[fastqc:https://www.bioinformatics.babraham.ac.uk/projects/fastqc/]]
*Trimmomatic¤Ë¤è¤ë¥È¥ê¥ß¥ó¥° [#df89e2ab]
Trimmomatic¤Ï[[¤³¤Á¤é:http://www.usadellab.org/cms/?page=trimmomatic]]»²¾È¡£~
¥À¥¦¥ó¥í¡¼¥É¡Ê2019-03-02»þÅÀ¤ÇVersion 0.38¡Ë~
/usr/local/Trimmomatic¤Ë¥¤¥ó¥¹¥È¡¼¥ë
»È¤¤Êý¤Ï
java -jar trimmomatic-0.38.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz
output_forward_paired.fq.gz output_forward_unpaired.fq.gz
output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz
ILLUMINACLIP:TruSeq3- PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
This will perform the following:
Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
Remove leading low quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Scan the read with a 4-base wide sliding window, cutting when the average quality
per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)
°äÅÁ¸¦¥¹¥é¥¤¥É¤Ë¤è¤ë¤È
java -jar -Xmx512m trimmomatic-0.38.jar \
PE \
-threads ${NSLOTS} \
-phred33 \
-trimlog log_SRR${NUM}.txt \
SRR${NUM}_1.fastq.gz \
SRR${NUM}_2.fastq.gz \
paired_SRR${NUM}_1.trim.fastq.gz \
unpaired_SRR${NUM}_1.trim.fastq.gz \
paired_SRR${NUM}_2.trim.fastq.gz \
unpaired_SRR${NUM}_2.trim.fastq.gz \
ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:4:15 \
MINLEN:36
°ìÈ̤ˡ¢¥¢¥À¥×¥¿½üµî¤Ï¾¤Î²Ã¹©¤è¤êÀè¤Ë¹Ô¤¦Êý¤¬Îɤ¤¡Ê¾¤Î²Ã¹©¤Ë¤è¤ê¥Þ¥Ã¥Á¥ó¥°¤¬Æñ¤·¤¯¤Ê¤ë¤¿¤á¡Ë¤È¤·¤Æ¤¤¤ë¡£
¸Ä¡¹¤ÎºÙ¤«¤¤¥Ñ¥é¥á¡¼¥¿¤Î°ÕÌ£¤Ï[[trimmomatic¤Î¥Ú¡¼¥¸:http://www.usadellab.org/cms/?page=trimmomatic]]¤È[[¥Þ¥Ë¥å¥¢¥ë:http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf]]¤Ë½ñ¤«¤ì¤Æ¤¤¤ë¡£°Ê²¼¡¢¾åµ¤ÎÎã¤Ë¤Ä¤¤¤ÆÀâÌÀ¤¹¤ë¡ÊPaired End¤Î¾ì¹ç¤Ë¸Â¤ë¡Ë¡£
|-Xmx256m |¤³¤ì¤Ïjava¥³¥Þ¥ó¥É¤ËÂФ¹¤ë»ØÄê¤Ç¡¢trimmomatic¤Î¥Ñ¥é¥á¡¼¥¿¤Ç¤Ï¤Ê¤¤¡£-Xmx256m¤Ï¥á¥â¥ê³äÅö¤Æ¤ÎºÇÂçÎ̤ò256M¥Ð¥¤¥È¤Ë¤¹¤ë¡£Ìµ»ØÄê»þ¤Ï64M|
|PE |ưºî¥â¡¼¥É¤¬SE(SingleEnd)¤«PE(PairedEnd)¤«|
|--threads 16 |½èÍý»þ¤ÎÊÂÎó¥¹¥ì¥Ã¥É¿ô|
|-phred33 |±ö´ð¡ÊÆÉ¼è¤ê¡ËÉʼÁ¤Îµ½ÒË¡¡¢-phread33¤«-phread64¡¢Ìµ»ØÄê»þ¤Ï¼«Æ°È½ÊÌ¡Ê-v0.32°Ê¹ß¡Ë|
|-trimlog log_SRR453566.txt|¼Â¹Ô¥í¥°¤Î½ÐÎÏÀè¥Õ¥¡¥¤¥ë̾¤Î»ØÄê|
|SRR453566_1.fastq|PairedEnd¤Ç¤ÎÆþÎÏforward¥Õ¥¡¥¤¥ë |
|SRR453566_2.fastq|PairedEnd¤Ç¤ÎÆþÎÏbackward¥Õ¥¡¥¤¥ë |
|paired_SRR453566}_1.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏpaired forward¥Õ¥¡¥¤¥ë |
|unpaired_SRR453566}_1.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏunpaired forward¥Õ¥¡¥¤¥ë |
|paired_SRR453566}_2.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏpaired backward¥Õ¥¡¥¤¥ë |
|unpaired_SRR453566}_2.trim.fastq|PairedEnd¤Ç¤Î½ÐÎÏunpaired backward¥Õ¥¡¥¤¥ë |
¤³¤ì°Ê¹ß¤Ï¡¢¸ÄÊ̤νüµî¥¹¥Æ¥Ã¥×¤ò»ØÄꤹ¤ë¡£
|ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10|¥¹¥Æ¥Ã¥×1¤ÇIllumina adapter¤ò½üµî¡¢TruSeq***¤Ï¥¢¥À¥×¥¿¡¼¤òµ½Ò¤·¤¿fasta¥Õ¥¡¥¤¥ë¡¢2¤ÏºÇÂç¥ß¥¹¥Þ¥Ã¥Á¿ô¡¢30¤Ï²óʸ¥¢¥é¥¤¥á¥ó¥È»þ¤Ë£²¤Ä¤ÎÎÙÀܥ꡼¥É¤¬¤É¤ì¤À¤±Àµ³Î¤Ë¥Þ¥Ã¥Á¤¹¤ë¤«¤ò»ØÄê¡¢10¤Ï¥¢¥À¥×¥¿¡¼¤È¥ê¡¼¥É´Ö¤Î¥¢¥é¥¤¥á¥ó¥È¥Þ¥Ã¥Á¤ÎÀµ³Î¤µ|
|LEADING:20|ÀèÆ¬¤«¤éÄãÉʼÁ¥Ù¡¼¥¹¤ò¼è¤ê½ü¤¯¡¢¤³¤Î»þ¤Î»Ä¤¹¤¿¤á¤ÎºÇÄãÉʼÁ¤¬20|
|TRAILING:20|ËöÈø¤«¤éÄãÉʼÁ¥Ù¡¼¥¹¤ò¼è¤ê½ü¤¯¡¢¤³¤Î»þ»Ä¤¹¤¿¤á¤ÎºÇÄãÉʼÁ¤¬20|
|CROP:? Îã¤Ç¤Ï»È¤ï¤ì¤Æ¤¤¤Ê¤¤|ÉʼÁ¤Ë´Ø·¸¤Ê¤¯¡¢ÀèÆ¬¤«¤é»ØÄꤵ¤ì¤¿±ö´ð¿ô¤À¤±¤ò»Ä¤·¸å¤í¤ò½üµî|
|HEADCROP:? Îã¤Ç¤Ï»È¤ï¤ì¤Æ¤¤¤Ê¤¤|ÉʼÁ¤Ë´Ø·¸¤Ê¤¯¡¢ÀèÆ¬¤«¤é»ØÄꤵ¤ì¤¿±ö´ð¿ô¤À¤±½üµî¤·¸å¤í¤ò»Ä¤¹|
|SLIDINGWINDOW:4:15|¥¹¥é¥¤¥Ç¥£¥ó¥°¥¦¥£¥ó¥É¥¦Éý¤ò4¤È¤·¡¢¤½¤ÎÃæ¤Ç¤ÎÊ¿¶ÑÉʼÁ¤¬15°Ê¾å¤Î¤â¤Î¤ò»Ä¤¹|
|MINLEN:36|¡ÊÄ̾ïºÇ¸å¤Ë¹Ô¤¦¡Ë»Ä¤Ã¤Æ¤¤¤ë¥ê¡¼¥É¤Î¤¦¤Á¡¢Ä¹¤µ¤ÎºÇ¾®ÃÍ36°Ê¾å¤Î¤â¤Î¤ò»Ä¤¹|
¼ÂºÝ¤Î¥³¥Þ¥ó¥É¤Ï
java -jar -Xmx512m /usr/local/Trimmomatic/trimmomatic-0.38.jar PE \
-threads 32 \
-phred33 \
-trimlog log_SRR453566.txt \
SRR453566_1.fastq \
SRR453566_2.fastq \
paired_SRR453566_1.trim.fastq \
unpaired_SRR453566_1.trim.fastq \
paired_SRR453566_2.trim.fastq \
unpaired_SRR453566_2.trim.fastq \
ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 \
LEADING:20 \
TRAILING:20 \
SLIDINGWINDOW:4:15 \
MINLEN:36
·ë²Ì¤Ï
TrimmomaticPE: Started with arguments:
-threads 32 -phred33 -trimlog log_SRR453566.txt SRR453566_1.fastq SRR453566_2.fastq
paired_SRR453566}_1.trim.fastq unpaired_SRR453566_1.trim.fastq
paired_SRR453566_2.trim.fastq unpaired_SRR53566_2.trim.fastq ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10
LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and
'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only
sequences, 0 reverse only sequences
Input Read Pairs: 5725730 Both Surviving: 5115482 (89.34%) Forward Only Surviving:
514793 (8.99%) Reverse Only Surviving: 46123 (0.81%) Dropped: 49332 (0.86%)
TrimmomaticPE: Completed successfully
¤Ê¤ª¡¢IlluminaClip¤Ç»ØÄꤹ¤ë¥¢¥À¥×¥¿¡¼¥·¡¼¥±¥ó¥¹¤Ï
[[GitHub¤Îtrimmomatic¤Î¥Ñ¥Ã¥±¡¼¥¸Ãæ¤Îadapters/TruSeq30PE-2.fa:https://github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-PE-2.fa]]
¤ò»È¤¦¤³¤È¤¬¤Ç¤¤¿¡£
¥Ú¡¼¥¸Ì¾: