[[Pythonバイオ]] [[Pythonバイオ/ツール]]~
&counter();   &lastmod();~

*リボソームRNA ribosomal RNA の除去 [#ze584e24]

-2019-07-23 ribosomal RNA
--[[RNA-seqにおけるrRNAの扱い - 備忘録 a record of inner life:http://amphipod.hatenablog.com/entry/2016/12/01/190843]]
--[[RNA-seq データからの rRNA除去 | Tips for NGS Data Analysis:http://catway.jp/bioinformatics/RNA-Seq/rrna.html]]
--[[RNA-Seq  〜研究に合わせたアプリケーションの選び方〜 (illumina):
https://jp.illumina.com/content/dam/illumina-marketing/apac/japan/documents/pdf/2015_techsupport_session7.pdf]]
--[[シーケンス講習会
RNA-seq library調製法の特徴と選び方(理研):http://www.clst.riken.jp/files/5914/3338/5219/4_How_to_prepare_RNAseq_library_and_sequence_data.pdf]]

--[[rRNAのコンタミを除く SortMeRNA - macでインフォマティクス:
http://kazumaxneo.hatenablog.com/entry/2018/01/21/125838]]
--[[Bonsai bioinformatics - sortmerna
https://bioinfo.lifl.fr/sortmerna/sortmerna.php]]
--[[SortMeRNA-user-manual-v2.1.pdf:https://bioinfo.lifl.fr/RNA/sortmerna/code/SortMeRNA-user-manual-v2.1.pdf]]

対象は 16S, 23S, 5S


* 九大からのデータ [#h3c95016]




■ 岸本先生, 東邦大									
 SE51									
 E. coli									
									
 これまでの2ランの合計									
 file	#リード数	#トリム済	#rrn_map	#rrn_unmap	%rrn_unmap	5M SEのnon-rRNAを得るために必要なSE数	5M SEのnon-rRNAを得るために不足しているSE数	5M SEのnon-rRNAを得るために必要なSE数に対する超過量	
 181012_10B_S7	17,054,354	17,048,531	15,372,799	1,675,732	9.83	50,886,281	33,831,927		
 181012_1p2-1_S11	14,637,538	14,632,750	11,399,254	3,233,496	22.09	22,634,229	7,996,691		
 181012_1p2-2_S13	17,575,234	17,568,939	13,697,563	3,871,376	22.03	22,698,950	5,123,716		
 181012_2p5-1_S12	11,690,525	11,687,060	8,966,637	2,720,423	23.27	21,486,594	9,796,069		
 181012_2p6-1_S14	16,662,621	16,657,233	12,576,242	4,080,991	24.49	20,414,920	3,752,299		
 181012_45a_plus_S2	18,990,853	18,985,343	14,055,624	4,929,719	25.96	19,261,598	270,745		
 181012_45b_plus_S3	15,055,824	15,052,797	11,717,495	3,335,302	22.15	22,570,406	7,514,582		
 181012_45c_plus_S4	21,339,929	21,336,596	16,477,436	4,859,160	22.77	21,958,455	618,526		
 181013_45A_minus_S18	16,532,044	16,524,826	6,155,460	10,369,366	62.72	7,971,579	***	8,560,465	
 181019_43B_S1	14,923,538	14,920,375	10,969,417	3,950,958	26.47	18,885,974	3,962,436		
 181019_45A_plus_S5	16,869,266	16,865,671	12,115,849	4,749,822	28.16	17,757,788	888,522		
 181019_45L_S6	12,808,719	12,805,744	8,250,496	4,555,248	35.56	14,059,299	1,250,580		
 181019_45a10D_plus_S9	16,210,966	16,205,968	13,420,227	2,785,741	17.18	29,096,327	12,885,361		
 181019_45aIII6c_plus_S8	15,696,596	15,690,006	12,927,019	2,762,987	17.60	28,405,121	12,708,525		
 181019_45d7B_plus_S10	13,268,830	13,264,315	11,154,294	2,110,021	15.90	31,442,412	18,173,582		
 181026_10D_minus_S21	20,353,513	20,344,812	10,544,708	9,800,104	48.15	10,384,335	***	9,969,178	
 181026_2-10B_minus_S20	20,487,318	20,476,914	9,177,453	11,299,461	55.15	9,065,617	***	11,421,701	
 181026_45a_minus_S15	17,516,312	17,510,720	9,034,910	8,475,810	48.39	10,333,120	***	7,183,192	
 181026_45b_minus_S16	15,943,050	15,937,770	7,184,469	8,753,301	54.90	9,106,879	***	6,836,171	
 181026_45c_minus_S17	14,383,563	14,376,676	5,640,314	8,736,362	60.74	8,232,010	***	6,151,553	
 181103_45alll6c_minus_S22	21,788,554	21,781,657	10,052,049	11,729,608	53.83	9,287,844	***	12,500,710	
 181103_45d7B_minus_S23	22,149,677	22,140,700	16,750,927	5,389,773	24.33	20,547,876	***	1,601,801	
 181103_Anc_S19	17,975,953	17,964,999	13,759,979	4,205,020	23.39	21,374,397	3,398,444		
 PwOw_minus_S25	18,853,776	18,849,002	8,300,103	10,548,899	55.95	8,936,371	***	9,917,405	
 PwOw_plus_S24	19,269,944	19,263,287	14,659,892	4,603,395	23.89	20,930,144	1,660,200		
							123,832,204		

16S, 23S, 5S

* sortmeRNAをやってみる  2019-07-23 [#a4350acb]
バイナリをダウンロード、/usr/local/bin/sortmernaからリンク

データベースは?  ディレクトリ ~/src/sortmerna-2.1b/rRNA_databasesを確認
silva-arc-16s-id95.fasta  arcaea 古細菌
silva-arc-23s-id98.fasta  
silva-bac-16s-id90.fasta  bacterium 細菌
silva-bac-23s-id98.fasta
silva-euk-18s-id95.fasta  eukaryota 真核生物
silva-euk-28s-id98.fasta
rfam-5s-database-id98.fasta
rfam-5.8s-database-id98.fasta

じゃあ、silva-bac-16s-id90.fasta  と silva-bac-23s-id98.fasta と rfam-5s-database-id98.fastaでいいのか。

indexdbを作る

 cd ~/src/sortmerna-2.1b
 indexdb_rna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\
   ./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:\
   ./rRNA_databases/silva-arc-16s-id95.fasta,./index/silva-arc-16s-db:\
   ./rRNA_databases/silva-arc-23s-id98.fasta,./index/silva-arc-23s-db:\
   ./rRNA_databases/silva-euk-18s-id95.fasta,./index/silva-euk-18s-db:\
   ./rRNA_databases/silva-euk-28s-id98.fasta,./index/silva-euk-28s:\
   ./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:\
   ./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db

時間がかかるが、1回だけ行っておけばいい(らしい)。indexディレクトリ下にファイルができた。

これを使って1つだけ処理。たとえば、
~/src/KishimotoRNA2/181012_10B_S7_R1_001.trim.fastq
を試してみる。

 sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:\
  ./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:\
  ./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:\
  ./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db \
  --reads 181012_10B_S7_R1_001.trim.fastq --sam --num_alignments 1 --fastx --aligned 181012_10B_S7_R1_001.rRNA \
  --other 181012_10B_S7_R1_001.trim.non_rRNA --log -v

画面
  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com 
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr
 
 
  Computing read file statistics ... done [23.20 sec]
  size of reads file: 2906733904 bytes
  partial section(s) to be executed: 3 of size 1073741824 bytes 
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 1
 
  Begin mmap reads section # 1:
  Time to mmap reads and set up pointers [1.37 sec]
 
  Begin analysis of: ./rRNA_databases/silva-bac-16s-id90.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.602725
    Gumbel K = 0.329559
    Minimal SW score based on E-value = 59
    Loading index part 1/1 ...  done [1.34 sec]
    Begin index search ...  done [719.52 sec]
    Freeing index ...  done [0.24 sec]
 
  Begin analysis of: ./rRNA_databases/silva-bac-23s-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.602436
    Gumbel K = 0.335011
    Minimal SW score based on E-value = 58
    Loading index part 1/1 ...  done [1.00 sec]
    Begin index search ...  done [1471.22 sec]
    Freeing index ...  done [0.16 sec]
 
  Begin analysis of: ./rRNA_databases/rfam-5s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.616694
    Gumbel K = 0.342032
    Minimal SW score based on E-value = 56
    Loading index part 1/1 ...  done [0.60 sec]
    Begin index search ...  done [143.74 sec]
    Freeing index ...  done [0.08 sec]
 
  Begin analysis of: ./rRNA_databases/rfam-5.8s-database-id98.fasta
    Seed length = 18
    Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
    Gumbel lambda = 0.617555
    Gumbel K = 0.343861
    Minimal SW score based on E-value = 54
    Loading index part 1/1 ...  done [0.22 sec]
    Begin index search ...  done [111.06 sec]
    Freeing index ...  done [0.03 sec]
    Total number of reads mapped (incl. all reads file sections searched): 5800992
    Writing aligned FASTA/FASTQ ...  done [18.38 sec]
    Writing not-aligned FASTA/FASTQ ...  done [1.82 sec]
 
  Begin mmap reads section # 2:
  Time to mmap reads and set up pointers [1.53 sec]
 
  Begin analysis of: ./rRNA_databases/silva-bac-16s-id90.fasta
    Loading index part 1/1 ...  done [1.44 sec]
    Begin index search ...  done [690.19 sec]
    Freeing index ...  done [0.27 sec]
 
  Begin analysis of: ./rRNA_databases/silva-bac-23s-id98.fasta
    Loading index part 1/1 ...  done [0.93 sec]
    Begin index search ...  done [1429.56 sec]
    Freeing index ...  done [0.14 sec]
 
  Begin analysis of: ./rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.58 sec]
    Begin index search ...  done [140.79 sec]
    Freeing index ...  done [0.07 sec]
 
  Begin analysis of: ./rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.21 sec]
    Begin index search ...  done [111.21 sec]
    Freeing index ...  done [0.02 sec]
    Total number of reads mapped (incl. all reads file sections searched): 11563780
    Writing aligned FASTA/FASTQ ...  done [19.83 sec]
    Writing not-aligned FASTA/FASTQ ...  done [2.13 sec]
 
  Begin mmap reads section # 3:
  Time to mmap reads and set up pointers [1.07 sec]
 
  Begin analysis of: ./rRNA_databases/silva-bac-16s-id90.fasta
    Loading index part 1/1 ...  done [1.42 sec]
    Begin index search ...  done [523.39 sec]
    Freeing index ...  done [0.26 sec]
 
  Begin analysis of: ./rRNA_databases/silva-bac-23s-id98.fasta
    Loading index part 1/1 ...  done [0.93 sec]
    Begin index search ...  done [1039.06 sec]
    Freeing index ...  done [0.18 sec]
 
  Begin analysis of: ./rRNA_databases/rfam-5s-database-id98.fasta
    Loading index part 1/1 ...  done [0.60 sec]
    Begin index search ...  done [100.63 sec]
    Freeing index ...  done [0.07 sec]
 
  Begin analysis of: ./rRNA_databases/rfam-5.8s-database-id98.fasta
    Loading index part 1/1 ...  done [0.22 sec]
    Begin index search ...  done [80.05 sec]
    Freeing index ...  done [0.02 sec]
    Total number of reads mapped (incl. all reads file sections searched): 15633981
    Writing aligned FASTA/FASTQ ...  done [13.92 sec]
    Writing not-aligned FASTA/FASTQ ...  done [1.40 sec]

ログファイル 181012_10B_S7_R1_001.rRNA.log
 Tue Jul 23 16:59:31 2019
 
 Command: sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db --reads 181012_10B_S7_R1_001.trim.fastq --sam --num_alignments 1 --fastx --aligned 181012_10B_S7_R1_001.rRNA --other 181012_10B_S7_R1_001.trim.non_rRNA.fastq fastq -v 
 Process pid = 70967
 Parameters summary:
    Index: ./index/silva-bac-16s-db
     Seed length = 18
     Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
     Gumbel lambda = 0.602725
     Gumbel K = 0.329559
     Minimal SW score based on E-value = 59
    Index: ./index/silva-bac-23s-db
     Seed length = 18
     Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
     Gumbel lambda = 0.602436
     Gumbel K = 0.335011
     Minimal SW score based on E-value = 58
    Index: ./index/rfam-5s-db
     Seed length = 18
     Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
     Gumbel lambda = 0.616694
     Gumbel K = 0.342032
     Minimal SW score based on E-value = 56
    Index: ./index/rfam-5.8s-db
     Seed length = 18
     Pass 1 = 18, Pass 2 = 9, Pass 3 = 3
     Gumbel lambda = 0.617555
     Gumbel K = 0.343861
     Minimal SW score based on E-value = 54
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nucleotide = -3
    SQ tags are not output
    Number of threads = 1
    Reads file = 181012_10B_S7_R1_001.trim.fastq
 
 Results:
    Total reads = 17049704
    Total reads passing E-value threshold = 15633981 (91.70%)
    Total reads failing E-value threshold = 1415723 (8.30%)
    Minimum read length = 36
    Maximum read length = 51
    Mean read length = 50
 By database:
    ./rRNA_databases/silva-bac-16s-id90.fasta		20.37%
    ./rRNA_databases/silva-bac-23s-id98.fasta		71.31%
    ./rRNA_databases/rfam-5s-database-id98.fasta		0.02%
    ./rRNA_databases/rfam-5.8s-database-id98.fasta		0.00%
 
 Tue Jul 23 18:50:05 2019

トリム前
 wc -l 181012_10B_S7_R1_001.fastq
 68217416 181012_10B_S7_R1_001.fastq  4で割ると 17,054,354
トリム後
 wc -l 181012_10B_S7_R1_001.trim.fastq
 68198816 181012_10B_S7_R1_001.trim.fastq  4で割ると 17,049,704

作られたfastqファイルの長さ 
  wc -l 181012_10B_S7_R1_001.rRNA.fastq で測定
  62,535,924 181012_10B_S7_R1_001.rRNA.fastq  4で割ると 15,633,981
    5,662,892 181012_10B_S7_R1_001.trim.non_rRNA.fastq 4で割ると 1,415,723
                                足すと  17,049,704

九大からの情報
 file	#リード数	#トリム済	#rrn_map	#rrn_unmap	%rrn_unmap
 181012_10B_S7	17,054,354	17,048,531	15,372,799	1,675,732	9.83


分離後得られたr-RNAのsamファイル 181012_10B_S7_R1_001.rRNA.sam の先頭を取り出して覗いてみる。
 head -500 181012_10B_S7_R1_001.rRNA.sam > 181012_10B_S7_R1_001.rRNA-500.sam

 @HD	VN:1.0	SO:unsorted										
 @PG	ID:sortmerna	VN:1.0	CL:sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./index/silva-bac-16s-db:./rRNA_databases/silva-bac-23s-id98.fasta,./index/silva-bac-23s-db:./rRNA_databases/rfam-5s-database-id98.fasta,./index/rfam-5s-db:./rRNA_databases/rfam-5.8s-database-id98.fasta,./index/rfam-5.8s-db --reads 181012_10B_S7_R1_001.trim.fastq --sam --num_alignments 1 --fastx --aligned 181012_10B_S7_R1_001.rRNA --other 181012_10B_S7_R1_001.trim.non_rRNA.fastq fastq -v 									
 C00122:247:HLWJLBCX2:1:1101:10535:2128	0	FU8Pinit	117	255	1S47M3S	*	0	0	TATTGCTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGG	DDDDDIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIHIIII	AS:i:79	NM:i:3
 C00122:247:HLWJLBCX2:1:1101:12824:2040	0	FU8Pinit	121	255	49M2S	*	0	0	CTAGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTAC	DDDDDIIIIGIIHIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:78	NM:i:4
 C00122:247:HLWJLBCX2:1:1101:10750:2503	0	FU8Pinit	123	255	47M4S	*	0	0	AGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGA	DDDDDHHIIHIIHIIIIIIIIIIIIIIIIIGIHIIHIIHIIIIGIIIIIID	AS:i:74	NM:i:4
 C00122:247:HLWJLBCX2:1:1101:8130:2894	0	Unc13453	1436	255	5S3M1D43M	*	0	0	CCTACGGTTACCTTGTTTCGACTTCACCCCAGTCATGAATCACAAAGTGGT	DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:82	NM:i:2
 C00122:247:HLWJLBCX2:1:1101:19062:2832	0	FU8Pinit	70	255	36M1D15M	*	0	0	GTGTACAAGGCCCGGGAACGTATTCACCGTGGCATTCTGACCCACGATTAC	DDDDDIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIHIIIIHIIII	AS:i:62	NM:i:8
 C00122:247:HLWJLBCX2:1:1101:6236:3120	0	FU8Pinit	134	255	4S36M11S	*	0	0	CCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGACGCACTT	DDDDDHIIIIIIGHIIIIIIHIIHIHIIIIIIIIIIHGHHIHIIIIIIIII	AS:i:62	NM:i:2
 C00122:247:HLWJLBCX2:1:1101:3119:3328	0	FU8Pinit	81	255	25M1D25M1S	*	0	0	CCGGGAACGTATTCACCGTGGCATTCTGATCCACGATTACTAGCGATTCCG	DDDDDIHIIIIHIIIIIIIIIIIIIIIHIIIIIIIHIIIIIIHIIIIIIIH	AS:i:75	NM:i:5
 C00122:247:HLWJLBCX2:1:1101:6916:3312	0	FU8Pinit	75	255	1S31M1D19M	*	0	0	ACAAGGCCCGGGAACGTATTCACCGTGGCATTCTGATCCACGATTACTAGC	DDDDDIIIIIGHHDHIHIIIIIIIIIIHIGIIIIIIIIIIIIIHIHIHIII	AS:i:70	NM:i:6
 C00122:247:HLWJLBCX2:1:1101:14490:3742	0	Unc13453	1436	255	6S3M1D42M	*	0	0	CCCTACGGTTACCTTGTTCCGACTTCACCCCAGTCATGAATCACAAAGTGG	DDDDDHHGHHHIIIHHIIGHIIIHHGIHIHIFHHHHHH?FHHEHHGHHIII	AS:i:80	NM:i:2
 C00122:247:HLWJLBCX2:1:1101:10644:3794	0	FU8Pinit	75	255	31M1D20M	*	0	0	CAAGGCCCGGGAACGTATTCACCGTGGCATTCTGATCCACGATTACTAGCG	DDDDDHIIIIIHIIIIIHHIIIIIIIIIHIIIIIIIIIIIIIIIHIIIHII	AS:i:72	NM:i:6
 C00122:247:HLWJLBCX2:1:1101:12145:3862	0	FU8Pinit	51	255	49M2S	*	0	0	CCGTGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTGG	DDDDDIIIIIIIIIHHIIIIIIIIIIIIHIIIIIIIIIIIIIIIHIIIIII	AS:i:73	NM:i:5
 C00122:247:HLWJLBCX2:1:1101:14381:3795	0	FU8Pinit	124	255	46M5S	*	0	0	GCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGAC	DDDDDIIIIGHIIIHEHHIGHIIIGIIIIIIIIHHHIIIIIIDIIIIIIII	AS:i:72	NM:i:4
 C00122:247:HLWJLBCX2:1:1101:20932:3891	0	Unc13453	1436	255	5S3M1D43M	*	0	0	CCTACGGTTACCTTGTTCCGACTTCACCCCAGTCATGAATCACAAAGTGGT	DDDDDHHIIIIIHHIHIIHIIIIIIIIIHHIHFHHHHIIHGIHIIIIIIII	AS:i:82	NM:i:2
 C00122:247:HLWJLBCX2:1:1101:20311:4173	0	FU8Pinit	59	255	1S47M1D3M	*	0	0	GTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTGGCATTCTG	<D@DBCHIDHI<CEHIIHEGHIIHIIIGHHIDHH?EHHIGHIFHII?FHGH	AS:i:75	NM:i:5
 C00122:247:HLWJLBCX2:1:1101:9967:4272	0	I6CCervi	5	255	1S43M7S	*	0	0	CCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATC	DDDDDIIHIHIIHHIIIIHIIIIIIIIIIIIIHIIIIIIIIHIIIIIIIII	AS:i:86	NM:i:0
 C00122:247:HLWJLBCX2:1:1101:5980:4705	0	FU8Pinit	52	255	48M3S	*	0	0	CATGGTGTGACGGGCGGTGTGTACAAGGCCCGGGAACGTATTCACCGTGGC	DDDDDIIIIHHIIIIIIIIIIIIIIIIHHIIIIIIIIIIIIIIIIIIIIII	AS:i:76	NM:i:4
 C00122:247:HLWJLBCX2:1:1101:7670:4720	0	FU8Pinit	123	255	47M4S	*	0	0	AGCGATTCCGACTTCATGGAGTCGAGTTGCAGACTCCAATCCGGACTACGA	DDDDDIIIIIHIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIII	AS:i:74	NM:i:4
 C00122:247:HLWJLBCX2:1:1101:14848:4727	0	Unc13453	1436	255	1S3M1D47M	*	0	0	CGGTTACCTTGTTACGACTTCACCCCAGTCATGAATCACAAAGTGGTAAGC	DDDDDIIIIIIIIIIIIIIIHIIIH<EH<GHHIGIIIIIIIIIIHIIIIII	AS:i:90	NM:i:2


non-rRNAのfastqファイルを、マッピングして、どれだけ二重マップがあるか出してみる。
 nohup hisat2 -p 16 --dta-cufflinks -x AP012030.fna -U 181012_10B_S7_R1_001.trim.non_rRNA.fastq -S 181012_10B_S7_R1_001.trim.non_rRNA.sam > 181012_10B_S7_R1_001.trim.non_rRNA.hisat2.log &

logファイルでは
 1415723 reads; of these:
   1415723 (100.00%) were unpaired; of these:
     30855 (2.18%) aligned 0 times
     1366651 (96.53%) aligned exactly 1 time
     18217 (1.29%) aligned >1 times
 97.82% overall alignment rate
ということで、>1 timesが1.29%で、OKになったみたいである。なおこの処理はあっという間に終わった。

トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS