[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
]
開始行:
[[Pythonバイオ]] [[Pythonバイオ/ツール]]~
&counter(); &lastmod();~
*AP012030 [#ede45d10]
**HISAT2-build [#sa5eecf1]
hisat2-build ~/1710JNHX-0008/rawdata/AP012030-new.fasta AP012030-new-build.fna
Settings:
Output files: "AP012030-new-build.fna.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/home/yamanouc/1710JNHX-0008/rawdata/AP012030-new.fasta
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 866518 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 866518 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:01
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 577678 (target: 866517)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (866518) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
bucket 1: 10%
bucket 1: 20%
bucket 1: 30%
bucket 1: 40%
bucket 1: 50%
bucket 1: 60%
bucket 1: 70%
bucket 1: 80%
bucket 1: 90%
bucket 1: 100%
Sorting block of length 632333 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 632334 for bucket 1
Getting block 2 of 8
Reserving size (866518) for bucket 2
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 2:
bucket 2: 10%
bucket 2: 20%
bucket 2: 30%
bucket 2: 40%
bucket 2: 50%
bucket 2: 60%
bucket 2: 70%
bucket 2: 80%
bucket 2: 90%
bucket 2: 100%
Sorting block of length 695911 for bucket 2
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 695912 for bucket 2
Getting block 3 of 8
Reserving size (866518) for bucket 3
Calculating Z arrays for bucket 3
Entering block accumulator loop for bucket 3:
bucket 3: 10%
bucket 3: 20%
bucket 3: 30%
bucket 3: 40%
bucket 3: 50%
bucket 3: 60%
bucket 3: 70%
bucket 3: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 741622 for bucket 3
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 741623 for bucket 3
Getting block 4 of 8
Reserving size (866518) for bucket 4
Calculating Z arrays for bucket 4
Entering block accumulator loop for bucket 4:
bucket 4: 10%
bucket 4: 20%
bucket 4: 30%
bucket 4: 40%
bucket 4: 50%
bucket 4: 60%
bucket 4: 70%
bucket 4: 80%
bucket 4: 90%
bucket 4: 100%
Sorting block of length 585117 for bucket 4
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 585118 for bucket 4
Getting block 5 of 8
Reserving size (866518) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
bucket 5: 70%
bucket 5: 80%
bucket 5: 90%
bucket 5: 100%
Sorting block of length 318349 for bucket 5
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 318350 for bucket 5
Getting block 6 of 8
Reserving size (866518) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 6: 10%
bucket 6: 20%
bucket 6: 30%
bucket 6: 40%
bucket 6: 50%
bucket 6: 60%
bucket 6: 70%
bucket 6: 80%
bucket 6: 90%
bucket 6: 100%
Sorting block of length 719612 for bucket 6
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 719613 for bucket 6
Getting block 7 of 8
Reserving size (866518) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 7: 10%
bucket 7: 20%
bucket 7: 30%
bucket 7: 40%
bucket 7: 50%
bucket 7: 60%
bucket 7: 70%
bucket 7: 80%
bucket 7: 90%
bucket 7: 100%
Sorting block of length 283374 for bucket 7
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 283375 for bucket 7
Getting block 8 of 8
Reserving size (866518) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 8: 10%
bucket 8: 20%
bucket 8: 30%
bucket 8: 40%
bucket 8: 50%
bucket 8: 60%
bucket 8: 70%
bucket 8: 80%
bucket 8: 90%
bucket 8: 100%
Sorting block of length 645105 for bucket 8
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 645106 for bucket 8
Exited GFM loop
fchr[A]: 0
fchr[C]: 1136926
fchr[G]: 2313079
fchr[T]: 3485960
fchr[$]: 4621430
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 5735025 bytes to primary GFM file: AP012030-new-build.fna.1.ht2
Wrote 1155364 bytes to secondary GFM file: AP012030-new-build.fna.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 2035637 bytes to primary GFM file: AP012030-new-build.fna.5.ht2
Wrote 1176518 bytes to secondary GFM file: AP012030-new-build.fna.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 4621430
gbwtLen: 4621431
nodes: 4621431
sz: 1155358
gbwtSz: 1155358
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 288840
offsSz: 1155360
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 24070
numLines: 24070
gbwtTotLen: 1540480
gbwtTotSz: 1540480
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:00:03
**HISAT2 [#d718db4c]
hisat2 -p 32 -x AP012030-new-build.fna -1 ~/1710JNHX-0008/rawdata/Anc/Anc_R1.fastq.gz
-2 ~/1710JNHX-0008/rawdata/Anc/Anc_R2.fastq.gz -S Anc.sam
47567149 reads; of these:
47567149 (100.00%) were paired; of these:
2046602 (4.30%) aligned concordantly 0 times
44840697 (94.27%) aligned concordantly exactly 1 time
679850 (1.43%) aligned concordantly >1 times
----
2046602 pairs aligned concordantly 0 times; of these:
30402 (1.49%) aligned discordantly 1 time
----
2016200 pairs aligned 0 times concordantly or discordantly; of these:
4032400 mates make up the pairs; of these:
2317731 (57.48%) aligned 0 times
1669267 (41.40%) aligned exactly 1 time
45402 (1.13%) aligned >1 times
97.56% overall alignment rate
** [#e6bc0543]
GeneBankフォーマットをGFF3フォーマットに変換するプログラム gb2gff3.pyを作成する。
#Converting other formats to GFF3
from BCBio import GFF
from Bio import SeqIO
#in_file = "your_file.gb"
#out_file = "your_file.gff"
in_file = "/home/yamanouc/1710JNHX-0008/rawdata/AP012030.gb"
out_file = "AP012030.gff"
in_handle = open(in_file)
out_handle = open(out_file, "w")
GFF.write(SeqIO.parse(in_handle, "genbank"), out_handle)
in_handle.close()
out_handle.close()
これを実行する。
python gb2gff3.py
できたファイルAP012030.gffに対して add_gene_id を実行する。
python /usr/local/RNAseq/add_gene_id AP012030.gff AP012030_e.gff
データ側の準備は、samtoolsを使ってsamファイルAnc.samをbamファイルAnc.bamに変換し、かつ、positionでsortする。
samtools viewを使ってsamファイルをbamファイルに変換。@32は32スレッド使用。
samtools view -@ 32 -b Anc.sam -o Anc.bam
samtools sortを使ってbamファイルAnc.bam をソートしてAnc.sorted.bamに出力
samtools sort -@ 32 Anc.bam > Anc.sorted.bam
GFF変換で得られたAP012030_e.gffを使って、AncデータAnc.sorted.bamをfeatureCountする。
featureCounts -p -T 32 -t exon -g gene_id -a AP012030_e.gff -o Anc.counts.txt Anc.sorted.bam
終了行:
[[Pythonバイオ]] [[Pythonバイオ/ツール]]~
&counter(); &lastmod();~
*AP012030 [#ede45d10]
**HISAT2-build [#sa5eecf1]
hisat2-build ~/1710JNHX-0008/rawdata/AP012030-new.fasta AP012030-new-build.fna
Settings:
Output files: "AP012030-new-build.fna.*.ht2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/home/yamanouc/1710JNHX-0008/rawdata/AP012030-new.fasta
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to read SNPs and splice sites: 00:00:00
Using parameters --bmax 866518 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 866518 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:01
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 577678 (target: 866517)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering GFM loop
Getting block 1 of 8
Reserving size (866518) for bucket 1
Calculating Z arrays for bucket 1
Entering block accumulator loop for bucket 1:
bucket 1: 10%
bucket 1: 20%
bucket 1: 30%
bucket 1: 40%
bucket 1: 50%
bucket 1: 60%
bucket 1: 70%
bucket 1: 80%
bucket 1: 90%
bucket 1: 100%
Sorting block of length 632333 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 632334 for bucket 1
Getting block 2 of 8
Reserving size (866518) for bucket 2
Calculating Z arrays for bucket 2
Entering block accumulator loop for bucket 2:
bucket 2: 10%
bucket 2: 20%
bucket 2: 30%
bucket 2: 40%
bucket 2: 50%
bucket 2: 60%
bucket 2: 70%
bucket 2: 80%
bucket 2: 90%
bucket 2: 100%
Sorting block of length 695911 for bucket 2
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 695912 for bucket 2
Getting block 3 of 8
Reserving size (866518) for bucket 3
Calculating Z arrays for bucket 3
Entering block accumulator loop for bucket 3:
bucket 3: 10%
bucket 3: 20%
bucket 3: 30%
bucket 3: 40%
bucket 3: 50%
bucket 3: 60%
bucket 3: 70%
bucket 3: 80%
bucket 3: 90%
bucket 3: 100%
Sorting block of length 741622 for bucket 3
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 741623 for bucket 3
Getting block 4 of 8
Reserving size (866518) for bucket 4
Calculating Z arrays for bucket 4
Entering block accumulator loop for bucket 4:
bucket 4: 10%
bucket 4: 20%
bucket 4: 30%
bucket 4: 40%
bucket 4: 50%
bucket 4: 60%
bucket 4: 70%
bucket 4: 80%
bucket 4: 90%
bucket 4: 100%
Sorting block of length 585117 for bucket 4
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 585118 for bucket 4
Getting block 5 of 8
Reserving size (866518) for bucket 5
Calculating Z arrays for bucket 5
Entering block accumulator loop for bucket 5:
bucket 5: 10%
bucket 5: 20%
bucket 5: 30%
bucket 5: 40%
bucket 5: 50%
bucket 5: 60%
bucket 5: 70%
bucket 5: 80%
bucket 5: 90%
bucket 5: 100%
Sorting block of length 318349 for bucket 5
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 318350 for bucket 5
Getting block 6 of 8
Reserving size (866518) for bucket 6
Calculating Z arrays for bucket 6
Entering block accumulator loop for bucket 6:
bucket 6: 10%
bucket 6: 20%
bucket 6: 30%
bucket 6: 40%
bucket 6: 50%
bucket 6: 60%
bucket 6: 70%
bucket 6: 80%
bucket 6: 90%
bucket 6: 100%
Sorting block of length 719612 for bucket 6
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 719613 for bucket 6
Getting block 7 of 8
Reserving size (866518) for bucket 7
Calculating Z arrays for bucket 7
Entering block accumulator loop for bucket 7:
bucket 7: 10%
bucket 7: 20%
bucket 7: 30%
bucket 7: 40%
bucket 7: 50%
bucket 7: 60%
bucket 7: 70%
bucket 7: 80%
bucket 7: 90%
bucket 7: 100%
Sorting block of length 283374 for bucket 7
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 283375 for bucket 7
Getting block 8 of 8
Reserving size (866518) for bucket 8
Calculating Z arrays for bucket 8
Entering block accumulator loop for bucket 8:
bucket 8: 10%
bucket 8: 20%
bucket 8: 30%
bucket 8: 40%
bucket 8: 50%
bucket 8: 60%
bucket 8: 70%
bucket 8: 80%
bucket 8: 90%
bucket 8: 100%
Sorting block of length 645105 for bucket 8
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 645106 for bucket 8
Exited GFM loop
fchr[A]: 0
fchr[C]: 1136926
fchr[G]: 2313079
fchr[T]: 3485960
fchr[$]: 4621430
Exiting GFM::buildToDisk()
Returning from initFromVector
Wrote 5735025 bytes to primary GFM file: AP012030-new-build.fna.1.ht2
Wrote 1155364 bytes to secondary GFM file: AP012030-new-build.fna.2.ht2
Re-opening _in1 and _in2 as input streams
Returning from GFM constructor
Returning from initFromVector
Wrote 2035637 bytes to primary GFM file: AP012030-new-build.fna.5.ht2
Wrote 1176518 bytes to secondary GFM file: AP012030-new-build.fna.6.ht2
Re-opening _in5 and _in5 as input streams
Returning from HierEbwt constructor
Headers:
len: 4621430
gbwtLen: 4621431
nodes: 4621431
sz: 1155358
gbwtSz: 1155358
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 0
eftabSz: 0
ftabLen: 1048577
ftabSz: 4194308
offsLen: 288840
offsSz: 1155360
lineSz: 64
sideSz: 64
sideGbwtSz: 48
sideGbwtLen: 192
numSides: 24070
numLines: 24070
gbwtTotLen: 1540480
gbwtTotSz: 1540480
reverse: 0
linearFM: Yes
Total time for call to driver() for forward index: 00:00:03
**HISAT2 [#d718db4c]
hisat2 -p 32 -x AP012030-new-build.fna -1 ~/1710JNHX-0008/rawdata/Anc/Anc_R1.fastq.gz
-2 ~/1710JNHX-0008/rawdata/Anc/Anc_R2.fastq.gz -S Anc.sam
47567149 reads; of these:
47567149 (100.00%) were paired; of these:
2046602 (4.30%) aligned concordantly 0 times
44840697 (94.27%) aligned concordantly exactly 1 time
679850 (1.43%) aligned concordantly >1 times
----
2046602 pairs aligned concordantly 0 times; of these:
30402 (1.49%) aligned discordantly 1 time
----
2016200 pairs aligned 0 times concordantly or discordantly; of these:
4032400 mates make up the pairs; of these:
2317731 (57.48%) aligned 0 times
1669267 (41.40%) aligned exactly 1 time
45402 (1.13%) aligned >1 times
97.56% overall alignment rate
** [#e6bc0543]
GeneBankフォーマットをGFF3フォーマットに変換するプログラム gb2gff3.pyを作成する。
#Converting other formats to GFF3
from BCBio import GFF
from Bio import SeqIO
#in_file = "your_file.gb"
#out_file = "your_file.gff"
in_file = "/home/yamanouc/1710JNHX-0008/rawdata/AP012030.gb"
out_file = "AP012030.gff"
in_handle = open(in_file)
out_handle = open(out_file, "w")
GFF.write(SeqIO.parse(in_handle, "genbank"), out_handle)
in_handle.close()
out_handle.close()
これを実行する。
python gb2gff3.py
できたファイルAP012030.gffに対して add_gene_id を実行する。
python /usr/local/RNAseq/add_gene_id AP012030.gff AP012030_e.gff
データ側の準備は、samtoolsを使ってsamファイルAnc.samをbamファイルAnc.bamに変換し、かつ、positionでsortする。
samtools viewを使ってsamファイルをbamファイルに変換。@32は32スレッド使用。
samtools view -@ 32 -b Anc.sam -o Anc.bam
samtools sortを使ってbamファイルAnc.bam をソートしてAnc.sorted.bamに出力
samtools sort -@ 32 Anc.bam > Anc.sorted.bam
GFF変換で得られたAP012030_e.gffを使って、AncデータAnc.sorted.bamをfeatureCountする。
featureCounts -p -T 32 -t exon -g gene_id -a AP012030_e.gff -o Anc.counts.txt Anc.sorted.bam
ページ名: