噬菌体(phage)是侵袭细菌的病毒,也是赋予宿主菌生物学性状的遗传物质。噬菌体必须在活菌内寄生,有严格的宿主特异性,其取决于噬菌体吸附器官和受体菌表面受体的分子结构和互补性。噬菌体测序过程中会有宿主菌污染,因此组装前需要去除宿主菌序列。
一.bowtie2
文件夹内容如下所示:
1 | ├── index |
1.建立索引
1 | bowtie2-build SG15.fasta index/SG15 |
2.比对
1 | bowtie2 -x index/SG15 -1 SMP1_R1.clean.fq.gz -2 SMP1_R2.clean.fq.gz -S smp1.sam -p30 |
3.去掉比对上的
1 | samtools view -b -f 12 -F 256 smp1.bam > smp1.unmapped.bam |
4.bam转fastq
1 | samtools sort -n smp1.unmapped.bam -O BAM -o smp1.unmapped.sort.bam# samtools根据名字排序 |
5.直接使用bowtie2的–un-conc参数
1 | bowtie2 -p 30 -x index/SG15 -1 SMP1_R1.clean.fq.gz -2 SMP1_R2.clean.fq.gz -S sample1.sam --un-conc uncon_bowtie/sample1.fq |
二.kneaddata
1 | kneaddata -t 20 --input SMP1_R1.clean.fq.gz --input SMP1_R2.clean.fq.gz -db ./index/SG15 --output kneaddata/ --bypass-trim --remove-intermediate-output |
各文件内容如下图官网所示:
1 | kneaddata --input seq1.fastq --input seq2.fastq -db bact_rrna_db -db human_rna_db --output seq_out |
This will output files in the folder seq_out
named:
Files for just the bact_rrna_db
database:
seq_kneaddata_paired_bact_rrna_db_bowtie2_contam_1.fastq
: Reads from the first mate in situation (1) above that were identified as belonging to thebact_rrna_db
database.seq_kneaddata_paired_bact_rrna_db_bowtie2_contam_2.fastq
: Reads from the second mate in situation (1) above that were identified as belonging to thebact_rrna_db
database.seq_kneaddata_paired_bact_rrna_db_bowtie2_clean_1.fastq
: Reads from the first mate in situation (1) above that were identified as NOT belonging to thebact_rrna_db
database.seq_kneaddata_paired_bact_rrna_db_bowtie2_clean_2.fastq
: Reads from the second mate in situation (1) above that were identified as NOT belonging to thebact_rrna_db
database.