"\u001b[1;32m<ipython-input-257-a450f159b46b>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m 30\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 31\u001b[0m \u001b[1;31m# Write back fastq\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 32\u001b[1;33m \u001b[0mSeqIO\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mwrite\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mseqiter\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0msubstracted\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m\"fastq\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;32m/root/anaconda3/lib/python3.5/site-packages/Bio/SeqIO/__init__.py\u001b[0m in \u001b[0;36mwrite\u001b[1;34m(sequences, handle, format)\u001b[0m\n\u001b[0;32m 481\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mformat\u001b[0m \u001b[1;32min\u001b[0m \u001b[0m_FormatToWriter\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 482\u001b[0m \u001b[0mwriter_class\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_FormatToWriter\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mformat\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 483\u001b[1;33m \u001b[0mcount\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mwriter_class\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfp\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mwrite_file\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msequences\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 484\u001b[0m \u001b[1;32melif\u001b[0m \u001b[0mformat\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mAlignIO\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_FormatToWriter\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 485\u001b[0m \u001b[1;31m# Try and turn all the records into a single alignment,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/root/anaconda3/lib/python3.5/site-packages/Bio/SeqIO/Interfaces.py\u001b[0m in \u001b[0;36mwrite_records\u001b[1;34m(self, records)\u001b[0m\n\u001b[0;32m 194\u001b[0m \u001b[0mcount\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 195\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mrecord\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mrecords\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 196\u001b[1;33m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mwrite_record\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrecord\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 197\u001b[0m \u001b[0mcount\u001b[0m \u001b[1;33m+=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 198\u001b[0m \u001b[1;31m# Mark as true, even if there where no records\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;32m/root/anaconda3/lib/python3.5/site-packages/Bio/SeqIO/QualityIO.py\u001b[0m in \u001b[0;36mwrite_record\u001b[1;34m(self, record)\u001b[0m\n\u001b[0;32m 1429\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_record_written\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mTrue\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1430\u001b[0m \u001b[1;31m# TODO - Is an empty sequence allowed in FASTQ format?\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1431\u001b[1;33m \u001b[1;32mif\u001b[0m \u001b[0mrecord\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mseq\u001b[0m \u001b[1;32mis\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1432\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"No sequence for record %s\"\u001b[0m \u001b[1;33m%\u001b[0m \u001b[0mrecord\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mid\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1433\u001b[0m \u001b[0mseq_str\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mrecord\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mseq\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mAttributeError\u001b[0m: 'str' object has no attribute 'seq'"
# reads with at least one reported alignment: 555 (55.50%)
# reads that failed to align: 445 (44.50%)
Reported 555 alignments to 1 output stream(s)
Time searching: 00:00:00
Overall time: 00:00:00
%% Cell type:markdown id: tags:
## Remove non-codant RNA
Original script burns 416 lines (0-415). Doing so strip the first non-header entry. Is it right ? Here I strip 415 exactly.
This block creates a generator that filters records according to the corresponding sam file with '4' in the field used in the original script. This method takes the bet that the first non-header line in the sam file matches the first line of the fastq. This is ugly. Don't do this. Need refact with browsing the sam file correctly.
The '4' in the flag field of the SAM file means that the read has no reported alignment. In this case, every aligned read means a match with non-codant tRNA index. So we keep only "mismatches" as they represent all reads that don't match with non-codant, in other words the reads we are interested in.
> Sum of all applicable flags. Flags relevant to Bowtie are:
> * 1 The read is one of a pair
> * 2 The alignment is one end of a proper paired-end alignment
> * 4 The read has no reported alignments
> * 8 The read is one of a pair and has no reported alignments
> * 16 The alignment is to the reverse reference strand
> * 32 The other mate in the paired-end alignment is aligned to the reverse reference strand
> * 64 The read is the first (#1) mate in a pair
> * 128 The read is the second (#2) mate in a pair
> Thus, an unpaired read that aligns to the reverse reference strand will have flag 16.
> A paired-end read that aligns and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).