MID is a tool to detect microinversions (MIs) by mapping initially unmapped short reads back onto reference genome sequence. The input file is unmapped BAM file, and the output files contain detailed alignments of each unmapped read with MIs (output_i) and a list of unique MIs (o_inv).
Download MID source code: MID.tar.gz (updated by 12/25/2015 )
64 bit GNU/Linux
GCC 4.0 with Standard C++ Library
Python 2.7
Download MID source code(MID.tar.gz) from http://cqb.pku.edu.cn/ZhuLab/MID
Install bowtie from http://sourceforge.net/projects/bowtie-bio/files/bowtieor download here BowtieBowtie should be in the systems environment variable $ PATH
Get UCSC hg19.fa and pre-built bowtie index of UCSC hg19 fromhttp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/http://bowtie-bio.sourceforge.net/manual.shtmlOr download here hg19.fa BowtieindexExtract index by$ tar -xzvf index.tar.gz
Download cython, pysam and Biopython module of python fromhttps://pypi.python.org/pypi/Cython/http://code.google.com/p/pysam/downloads/listhttp://biopython.org/DISTOr download here cython pysam-0.7 BiopythonInstall the modules by$ tar -xzvf Cython-0.17.4.tar.gz$ cd Cython-0.17.4$ python setup.py install$ tar -xzvf pysam-0.7.tar.gz$ cd pysam-0.7$ python setup.py install$ tar -xzvf biopython-1.66.tar.gz$ cd biopython-1.66$ python setup.py install
Extract MID.tar.gz by $ tar -xzvf MID.tar.gzRun the program by command line$ python MID.py -a unmapped -r hg19.fa -i index -v erranchor -p parallel -s anchor -k kmer -m matchnum -e errkmer -g mergenum -c cutsize[Option]-a/--unmapped unmapped BAM file of 1000 Genomes Project sample (e.g., HG01880.unmapped.ILLUMINA.bwa.ACB.low_coverage.20120522.bam)-r/--reference reference sequence (e.g., hg19.fa)-i/--index bowtie index (e.g., hg19)-v/--erranchor error number in the anchors (default: 1)-p/--parallel number of alignment threads (default: 1)-s/--anchor length of anchors (default: 18)-k/--kmer length of kmers (default: 14)-m/--matchnum number of matching serial (default: 5)-e/--errkmer error number in each kmer (default: 2)-g/--mergenum deviation for merging two subsequences (default: 3)-c/--cutsize length of cutting size (default: 0)
If you want to compile the files by yourself, please remove the previous executable files and compile the files after extracting MID.tar.gz in step(5) by$ make clean$ makeRemove all the files of MID by $ make remove
For HG01880 from 1000 Genomes Project, the command line would be:$ python MID.py -a HG01880.unmapped.ILLUMINA.bwa.ACB.low_coverage.20120522.bam -r hg19.fa -i hg19 -v 1 -p 1 -s 18 -k 14 -m 5 -e 2 -g 3 -c 0
The input file (unmapped BAM file) and output files (output_i, o_inv) are available.Extract input file by $ tar -xzvf input.tar.gzExtract output files by $ tar -xzvf output.tar.gz
The format of each read in output file "output_i" is
The first line is the name of short read, the second line starting with “s” is the reference sequence of the read, and the third and fourth line are alignments on both forward and reverse strand. For the “s” lines, the first column “s” stands for the alignment lines, the second column stands for the chromosome and specie of the reference sequence or the name of the read respectively, the third column stands for the starting point of the following sequence, the fourth column stands for the length of the aligned sequence, the fifth column describes the strand to which the following sequence is aligned (“+” stands for the forward strand, while “-” stands for the reverse strand), the sixth column stands for the size of the entire source sequence, and the last column stands for the aligned sequence.
From here you can search these documents. Enter your search terms below.