Bio::ViennaNGS ============== Bio::ViennaNGS is a collection of Perl modules and utilities for building efficient Next-Generation Sequencing (NGS) analysis pipelines. It covers various aspects of NGS data analysis, including (but not limited to) conversion of sequence annotation, evaluation of mapped data, expression quantification and visualization. Bio::ViennaNGS is shipped with the following (sub)modules: - Bio::ViennaNGS::Fasta Routines for accessing genomic sequences implemented through a Moose interface to Bio::DB::Fasta. - Bio::ViennaNGS::AnnoC A Moose interface for storage and conversion of sequence annotation data. - Bio::ViennaNGS::SpliceJunc A collection of routines for alternative splicing analysis. - Bio::ViennaNGS::UCSC (will be provided in an upcoming release) Routines for visualization of genomics data with the UCSC genome browser. In addition, Bio::ViennaNGS comes with a set of utility programs for accomplishing routine tasks often required in NGS data processing. These utilities serve as reference implementation of the routines implemented in the (sub)modules: bam_split.pl: Split (paired-end and single-end) BAM alignment files by strand and compute statistics. Optionally create BED output, as well as normalized bedGraph and bigWig files for coverage visualization in genome browsers (see dependencies on third-patry tools below). bam_to_bigWig.pl: Produce bigWig coverage profiles from (aligned) BAM files, explicitly considering strandedness. The most natural use case of this tool is to create strand-aware coverage profiles in bigWig format for genome browser visualization. bam_uniq.pl: Extract unique and multi mapping reads from BAM alignment files and create a separate BAM file for both uniqe (.uniq.) and multi (.mult.) mappers. bed2bedGraph.pl: Convert BED files to (strand specific) bedGraph files, allowing additional annotation and automatic generation of bedGraph files which can easily be converted to big-type files for easy UCSC visualization. extend_bed.pl: Extend genomic features in BED files by a certain number of nucleotides, either on both sides or specifically at the 5' or 3' end, respectively. gff2bed.pl: Convert RefSeq GFF3 annotation files to BED12 format. Individual BED12 files are created for each feature type (CDS/tRNA/rRNA/etc.). Tested with RefSeq bacterial GFF3 annotation. kmer_analysis.pl: Count k-mers of predefined length in FastQ and Fasta files MEME_XML_motif_extractor.pl: Compute simple statistics from MEME XML output and return a list of found motifs with the number of sequences containing those motifs as well as nice ggplot graphs. motiffinda.pl: Find motifs in annotated sequence features. The motif can be provided as regular expression. newUCSCdb.pl: Create a new genome database (ie. add a novel organism) in a local instance of the UCSC genome browser. normalize_multicov.pl: Compute normalized expression data in TPM/RPKM from (raw) read counts in bedtools multicov format. TPM reference: Wagner et al, Theory Biosci. 131(4), pp 281-85 (2012) sj_visualizer.pl: Convert splice junctions from mapped RNA-seq data in segemehl BED6 splice junction format to BED12 for easy visualization in genome Browsers. splice_site_summary.pl: Identify and characterize splice junctions from RNA-seq data by intersecting them with annotated splice junctions. newUCSCdb.pl: Create a new genome database for a locally installed instance of the UCSC genome browser. Based on http://genomewiki.ucsc.edu/index.php/Building_a_new_genome_database trim_fastq.pl: Trim sequence and quality string fields in a Fastq file by user defined length. INSTALLATION To install this module type the following: perl Makefile.PL make make test make install DEPENDENCIES Bio::ViennaNGS and its submodules require a set of third-party libraries and packages that are not part of the Perl core distribution: Bio::Perl >= 1.00690001 Bio::DB::Sam >= 1.39 Bio::DB::Fasta Bio::Tools::GFF Moose Moose::Util::TypeConstraints Path::Class namespace::autoclean The modules listed below are only reuqired by some of the utilities. You will only need to install them if you intend to use the reference utilities: PerlIO::gzip Math::Round XML::Simple Statistics::R Computation of BigWig files from BAM is accomplished via two calls to third-party tools: 'genomeCoverageBed' from the BEDtools suite (http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html) is used to create a BedGraph coverage file at first place. In a second step, the BedGraph is converted to BigWig by 'bedGraphToBigWig', a UCSC Genome Browser utility (http://hgdownload.cse.ucsc.edu/admin/exe/). Please ensure that all third-party utilities are available on your system and accessible to the Perl interpreter. SOURCE AVAILABILITY This source is available on Github: https://github.com/mtw/Bio-ViennaNGS AUTHORS Michael T. Wolfinger Joerg Fallmann Florian Eggenhofer Fabian Amman COPYRIGHT AND LICENCE Copyright (C) 2014 Michael T. Wolfinger This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.4 or, at your option, any later version of Perl 5 you may have available. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.