Bio::ViennaNGS 0.12
==============

Bio::ViennaNGS is a distribution of Perl modules and utilities for
building efficient Next-Generation Sequencing (NGS) analysis
pipelines. It covers various aspects of NGS data analysis, including
(but not limited to) conversion of sequence annotation, evaluation of
mapped data, expression quantification and visualization.

Bio::ViennaNGS is shipped with a complementary set of modules/classes:

- Bio::ViennaNGS::AnnoC
  A Moose interface for storage and conversion of sequence annotation
  data.

- Bio::ViennaNGS::Bam
  Routines for high-level manipulation of BAM files.

- Bio::ViennaNGS::BamStat
  A Moose-based class for collecting mapping statistics.

- Bio::ViennaNGS::BamStatSummary
  A Moose interface for processing Bio::ViennaNGS::BamStat objects on
  a set of BAM files.

- Bio::ViennaNGS::Bed 
  A Moose interface for manipulation of genomic interval data in BED
  format.

- Bio::ViennaNGS::Expression
  A Moose interface for computation of normalized gene / transcript
  expression based on read counts.

- Bio::ViennaNGS::ExtFeature
  A Moose wrapper for extended BED6 elements.

- Bio::ViennaNGS::Fasta
  Routines for accessing genomic sequences implemented through a Moose
  interface to Bio::DB::Fasta.

- Bio::ViennaNGS::Feature
  A Moose-based BED6 wrapper.  

- Bio::ViennaNGS::FeatureChain
  Yet another Moose class for chaining gene annotation features. 

- Bio::ViennaNGS::FeatureLine
  An abstract Moose class for combining several
  Bio::ViennaNGS::FeatureChain objects.

- Bio::ViennaNGS::MinimalFeature
  A Moose interface for handling elementary gene annotation. 

- Bio::ViennaNGS::SpliceJunc
  A collection of routines for alternative splicing analysis.  

- Bio::ViennaNGS::Tutorial
  A comprehensive tutorial of the Bio::ViennaNGS core routines with
  real-world NGS data.

- Bio::ViennaNGS::UCSC
  Routines for visualization of genomics data with the UCSC genome
  browser.

- Bio::ViennaNGS::Util
  A collection of wrapper routines for commonly used third-party NGS
  utilities, code for normalization of gene expression values based on
  read count data and a set of utility functions.


UTILITIES

In addition, Bio::ViennaNGS comes with a collection of utility
programs for accomplishing routine tasks often required in NGS data
processing. These utilities serve as reference implementation of the
routines implemented in the (sub)modules:

assembly_hub_constructor.pl:
The UCSC genome browser offers the possibility to visualize any
organism (including organisms that are not included in the standard
UCSC browser bundle) through hso called 'Assembly Hubs'. This script
constructs Assembly Hubs from genomic sequence and annotation data.

bam_quality_stat.pl:
Compute sophisticated quality/mapping statistics for a set of BAM
alignment files and produce publication-ready graphics.

bam_split.pl: 
Split (paired-end and single-end) BAM alignment files by strand and compute
statistics. Optionally create BED output, as well as normalized bedGraph
and bigWig files for coverage visualization in genome browsers (see
dependencies on third-patry tools below).

bam_to_bigWig.pl:
Produce bigWig coverage profiles from (aligned) BAM files, explicitly
considering strandedness. The most natural use case of this tool is to
create strand-aware coverage profiles in bigWig format for genome browser
visualization.

bam_uniq.pl:
Extract unique and multi mapping reads from BAM alignment files and create
a separate BAM file for both uniqe (.uniq.) and multi (.mult.) mappers.

bed2bedGraph.pl:
Convert BED files to (strand specific) bedGraph files, allowing
additional annotation and automatic generation of bedGraph files which can
easily be converted to big-type files for easy UCSC visualization.

extend_bed.pl:
Extend genomic features in BED files by a certain number of nucleotides,
either on both sides or specifically at the 5' or 3' end, respectively.

gff2bed.pl:
Convert RefSeq GFF3 annotation files to BED12 format. Individual BED12
files are created for each feature type (CDS/tRNA/rRNA/etc.). Tested with
RefSeq bacterial GFF3 annotation.  

kmer_analysis.pl:
Count k-mers of predefined length in FastQ and Fasta files

MEME_xml_motif_extractor.pl:
Compute simple statistics from MEME XML output and return a list of found
motifs with the number of sequences containing those motifs as well as nice
ggplot graphs.

newUCSCdb.pl:
Create a new genome database to a locally installed instance of the UCSC
genome browser in order to add a novel organism for visualization.

normalize_multicov.pl: 
Compute normalized expression data in TPM from (raw) read counts in
bedtools multicov format. TPM reference: Wagner et al, Theory
Biosci. 131(4), pp 281-85 (2012)

sj_visualizer.pl:
Convert splice junctions from mapped RNA-seq data in segemehl BED6 splice
junction format to BED12 for easy visualization in genome browsers.

splice_site_summary.pl:
Identify and characterize splice junctions from RNA-seq data by
intersecting them with annotated splice junctions.

track_hub_constructor.pl:
Analogous to assembly_hub_constructor.pl, construct a Track Hub for an
organism listed in the UCSC Genome Browser.

trim_fastq.pl:
Trim sequence and quality string fields in a Fastq file by user
defined length.

TUTORIAL

See Bio::ViennaNGS::Tutorial 

INSTALLATION

To install this module type the following:

   perl Makefile.PL
   make
   make test
   make install


DEPENDENCIES

Bio::ViennaNGS and its submodules require a set of third-party libraries
and packages that are not part of the Perl core distribution:

  Bio::Perl >= 1.00690001
  Bio::DB::Sam >= 1.37
  Bio::DB::Fasta
  Bio::Tools::GFF
  File::Share	
  File::Path
  Template
  Moose
  Moose::Util::TypeConstraints
  Path::Class
  namespace::autoclean
  Tie::Hash::Indexed
  MooseX::Clone
  MooseX::InstanceTracking

The modules listed below are only reuqired by some of the utilities. You
will only need to install them if you intend to use the reference
utilities:

  PerlIO::gzip
  Math::Round
  XML::Simple
  Statistics::R

Computation of BigWig files from BAM is accomplished via two calls to
third-party tools: 'genomeCoverageBed' from the BEDtools suite
(http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html) is
used to create a BedGraph coverage file at first place. In a second step,
the BedGraph is converted to BigWig by 'bedGraphToBigWig', a UCSC Genome
Browser utility (http://hgdownload.cse.ucsc.edu/admin/exe/). Please ensure
that all third-party utilities are available on your system and accessible
to the Perl interpreter.


SOURCE AVAILABILITY

This source is available on Github: 
              https://github.com/mtw/Bio-ViennaNGS


PAPERS

If the Bio::ViennaNGS suite is useful for your work or if you use any
component of ViennaNGS in a custom pipeline, please cite the following
publication:

"ViennaNGS - A toolbox for building efficient next-generation sequencing
analysis pipelines"
Michael T. Wolfinger, Joerg Fallmann, Florian Eggenhofer and Fabian Amman
bioRxiv doi: http://dx.doi.org/10.1101/013011


NOTES

The Bio::ViennaNGS suite is actively developed and tested on different
flavours of Linux and Mac OS X. We have taken care of writing
platform-independent code that should run out of the box on most UNIX-based
systems, however we do not have access to machines running Microsoft
Windows. As such, we have not tested and will not test Windows
compatibility.


BUGS

Please report bugs through the Github issue tracker at
https://github.com/mtw/Bio-ViennaNGS/issues


AUTHORS

Michael T. Wolfinger <michael@wolfinger.eu>
Joerg Fallmann <fall@tbi.univie.ac.at>
Florian Eggenhofer <florian.eggenhofer@tbi.univie.ac.at>
Fabian Amman <fabian@tbi.univie.ac.at>


COPYRIGHT AND LICENCE

Copyright (C) 2014-2015 Michael T. Wolfinger <michael@wolfinger.eu>

This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself, either Perl version 5.10.0 or, at your
option, any later version of Perl 5 you may have available.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.