NAME Annovar::Wrapper - A wrapper around the annovar annotation pipeline VERSION Version 0.06 SYNOPSIS --vcfs file1.vcf,file2.vcf --annovardb_path /path/to/annovar/dbs This module is a wrapper around the popular annotation tool, annovar. . The commands generated are taken straight from the documentation. In addition, there is an option to reannotate using vcf-annotate from vcftools. It takes as its input a list or directory of vcf files, bgzipped and tabixed or not, and uses annovar to create annotation files. These multianno table files can be optionally reannotated into the vcf file. This script does not actually execute any commands, only writes them to STDOUT for the user to run as they wish. It comes with an executable script This should be sufficient for most of your needs, but if you wish to overwrite methods you can always do so in the usual Moose fashion. #!/usr/bin/env perl package Main; use Moose; extends 'Annovar::Wrapper'; Annovar::Wrapper->new_with_options->run; sub method_to_override { my $self = shift; #dostuff }; before 'method' => sub { my $self = shift; #dostuff }; has '+variable' => ( #things to add to variable declaration ); #or has 'variable' => ( #override variable declaration ); 1; Please see the Moose::Manual::MethodModifiers for more information. Prerequisites This module requires the annovar download. The easiest thing to do is to put the annovar scripts in your ENV{PATH}, but if you choose not to do this you can also pass in the location with --tableannovar_path /path/to/ --convert2annovar_path /path/to/ It requires, which comes with vcftools. Vcftools is publicly available for download. export PERL5LIB=$PERL5LIB:path_to_vcftools/perl If you wish to you reannotate the vcf file you need to have bgzip and tabix installed, and have the executables in vcftools in your path. export PATH=$PATH:path_to_vcftools Generate an Example To generate an example you can run the following commands tabix -h 2:39967768-40000000 > test.vcf bgzip test.vcf tabix test.vcf.gz vcf-subset -c HG00098,HG00100,HG00106,HG00112,HG00114 test.vcf.gz | bgzip -c > out.vcf.gz tabix out.vcf.gz rm test.vcf.gz rm test.vcf.gz.tbi --vcfs out.vcf.gz --annovar_dbs refGene --annovar_fun g --outdir annovar_out --annovardb_path /path/to/annovar/dbs > There is more detail on the example in the pod files. Variables Annovar Options tableannovar_path You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location. convert2annovar_path You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location annovardb_path Path to your annovar databases buildver Its probably hg19 or hg18 convert2annovar_opts Assumes vcf version 4 and that you want to convert all samples Not using --allsample on a multisample vcf is untested and will probably break the whole pipeline annovar_dbs These are pretty much all the databases listed on for hg19 that I tested as working #Download databases with cd path_to_annovar_dir ./ --buildver hg19 -downdb -webfrom annovar esp6500si_aa hg19/ #Option is an ArrayRef, and can be given as either --annovar_dbs cg46,cg69,nci60 #or --annovar_dbs cg46 --annovar_dbs cg69 --annovar_dbs nci60 annovar_fun Functions of the individual databases can be found at What function your DB may already be listed otherwise it is probably listed in the URLS under Annotation: Gene-Based, Region-Based, or Filter-Based Functions must be given in the corresponding order of your annovar_dbs #Option is an ArrayRef, and can be given as either --anovar_fun f,f,g #or --annovar_fun f --annovar_fun f --annovar_fun g annovar_cols Some database annotations generate multiple columns. For reannotating the vcf we need to know what these columns are. Below are the columns generated for the databases given in annovar_dbs To add give a hashref of array Wrapper Options indir A path to your vcf files can be given, and using File::Find::Rule it will recursively search for vcf or vcf.gz vcfs VCF files can be given individually as well. #Option is an ArrayRef and can be given as either --vcfs 1.vcf,2.vcf,3.vcfs #or --vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf Don't mix the methods outdir Path to write out annotation files. It creates the structure outdir --annovar_interim --annovar_final --vcf-annotate_interim #If you choose to reannotate VCF file --vcf-annotate_final #If you choose to reannotate VCF file A lot of interim files are created by annovar, and the only one that really matters unless you debugging a new database is the multianno file found in annovar_final If not given the outdirectory is assumed to be the current working directory. annotate_vcf Use vcf-annotate from VCF tools to annotate the VCF file This does not overwrite the original VCF file, but instead creates a new one To turn this off --annotate_vcf 0 SUBROUTINES/METHODS run Subroutine that starts everything off print_opts Print out the command line options check_files Check to make sure either an indir or vcfs are supplied find_vcfs Use File::Find::Rule to find the vcfs parse_commands Allow for giving ArrayRef either in the usual fashion or with commas write_annovar Write the commands that Convert the vcf file to annovar input Do the annotations Reannotate the vcf - if you want get_samples Using VCF tools get the samples listed per vcf file Supports files that are bgzipped or not Sample names are stripped of all non alphanumeric characters. convert_annovar Print out the command to print the convert2annovar commands table_annovar Print out the commands to generate the annotation using command. vcf_annotate Generate the commands to annotate the vcf file using vcf-annotate gen_descr Bgzip, tabix, all of vcftools, and sort must be in your PATH for these to work. There are two parts to this command. The first prepares the annotation file. 1. The annotation file is backed up just in case 2. The annotation file is sorted, because I had some problems with sorting 3. The annotation file is bgzipped, as required by vcf-annotate 4. The annotation file is tabix indexed using the special commands -s 1 -b 2 -e 3 The second writes out the vcf-annotate commands Example with RefGene zcat ../../variants.vcf.gz | vcf-annotate -a sorted.annotation.gz \ -d key=INFO,ID=SAMPLEID_Func_refGene,Number=0,Type=String,Description='SAMP LEID Annovar Func_refGene' \ -d key=INFO,ID=SAMPLEID_Gene_refGene,Number=0,Type=String,Description='SAMP LEID Annovar Gene_refGene' \ -d key=INFO,ID=SAMPLEID_ExonicFun_refGene,Number=0,Type=String,Description= 'SAMPLEID Annovar ExonicFun_refGene' \ -d key=INFO,ID=SAMPLEID_AAChange_refGene,Number=0,Type=String,Description=' SAMPLEID Annovar AAChange_refGene' \ -c CHROM,FROM,TO,-,-,INFO/SAMPLEID_Func_refGene,INFO/SAMPLEID_Gene_refGene, INFO/SAMPLEID_ExonicFun_refGene,INFO/SAMPLEID_AAChange_refGene > SAMPLEID.annotated.vcf gen_cols Generate the -c portion of the vcf-annotate command merge_vcfs There is one vcf-annotated file per sample, so merge those at the the end to get a multisample file using vcf-merge subset_vcfs vcf-merge used in this fashion will create a lot of redundant columns, because it wants to assume all sample names are unique Straight from the vcftools documentation vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz AUTHOR Jillian Rowe, "" BUGS Please report any bugs or feature requests to "bug-annovar-wrapper at", or through the web interface at . ACKNOWLEDGEMENTS This module is a wrapper around the well developed annovar pipeline. The commands come straight from the documentation. This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team and LOTS of input and scientist debugging by Khalid Fahkro. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude. 