User Contributed Perl Documentation CompBio(3) NNNNAAAAMMMMEEEE CompBio - Core library for some basic methods useful in computational biology/bioinformatics. SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS use CompBio; my $cbc = new->CompBio; DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN The CompBio module set is _not_ intended to replace the bioperl project (http://www.bioperl.org/). Although I welcome suggestions for improving or adding to the methods available in these modules, particularly I would love any help with things on the TO DO list, these modules are not intended to provide the debth that the bioperl suite can provide. These modules grew out of a set we have used at the BMERC(http://bmerc-www.bu.edu) and worked with for years. Originally developed at the BioMolecular Engineering Research Center (http://bmerc-www.bu.edu), these modules and utilities grew out of a set we have used and worked with for years. CompBio.pm is intended to take a number of small, commonly used methods, and make them into a single package. Many of the utils are just command line interfaces to these methods. To get the most out of this set I highly recomend installing CompBio::DB.pm and either importing our databases (ftp://mcclintock.bu.edu/BMERC/mysql/) or adapting it to your local needs. Suggestions for improving portability are welcome! The early versions of this module assumed installation on our local system. Although I have tried to correct this in the current version, you may find this package requires a litle twidling to get working. I'll try to leave comments where I think it is most likely, but hopefully use of a relational database and local setting changes in the globals will have taken care of it. If not _please_ email me at seanq@darwin.bu.edu with the details. CompBio has a limited API. It expects it's input to be in specific formats, as described in each methods docs, and it's output is in a format that makes the most sense to me for that method. It does no error checking by and large, so incorrect input could cause bizzare behavior and/or a noisy death. If you want a flexible interface with lots of error checking and deep levels of vebosity, use the CompBio::Simple manpage - that's its job. Thanks! 2001-09-11 Last change: perl v5.6.0 1 User Contributed Perl Documentation CompBio(3) Other modules available (or that will be available) in the CompBio set are: The DB module will only be imediately useful if you import the databases as used by us here at the BMERC(ftp://mcclintock.bu.edu/BMERC/mysql/) or develop your own on the same basic design scheme. Otherwise I hope you find it useful as a source of design ideas for rolling your own. The Profile module was designed to work with our PIMA-II sofware and the PIMA modules. The PIMA suite is available for license from Boston University for a nominal fee, and free for academic use. For examples and more info see http://bmerc-www.bu.edu/PIMA/. Unless you have or are interested in that package, this module will have no functional value. MMMMeeeetttthhhhooooddddssss You may note that the majority of the methods here are for converting sequences from one format to another. Mainly this is for converting other formats to table format, which is used by most of these programs. This is not meant to be a comprhensive collection of format guessing and transformation methods. If you are looking for a converter for a format CompBio doesn't handle, I suggest you look into bioperls SeqIO package or the READSEQ program (java), found at http://iubio.bio.indiana.edu/soft/molbio/readseq/ nnnneeeewwww Construct an object for invoking methods in CompBio. hhhheeeellllpppp Quits current application and uses perldoc to display the POD. cccchhhheeeecccckkkk____ttttyyyyppppeeee Checks a given sequence or set of sequences for it's type. Currently groks fasta(.fa), table(.tbl), raw genome(.raw), intelligenics[?](.ig) and coding dna sequence(.cdna) types. Each index of the referenced array should be an entire sequence record. * It is however nnnnooootttt recomended that you load up an entire raw genome into memory to do this test - see the perlfunc read entry elsewhere in this document * `$type = check_type(\@seqs,%parameters);' Possible return types are CDNA, TBL, FA, IG, RAW and UNKNOWN. 2001-09-11 Last change: perl v5.6.0 2 User Contributed Perl Documentation CompBio(3) Be warned, this is intended only as a quick check and only uses as many records as necisarry in the reference provided, stating with the first. check_type assumes the rest of the records look the same and does not do any kind of deep QA on the set. If you are not sure, invoke check_type with a few random samples from your set, or use the CompBio::Simple manpage, which does that by default. ttttbbbbllll____ttttoooo____ffffaaaa Converts a sequence record in table (tab delimited, usually .tbl file extension) format to fasta format. `$aref_faseqs = $cbc-'tbl_to_fa(\@seqdat,%params);> Each index in the @seqdat array must contain entire record (loci\tsequence) for single sequence. Return is an array reference, still one sequence per index. ttttbbbbllll____ttttoooo____iiiigggg Converts a sequence in table (tab delimited) format to .ig format. Accepts sequences in a referenced array, one record per index. `$aref_igseqs = $cbc-'tbl_to_ig(\@tbl_seqs,%params);> ffffaaaa____ttttoooo____ttttbbbbllll Accepts fasta format sequences in a referenced array, one complete sequence record per index. This method returns the _s_e_q_u_e_n_c_e(s) in table(.tbl) format contained in a referenced array. `$aref_faseqs = $cbc-'fa_to_tbl(\@fa_seq);> iiiigggg____ttttoooo____ttttbbbbllll Accepts ig format sequences in a referenced array, one complete sequence record per index. This method returns the _s_e_q_u_e_n_c_e(s) in table(.tbl) format contained in a referenced array. `$aref_igseqs = $cbc-'ig_to_tbl(\@fa_seq);> ddddnnnnaaaa____ttttoooo____pppprrrrooootttteeeeiiiinnnn Converts a dna sequence, containing no whitespace, submited as a scalar reference, to the amino acid residues as coded by the standard 'universal' genetic code. Return is a reference to a scalar. dna sequence may contain standard special characters (i.e. R,S,B,N, ect.). Default behavior is 2001-09-11 Last change: perl v5.6.0 3 User Contributed Perl Documentation CompBio(3) to trim a final stop, if present, and to substitute an M for an I or L in the first position - this is usually correct when translating whole sequences from a coding DNA sequence. A hash containing optional parameters may be passed as the second argument. Options allowed are: C: Set to a true value to indicate dna should be converted to it's compliment before translation. ALTCODE: A reference to a hash containing alternate coding keys where the value is the new aa to code for. Stop codons are represented by ".". SEQFIX: Set to true to alter first position, making V or L an M, and removing stop in last position. `$aa = dna_to_protein(\$dna_seq,%params);' ccccoooommmmpppplllleeeemmmmeeeennnntttt Converts dna to it's complimentary strand. DNA sequence is submitted as scalar by reference. There is no return as sequence is modified. To maintain original sequence, send a reference to a copy. `compliment(\$dna);' ssssiiiixxxx____ffffrrrraaaammmmeeee Converts a submitted dna sequence into all 6 frame translations to aa. Arguments are the file containing the raw dna sequence (in .raw format! no whitesace), the id to prefix the output, the min length of amino acid sequences to be recorded, and the file to output. Note that the output file will be truncated. Output id's have strand, frame, start and stop positions encoded. `$result = six_frame($raw_file,$id,$seq_len,$out_file);' Note: six_frame returns a result of 0 on success, or an error message on failure. wwwwuuuu____bbbbllllaaaasssstttt A simple interface to the Washington University distribution of blast. Currently has only been tested with the 2.0a8MP release. 2001-09-11 Last change: perl v5.6.0 4 User Contributed Perl Documentation CompBio(3) Takes series of arguments qw($method $database $queryfile "parameter list","noise",$cpu_server). Method can be any allowed blast method. Database is the blastable fasta format database to be searched. queryfile is the file containg the fasta sequence you wish to use in the blast comparison. "parameter list" is a quote inclosed set of parameter arguments to hand blast. Optional final arguments are the noise or debig level to use (default is quiet), "quiet" tells blast to operate silently, anything else prints some debugging messages and allows any errors from blast to be printed. Final argument is for the 'cpu server' where blast is to execute and must follow a noise level argument. Only blastp and blastx have default parameters at moment: blastp B=10 e=1e-3 filter=seg+xnu blastx B=25 e=1e-3 filter=seg+xnu If any parameters are supplied method will not use any default values! `$blast_out = blast('blastp',$fa_file,$queryfile,"-B=0 -matrix=BLOSUM62","quiet"); ' AAAAAAAA____HHHHAAAASSSSHHHH Creates a hash table with amino acid lookups by codon. Includes all cases where even an alternate na code (such as M for A or C) would return an unambiguos aa. Also consistent with the complement method in this package, ie, lower cases in some contexts, for ease of use with six_frame. C<%aa_hash = aa_hash;> EEEEXXXXPPPPOOOORRRRTTTT None by default. HHHHIIIISSSSTTTTOOOORRRRYYYY 0.01 Original version; created by h2xs 1.20 with options -AXC -n CompBio 0.44 Copy over most functions from original BMERC::bio (ver 0.74), making improvements to code, mostly by removing lingering locale assumptions and (hopefully) improving interface, and adding OOP useability . 0.45 Modifications to Simple primarilly TTTTOOOO DDDDOOOO - iiiinnnn nnnnoooo ppppaaaarrrrttttiiiiccccuuuullllaaaarrrr oooorrrrddddeeeerrrr I like the basic design so far - except where large sequence sets are to be munged. There this becomes a seriouse memory hog. Some faster & more efficient way needs to be provided when dealing with files & large data sets. 2001-09-11 Last change: perl v5.6.0 5 User Contributed Perl Documentation CompBio(3) **Got it - needs work!** Get the full release of WashU Blast and make sure the blast stuff works with it as well. OK, rethink blast _m_e_t_h_o_d(s) completely. Old and kludged and needs to die. Add a method for handeling ncbi blast. CompBio::Simple should only have a blast method that will DTRT and use the appropriate core methods. Add DNA* and GCG format handelers Add handler for the genbank report type format: 1 attgc gtgct 11 gtgtg gacaa Which, though annoying, seems a certain candidate for recieving in cut and paste operations, particularly through the web. Better way to handle CPUSERVER. There must be some way to allow the user to define (presumably during ./config?) how to submit jobs for more intensive computational tools such as Blast and PIMA. Things like rsh, submitting to a batch queue, etc should all be definable at install somehow. write configure script to populate these globals as part of install process. modify tests to look for inclusion of Profile.pm or such and skip testing where apropriate if the PIMA suite is not installed. Find out if there is a way using OOP to allow a user to only need to include this module and create new objects for the submodules, or even better just DTRT. Can this be done just through inheritence? Should it be triggered by requested export (like use CompBio qw(Simple DB)? Or through an AUTOLOAD type interface, returning the correct object ($cbs = CompBio->Simple("new") or some such)? I think that would be far more desirable than _having_ to use a bunch of modules all in the CompBio namespace. fasta and ig to tbl methods should place extra data in optional fields in new table specs tbl_to_ig needs to check for extra fields in new table format and place fields in ig's optional comment lines. _error (all packages) needs to correctly report line where error occured. Can this be done through caller or do I need to pass manually? CCCCOOOOPPPPYYYYRRRRIIIIGGGGHHHHTTTT Copyright Sean Quinlan, Trustees of Boston University 2001-09-11 Last change: perl v5.6.0 6 User Contributed Perl Documentation CompBio(3) 2000-2001. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. AAAAUUUUTTTTHHHHOOOORRRR Sean Quinlan, seanq@darwin.bu.edu Please email me with any changes you make or suggestions for changes/additions. Latest version is available under ftp://mcclintock.bu.edu/BMERC/perl/. Thank you! SSSSEEEEEEEE AAAALLLLSSSSOOOO _p_e_r_l(1), CompBio::Simple, CompBio::DB. 2001-09-11 Last change: perl v5.6.0 7