B.pl.html


Fri Jul 10 17:10:08 EDT 1998 Created by ./headbox2html.pl

B

Download B .pl
Author     : J. Park, Jason Johnson, Sarah Teichmann, Alex Bateman,
               Astrid Reinhardt, and anybody contributed.  jong@salt2.med.harvard.edu
Example    : require "B.pl"; BUT, I recommand you take subroutines out and
             use it directly or modify in your programs.
Function   : This is a comprehensive perl subroutine library developed
               under Bioperl project and others.
             URL: http://cyrah.med.harvard.edu/Bioperlsub/

             This serves as the depository database for various perl subroutines
              or algorithms developed in Bioinformatics and Genome projects.

             You can copy any of the sub routines in this file, modify, use
             in yours...
             PLEASE MODIFY AS FREELY AS YOU WANT !! All has the same PERL copyright

             All the subroutines are tested in small files
             If you want to have such single example program
             to see how they really work, pls contact me( A Biomatic )
             For example, a file called  'handle_arguments.pl' exists to
             test the subroutine 'handle_arguments'. Usually you can find them
             in  http://cyrah.med.harvard.edu/B.pl.html

Keywords   : Biology, perl library, sequence handling lib
Options    : nothing (used as subroutine library)

Usage      : require "B.pl"; ##<-- This is very slow, so you'd better
             copy the subroutines in your prog. or make a smaller lib files
               which are classified according to functions(like, Bio_Seq.pl
               for sequence handling, Bio_Array.pl for various array
               subroutines..), or make your own module out of this, do whatever
               you want....

Version    : 1.8    (April/27/1998)
Warning    : CopyLEFTed, for the enhancement of Biology, Biomatics, and Science.
             This is a development companion, nothing else.
             Class is for classification of my subroutines. If it is B, it can
             be useful for biological sequence data handling. If it's Utility,
             it can also be used for general purpose file handling stuff.
             File, Array, Hash,... are my classification items.

handle_arguments

Download handle_arguments .pl
Argument   : any type, any amount
Category   : general programming
Example    : 'handle_arguments(\@array, $string, \%hash, 8, 'any_string')
Function   : Sorts input arguments going into subroutines and returns default
             arrays of references for various types (file, dir, hash, array,,,,)
             If you give (\@out, @file), it will put @out into @array as a ref
             and also the contents of @out will be dereferenced and put to
             raw_string regardless what is in it).

Keywords   : handling arguments, parsing arguments,
Returns    : Following GLOBAL variables

             $num_opt,    @num_opt     @file          @dir
             $char_opt,   @char_opt    %vars          @array,
             @hash        @string,     @raw_string    @range,

             $num_opt has 10,20
             @num_opt has (10, 20)
             @file has  xxxx.ext
             @dir has  dir  or /my/dir
             $char_opt has 'A,B'
             @char_opt has (A, B)
             @array has  (\@ar1, \@ar2)
             @hash has (\%hash1, \%hash2)
             @string  ('sdfasf', 'dfsf')
             @raw_string (file.ext, dir_name, 'strings',,)
             @range has values like  10-20
             %vars deals with x=2, y=3 stuff.

Usage      : Just put the whole box delimited by the two '###..' lines below
             to inside of your subroutines. It will call 'handle_arguments'
             subroutine and parse all the given input arguments.
             To use, claim the arguments, just use the variable in the box.
             For example, if you had passed 2 file names for files existing
             in your PWD(or if the string looks like this: xxxx.ext),
             you can claim them by $file[0], $file[1] in
             your subroutine.
Version    : 4.8

sort_by_E_values

Download sort_by_E_values .pl
Function   : it sorts by the 2nd column(E-value, in msp file), small comes top
Keywords   : sort_by_2nd_column, sort_by_second_column, sort_by_e_values,
             sort_by_evalues,
Usage      : @out=@{&sort_by_E_values(\@input_line_array)};
Version    : 1.0

sort_hash_value_by_column

Download sort_hash_value_by_column .pl
Example    : Above will sort the file xxxx.msp by its 3rd column(numerically)
               small numbers will come to the top.
Function   : it sorts values of hash by the given column , small comes top. Unless number is
             is given, it sorts by the first column.
             It returnns ARRAY of the keys of the input HASH!!!

             It can handle gzipped file. It called gunzip to open and sort.

Keywords   : sort_by_2nd_column, sort_by_second_column, sort_by_e_values,
             sort_by_evalues, sort_hash_by_column, sort_value_by_column,
Options    : 
      s  for sorting stringwise
      d  for sorting by digit
      n  for sorting by digit(numerically)
   numerically  an alias of n

Usage      : @out=@{&sort_by_column(\%input_line_hash, )};
Version    : 1.1

sort_by_column

Download sort_by_column .pl
Example    : sort_by_column.pl 3 xxxx.msp
               Above will sort the file xxxx.msp by its 3rd column(numerically)
               small numbers will come to the top.
Function   : it sorts by the given column , small comes top. Unless number is
             is given, it sorts by the first column.

             It can handle gzipped file. It called gunzip to open and sort.

Keywords   : sort_by_2nd_column, sort_by_second_column, sort_by_e_values,
             sort_by_evalues,
Options    : 
      s  for sorting stringwise
      d  for sorting by digit
      n  for sorting by digit(numerically)
Usage      : @out=@{&sort_by_column(\@input_line_array, )};
Version    : 1.4

sort_by_cluster_size

Download sort_by_cluster_size .pl
Function   : it sorts by the 1st digit before '-'  as in 2-183_cluster, 2-140_cluster,
               etc.
Keywords   : sort_by_columns, sort_by_text_columns, sort_by_column_numerically
             sort_by_pattern
Usage      : @out=@{&sort_by_cluster_size(\@input_line_array)};
Version    : 1.2

sort_by_column_bigger_first

Download sort_by_column_bigger_first .pl
Function   : it sorts by the 2nd column(E-value, in msp file), small comes top
             by the help of  ts 
Keywords   : sort_by_columns, sort_by_text_columns, sort_by_column_numerically

Usage      : @out=@{&sort_by_column_bigger_first(\@input_line_array, 1)};
Version    : 1.1

make_scrambled_seq_database

Download make_scrambled_seq_database .pl
Keywords   : scramble_seq_database, create_scrambled_seq_database
Usage      : &make_reverse_seq_database(\@input_database_fasta_file);
Version    : 1.1

make_2D_identity_matrix_array

Download make_2D_identity_matrix_array .pl
Function   : @matrix is like  $matrix[1][2]=1;
             This assigns number 1 to array element
             If one array is given, it makes self to self matrix.
             When 2 are given, make matrix for the 2
Keywords   : make_matrix
Options    : 
    $skip_gap_char = g  for skipping gap char (any special char)
Usage      : @matrix=@{&make_2D_identity_matrix(\@seq1, \@seq2)};
Version    : 1.2

make_2D_aa_residue_matrix_array

Download make_2D_aa_residue_matrix_array .pl
Function   : @matrix is like  $matrix[1][2]='A'; when aa residue is identical
             This assigns identical residue to array element
             If one array is given, it makes self to self matrix.
             When 2 are given, make matrix for the 2
Keywords   : make_matrix
Usage      : @matrix=@{&make_2D_aa_residue_matrix_array(\@seq)};
Version    : 1.1

make_2D_identity_matrix

Download make_2D_identity_matrix .pl
Function   : @matrix is like  $matrix[1][2]=1;
             This assigns number 1 to array element
Keywords   : make_matrix, make_identity_matrix
Options    : 
        s  for show axis
Usage      : @matrix=@{&make_2D_identity_matrix(\$seq, [\$seq2] )};
Version    : 1.2

amino_acid_homology_matrix

Download amino_acid_homology_matrix .pl
Keywords   : are_they_homologous, amino_acid_homology_table, compare_amino_acid_homology
Usage      : $yes_no=${&amino_acid_homology_matrix('E', 'D')};
Version    : 1.0

write_reverse_seq_files

Download write_reverse_seq_files .pl
Author     : jong@salt2.med.harvard.edu
Keywords   : write_rev_seq_files, write_reverse_msf_files
Version    : 1.0

make_hmm_from_alignment

Download make_hmm_from_alignment .pl
Keywords   : 
: HMM, hidden markov model, make_HMM_from_alignment,
             make_hmm_from_msf_file, create_hmm_from_alignment,
             create_hmm_from_msf_file,
Usage      : &make_reverse_seq_database(\@input_database_fasta_file);
: @out_hmm_file_names=@{&make_hmm_from_alignment(\@file, "$over_write")};
Version    : 1.2
: 1.1

make_sequence_match_table

Download make_sequence_match_table .pl
Example    : 

  OUTPUT looks like the following;
 d1dvh__=d1fcdc1     7.1e-08
 d1fcdc1=d1dvh__     7.1e-08
 d5cytr_=d351c__     5.3e-08
 d351c__=d5cytr_     5.3e-08
 d1cyi__=d2mtac_     9.1e-06
 d2mtac_=d1cyi__     9.1e-06
 d1cyi__=d5cytr_     0.00045
 d5cytr_=d1cyi__     0.00045

: 

  INPUT looks like this: (the iss file format), first column is key

   d1ten__(110)(0.00031)     d1fna__    d1fna___1-91(578)(6.9e-37)       d1ten__(110)(0.00031)
   d1cfb_2(255)(7.8e-16)     d1cfb_2    HSU55258_741-838(255)(5.6e-12)   d1cfb_2(255)(7.8e-16)

  OUTPUT looks like the following;
   d1dvh__=d1fcdc1    Correct: 7.1e-08
 d1fcdc1=d1dvh__    Correct: 7.1e-08
 d5cytr_=d351c__    Correct: 5.3e-08
 d351c__=d5cytr_    Correct: 5.3e-08
 d1cyi__=d2mtac_    Wrong:   9.1e-06

Function   : gets sequences which are wrongly matched from intermediate seq search
: makes a table of match with the values for E values.
Keywords   : 
: make_sequence_match_Evalue_table, Evalue_table, make_Evalue_table
             make_iss_sequence_match_table
Options    : _  for debugging.
             #  for debugging.

: _  for debugging.
             #  for debugging.
             s  for skip SELF to SELF match entries
             w  for Smith-Waterman score result out than E value out
             r  for reflexive output

Reference  : http://sonja.acad.cai.cam.ac.uk/perl_for_bio.html
Usage      : %seq=%{&get_false_positive_seq_matches(\%msp_1, \%msp2)};
: %sequence_match_table=%{&make_sequence_match_table(\%msp_1, \%msp2)};
Version    : 1.0
: 1.5
Warning    : The default is to show the best E value(lowest that is)


write_iss_file

Download write_iss_file .pl
Function   : writes the intermediate sequence search file.
Keywords   : write_interm_seq_search_file
             v  for showing the output in STDOUT
Reference  : http://sonja.acad.cai.cam.ac.uk/perl_for_bio.html
Usage      : &write_iss_file(\%msp1, \%msp2);  ## for 2 msp_x file input
Version    : 1.2
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_overlapping_seq_match_size

Download get_overlapping_seq_match_size .pl
Keywords   : CF: get_overlapping_range, get_overlapping_seq_match
Usage      : $ovlapsize=${&get_overlapping_seq_match_size($st1, $en1, $st2, $en2)
Version    : 1.1

get_unix_shell_name

Download get_unix_shell_name .pl
Author     : jong@salt2.med.harvard.edu, On commercial use issue, Email me.
Version    : 1.0

get_stat_FASTA_search_result_in_msp_0_files

Download get_stat_FASTA_search_result_in_msp_0_files .pl
Category   : statistics, search, bio
: 
Keywords   : get_stat00_result, get_stat_msp0_files, get_stat_single_search_result
Options    : 

  $E_value= by e=
  $verbose=v by v
  $show_options=o by o
  $step   =  by s=
  $score_thresh1=   by t1=
  $score_thresh2=   by t2=
  $E_mult_factor1 = by m1=
  $E_mult_factor2 = by m2=

Usage      : &get_stat_FASTA_search_result_in_msp_0_files(\@file);
Version    : 1.0

get_scop_correcting_pairs

Download get_scop_correcting_pairs .pl
Keywords   : get_pdb_correcting_pairs ,
Usage      : %correct=%{&get_scop_correcting_pairs()};
Version    : 1.3

get_isearch_result_stat

Download get_isearch_result_stat .pl
Author     : A Scientist
Example    : Following input (hash eg: %stat2, input with the first word as key)
              will become columnar output.

    d1ash__ d1bam__ d1mba__ d2lhb__
    d1baba_ d1flp__ d1hbg__ d1hlb__ d1mba__ d1mbd__ d2lhb__ d3aaha_ d3sdha_
    d1cpca_ d1cpcb_ d1gof_1 d2ts1_1

    Will become:
      ....
      d1ash__ d2lhb__ Homolog: G1   98 0.012
      d1baba_ d1flp__ Homolog: G1   82 0.072
      d1baba_ d1hbg__ Homolog: G1   79 0.13
      d1baba_ d2lhb__ Homolog: G1   228 8e-12
      d1baba_ d3aaha_ Nomolog: G1   74 2
      d1baba_ d3sdha_ Homolog: G1   92 0.012
      d1cola_ d1hbg__ Nomolog: G1   79 0.59
      d1cpca_ d1cpcb_ Homolog: G1   176 4.9e-08
      ....

Keywords   : get_stat_interm_search, get_intermediate_search_stat
Options    : _  for debugging.
             #  for debugging.
Package    : Bio
Reference  : http://sonja.acad.cai.cam.ac.uk/perl_for_bio.html
Returns    : [$av_correct, $num_enq_seq]
Usage      : &get_self_isearch_stat(\%stat2, \@pdbg_seqs, \$evalue);
Version    : 2.1

open_sequence_index_files

Download open_sequence_index_files .pl
Example    : 
: %index=%{&open_sequence_index_files(\@INDEX_FILE, \@input_seq_names)};
Function   : 
: returns seqname with its seek pos in fasta sequence db file.
Keywords   : remove_sequence_ranges, remove_sequence_name_ranges,
             remove_ranges_in_sequences, strip_sequence_name_ranges,
: open_seq_index_files, open_seq_idx_files, open_idx_files,
             get_sequence_index, get_seq_index, get_sequence_with_index
Options    : _  for debugging.
             #  for debugging.
: _ or # for debugging
Usage      : 
: open_sequence_index_files(, );
Version    : 1.0
: 1.2
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.
: You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

do_intermediate_sequence_search

Download do_intermediate_sequence_search .pl
Example    : &do_intermediate_sequence_search(\%pdb_seq, $owl_db_fasta, $ARGV[0], $single_msp, $over_write,
                    "u=$upper_expect_limit", "l=$lower_expect_limit", "k=$k_tuple" );

Options    : 
             Query_seqs=  for enquiry sequences eg)  "Query_seqs=$ref_of_hash"
             DB=   for target DB  "DB=$DB_used"
             File= to get file base(root) name.  "File=$file[0]"
             m  for MSP format directly from FASTA or Ssearch result than through sso_to_msp to save mem
             s  for the big single output (msp file output I mean)
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             R  for adding ranges to the enquiry sequences as well.
             k= for k-tuple value. default is 1 (ori. FASTA prog. default is 2)
             u= for $upper_expect_limit
             l= for $lower_expect_limit
             a= for choosing either fasta or ssearch algorithm

Returns    : the names of files created (xxxxx.msp, yyy.msp,,)
Usage      : &do_intermediate_sequence_search(\%pdb_seq, $owl_db_fasta, $ARGV[0], $single_msp, $over_write,
                    "u=$upper_expect_limit", "l=$lower_expect_limit", "k=$k_tuple" );

Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

do_sequence_search

Download do_sequence_search .pl
Example    : &do_sequence_search(\%pdb_seq, $owl_db_fasta, $ARGV[0], $single_msp, $over_write,
                    "u=$upper_expect_limit", "l=$lower_expect_limit", "k=$k_tuple" );

Function   : do FASTA, SSEARCH or BLASTPGP(psi-blast) search
Keywords   : sequence_search
Options    : 
             Query_seqs=  for enquiry sequences eg)  "Query_seqs=$ref_of_hash"
             DB=   for target DB  "DB=$DB_used"
             File= to get file base(root) name.  "File=$file[0]"
             m  for MSP format directly from FASTA or Ssearch result than through sso_to_msp to save mem
             s  for the big single output (msp file output I mean)
             s= for the single big msp file name
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             d  for very simple run and saving the result in xxxx.gz format in sub dir starting with one char
             r  for reverse the query sequence
             R  for attaching ranges of sequences
             k= for k-tuple value. default is 1 (ori. FASTA prog. default is 2)
             u= for $upper_expect_limit
             l= for $lower_expect_limit
             a= for choosing either fasta or ssearch algorithm
             d= for defining the size of subdir made. 2 means it creates
                    eg, DE while 1 makes D
             d  for $make_gz_in_sub_dir_opt, putting resultant sso files in gz format and in single char subdir
             D  for $make_msp_in_sub_dir_opt, convert sso to msp and put in sub dir like /D/, /S/
             n  for new format to create new msp file format with sso_to_msp routine
          PVM=  for PVM run of FASTA (FASTA only)
             M  for machine readable format -m 10 option
             M= for machine readable format -m 10 option
             N  for 'NO' do not do any processing but, do the searches only.
       FILE_AGE for defining the age of file in days to be overwritten.
             L  for Lean output(removes xxxx.fa query seq file)

Returns    : the names of files created (xxxxx.msp, yyy.msp,,)
Usage      : &do_sequence_search("Query_seqs=\%pdb_seq", "DB=$sequence_db_fasta",
             "File=$ARGV[0]", $single_msp, $over_write,
            "u=$upper_expect_limit", "l=$lower_expect_limit",
          "k=$k_tuple", $No_processing );
Version    : 5.1

do_hmm_sequence_search

Download do_hmm_sequence_search .pl
Function   : does hmm sequence search using Sean Eddy's HMMER (hmmls, hmmfs)
Keywords   : do_seq_search_with_hmm, do_hmmt_sequence_search
Options    : 
    "method=ls"  for turning hmmls search option on (default)
    "method=fs"  for turning hmmfs search option on
    method= by method=
   o  for overwriting existint xxxxx.hmm files
   E=Enguiry_name    for specifying enquiry seq name rather than 'HMM', the default
   t=20  for score thresh at the level of hmmls. Default of hmmls is 0. example showed has 15
   $evalue_cutoff= by e=
$over_write = o by -o o
Usage      : &do_hmm_sequence_search(\@file, "method=$default_search_method",
        $over_write, "DB=$pdbd40_seq_fasta");

Version    : 1.6

divide_clusters

Download divide_clusters .pl
Example    : ÷_clusters(\@file, $verbose, $range, $merge, $sat_file,
                  $dindom, $indup, "T=$length_thresh", "e=$evalue", $over_write,
                   $optimize, "s=$score", "f=$factor");

Function   : This is the main funciton for divclus.pl
               divides complex single linkage cluster into smaller duplication
               module level sub clusters.
Keywords   : divicl, divclus, div_clus, divide clusters
Options    : _  for debugging.
   f=   for determing the factor in filtering out non-homologous
                  regions, 7 = 70% now!!
   l=   for seqlet(duplication module) length threshold
   t=   for seqlet(duplication module) length threshold
                  (same as l opt, confusing, huh? )
   s=   for score threshold
   e=   for evalue threshold
   z           for activating remove_similar_sequences, rather than remove_dup....
   o           for overwriting
   v           for verbose printout (infor)
   D           for dynamic factor
   S  $short_region=  S by S -S  # taking shorter region overlap in removing similar reg
   L  $large_region=  L by L -L  # taking larger  region overlap in removing similar reg
   A  $average_region=A by A -A  # taking average region overlap in removing similar reg

Usage      : ÷_clusters(\@file);
Version    : 2.8

remove_similar_seqlets

Download remove_similar_seqlets .pl
Example    : @seqlets=@{&remove_similar_seqlets(\@mrg1, $mrg2, \@mrg3)};
               while @mrg1=qw(M_2-100 M_2-110 M_8-105 M_4-108 N_10-110 N_12-115);
                     $mrg2='Z_3-400 Z_2-420';
                     @mrg3=('X_2-300 X_3-300', 'X_2-300', 'X_5-300 X_2-301' );
Function   : merges(gets average starts and ends ) of similar
             seqlets to reduce them into smaller numbers. This can also handle
              names like XLBGLO2R_8-119_d1hlm__.

Keywords   : merge_sequence_names, merge_seq_names, merge_sequence_ranges
             merge_seq_ranges
Options    : _  for debugging.
             #  for debugging.
             f= for factor
             S  for shorter region matched is used
             A  for average region matched is used
             L  for larger region matched is used

Usage      : @seqlets=@{&remove_similar_seqlets(\@split)};
Version    : 2.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

show_subclusterings

Download show_subclusterings .pl
Example    : @temp_show_sub=&show_subclusterings(\@out, $file, $sat_file, $dindom, $indup);
Function   : This is the very final sub of divclus.pl
Keywords   : print_subclusterings, sum_subclusterings, write_subclustering
             show_clusterings, display_subclusterings
Options    : 
             f  for file output, eg: xxxxxxx.sat

Usage      : &show_subclusterings(\@out);
Version    : 2.6
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

exchange_query_with_match_in_msp

Download exchange_query_with_match_in_msp .pl
Keywords   : swap_query_with_match_in_msp, invert_query_with_match_in_msp,
             swap_query_seq_with_match_seq_in_msp,
Usage      : @exchanged_msp=@{&exchange_query_with_match_in_msp(\@file)};
Version    : 1.1

get_internal_dup_in_a_cluster

Download get_internal_dup_in_a_cluster .pl
Options    : _  for debugging.
             #  for debugging.
Version    : 1.1
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_domain_inside_domain

Download get_domain_inside_domain .pl
Keywords   : find_dindoms, domain_inside_domain, domain_in_domain
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

scale_for_horizontal_histogram

Download scale_for_horizontal_histogram .pl
Function   : used to make things like:

Options    : _  for debugging.
             #  for debugging.
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_added_matched_regions_in_msp

Download get_added_matched_regions_in_msp .pl
Function   : This reads MSP file regions matched for a target seq
             and adds things up to plot horizontally.
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

cluster_merged_seqlet_sets

Download cluster_merged_seqlet_sets .pl
Options    : _  for debugging.
             #  for debugging.
  $short_region=  S by S -S  # taking shorter region overlapped in removing similar regions
  $large_region=  L by L -L  # taking larger  region overlapped in removing similar regions
  $average_region=A by A -A # taking average region overlapped in removing similar regions

Usage      : @out=@{&cluster_merged_seqlet_sets(\@lines)};
Version    : 1.5
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

check_linkage_of_2_similar_seqlet_sets

Download check_linkage_of_2_similar_seqlet_sets .pl
Function   : connects two clusters of seqlets if they share
              identical or near identical seqlets
Keywords   : check_link, check_relation, check_relatedness
Options    : _  for debugging.
  $factor = by f=  # eg)  "f=$factor" in the higher level sub

Version    : 1.7
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

merge_arrays_by_common_elements

Download merge_arrays_by_common_elements .pl
Function   : merges arrays if there are common array elements.
             if @A has (1,2,3) and @B has (2, 4, 5), they share 2, so
             they are merged to be (1,2,3,4,5)
Keywords   : cluster_arrays_by_common_elements, merge_arrays_if_common_elements
             merge_array_if_common_elements, merge_arrays_when_common_elements_occur
             merge_arrays
Usage      : @out=@{&merge_arrays_by_common_elements(\@ref_of_arrays)}
Version    : 1.1

check_parf_files

Download check_parf_files .pl
Example    : 
    PARF file looks like this>
   d1nsca_   d3nn9__   Homolog -664.92 2.43.1.1.3  2.43.1.1.2
   d1dppa_   d2olba_   Homolog -617.41 3.68.1.1.6  3.68.1.1.1
   d2ach.1a1 d9api.1a1 Homolog -556.38 5.2.1.1.3   5.2.1.1.4
Function   : checks if given file(s) is a parf file and returns the number of
              identified parf file. If you check 2 files and both are parf, you
               will get (\$num_of_parf_file) value of 2.
Usage      : $number_of_parf=${&check_parf_files(@input)};
Version    : 1.0

check_common_elements_in_array

Download check_common_elements_in_array .pl
Function   : accepts 1 or 2 refs of arrays and checks if there is any
             common(repeating) elements between the two (or inside one)
             The result is either ref of 1, or 0
Keywords   : is_there_common_element, if_common_elements
Usage      : &check_common_elements_in_array($mother_array[$i], $mother_array[$i+1]));
Version    : 1.0

link_ranges

Download link_ranges .pl
Example    : INPUT:

   @input=( '1-30 1-40 1-50',
            '2-49 4-40 2-99'....)

Function   : merges ranges(10-20, 11-21 etc) when there is any overlap
              is present
             If you put a reverse range like '2000-20', it will
              complain and reverse the order and do the job after correction.

Keywords   : connect_ranges, link_overlapping_ranges, connect_overlapping_ranges
Options    : _  for debugging.
Usage      : @all_ranges = @{&link_ranges(@all_ranges)};
Version    : 1.1

merge_similar_ranges

Download merge_similar_ranges .pl
Example    : INPUT:

   @input=( '1-30 1-40 1-50',
            '2-49 4-40 2-99'....)

Function   : merges ranges(10-20, 11-21 etc) when there is any overlap
              is present (resulting in average start and end at each level)
             If you put a reverse range like '2000-20', it will
              complain and reverse the order and do the job after correction.

Keywords   : merge_similar_regions, merge_ranges, merge_regions,
              merge_sequence_ranges, merge_overlap_ranges, connect_ranges
              connect_overlapping_ranges, connect_similar_ranges,
              remove_similar_ranges
Options    : f=   for setting factor (0.7 for 70% overlap minimum)

Usage      : @all_ranges = @{&merge_similar_seqlets(@all_ranges)};
Version    : 1.3

merge_similar_seqlets

Download merge_similar_seqlets .pl
Example    : INPUT:

   @input=( 'seq1_1-30 seq2_1-40 seq3_1-50',
            'seq1_2-49 seq3_4-40 seq4_2-99'....)

   @output=('seq1_1-30 seq2_1-45 seq3_2-45 seq4_2-99');

Function   : merges seqlet sets which have identical
             sequences and share similar regions by connection factor of 30%
             This means, if any two seqlets from the same sequences which
             share more than 70% seqlet regions overlapping are merged
             This only sees the very first sequence in the seqlets line!!!
             (so, PARTIAL MERGE !!)
Keywords   : merge_similar_sequences, merge_sequence_names, merge_sequences,
              merge_sequence_ranges, merge_similar_sequences_with_ranges,
              merge_seqlets, merge_duplication_modules
Options    : 

   f=   for determing the factor in filtering out non-homologous
                  regions, 7 = 70% now!!
   l=   for seqlet(duplication module) length threshold
   z           for activating remove_similar_sequences, rather than remove_dup....
   S  $short_region=  S by S -S  # taking shorter region overlap in removing similar reg
   L  $large_region=  L by L -L  # taking larger  region overlap in removing similar reg
   A  $average_region=A by A -A  # taking average region overlap in removing similar reg

Usage      : @all_seqlets = @{&merge_similar_seqlets(@all_seqlets)};
Version    : 2.0

sort_by_digits_in_string

Download sort_by_digits_in_string .pl
Function   : sorts arrays of strings like

   MJ0228_314-573 MJ1197_348-601
   MJ0228_451-576 sll0078_502-594 sll1425_489-611
   MJ0228_479-572 sll0078_502-594

   According to the digits after seq names _314-, _451-, _479-
    in the above
   This only looks at the very first sequence in the string

Options    : _  for debugging.
             #  for debugging.
Version    : 1.4
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

sort_words_in_string

Download sort_words_in_string .pl
Function   : sort words in strings sperated by ' ' or "\n"
Keywords   : sort_words_in_sequences, sort_sequences_in_string,
             sort_strings_in_string, sort_string_by_words, sort_elements_in_string
Options    : _  for debugging.
             #  for debugging.
Version    : 1.1
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

convert_hmmls_to_msp_files

Download convert_hmmls_to_msp_files .pl
Keywords   : convert_hmmls_to_msp
Options    : 
   S=$single_out_file_name   for producing single msp file with all the hmmls contents
   E=Enguiry_name    for specifying enquiry seq name rather than 'HMM', the default
   $bit_score_threshold= by t=
Usage      : @out=@{&convert_hmmls_to_msp_files(\@file)};
Version    : 1.4

convert_mmp_to_mrg

Download convert_mmp_to_mrg .pl
Example    : 
  Example OUT as string

   slr1950 sll1920 sll0672 sll1076 sll1614 slr0797 slr0798 slr0822 slr1729
   slr1729 sll1076 sll0672 sll1614 sll1920 slr0797 slr0798 slr0822 slr1950

Options    : _  for debugging.
             #  for debugging.
Version    : 1.1
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

add_ranges_in_msp_line

Download add_ranges_in_msp_line .pl
Function   : this adds ranges to the seqnames of msp files
             mmp line is msp line with additional sequences at the end
Keywords   : convert_msp_to_mmp, convert_msp, convert_msp_2_mmp
             change_msp_to_mmp, add_range_in_msp, convert_msp_line_to_mmp_line
Options    : _  for debugging.
             #  for debugging.
Version    : 1.5

convert_msp_line_to_mmp_line

Download convert_msp_line_to_mmp_line .pl
Function   : this adds ranges to the seqnames of msp files
             mmp line is msp line with additional sequences at the end
Keywords   : convert_msp_to_mmp, convert_msp, convert_msp_2_mmp
             change_msp_to_mmp, add_range_in_msp
Options    : _  for debugging.
             #  for debugging.
Version    : 1.5

merge_sequence_alignments

Download merge_sequence_alignments .pl
Keywords   : combine_sequence_alignment, merge_sequence_alignment_pairs
             merge_seq_alignment, make_interm_alignment, make_3_way_alignment
             merge_alignment, combine_alignment
Options    : 
    l=  for sequence block length by print_seq_in_block subroutine
    t=  for specifying the length of seq names shown.
    t   for truncating the seq names in printing out.
    s   for sorting the final output lines (default anyway for print_seq_in_block)

Usage      : &merge_sequence_alignments(@seq);  while @seq has
              @seq=(\%hash1, \%hash2);  while %hash1 and %hash2 have
    %hash1=qw(seq1 ANN-NTMQQRRQQQRKRRRQQQSSSSTTST seq2 --NNN--QQ--QQQ--RRRR--SSSS--);
    %hash2=qw(seq2 NN-QQQQQ--RRRR----SS--SS---    seq3 -NNXQQQXQRTRRRXTTSTSSMMSSTTT);

Version    : 1.3

merge_sequence_in_msp_file

Download merge_sequence_in_msp_file .pl
Example    : INPUT: (MSP file) ===>
  59     2.6        47    64     d2pia_3        10    30     d1erd___10-30
  161    1.1e-07    24    91     d2pia_3        16    85     d1frd___16-85

  722    0          1     106    d1put__        1     106    d1put___1-106
  66     4.9        2     68     d1put__        43    106    d2lbp___43-106
  69     1.3        12    49     d1put__        81    120    d1cgo___81-120

  60     3.3        13    38     d1frd__        32    57     d1orda1_32-57
  65     1.7        21    58     d1frd__        40    69     d2mtac__40-69

   ==== OUTPUT ===>
    d1frd___1-98 d1frd___1-98_1-98 d1frd___16-85 d2pia_3_24-91_24-91
    d1frd___16-85_16-85 d2pia_3_24-91
    d1put___1-106 d1put___1-106_1-106
    d2pia_3_1-98 d2pia_3_1-98_1-98

Keywords   : mergr_seq_in_msp_file, merge_sequence_in_msp, merge_sequences_in_msp_file
Options    : 
$dynamic_factor =  y by y -y   # adjusting factor value dynamically(more seq higher factor)
$short_region   =  S by S -S  # taking shorter region overlapped in removing similar regions
$large_region   =  L by L -L  # taking larger  region overlapped in removing similar regions
$average_region =  A by A -A # taking average region overlapped in removing similar regions

Version    : 2.7

merge_sequence_in_msp_chunk

Download merge_sequence_in_msp_chunk .pl
Function   : merges sequences which are linked by common regions
             This filters the sequences by evalue and ssearch score
             This is the main algorithm of merging similar sequences.
Keywords   : connect_sequence_in_msp, link_sequence_in_msp_chunk
             connect_sequence_in_msp_chunk, link_sequence_in_msp
             merge_sequence, link_sequence, connect_sequence
Options    : _  for debugging.
             #  for debugging.
             m  for merge file output format (.mrg)
             t= for threshold of seqlet length eg)  "t=30"
             f= for overlap factor (usually between 2 to 7 )
                 2 means, if the two regions are not overlapped
                  by more than HALF of of the smaller region
                  it will not regard as common seqlet block
             s= for ssearch score minimum
             e= for ssearch e value maximum
             S  for S -S  # taking shorter region overlapped in removing similar regions
             L  for L -L  # taking larger  region overlapped in removing similar regions
             A  for A -A # taking average region overlapped in removing similar regions

Version    : 2.4
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_overlapping_range

Download get_overlapping_range .pl
Keywords   : get_overlapping_range_in_msp, get_overlapping_range_in_msp_file,
             get_overlapping_seq_match_range, get_overlap_seq_match_range
Options    : _  for debugging.
             #  for debugging.
Usage      : @n1=@{&get_overlapping_range(\@ranges1, \@ranges2)};
Version    : 1.1
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

find_source_perl_library

Download find_source_perl_library .pl
Function   : gets the default perl sub source library from ENV setenv
Usage      : $source_library=${&find_source_perl_library};
Version    : 1.1

find_central_seq_msp_chunk

Download find_central_seq_msp_chunk .pl
Options    : _  for debugging.
             #  for debugging.
Usage      : This finds the correct msp chunk with given seq name
             and big original or any msp chunk
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

find_central_sequence

Download find_central_sequence .pl
Function   : accepts msp file and finds the central sequence.
             central sequence is in the centre of all the member
             sequences in a group or cluster
Options    : _  for debugging.
             #  for debugging.
Version    : 1.1
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

write_dof_files

Download write_dof_files .pl
Function   : write Alex's domfam file. it prints out tilde lines
             if the seqlet matched are below threshold defined.
Options    : _  for debugging.
             #  for debugging.
             v  for verbose STDOUT
             n  for NO seq start and end number display
             t= for teshold (eg, t=40  for Blastp(or ssearch) score 40 threshold)

Usage      : &write_dof_files(\@msps);
             while @msps means msp file names
Version    : 1.2
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

make_filtered_list

Download make_filtered_list .pl
Function   : this is the core of check_genome_cluster.pl
             finds good linkage seqlets in msp files
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

follow_seqlet_link

Download follow_seqlet_link .pl
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

clu_to_sso_to_msp

Download clu_to_sso_to_msp .pl
Function   : reads in a big single linkage cluster file(or normal cluster file)
              and creates a big msp file which contains all the entries in the
              cluster file (usually with the extension of sclu or clu)
             This normally reads in xxxx.mso, xxxx.sso like files, but if the
              corresponding  xxx.msp file already exists, it concatenates them to
              make a bigger one.
Keywords   : clu_2_sso_2_msp, cluster_to_msp, cluster_to_sso_to_msp
              convert_clu_to_sso_to_msp
Options    : USE, convert_clu_to_sso_to_msp, this is obsolute now
Usage      : &clu_to_sso_to_msp(\$clu);
Version    : 1.7

convert_clu_to_sso_to_msp

Download convert_clu_to_sso_to_msp .pl
Function   : reads in a big single linkage cluster file(or normal cluster file)
              and creates a big msp file which contains all the entries in the
              cluster file (usually with the extension of sclu or clu)
             This normally reads in xxxx.mso, xxxx.sso like files, but if the
              corresponding  xxx.msp file already exists, it concatenates them to
              make a bigger one.
Keywords   : clu_2_sso_2_msp, cluster_to_msp, cluster_to_sso_to_msp
              clu_to_sso_to_msp
Usage      : &clu_to_sso_to_msp(\$clu);
Version    : 1.8

convert_sso_to_msp

Download convert_sso_to_msp .pl
Example    : &convert_sso_to_msp(@ARGV, 'OUT.msp', $single_out_opt);
Function   : This takes sso file(s) and produces MSP file. It
             concatenate sso file contents when more than one
             sso file is given.
Options    : _  for debugging.
             #  for debugging.
             v  for showing the MSP result to screen
             s  for making single MSP file for each sso file
                    as well as big MSP file which has all sso
             u= for upper expectation value limit
             l= for lower expect val limit
             s= for single file name input eg. "s=xxxxx.msp"
             n  for new format (msp2 format)
             r  for adding range
             r2 for adding ranges in all sequence names

Returns    : the file names created (xxxx.msp, yyyy.msp,,,,)
Usage      : &convert_sso_to_msp(@ARGV, $single_out_opt);
Version    : 2.6
Warning    : This capitalize all the input file names when
              producing xxxxx.msp. xxxxx.sso -> XXXX.sso

sso_to_msp

Download sso_to_msp .pl
Example    : &sso_to_msp(@ARGV, 'OUT.msp', $single_out_opt);
Function   : This takes sso file(s) and produces MSP file. It
             concatenate sso file contents when more than one
             sso file is given.
Keywords   : sso_file_to_msp_file, convert_sso_to_msp,
Options    : _  for debugging.
             #  for debugging.
             v  for showing the MSP result to screen
             s  for making single MSP file for each sso file
                    as well as big MSP file which has all sso
             u= for upper expectation value limit
             l= for lower expect val limit
             s= for single file name input eg. "s=xxxxx.msp"
             n  for new format (msp2 format)
             r  for adding range
             r2 for adding ranges in all sequence names

Returns    : the file names created (xxxx.msp, yyyy.msp,,,,)
Usage      : &sso_to_msp(@ARGV, $single_out_opt);
Version    : 2.6
Warning    : This capitalize all the input file names when
              producing xxxxx.msp. xxxxx.sso -> XXXX.sso

convert_sso_to_msp

Download convert_sso_to_msp .pl
Example    : &convert_sso_to_msp(@ARGV, 'OUT.msp', $single_out_opt);
Function   : This takes sso file(s) and produces MSP file. It
             concatenate sso file contents when more than one
             sso file is given.
Keywords   : sso_file_to_msp_file, convert_sso_to_msp,
Options    : _  for debugging.
             #  for debugging.
             v  for showing the MSP result to screen
             s  for making single MSP file for each sso file
                    as well as big MSP file which has all sso
             u= for upper expectation value limit
             l= for lower expect val limit
             s= for single file name input eg. "s=xxxxx.msp"
             n  for new format (msp2 format)
             r  for adding range
             r2 for adding ranges in all sequence names

Returns    : the file names created (xxxx.msp, yyyy.msp,,,,)
Usage      : &convert_sso_to_msp(@ARGV, $single_out_opt);
Version    : 2.6
Warning    : This capitalize all the input file names when
              producing xxxxx.msp. xxxxx.sso -> XXXX.sso

bla_to_msf

Download bla_to_msf .pl
Function   : matched each query seq name and if the E value is lower than
             my arbitrary threshold, I put the subject and target pair
             alignment into a hash.
             In later iterations, the latest is replaced
Keywords   : convert_bla_to_msf
Usage      : @msf_file_made=@{&bla_to_msf(\@bla_file)};
Version    : 1.1

convert_bla_to_msf

Download convert_bla_to_msf .pl
Function   : matched each query seq name and if the E value is lower than
             my arbitrary threshold, I put the subject and target pair
             alignment into a hash.
             In later iterations, the latest is replaced
Keywords   : convert_bla_to_msf
Usage      : @msf_file_made=@{&convert_bla_to_msf(\@bla_file)};
Version    : 1.1

convert_bla_to_msp

Download convert_bla_to_msp .pl
Author     : Sarah Teichmann and Jong Park, jong@salt2.med.harvard.edu
Example    : %hash_out=%{&convert_bla_to_msp(\$file)};
Function   : reads in PSI blast output and produces MSP file format.
             Takes all the good hits below certain threshold in multiple iteration
             Reports the best evalue with a given sequence name
Keywords   : pbla_to_msp, blast_to_msp, bla_2_msp, blastp_to_msp_format,
             blast_to_msp_format, convert_bla_to_msp, convert_bla_to_msp_files
             bla_to_msp
Options    : 
$pdbd_seq_only  d   for getting dxxxx_ like seq names only(pdb40d names for examp)
$all_seq  a         for forcing all seq conversion
   $which_iteration= by i=    # choose which iteration result you want to take
$which_iteration   as just a digit
   $report_only_the_best=b by b -b
   $take_only_the_last_iteration=l by l
   $accumulative_hits_eval_thresh= by e=
   $genome_seq_only=g
   $nrdb_seq_only=n
   $evalue_thresh= by E=
   $Accumulate_matches=A   by A -A

Usage      : %hash_out_final=%{&convert_bla_to_msp(\$file)};
Version    : 3.7

convert_bla_multaln_to_msf

Download convert_bla_multaln_to_msf .pl
Example    : @msf_file_made=@{&convert_bla_multaln_to_msf(\@bla_file,
                                              $verbose, "i=$iteration")};
Function   : matched each query seq name and if the E value is lower than
             my arbitrary threshold, I put the subject and target pair
             alignment into a hash.
             In later iterations, the latest is replaced,
              when you use m6 option for PSI blast
             this adds '00x' extensions to the repeatedly occurring seq names

Keywords   : psi_blast_to_msf, psi_blast_multaln_to_msf
Options    : 
   i=$iteration
   v  for verbose
Usage      : @msf_file_made=@{&convert_bla_multaln_to_msf(\@bla_file, [i=2])};
Version    : 1.6

convert_bla_multaln_to_msf

Download convert_bla_multaln_to_msf .pl
Example    : @msf_file_made=@{&convert_bla_multaln_to_msf(\@bla_file, "i=$iteration")};
Function   : matched each query seq name and if the E value is lower than
             my arbitrary threshold, I put the subject and target pair
             alignment into a hash.
             In later iterations, the latest is replaced,
              when you use m6 option for PSI blast
             this adds '00x' extensions to the repeatedly occurring seq names

Keywords   : psi_blast_to_msf, psi_blast_multaln_to_msf,
             bla_multaln_to_msf
Usage      : @msf_file_made=@{&convert_bla_multaln_to_msf(\@bla_file, [i=2])};
Version    : 1.4

get_sub_hash

Download get_sub_hash .pl
Function   : fetches hash keys and values by giving keys to
             a hash
Keywords   : subhash, sub_hash, get_hash_elements, fetch_sub_hash
             take_sub_hash, get_hash_by_keys, get_sub_hash_by_keys
Options    : _  for debugging.
             #  for debugging.
Usage      : %sub_hash=%{&get_sub_hash(\%FASTA, \@list)};
Version    : 1.1
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_smallest_file

Download get_smallest_file .pl
Function   : checks the size of files and returns the smallest
             one's name. If a file is not present in pwd or
             specified absolute path, it ignores it.
Keywords   : choose_smallest_file, smallest_file, find_smallest_file
             get_the_smallest_file, choose_the_smallest_file,
             fetch_smallest_file, take_smallest_file, get_smaller_file,
Options    : _  for debugging.
             #  for debugging.
             e  for extract the smallest from the input array
                       leaving it one element less, in this case
                       there will be two returning refs.
Usage      : $smallest_file_name=${&get_largest_file(@ARGV)};
Version    : 1.3

get_largest_file

Download get_largest_file .pl
Function   : checks the size of files and returns the largest
             one's name. If a file is not present in pwd or
             specified absolute path, it ignores it.
Keywords   : choose_largest_file, largest_file, find_largest_file
             get_the_largest_file, choose_the_largest_file, get_biggest_file
             fetch_largest_file, take_largest_file, get_bigger_file, get_larger_file
Options    : _  for debugging.
             #  for debugging.
             e  for extract the largest from the input array
                       leaving it one element less, in this case
                       there will be two returning refs.
Usage      : $largest_file_name=${&get_largest_file(@ARGV)};
Version    : 1.4

get_sequence_complexity

Download get_sequence_complexity .pl
Argument   : ref. of string.
Example    : ${&get_sequence_complexity(\$seq)};
             while $seq='TTTTTACDEFGHIKLMNPQRSTVWYAAAAACCCADFADFA'
Function   : caculates the single sequence's sequence complexity
             If the seq given is larger than 20, it divides it into
             frags of 20 aa and gets the average of it.
Keywords   : sequence_complexity, calc_sequence_complexity,
             calc_seq_complexity, get_seq_complexity,
Options    : _  for debugging.
             #  for debugging.
             'w=' for window size as the first arg
Returns    : Ref. for a scalar digit.
Usage      : print "\n", ${&get_sequence_complexity(\$seq)};
Version    : 1.3

make_swiss_index

Download make_swiss_index .pl
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

fetch_sequence_from_db

Download fetch_sequence_from_db .pl
Argument   : gets names of sequences
             eg) \@array, \%hash, \$seq, while @array=(seq1, seq2), $seq='seq1 seq1'
                                               %hash=(seq1, xxxx, seq2, yyyy);

Example    : %seq=%{&fetch_sequence_from_db(\@input, seq.fa, seq.fa.idx)};
              while @input=qw( 11S3_HELAN_11-31 A1AB_CANFA A1AT_PIG )
Function   : accept seq names (with or without ranges like _10-111 )
              and produces hash ref.
             As an option, you can write(xxxx.fa) the sequences in pwd
              with the file names with sequence names.
             The default database used is FASTA format OWL database.
              You can change this by S (for Swissprot either fasta
              or full format), P for PDB fasta format data.
             If you give the path name of DB, it will look for the
              DB given.

             This automatically checks sequence family number as
               in >d1bpi___7.6.1
               and attaches the number in final %sequence output

Keywords   : fetch_seq_from_db, fetch_sequence_from_database
Options    : _  or #  for debugging.
     w       for write fasta file
     s=      for putting source DB file name manually
     d=p100  for PDB100 fasta database from ENV
     d=p40   for PDB40  fasta database from ENV
     d=p     for PDB database (usually p100) from ENV
     d=s     for Swissprot database from ENV
     d=o     for OWL database from ENV
     i=      for index filename. If not specified, this looks for it in the same dir as fast     ˜
     t=      for msp_threshold
  msp_threshold=0.0005  # when MSP file is given as input for getting seq names

Returns    : ref of hash
Usage      : %sequence=%{&fetch_sequence_from_db($input_file, \@string)};
Version    : 3.5

fetch_seq

Download fetch_seq .pl
Argument   : swissprot seqname
Example    : &fetch_swiss_seq(@ARGV);
Function   : fetches swissprot entry or fasta format seq with
             given seq name(like  SAA_HORSE, SA*HORSE, SAA,..)
             you can give multi files(SAA*, SAU*) at the same
             time. This uses ENV setting of 'SWDIR'
Keywords   : fetch_swissprot_sequence, fetch_sequence,
             find_swiss_sequence, find_sequence
Options    : _  for debugging.
             #  for debugging.
             -f for fasta format file output
             -a is for ALL matched seq. (same as using glob=> *YEAST)
             -c is for Creating seq.idx file
             -h is for HELP!
             -g is for GDF file format output
             -l is for list of match entries(in 1 column)
             -s is for species option (input name mst be species (YEAST, RAT, HUMAN..)
             n= is for Number of seq you want to get from swissprot
             s= is for Size limit. Min seq size in swiss, s=10  -> minimum 11 aa seq.
             S= is for Size limit. Max seq size in swiss, s=1000 -> get less than 1000

Usage      : &fetch_seq(@ARGV);
Version    : 1.6

fetch_swiss_seq

Download fetch_swiss_seq .pl
Argument   : swissprot seqname
Example    : &fetch_swiss_seq(@ARGV);
Function   : fetches swissprot entry or fasta format seq with
             given seq name(like  SAA_HORSE, SA*HORSE, SAA,..)
             you can give multi files(SAA*, SAU*) at the same
             time. This uses ENV setting of 'SWDIR'
Keywords   : fetch_swissprot_sequence, fetch_sequence,
             find_swiss_sequence, find_sequence, fetch
Options    : _  for debugging.
             #  for debugging.
             -f for fasta format file output
Version    : 1.0

get_sequence_number

Download get_sequence_number .pl
Function   : reads database and tells how many sequences are there
             fasta format db is only accepted for now.
Keywords   : count_number_of_sequence, get_number_of_sequence
             get_sequence_number_in_fasta
Options    : _  for debugging.
             #  for debugging.
Version    : 1.2

write_msp_files

Download write_msp_files .pl
Example    : &write_msp_files(@sso, 's', $out_file);
Function   : Writes input which is already in msp file format to
              files either the name is given or generated
              If more than one ref of hash is given, this will
              concatenate all the hashes to one big one to
              make one file.
             When NO output xxx.msp file name is given, it creates
              with the query sequence name.
Keywords   : write_msp,
Options    : _  for debugging.
             #  for debugging.
             s  for each single file output for each hash input
      filename  for putting output to the specified filename, should be xxx.msp

Returns    : if 's' option is set, it will make say,
               HI001.msp HI002.msp HI003.msp  rather than

               HI001HI002HI003.msp
  eg of one output(single file case)

   1027     0.0     1     154   HI0004     1     154   HI0004
   40       0.0     84    132   HI0004     63    108   HI0001
   31       0.0     79    84    HI0004     98    103   HI0003

Usage      : &write_msp_files(\%in1, \%in2, ['s'], [$filename],,)
Version    : 2.8
Warning    : When NO output xxx.msp file name is given, it creates
              with the query sequence name.

write_aln_files

Download write_aln_files .pl
Example    : &write_aln(\%hash, \$out_file_name);
  CLUSTAL W (1.74) multiple sequence alignment


  MMAF6040_1           -----MATDD--SIIVLDD----DDEDEA-AAQP-GPSNLPPN-PASTGPGPGLSQQATG
  AF015956_1           -----MATAN--SIIVLDD----DDEDEA-AAQP-GPSHPLPN-AASPGAG---------
  HSAB2381_80-900      KQRLLSVTSDEGSMNAFTGRGSPDTEIKINIKQESADVNVIGNKDVVTEEDLDVFKQAQE
                             .* :  *: .: .    * * :    *  .  :   *  . .  .

Function   : writes multiple seqs. in msf format (takes one or more than one seq.!!)
Options    : 
     $first_sequence_name= by f=  # to put a certain seq at the first in writing
Usage      : two argments:  $seq_hash_reference  and $output_file_name
             takes a hash which has got names keys and sequences values.
             uses Perl5 pointers(references).
Version    : 1.1

write_msf

Download write_msf .pl
Example    : &write_msf(\%hash, \$out_file_name, ["o=$seq_order"]);
             eg) $seq_order='asdf seq2 seq3 seq5';
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    PileUp

       MSF: 1205  Type: P    Check:  9937   ..

     Name: PYC1_YEAST      oo  Len: 1205  Check:  7954  Weight:  1.00
     Name: PYC2_YEAST      oo  Len: 1205  Check:  5807  Weight:  1.00
     Name: PYC_MOUSE       oo  Len: 1205  Check:  6176  Weight:  1.00

    //

    PYC1_YEAST      MSQ.RKFAGL RDNFNLLGEK N......... .......... .KILVANRGE
    PYC2_YEAST      MSSSKKLAGL RDNFSLLGEK N......... .......... .KILVANRGE
    PYC_MOUSE       ...MLKFQTV RGGLRLLGVR RSSSAPVASP NVRRLEYKPI KKVMVANRGE

    PYC1_YEAST      IPIRIFRTAH ELSMQTVAIY SHEDRLSTHK QKADEAYVIG EVGQYTPVGA
    PYC2_YEAST      IPIRIFRSAH ELSMRTIAIY SHEDRLSMHR LKADEAYVIG EEGQYTPVGA
    PYC_MOUSE       IAIRVFRACT ELGIRTVAVY SEQDTGQMHR QKADEAYLIG R..GLAPVQA

    PYC1_YEAST      YLAIDEIISI AQKHQVDFIH PGYGFLSENS EFADKVVKAG ITWIGPPAEV
    PYC2_YEAST      YLAMDEIIEI AKKHKVDFIH PGYGFLSENS EFADKVVKAG ITWIGPPAEV
    PYC_MOUSE       YLHIPDIIKV AKENGVDAVH PGYGFLSERA DFAQACQDAG VRFIGPSPEV

Function   : writes multiple seqs. in msf format (takes one or more than one seq.!!)
Keywords   : write_msf_files, save_msf_files
Usage      : two argments:  $seq_hash_reference  and $output_file_name
            takes a hash which has got names keys and sequences values.
            uses Perl5 pointers(references).
Version    : 2.2

get_seqblock

Download get_seqblock .pl
Example    : @blocks_in_hash=@{&get_seqblock(\%msf, 30)};
Keywords   : find_sequence_block, get_sequence_block,
             make_seq_block, make_seqblock, find_seqblock
Options    : _  for debugging.
             #  for debugging.
             m=  for margin length of the seqblock
             t=  for threshold
             l=  for min seqlet length

Version    : 1.3

add_columns

Download add_columns .pl
Keywords   : add_seq_columns, add_sequence_columns,
Options    : _  for debugging.
             #  for debugging.
Version    : 1.2
Warning    : if the attached name is too long(over 12 char),
             it changes to 'Added_upX' while X is a numb.

get_high_score_blocks

Download get_high_score_blocks .pl
Argument   : accepts one single ref. of hash
Example    : %block_start_end=%{&get_high_score_blocks(\%input_numb_block)};
             %out=%{&get_high_score_blocks(\%inp_numbs, 'v', 'b')};
Function   : gets hash of key and number string and filters out the
              number string region which is below certain threshold
              determined inside this sub and returns a selected high
              number regions
Keywords   : high_scoring_regions
             get_high_scoring_blocks, find_blocks, get_blocks
Options    : _  for debugging.
             #  for debugging.
             b  for best_block_opt, returns best block only
             v  for showing the final range hash output
             c  for connect close blocks
             c= for connect close blocks with specific closing gap size
             m=  for margin length of the seqblock
             t=  for threshold
             l=  for min seqlet length

Usage      : get_high_score_blocks()
Version    : 1.4
Warning    : This assumes that the inputs are multiply aligned seq

delbut

Download delbut .pl
Options    : _  for debugging.
             #  for debugging.
Usage      : delbut *.zip  (delete files except xxxx.zip)
Version    : 1.2

get_msp_range

Download get_msp_range .pl
Keywords   : get_msp_file_ranges
Options    : _  for debugging.
             #  for debugging.
Usage      : @range=@{&get_msp_range($seqlet)};
             @temp=&get_msp_range($seqlet);

Version    : 1.5

get_msp_enquiry_sequence

Download get_msp_enquiry_sequence .pl
Function   : gets the name of sequence used as enquiry(target)
Keywords   : get_msp_target_sequence, get_msp_enquiry_sequence_name
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

get_msp_matched_sequence

Download get_msp_matched_sequence .pl
Function   : gets the name of sequence used as enquiry(target)
Keywords   : get_msp_matched_sequence_name
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

get_linked_sequence

Download get_linked_sequence .pl
Example    : seq1 ------------------------------
                            |||||||||||
             seq2        --------------------------------
             OUT  000000000011111111111000000000000000000

Function   : opens msp file and links the sequences according
             to the matches.
Keywords   : link_sequence_from_msp_file, linked_sequenced_length
             get_clustered_sequence_length, get_annexed_sequence_length
             connect_sequences, merge_sequences, combine_sequences
Options    : _  for debugging.
             #  for debugging.
Returns    : A ref. of an array
Version    : 1.0

get_averaged_prediction

Download get_averaged_prediction .pl
Author     : jong@salt2.med.harvard.edu sat@mrc-lmb.cam.ac.uk
Function   : The content of out %average is
               $averaged{$position}=[$residue1, $sec_str2, $dif_reliability];
Keywords   : get_average_predator_prediction, average_predator_prediction
             get_averaged_sec_prediction, get_average_prediction
Options    : 
   $reverse_order_of_one_hash=r by r
   $give_weight_with_good_match=w by w # this is to give preference to well
   $weight_factor= by w=
                                        matching sec. str. I add '0.1'
Usage      : %av_for_back_pred=%{&get_averaged_prediction(\%sec1, \%sec_rv)};
Version    : 1.2

get_average_sequence_size

Download get_average_sequence_size .pl
Keywords   : get_av_sequence_size, get_average_seq_size
             get_av_seq_size, average_seq_size, av_seq_size
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

get_linux_kernel_version

Download get_linux_kernel_version .pl
Keywords   : get_kernel_version, kernel_version,
Options    : _  for debugging.
             #  for debugging.
Version    : 1.1

load_mount_info

Download load_mount_info .pl
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

plot_vertically

Download plot_vertically .pl
Function   : This is a sub used for plot_domains.pl for
             genome_analysis
Options    : _  for debugging.
             #  for debugging.
Usage      : &plot_vertically(\@query);
Version    : 1.1

plot_histogram_horizontally

Download plot_histogram_horizontally .pl
Input      : $input= '00001111111113333333333444444444111111111111111';

Keywords   : plot_horizontally, plot_numbers_horizontally, plot,
             plot_numbers,
Options    : _  for debugging.
             #  for debugging.
Output     : 
   00001111111113333333333444444444111111111111111
   1-------------------------------------------47
  |
  |
  |                       *********
  |             *******************
  |             *******************
  |    *******************************************
  |-----------------------------------------------

Usage      : &plot_horizontally(\@query);
Version    : 1.2

condense_number_string

Download condense_number_string .pl
Example    : @output=@{&condense_number_string(\@input, $factor)};
             with @input=qw(1 2 4 10 10 22 2 3 44 2 3); and $factor=3
Function   : condenses the numbers by making an average with
             given factor. If the factor is 2 on number seq
              1334284425 , result will be 23543
              133428442  ,                23541 <-- preserved end
             Factor 3 =>
              133428442  , (1+3+3)/3 = 2
                           (4+2+8)/3 = 4,,,
Keywords   : compact_number_string, compact_digits, condense
             condense_string
Options    : _  for debugging.
             #  for debugging.
Version    : 1.1

get_seq_fragments

Download get_seq_fragments .pl
Example    : 
  %test=('seq1', '1234AAAAAAAAAAAaaaaa', 'seq2', '1234BBBBBBB');
  @range = ('1-4', '5-8');

  %out = %{&get_seq_fragments(\%test, \@range)};
  %out => (seq1_5-8   AAAAA
           seq2_5-8   BBBBB
           seq1_1-4    1234
           seq2_1-4    1234 )

Function   : gets sequence(string) segments with defined
             ranges.
Keywords   : get_sequence_fragments,
Options    : _  for debugging.
             #  for debugging.
             l=  for min seqlet length
             r  for adding ranges in the seq names

Usage      : @seq_frag=&get_seq_fragments(\%msf, @RANGE);
Version    : 1.8

make_standalone_subroutines

Download make_standalone_subroutines .pl
Author     : jong@salt2.med.harvard.edu
Class      : Utility
Example    : &make_standalone_subroutines(@ARGV);
Function   : Creates each subroutine derived xxx.pl file from B.pl or any
             given library file. If there is a file for a sub already, it
             skips.
Usage      : &make_standalone_subroutines(@ARGV);
Version    : 1.1

is_html

Download is_html .pl
Example    : $html=&is_html(\@test);
Function   : Checks if it is an html file.
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0

get_column

Download get_column .pl
Argument   : Ref of Hash, Array or just filename, and wanted column numbers.
Example    : For getting only necessary columns
             Input: %Hash=(1, 'col1 col2 col3',
                           2, 'col1 col2 col3',
                           3, 'col1 col2 col3');
             input format: &get_column(\%Hash, 3,2,1, 'k'); # k is opt
             Ouput format: STDOUT as

                1     col3 col2 col1
                2     col3 col2 col1
                3     col3 col2 col1

Function   : Prints any specified columns, can change order of them,
             can filter values of columns to filter (max or min value)
             Skipps blank line.
Keywords   : columns, column.pl, column, get_columns, take_columns,
Options    : #  for debugging.
             _  for debugging.
             k  for Key print when hash input is given.
             n  for no first line display(Handy when you have title line
                                          and wanna remove it)
             ?max?=xxx for filtering column numbers by maximum of xxx
             ?min?=yyy for filtering column numbers by minimum of yyy
                      (eg, min4=100000 means 4th column minimum is 100000)
                      (eg, 1min4=10, 2min3=10, means get 4th column values
                           below 10 as the first output column. Get 3rd
                           column values below 10 as the second out column.

$combine = 1 by -c c   # c is for combining columns in different files
$ignore  = 1 by -i i   # i is for ignoring leng diff in columns over 1 input

Returns    : Ref of
Usage      : &get_column(\@ar, 1,2 ,3);
             &get_column(\%ha, 1,2 ,3);
             &get_column(@ARGV);
             # where prompt is like: column.pl temp.txt 1 2 3 4
Version    : 1.5

set_debug_option

Download set_debug_option .pl
Example    : set_debug_option #    <-- at prompt.
Function   : If you put '#' or  '##' at the prompt of any program which uses
             this sub you will get verbose printouts for the program if the program
             has a lot of comments.
Options    : #   for 1st level of verbose printouts
             ##  for even more verbose printouts
$debug  becomes 1 by '#'  or '_'
$debug2 becomes 1 by '##'  or '__'

Returns    : $debug
Usage      : &set_debug_option;
Version    : 1.8

write_sdb_file

Download write_sdb_file .pl
Argument   : \%ref_of_seq
Example    : @out=@{&write_sdb_file(\%seq, 'v')};  ## for STDOUT as well
    ___________________________________________________________________________
    Title      : EST_YEAST.sdb
    Full Name  : Telomerase_yeast_699aa
    Nicknames  :
    EMBL       :
    PDB        :
    Swissprot  :

Function   : gets a hash ref. and writes the SDB file with 'sprintf'
Keywords   : write_sdb
Options    : v  for verbose representation. This will print boxes on STDOUT
            n  for no '#' leader.
            e  for Endline( '-----------------------------..' )
Usage      : @out=@{&write_sdb_file(\%seq)};
Version    : 1.0
Warning    : if version no. is null, it automatically puts '1.0'

push_if_not_already

Download push_if_not_already .pl
Argument   : two references. The first should be an array ref. The 2nd can be either
             scalar or array reference.
Function   : returns ref. of an array for a list of non-repetitive entry.
Keywords   : add_if_not_already, add_element_if_not_already, if_not_already
             add_element_if_not_already, push_element_if_not_already,
             if_no_already_push, put_element_if_not_already, add_new_element
             add_new_items_only, push_new_items_only, push_new_elements_only
             put_if_not_already,
Returns    : a ref. of an array.
Usage      : @out=@{&push_if_not_already(\@mother_array, \@adding_array )};
             @out=@{&push_if_not_already(\@mother_array, $adding_scalar)};
Version    : 1.3

compare_sec_template_with_db

Download compare_sec_template_with_db .pl
Keywords   : sec structure mapping, map sec str, map_sec_structure
Version    : 1.0

get_peptide_occurance

Download get_peptide_occurance .pl
Argument   : eg=> (\%ref_hash, 4)
Example    : %stat=%{&get_peptide_occurance(\%pro_sequence, $size)};
              while %pro_sequence has one or more sequences like
              seq1 AAAAAAAAAAAA, seq2 BBBBBBBBBBBBBB, ...
$size is number. For dipeptide=2, tripeptide=3, tetrapep=4...
Function   : gets the number of occurances of peptide(with given size) for
             any number of sequences given.
Version    : 1.2

open_lottery_file

Download open_lottery_file .pl
Version    : 1.1

get_probable_half

Download get_probable_half .pl
Argument   : \@array
Function   : This produces a hash ref. which is supposed to be most probable
             according to the given array. It divides array into halves
             gets the more probable half until it gets one single number.
Keywords   : get_frequent_halves,
Version    : 1.0

divide_array

Download divide_array .pl
Function   : divides any array to the denominator given.
             If you give array of 100 elem, with 5, you will
             get 5 arrays with 20 elem each.
Keywords   : split_array_into_pieces, split_array, chop_array,
             fragment_array,
Options    : s=  for dividing the array with sub array size
                 eg) to get 20 elem length sub arrays from
                     a big array
                     @ar_ref=@{÷_array(\@array, 's=20')};
Usage      : &show_array(÷_array(\@input, 6));
Version    : 1.4

split_fasta_files

Download split_fasta_files .pl
Keywords   : divide_fasta_files, split_fasta_db_files, divide_fasta_db_files
             make_single_fasta_files, write_single_fasta, write_single_fasta_files
Usage      : @names_of_single_files=@{&split_fasta_files(\@files)};
Version    : 1.0

split_sequence

Download split_sequence .pl
Example    : &show_array( ÷_string(\%input, 3) );
              while  $input is 'seq', '12345789ABCDEFHIJKLMN'
              The output will be 'seq_1_half', '1234578'
                                 'seq_2_half', '9ABCDEF'
                                 'seq_3_half', 'HIJKLMN'
Function   : divides any string to the denominator given.
Keywords   : divide_string, split_string, chop_string, divide_sequence
             split_sequence(look at separate split_sequence sub),
Options    : 
  $reverse_second_half=S by S -S
$reverse_first_half =F by F -F
$reverse_rest   =R by R -R  ## reversing all except the first
$reverse_all  =A by A -A # reverse all the fragments
Usage      : %out=%{&split_sequence(\%input, 2 )};
Version    : 1.3

divide_string

Download divide_string .pl
Example    : &show_array( ÷_string(\$input, 3) );
              while  $input is '12345789ABCDEFHIJKLMN'
              The output will be '1234578 9ABCDEF HIJKLMN'
Function   : divides any string to the denominator given.
Keywords   : divide_string, split_string, chop_string, divide_sequence
             split_sequence(look at separate split_sequence sub),
Usage      : &show_array(÷_string(\$input, 6));
Version    : 1.4

write_html_headbox

Download write_html_headbox .pl
Function   : write html format headbox explanation with
              given hashes of headbox content.
Keywords   : write_headbox_html, write headbox in html,
               write_headbox_files
Options    : 'd' for date inclusion at the top of the page
  f=   for default ftp dir name

Usage      : &write_html_headbox($outfilename, \%entries);
Version    : 1.7
Warning    : It takes off the last '/' when $URL has it


open_sdb_files

Download open_sdb_files .pl
Argument   : One or None. If you give an argu. it should be a ref. of an ARRAY
              or a filename, or ref. of a filename.
             If no arg is given, it reads SELF, ie. the program itself.
Example    : Output is something like
             ('Title', 'read_head_box', 'Tips', 'Use to parse doc', ...)
Keywords   : read_sdb_files,read_sdb,
Options    : 'b' for remove blank lines. This will remove all the entries
             with no descriptions
Returns    : A hash ref.
Usage      : %entries = %{&open_sdb_files(\$file_to_read )};
Version    : 1.1

open_stride_dat_files

Download open_stride_dat_files .pl
Author     : jong@salt2.med.harvard.edu
Usage      : @out=@{&open_stride_dat_files(@ARGV)};
Version    : 1.2

get_pdb_file_start_number

Download get_pdb_file_start_number .pl
Keywords   : start_number_of_pdb, startnumber, start number of PDB,
             get_start_number_of_pdb_file,
Version    : 1.0

write_modeller_top_file

Download write_modeller_top_file .pl
Argument   : 1 hash ref which has model name and template name -> (\%hash)
             while %hash is (modelname, tempalatename)
Example    : 
     $modelname = 'gfct';
     $template = '1ovt';
     %hash=($modelname, $template);
     &write_modeller_top_file(\%hash);
Function   : Writes Modeller command file format.
Options    : v  for verbose. You will get STDOUT of the result as well as file
Returns    : a file of xxxx.top form.
Usage      : &write_modeller_top_file(\%hash, [v]);
Version    : 1.0

write_modeller_ali_file

Download write_modeller_ali_file .pl
Argument   : 2 ref. of hash for seq. and optional output.name and option(s).
             If second input hash (for template) has 3rd and 4th element which are
             numbers they are regarded as the starting and ending number of the
             template(i.e. pdb file seq)
Example    : 
             $out = 'test.ali';
             %model =    qw(model AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAccccccccccc);
             %template = qw(templ CCAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCC 3 42);
             &write_modeller_ali_file(\%model, \%template, \$out);
Function   : Writes Modeller alignment format.
Options    : You can put 2 numbers for the second set of key and element for
             the second hash input as the starting and ending points of
             template(i.e. pdb file seq). Unless I calculate the size of seq.
             By default, it reads PDB file defined by ENV setting of 'PDB' and
             gets the starting number of pdb. If starting number is defined
             explicitly at input hash, the given starting number is used instead
             of PDB's.
             v  for verbose. You will get STDOUT of the result as well as file
Returns    : a file of xxxx.ali form.
Usage      : &write_modeller_ali_file(\%model, \%template, [\$outfilename], [v]);
Version    : 1.0

make_template_from_sec_str

Download make_template_from_sec_str .pl
Function   : makes template of sec. str. like: 'H5 E4 E2' out of '__HHHHH__EEEE__EE__'
Usage      : %target   = %{&make_template_from_sec_str(\%seq)};
Version    : 1.1

calculate_protein_volume

Download calculate_protein_volume .pl
Usage      : %volumes=%{&calculate_protein_volume(\%seq)}
Version    : 1.0

extract_words

Download extract_words .pl
Usage      : @words = @{&extract_words(\$string)};
Version    : 1.1

replace_subroutines

Download replace_subroutines .pl
Function   : replaces subroutines of given file(s) with supplied subs.
             Doesn't care version
Version    : 1.0

write_subroutines

Download write_subroutines .pl
Author     : jong@salt2.med.harvard.edu
Function   : Writes subroutine file xxxx.psub with given headbox including
              hash
Usage      : @out_file=@{&write_subroutines(\%head_box)};
Version    : 1.0

read_subroutines

Download read_subroutines .pl
Function   : retunrns ALL subroutines with the keys as subroutine names
             with version like ('show_array2.2' => 'subroutine in one string')
             It reports the subroutines not found in searched file(s)
Options    : 'nv' for no version attachment in the keys of returning hash of subroutines
             'r'  for getting remnant file content rather than the sub routines
             't'  for leaving the original file without the sub routines taken.
     $separate_hash_entry_opt=s by s
Usage      : @out_subs=@{&read_subroutines(\@file, $separate_hash_entry_opt)}; or
             %out_subs=%{&read_subroutines(\@file)};
Version    : 1.2

fetch_subroutines

Download fetch_subroutines .pl
Function   : retunrns subroutines with the keys as subroutine names with version
             like in the form( 'show_array2.2' => 'subroutine in one string')
             It reports the subroutines not found in searched file(s). This
             requires the names of sub you want while read_subroutines will
             read any subroutines with their headbox to a hash.
Options    : 'nv' for no version attachment in the keys of returning hash of subroutines
             'r'  for getting remnant file content rather than the sub routines
             't'  for leaving the original file without the sub routines taken.
             'h'  for headbox only output.
Version    : 2.5

update_subroutines

Download update_subroutines .pl
Example    : &update_subroutines($file, \%fetched_subs);
Function   : replaces subroutines of given file(s) with supplied subs.
             If the given subroutine versions are not higher than the
             ones in the program, no upgrade would happen.
             This can read version information from '# Version  : 1.0' line
              or sub xxxxx{  # Version : 1.0   line
Keywords   : upgrade_subroutines,
Usage      : &update_subroutines(\@file, \%fetched_subs);
Version    : 2.8

takeout_subroutines

Download takeout_subroutines .pl
Function   : retunrns subroutines with the keys as subroutine names with version
             like in the form( 'show_array2.2' => 'subroutine in one string')
             It reports the subroutines not found in searched file(s)
             fetch_subroutines  also has this feature.
Keywords   : take_out_subroutines, take_subroutines, cut_subroutines,
             cutout_subroutines, remove_subroutines
Options    : 'nv' for no version attachment in the keys of returning hash of subroutines
             'r'  for getting remnant file content rather than the sub routines
Version    : 1.5
Warning    : If there is no headbox and version no. It thinks the version
             is 1.0

get_subroutine_calls

Download get_subroutine_calls .pl
Function   : gets all the subroutine calls( like &show_hash ) in the given
             file name or array of lines which is the content of a file,
             text etc. If there is no input arg, it reads the running
             program as default input
Keywords   : get_sub_names,get_subroutine_names, get_sub_calls,
             get_subroutine_calls, find_sub_calls, find_subroutine_calls
Usage      : @sub_name_array= @{&get_subroutine_calls(\@AR))};
Version    : 2.2

set_special_options

Download set_special_options .pl
Argument   : Nothing in a program.
Example    : &set_special_options.pl  ##    <-- at prompt.
Function   : If you put special chars like '#' or  '##', '###..' at the
             prompt of any program which uses
             this sub you will get verbose printouts for the program if
             the program has a lot of comments.
Options    : #   for 1st level of debugging printouts
             ##  for even more debugging printouts
             +   for more outputs(more calculations are shown, like statistics)
             ++  even more outputs.(
$DEBUG    becomes 1 by '#'
$DEBUG2   becomes 1 by '##'
$VERBOSE  becomes 1 by '+'
$VERBOSE2 becomes 1 by '++'

Returns    : $debug, $verbose
Usage      : &set_special_options;
Version    : 1.0
             generalized debug var is added for more verbose printouts.

set_debug

Download set_debug .pl
Example    : set_debug #    <-- at prompt.
Function   : If you put '#' or  '##' at the prompt of any program which uses
             this sub you will get verbose printouts for the program if the program
             has a lot of comments.
Options    : #   for 1st level of verbose printouts
             ##  for even more verbose printouts
$debug  becomes 1 by '#'  or '_'
$debug2 becomes 1 by '##'  or '__'

Returns    : $debug
Usage      : &set_debug;
Version    : 1.8
             generalized debug var is added for more verbose printouts.

open_self

Download open_self .pl
Keywords   : read self, read_self, open self, open itself
Returns    : one array
Usage      : @lines =  &open_self;
Version    : 1.0

tell_seq_length

Download tell_seq_length .pl
Function   : tells the sequence sizes of given sequences
Usage      : %hash_out = %{&tell_seq_length(\%hash_in)};
Version    : 1.0

do_window_scan

Download do_window_scan .pl
Function   : This is the core part of any window (of sequences)
             scanning function.
Keywords   : scan_sequence, scan_window
Usage      : @out_array = @{&do_window_scan(\@input_array, $win_size)};
             Often, bioters(Bio Computer Scientists) need to scan a long sequences
             of DNA or Protein like(ABADFAFASDFASFASDFDFA or 109384717817947) to
             caculate something out of them.
             This routine is providing such scanning
             function in perl.
Version    : 1.3

scan_window_and_calc_something

Download scan_window_and_calc_something .pl
Function   : scans any given length window of sequence and computes something.
Options    : average for getting average of given window size.
             sum for getting sum of given window size.
Version    : 1

scan_window_and_calc_average

Download scan_window_and_calc_average .pl
Usage      : %out_hash_final = %{scan_window_and_calc_average(\@hash, \$win_size)};
Version    : 1.0

read_blast_hits

Download read_blast_hits .pl
Example    : 
      - - - - -  EXample of blastp file  - - - - - - - - - - - - - - - - - - - - - - - - -
      BLASTP 1.4.8 [19-Dec-94] [Build 16:06:14 Jul 26 1995]
      Reference:  Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
      and David J. Lipman (1990).  Basic local alignment search tool.  J. Mol. Biol.
      215:403-10.
      Query=  1mbs
      (153 letters)
      Database:  /nfs/ind4/ccpe1/people/A Biomatic /jpo/align/all_in_fasta.fas
      406 sequences; 77,134 total letters.
      Searching..................................................done
      WARNING:  -hspmax 100 was exceeded with 13 of the database sequences, with as
      many as 173 HSPs being found at one time.
      Smallest
      Sum
      High  Probability
      Sequences producing High-scoring Segment Pairs:              Score  P(N)      N
      1mbs                                                          804  2.0e-109  1
      1pmb                                                          718  1.4e-97   1
      1ymb                                                          707  4.7e-96   1
      2xxx                                                           31  0.55      1
Function   : This reads the output of blastp program(xxxx.bla or whatever file extension
             you attatched). And produces the names of found sequences which are
             above(smaller in probability) a certain threshold in the blast result.
             For example, it will produce a reference of an array (@hits, in the code)
             which contains (1mbs, 1pmb, 1ymb) from the example in this header box(down the
             lines) with the given (you give!) threshold of, say, 0.0001.
Keywords   : bla2fasta, take_blast_hits
Usage      : @array_of_names = @{&read_blast_hits(\$file_name, \$threshold)};
Version    : 1.1

put_gaps_every_x_position_in_string

Download put_gaps_every_x_position_in_string.pl
Argument   : 3 arg. One is the string, second is the interval number, third is
             the gap separater
Example    : "1234567890123456789012345678901234567890"  will be
             "1234567890 1234567890 1234567890 1234567890"
             with
                &put_gaps_every_x_position_in_string(\$test, 10, ' ')
Keywords   : put_space_in_sequence, put_gaps_in_sequence, put_gaps,
             put_space
Returns    : 
             every char.
Version    : 1.1
Warning    : it does not returns reference

transform_values

Download transform_values .pl
Argument   : hash(es) and Matrix or table for conversion.
Example    : 
             IN =>  to transform E and H to 9 and 4

             1cdg_6taa      -------EEE-----------HH--HHHH------EE---------EEE-
             1cdg_2aaa      -------EEE-----------HH--HHHH------EE---------EEE-
             2aaa_6taa      -------EEEEE------EE-HHHHHHHH----EEEE-------EEEEE-

             OUT
             1cdg_6taa      -------999-----------44--4444------99---------999-
             1cdg_2aaa      -------999-----------44--4444------99---------999-
             2aaa_6taa      -------99999------99-44444444----9999-------99999-

Function   : transform any value to another value with given table, matrix..
             This is used to transform Amino Acid to its various propensities
             If you feed a sequence 'ACDEDA', this transforms it to '
             '124741' if the table given is 'A->1, C->2, D->4, E->7'
Returns    : hash(es)
             Sheraga_alpha_matrix
             Richardson_alpha_matrix  or any conversion table made in a hash.

Usage      : Used in predict_secondary_structure
Version    : 1.0

Sheraga_alpha_matrix

Download Sheraga_alpha_matrix .pl
Function   : an alpha matrix propensity table.
Version    : 1.0

Richardson_alpha_matrix

Download Richardson_alpha_matrix .pl
Function   : an alpha matrix propensity table.
Version    : 1.0

get_segment_shift_rate

Download get_segment_shift_rate .pl
Argument   : Two references of hashes. One for error rate the other for sec.
             assignment.
Example    :  First block is for the first hash input
                             and Second is for the second hash input.

             1cdg_6taa      00000442222222222242222222222777700000007000000000
             1cdg_2aaa      00000442222222222242222222222777700000007000000000
             2aaa_6taa      00000000000000000000000000000000000000000000000000

             1cdg_6taa      -------EEE-----------EE--EEEE------EE---------EEE-
             1cdg_2aaa      -------EEE-----------EE--EEEE------EE---------EEE-
             2aaa_6taa      -------EEEEE------EE-EEEEEEEE----EEEE-------EEEEE-

             
             2aaa_6taa      -------00000---------00000000----0000-------00000-
             1cdg_6taa      -------442---------------2222-----------------000-
             1cdg_2aaa      -------222---------------2222-----------------000-

             
             2aaa_6taa      0%
             1cdg_6taa      67%
             1cdg_2aaa      67%

Function   : calculates the secondary structure segment shift rate.
Options    : 'p' or 'P' for percentage term(default)
             'r' or 'R' for ratio term (0.0 - 1.0), where 1 means all the
              segments were wrongly aligned.
             's' or 'S' for Shift rate (it actually caculates the position shift
              rate for the secondary structure segment.
             'h' or 'H' for position Shift rate (it actually caculates the position
              shift rate for helical segments). If this is the only option, it
              will show the default percentage term rate for helical segments.
              If used with 'r', it will give you ratio (0.0 - 1.0) for helical
              segment. If used with 's' option, it will give you position shift
              rate for only helical segments.
             'e' or 'E' for position Shift rate (it actually caculates the position
              shift rate for beta segments). If this is the only option, it will
              show the default percentage term rate for beta segments. If used
              with 'r', it will give you ratio (0.0 - 1.0) for beta. If used
              with 's' option, it will give you position shift rate for only
              beta segments.
Usage      : &get_segment_shift_rate(\%hash_for_errors, \%hash_for_sec_str);
Version    : 1.1

get_wrong_segment_rate

Download get_wrong_segment_rate .pl
Example    :  hash of 3 keys and values.
             2aaa_6taa      -------00000---------00000000----0000-------00000-
             1cdg_6taa      -------442---------------2222-----------------000-
             1cdg_2aaa      -------222---------------2222-----------------000-

             In the above there are two segments wrong in 3 segment blocks = 2/3
              hash of 3 percentage rates.

             2aaa_6taa      0 %
             1cdg_6taa      66.6666666666667 %
             1cdg_2aaa      66.6666666666667 %

Function   : Treats the segment as one single big error.
             calculates the wrong segment number compared to the correct ones.
Usage      : print_seq_in_block( &get_wrong_segment_rate(\%superposed_hash) );
Version    : 1.0

tidy_secondary_structure_segments

Download tidy_secondary_structure_segments .pl
Argument   : hashes and [options]. No options result in default of 'H3', 'E3'
Example    : print_seq_in_block(&tidy_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');
             

             1cdg_2aaa      -------EEE-----------EE--EEEE------EE---------EEE-
             1cdg_6taa      -------EEE-----------EE--EEEE------EE---------EEE-
             2aaa_6taa      -------EEEEE------EE-EEEEEEEE----EEEE-------EEEEE-

             

             1cdg_6taa      -------------------------EEEE---------------------
             1cdg_2aaa      -------------------------EEEE---------------------
             2aaa_6taa      -------EEEEE---------EEEEEEEE----EEEE-------EEEEE-

Function   : receives any secondary structure assignment hashes and
             tidys up them. That is removes very shoft secondary structure
             regions like( --HH--, -E-, -EE- ) according to the given minimum
             lengths(threshold) of segments by you.
Options    : something like 'H3' or 'E3' for minimum segment length set to 3 positions.
Returns    : array of references of hashes.
Usage      : print_seq_in_block(&tidy_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');

Version    : 1.0.0

define_secondary_structure_segments

Download define_secondary_structure_segments .pl
Argument   : hashes and [options]. No options result in default of 'H3', 'E3'
Example    : print_seq_in_block(&define_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');
             

             1cdg_2aaa      -------EEE-----------EE--EEEE------EE---------EEE-
             1cdg_6taa      -------EEE-----------EE--EEEE------EE---------EEE-
             2aaa_6taa      -------EEEEE------EE-EEEEEEEE----EEEE-------EEEEE-

             

             1cdg_6taa      -------------------------EEEE---------------------
             1cdg_2aaa      -------------------------EEEE---------------------
             2aaa_6taa      -------EEEEE---------EEEEEEEE----EEEE-------EEEEE-

Function   : receives any secondary structure assignment hashes and
             tidys up them. That is removes very shoft secondary structure
             regions like( --HH--, -E-, -EE- ) according to the given minimum
             lengths of segments.
Options    : something like 'H3' or 'E3' for minimum segment length set to 3 positions.
Returns    : array of references of hashes.
Usage      : print_seq_in_block(&define_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');

Version    : 1.0

overlay_seq_by_certain_chars

Download overlay_seq_by_certain_chars .pl
Argument   : 2 ref for hash of identical keys and value length.
Example    : %out =%{&overlay_seq_by_certain_chars(\%hash1, \%hash2, 'E')};
             output> with 'E' option >>> "name1     --HHH--1232-"
Function   : (name1 000000112324)+(name1  ABC..AD..EFDK ) => (name1 000..00..12324)
             (name2 000000112324)+(name2  --HHH--EEEE-- ) => (name1 ---000--1123--)
             uses the second hash a template for the first sequences. gap_char is
             '-' or '.' or any given char or symbol.
             To insert gaps rather than overlap, use insert_gaps_in_seq_hash
Keywords   : Overlap, superpose hash, overlay, superpose_seq_hash
Options    : E for replacing All 'E' occurrances in ---EEEE--HHHH----, etc.
             : H for replacing all 'H'  "     " "
Returns    : one hash ref.
Usage      : %out =%{&overlay_seq_by_certain_chars(\%hash1, \%hash2, 'HE')};
Version    : 1.0
Warning    : If gap_chr ('H',,,) is not given, it replaces all the
             non-gap chars (normal alphabet), ie,
             it becomes 'superpose_seq_hash'

rev_lines_pdb

Download rev_lines_pdb .pl
Argument   : one pdb coordinate file reference
Example    : 
             The INPUT example >

             ATOM    191  CA  ALA   195      -2.566   8.099  42.827  1.00 12.42      1ENG 256
             ATOM    192  CA  ARG   196      -1.401  11.546  41.629  1.00  8.63      1ENG 257
             ATOM    193  CA  THR   197      -4.073  13.846  43.107  1.00  9.93      1ENG 258

             The OUTPUT example >             

             ATOM      1  CA  ALA     1      -2.566   8.099  42.827  1.00 12.42      1ENG 256
             ATOM      2  CA  ARG     2      -1.401  11.546  41.629  1.00  8.63      1ENG 257
             ATOM      3  CA  THR     3      -4.073  13.846  43.107  1.00  9.93      1ENG 258

                                           <2nd file, called  xxxx2.atm >
             ATOM      1  CA  THR     1      -4.073  13.846  43.107  1.00  9.93      1ENG 258
             ATOM      2  CA  ARG     2      -1.401  11.546  41.629  1.00  8.63      1ENG 257
             ATOM      3  CA  ALA     3      -2.566   8.099  42.827  1.00 12.42      1ENG 256

Function   : reorders the lines of any pdb files, but takes only C alpha positions.
Options    : None
Returns    : directly writes two output files  xxxx1.atm  xxxx2.atm
Usage      : &rev_lines_pdb(\$ARGV[0]);
Version    : 1.0
Warning    : A Biomatic

tally_2_hashes

Download tally_2_hashes .pl
Argument   : (\%hash1, \%hash2) or optionally (\%hash1, \%hash2, ['n', 'i', 'p', 'a'])
             'n' => normalizing, 'p' => percentage out, 'i' => make int out, 'a'=> averaged
Example    : you put two hash refs. (ass. array) as args (\%hash1, \%hash2)
             The hashes are like; hash1  (name1, 0000011111, name2, 0000122222 );
                                  hash2  (name3, 1324..1341, name4, 13424444.. );

             1) The resulting 1st hash output is (0, 20,   1, 13,     2, 12)
             which means that 0 added up to 24 in the second arg hash positions
                              1 added up to 15 in the second arg hash positions
                              2 added up to 18 in the second arg hash positions
             'p' option only works with 'n' or 'a'
             2) The resulting 2nd hash output is (0, 5,   1, 5)
             which means that 0 occurred 5 times in the first input hash
                              1 occurred 5 times in the first input hash
             'p' option only works with 'n' or 'a'
Function   : Makes hashes of tallied occurances and summed up values for disits in
             positions.
             calculates the occurances or occurance rates of CS rate positions.
             The hashes should have numbers.
Keywords   : tally two hashes of numbers.
Options    : [a n i p]
Returns    : ($ref1, $ref2), ie, two references of hash
             averaging option causes division of 20(added up value)
                                                by 9(occurance) in the above
             for '0' of the first hash, so (0, 2.222,  1, 2.1666,  2, 2.4 )
             Average is the average of numbers
             average value in 0-9 scale (or 0-100 with 'p' option)
             So, if there are
                  seq1 00111110000,   The 'a' value of 0 and 1 as in the seq2
                  seq2 33000040000    is 0-> 6/6, 1-> 4/5, while the 'n'
                                        calc would be, 0-> 6 (60%), 1-> 4(40%)

Usage      : ($ref1, $ref2) = &tally_2_hashes(\%hash1, \%hash2, ['n', 'a', 'p', 'i']);
              %tally_addedup=%{$ref1};    '0' position had addedup value of 1000
              %tally_occurances=%{$ref2}; '0' position had occurred 100 times,
                                          '0' on average had 10 in its
                                              corresponding hash positions
Version    : 1.2

superpose_seq_hash

Download superpose_seq_hash .pl
Argument   : 2 refs. for hash of identical keys and value length and gap_chr.
Function   : (name1 000000112324)+(name1  ABC..AD..EFD ) => (name1 000..01..324)
             uses the second hash a template for the first sequences. gap_char is
             '-' or '.'
             To insert gaps rather than overlap, use insert_gaps_in_seq_hash
Keywords   : overlay sequence, overlay alphabet, superpose sequence,
Returns    : one hash ref.
Usage      : %out =%{&superpose_seq_hash(\%hash1, \%hash2)};
Version    : 1.0
Warning    : Accepts only two HASHes and many possible gap_chr. Default gap is '-'

overlay_seq_hash

Download overlay_seq_hash .pl
Argument   : 2 refs. for hash of identical keys and value length and gap_chr.
Function   : (name1 000000112324)+(name1  ABC..AD..EFD ) => (name1 000..01..324)
             uses the second hash a template for the first sequences. gap_char is
             '-' or '.'
             To insert gaps rather than overlap, use insert_gaps_in_seq_hash
Returns    : one hash ref.
Usage      : %out =%{&overlay_seq_hash(\%hash1, \%hash2)};
Version    : 1.0
Warning    : Accepts only two HASHes and many possible gap_chr. Default gap is '-'

insert_gaps_in_seq_hash

Download insert_gaps_in_seq_hash .pl
Argument   : 2 ref for hash of identical keys and value length.
Function   : superpose two hashes of the same sequence or same seq. length sequences,
             but unlike 'superpose_seq_hash', this inserts gaps and extend the
             sequences.
             (name1_sec  hHHHHHH EEEEEEE) +
             (name1_seq  .CDEABC..AD..EFD..EKST) => (name1_ext  .hHHHHH..H...EEE..EEEE)
             In the example, the undefined sec. str. position is replaced as gaps('.')
             Uses the second hash a template for the first sequences. gap_char is
             '-' or '.'
             One rule is that the SECOND hash contains gaps!!
             There are two types of hash input. One is simple seq hash(both args)
              The other is from secondary structure prediction. The hash has contents
              like: $averaged{$position}=[$residue1, $sec_str2, $dif_reliability];
Keywords   : superposing sequences with gaps, interpolate_sequences, interpolate_gaps
Returns    : one hash ref.
Usage      : %out_extended_seq =%{&insert_gaps_in_seq_hash(\%hash1, \%hash2)};
Version    : 1.3
Warning    : coded by A Biomatic

scan_win_get_average

Download scan_win_get_average .pl
Example    : input hash: ( seq1,  '13241234141234234',      (2 or more sequences accepted)
                           seq2,  '1341324123413241234')
             input winsize : 5;

             output hash; (seq1, 1234123413241234);
             output hash; (seq2, 1344234123412341);
                  The numbers are ratios(compos/seqid) with given window size.
Usage      : %out1 = %{&scan_win_get_av(\%input, \$window_size, \%input2,,,,)};
             The order of the arguments doesn't matter.
Version    : 1.0

scan_win_and_get_sc_rate_pairs

Download scan_win_and_get_sc_rate_pairs .pl
Argument   : One ref. for hash, one ref. for a scalar.
Example    : input hash: ( seq1,  'ABCDEFG.HIK',      (2 or more sequences accepted)
                         seq2,  'DFD..ASDFAFS',
                         seq3,  'DDDDD..ASDFAFS' );
             input winsize : 5;

             output hash; (seq1seq2, 1,2,2,2,1,1,2,2); <-- joined by ',';
             output hash; (seq1seq3, 1,2,2,2,1,1,2,2); <-- joined by ',';
                  The numbers are ratios(compos/seqid) with given window size.
Function   : scans input sequences(arg1) in a given(arg2) window size and gets
             each composition and sequence identity rate(sc_rate) of the window.
             sc rate = Sequence Id(%)/ Composition Id(%)
Returns    : a reference of a hash.
Usage      : %out1 = %{&scan_win_and_get_sc_rate_pairs(\%input, \$window_size)};
Version    : 1.1
Warning    : when $seqid is zero  the rate becomes $compos_id/10   !!!

get_windows_sc_rate_array

Download get_windows_sc_rate_array .pl
Argument   : (\@input, \$window_size);  @input => ('ABCDEFG.HIK', 'DFD..ASDFAFS', 'ASDFASDFASAS');
             Input ar => ( 'ABCDEFG
                'DFD..ASDFAFS'
                'ASDFASDFASAS' )  as the name of  @sequences.
Author     : A Biomatic
Function   : actual working part of scan_windows_and_get_compos_seqid_rate
Returns    : \@ratio_array, \$ratio_whole_seq
Usage      : @out_rate = @{&get_windows_compos_and_seqid_rate_array(\@seq, \$win_size)};
Version    : 1.0

scan_win_and_get_cs_rate_pairs

Download scan_win_and_get_cs_rate_pairs .pl
Argument   : One ref. for hash, one ref. for a scalar.
Example    : input hash: ( seq1,  'ABCDEFG.HIK',      (2 or more sequences accepted)
                               seq2,  'DFD..ASDFAFS',
                               seq3,  'DDDDD..ASDFAFS' );
                 input winsize : 5;

                 output hash; (seq1seq2, 1,2,2,2,1,1,2,2); <-- joined by ',';
                 output hash; (seq1seq3, 1,2,2,2,1,1,2,2); <-- joined by ',';
             The numbers are ratios(compos/seqid) with given window size.

Function   : scans input sequences(arg1) in a given(arg2) window size and gets
             each composition and sequence identity rate(cs_rate) of the window.
             CS rate = Composition Id / Sequence Id
Returns    : a reference of a hash.
             It is getting the entropy of the column and calculates something after.
Usage      : %out1 = %{&scan_win_and_get_cs_rate_pairs(\%input, \$window_size)};
Version    : 1.0
Warning    : when $seqid is zero  the rate becomes $compos_id/10   !!!

get_residue_error_rate

Download get_residue_error_rate .pl
Argument   : Takes a ref. for hash which have positions of residues of sequences.
Function   : This is the final step in error rate getting.
             gets a ref. of a hash and calculates the absolute position diffs.
Options    : 'L' for limitting the error rate to 9 to make one digit output
$LIMIT becomes 'L' by L, l, -l, -L
Returns    : one ref. for an array of differences of input arrays. array context.
             ---Example input (a hash with sequences); The values are differences after
                                comparion with structural and sequential alignments.
             %diffs =('seq1', '117742433441...000',   <-- input (can be speparated by '' or ','.
                      'seq2', '12222...99999.8888',
                      'seq3', '66222...44444.8822',
                      'seq4', '12262...00666.772.');
             example output;
             seq3_seq4       '0,1,0,0,0,.,.,.,,.,0,,0,0,,0,0,,.,0,,0,0,.'
             seq1_seq2       '0,1,0,1,1,.,.,.,,.,2,,2,2,,2,2,,.,.,,2,2,1'
             seq1_seq3       '0,1,0,1,1,.,.,.,,.,1,,1,1,,0,.,,.,.,,1,1,1'
             seq1_seq4       '0,1,0,,1,1,.,.,.,,.,1,,1,1,0,.,.,,.,1,,2,2'
             seq2_seq3       '0,1,0,,0,0,,.,.,,.,0,,1,0,,0,0,,.,0,,0,0,0'
             seq2_seq4       '0,0,0,,1,0,,.,.,,.,0,,1,0,,0,0,,.,0,,0,0,.'
Usage      : %position_diffs =%{&get_residue_error_rate(\@seq_position1, \@seq_position2)};
Version    : 1.1
Warning    : split and join char is ',';

get_each_posi_diff_hash

Download get_each_posi_diff_hash .pl
Argument   : Takes a ref. for hash which have positions of residues of sequences.
Function   : This is the final step in error rate getting.
             gets a ref. of a hash and calculates the position diffs.
Options    : 'L' for limitting the error rate to 9 to make one digit output
$LIMIT becomes 'L' by L, l, -l, -L

Returns    : one ref. for an array of differences of input arrays. array context.
             ---Example input (a hash with sequences); The values are differences after
                                comparion with structural and sequential alignments.
             %diffs =('seq1', '117742433441...000',   <-- input (can be speparated by '' or ','.
                      'seq2', '12222...99999.8888',
                      'seq3', '66222...44444.8822',
                      'seq4', '12262...00666.772.');
             example output;
             seq3_seq4       '0,1,0,0,0,.,.,.,,.,0,,0,0,,0,0,,.,0,,0,0,.'
             seq1_seq2       '0,1,0,1,1,.,.,.,,.,2,,2,2,,2,2,,.,.,,2,2,1'
             seq1_seq3       '0,1,0,1,1,.,.,.,,.,1,,1,1,,0,.,,.,.,,1,1,1'
             seq1_seq4       '0,1,0,,1,1,.,.,.,,.,1,,1,1,0,.,.,,.,1,,2,2'
             seq2_seq3       '0,1,0,,0,0,,.,.,,.,0,,1,0,,0,0,,.,0,,0,0,0'
             seq2_seq4       '0,0,0,,1,0,,.,.,,.,0,,1,0,,0,0,,.,0,,0,0,.'
Usage      : %position_diffs =%{&get_each_posi_diff_hash(\@seq_position1, \@seq_position2)};
Version    : 1.0
Warning    : split and join char is ',';

get_posi_rates_hash_out

Download get_posi_rates_hash_out .pl
Argument   : %{&get_posi_rates_hash_out(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment.
             Output >>
             seq1_seq2  1110...222...2222
             seq2_seq3  1111....10...1111
             seq1_seq3  1111....0000.0000

Returns    : \%final_posi_diffs;
Usage      : %rate_hash = %{&get_posi_shift_hash(\%hash_msf, \%hash_jp)};
Version    : 1.0
Warning    : split and join char is ','; (space)

get_posi_rates_hash_out_compact

Download get_posi_rates_hash_out_compact .pl
Argument   : %{&get_posi_rates_hash_out(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment.
             Output >>  something like below but, without gaps, so final one is;
             seq1_seq2  1110...222...2222     seq1_seq2  11102222222
             seq2_seq3  1111....10...1111  -> seq2_seq3  1111101111
             seq1_seq3  1111....0000.0000     seq1_seq3  111100000000

Returns    : \%final_posi_diffs_compact;  Compare with  'get_posi_rates_hash_out_jp'
Usage      : %rate_hash = %{&get_posi_shift_hash(\%hash_msf, \%hash_jp)};
Version    : 1.0
Warning    : split and join char is ','; (space)

get_posi_rates_hash_out_jp

Download get_posi_rates_hash_out_jp .pl
Argument   : %{&get_posi_rates_hash_out_jp(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment. get_posi_rates_hash_out_jp
             results in jp template sequence, while get_posi_rates_hash_out_msf does
             in msf template sequence.
             Output >>
             seq1_seq2  1110...222...2222   <--- the alignment template is JPO's
             seq2_seq3  1111....10...1111        (ie structural)
             seq1_seq3  1111....0000.0000

Returns    : \%final_posi_diffs;
Usage      : %rate_hash = %{&get_posi_shift_hash(\%hash_msf, \%hash_jp)};
Version    : 1.0
Warning    : split and join char is ','; (space)

get_posi_rates_hash_out

Download get_posi_rates_hash_out .pl
Argument   : %{&get_posi_rates_hash_out(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
             Output >>
             seq1_seq2  1110...222...2222
             seq2_seq3  1111....10...1111
             seq1_seq3  1111....0000.0000

Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment.
Returns    : \%final_posi_diffs;
Usage      : %rate_hash = %{&get_posi_shift_hash(\%hash_msf, \%hash_jp)};
Version    : 1.0
Warning    : split and join char is ','; (space)

normalize_numbers

Download normalize_numbers .pl
Argument   : (\%hash1, %hash2, \%hash3, ....)
Example    : intputhash>                   Outputhash>
             ( '1-2', '12,.,1,2,3,4',     ( '1-2',   '9,.,0,1,2,3',
              '2-3', '12,.,1,5,3,4',       '2-3',   '9,.,0,4,2,3',
              '4-3', '12,3,1,2,3,4',       '3-1',   '9,3,.,.,2,3',
              '3-1', '12,4,.,.,3,4' );     '4-3',   '9,2,0,1,2,3' );
Function   : with given numbers in hashes, it makes a scale of 0-9 and puts
             all the elements in the scale. Also returns the average of the numbs.
Returns    : (\%norm_hash1, \%norm_hash2, \%norm_hash3,.... )

Usage      : %output=%{&normalize_numbers(\%hash1)};
             originally made to normalize the result of get_posi_rates_hash_out
             in   'scan_compos_and_seqid.pl'
Version    : 1.0

scan_windows_and_get_compos_seqid_rate

Download scan_windows_and_get_compos_seqid_rate .pl
Argument   : One ref. for hash, one ref. for a scalar.
Example    : input hash: ( seq1,  'ABCDEFG.HIK',    (2 or more sequences accepted)
                           seq2,  'DFD..ASDFAFS',
                           seq3,  'DDDDD..ASDFAFS' );
             input winsize : 5;

             output hash; (seq1seq2, 1,2,2,2,1,1,2,2); <-- joined by ',';
                  The numbers are ratios(compos/seqid) with given
                  window size.
Function   : scans input sequences(arg1) in a given(arg2) window size and gets
             each composition and sequence identity rate of the window.
Returns    : a reference of a hash.
Usage      : %out1 =%{&scan_windows_and_get_compos_seqid_rate(\%input, \$window_size)};
Warning    : when $seqid is zero  the rate becomes $compos_id/10   !!!

get_windows_cs_rate_array

Download get_windows_cs_rate_array .pl
Argument   : (\@input, \$window_size);  @input => ('ABCDEFG.HIK', 'DFD..ASDFAFS', 'ASDFASDFASAS');
             Input ar => ( 'ABCDEFG
                'DFD..ASDFAFS'
                'ASDFASDFASAS' )  as the name of  @sequences.
Function   : actual working part of scan_windows_and_get_compos_seqid_rate
Returns    : \@ratio_array, \$ratio_whole_seq
Usage      : @out_rate = @{&get_windows_cs_rate_array(\@seq, \$win_size)};
Version    : 1.0

read_any_seq_files

Download read_any_seq_files .pl
Argument   : one of more ref. for scalar.
Example    : (*out1,  *out2) =&read_any_seq_files(\$input1, \$input2);
             : (@out_ref_array)=@{&read_any_seq_files(\$input1, \$input2)};
             : (%one_hash_out) =%{&read_any_seq_files(\$input1)};
Function   : Tries to find given input regardless it is full pathname, with or
             without extension. If not in pwd, it searches the dirs exhaustively.
Keywords   : open_any_seq_files,
Returns    : 1 ref. for a HASH of sequence ONLY if there was one hash input
             1 array (not REF.) of references for multiple hashes.
Usage      : %out_seq=%{&read_any_seq_files(\$input_file_name)};
Version    : 1.1

seq_to_regexp

Download seq_to_regexp .pl
Function   : given an array and a start and end length,
              return an array of regular expressions, where each element of the original
              array has been expanded to a set of regular expressions that match the
              original exactly num times, for num between the start and end length

Returns    : a ref. of an array for
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall

strip_rotated_seq

Download strip_rotated_seq .pl
Function   : remove all but one string of each set of rotations
             (reverse of rotated_seq )
Returns    : a ref. for
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

rotate_seq

Download rotate_seq .pl
Function   : given a string, return all the rotations of that string
             e.g. given 'abcd', return ('abcd','bcda','cdab','dabc')
Returns    : a ref. for reverse complement
Usage      : @out_array=@{&rotate_seq($string)};
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall   ##### RevCom

convert_to_anti_sense

Download convert_to_anti_sense .pl
Returns    : a ref. for reverse complement
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall   ##### RevCom

convert_rna_to_protein

Download convert_rna_to_protein .pl
Argument   : a scalar for RNA sequence data
Function   : translate RNA seq to protein seq.
Keywords   : rna2protein, rna_2_protein, RNA2protein, translate_rna
             dna2protein, convert_RNA_to_protein, RNA_2_PROTEIN, RNA_2_protein
Returns    : a ref. of an array for protein translation
Version    : 1.1
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

convert_dna_to_protein

Download convert_dna_to_protein .pl
Argument   : a scalar for DNA sequence data
Function   : translate DNA or RNA seq to protein seq.
Keywords   : dna2protein, dna_2_protein, DNA2protein, translate_dna
             dna2protein, convert_DNA_to_protein, translate_nucleic_acid
             rna2protein, rna_2_protein, RNA2protein, translate_rna
             dna2protein, convert_RNA_to_protein
Returns    : a ref. of an array for protein translation
Version    : 1.2
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

write_staden_file

Download write_staden_file .pl
Returns    : a ref. of an array for  STADEN formatted sequence record
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

write_primer_file

Download write_primer_file .pl
Returns    : a ref. of an array for PRIMER formatted sequence record
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

write_gcg_genbank_file

Download write_gcg_genbank_file .pl
Returns    : a ref. of an array for GCG-Genbank formatted sequence record
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

write_pir_file

Download write_pir_file .pl
Returns    : a ref. of an array for PIR formatted sequence record
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             from Tisdall

write_genbank_file

Download write_genbank_file .pl
Argument   : two scalars.
Function   : (This is DNA seq handling routine!)
Returns    : a ref. of an array for Genbank formatted sequence record
Usage      : @out =  @{&write_genbank_file($sequ, $header)};
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall
             stolen from Tisdall

write_gcg_file

Download write_gcg_file .pl
Returns    : a ref. of an array for GCG formatted sequence record
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall

write_fasta_array

Download write_fasta_array .pl
Argument   : \%input
Example    : @out = (
             $out[0] =>     ">name",
             $out[1] =>     "ABCDEABCDEBCDEABCDEABCDEABCDEABCDEBCDEABCDE",
             $out[2] =>     "TTTTTTTTDEBCDEABCDEABCDEABCDEABCDEBCDEABCDE",
             $out[3] =>     "ABCDEABCDEBCDEABCDEABCDEABCDEABCDEBCDEABCDE",
                 );

Function   : take Single sequence and produce single output array of fasta
Returns    : ref. for an array of FASTA formatted sequence record

Usage      : @output = @{&put_fasta($sequence, $name)};
Version    : 1.0
Warning    : Copyright (C) 1993-1994 by James Tisdall

find_seq_files

Download find_seq_files .pl
Argument   : (\$input_file_name) while $input_file_name can be  'xxx.xxx', or '/xxx/xxx/xxx/xxy.yyy'
             or just directory name like 'aat' for  /nfs/ind4/ccpe1/people/A Biomatic /jpo/align/aat
             then, it tries to find a file with stored seq file extensions like msf, jp, pir etc
             to make aat.msf, aat.jp, aat.pir ... and searches for these files.
Example    : $found_file=${&find_seq_files(\$input_file_name)};
Function   : (similar to find.pl) used in 'read_any_seq_file.pl'
             seeks given test file in pwd, specified dir, default path etc.
             If not found yet, it looks at all the subdirectories of path and pwd.
             PATH environment dirs, then returns full path file name.
Keywords   : find_anyj_seq_files, find any seq files, find seq files
Returns    : return( \$final );
Usage      : $found_file = ${&find_seq_files(\$input_file_name)};
Version    : 1.0

search_files_in_subdir

Download search_files_in_subdir .pl
Argument   : gets a ref. of a scaler (dir name) and returns nothing(void).
Function   : open dir and process all files in the dir if you wish,
             and then go in any other sub
             if any file(dir) is linked, it skips that file.
Usage      : 
                     $inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
Version    : 1.0
Warning    : the final var $found_from_search_files_in_subdir mustn't be 'my'ed.

find_seq_file_old

Download find_seq_file_old .pl
Argument   : one ref. for SCALAR
Function   : seeks text file in pwd. If not found it looks at
             PATH environment dirs
Returns    : one ref. for SCALAR of a full path filename.
Usage      : $found_file=${&find_seq_file_old(\$input_file_name)};
Version    : 1.0
Warning    : << This is READABLE old version of  find_seq_file

open_sst_files_with_gap

Download open_sst_files_with_gap .pl
Argument   : a ref. for scaler of "jp file name"
Example    : jp file  ==  seq1 ABDSF--DSFSDFS   <- true sequence
                              seq2 T--kdf-GAGGGASF     (aligned)

                 sst files ==> 'seq1.sst', 'seq2.sst' (in the same dir)

             original sst format:  seq1 hHHHHHttEEEE  <-- No gaps!
                                  seq2 hHHHHHHEEhh
             After this sub ==>
             (final out hash =   (  seq1 hHHHH--HttEEEE  <-- inserted
                                  seq2 h--HHH-HHHEEEhh  )     gaps

Function   : gets the name of a file(jp file) with its absolute dir path
             reads the sequence names in the jp file and looks up all
             the sst files in the same directory. Puts sst sequences
             in a hash with keys of sequence names.

Returns    : a ref. for a hash
Usage      : %out_sst_hash =%{&open_sst_files_with_gap(\$jp_file_dir_and_name)};
Version    : 1.0
Warning    : $jp_file_dir_and_name should be absolute dir and file name
             >> This gets JP file not SST file as input !!!!

put_gaps_in_hash

Download put_gaps_in_hash .pl
Argument   : 2 hash references.
Returns    : one hash reference.
Usage      : %out=%{&put_gaps_in_hash(\%hash_with_gap, \%hash_sans_gap)};

             %hash1=('1ctx',  '111111111111111',      <-- hash input without gaps
                     '2ctx',  '2222222222222222',
                     '3ctx',  '3333333333');

             %hash2=('1ctx',  'AAA--AAAAAAAAAAAA-',   <-- hash input with template gaps
                     '2ctx',  'BBBBBBBBBBBB-BBBB',
                     '3ctx',  'CCCCCC----CCCC');

             >> resulting out hash;

             %hash3=('1ctx',     '111--111111111111-',
                     '2ctx',     '222222222222-2222',
                     '3ctx',     '333333----3333 );

Version    : 1.0
Warning    : The keys for hashes should be the same and the two sequences
             should be identical.

get_gap_positions

Download get_gap_positions .pl
Argument   : 1 ref. of array eg)=( ABCDE--EF--GH ) while '-' is for gap.
Example    : for a string '--iu--sdf-j--', it will output  -2 -1 2 3 7 9 10
Function   : gets gap positions of seq. and stores in an array
Keywords   : get_gap_positions_in_seq, get_seq_gap_positions get_gap_positions_in_array
Options    : p for all positive gaps numbering. No negatives for '---STRING--'

Returns    : 1 ref. of array eg)=(2,3,7,8,10,100,122);
Usage      : @gap_pos=@{&get_gap_positions(\@string1)}; <- ('A','C','D','E')
             @gap_pos=@{&get_gap_positions(\$string1)}; <- ( ACDE )
Version    : 1.4
Warning    : uses References.

make_pairs_from_hash

Download make_pairs_from_hash .pl
Argument   : one ref. of hash
Example    : @output=($ref1, $ref2, ....$refn)
             each $ref is the reference of a hash of a pair of sequence
             >>  %pair1 = %{$ref1}; %pair2 = %{$ref2}; %pair3 = %{$ref3};

             %pair1 is like;       %pair1 is like;       %pair3 is like;

             seq1  ABCDEFAD     seq1  ABCDEFAD        seq2  SDFSFSDF
             seq2  SDFSFSDF     seq3  SDFSFSDF        seq3  SDFSFSDF

Function   : returns all the possible pairs of a set of sequences in
             an array of references;

Returns    : one ref. of array for references for hashes.
Usage      : @output =@{&make_pairs_from_hash(\%input_sequence_hash);
             Input example
             %input =  seq1  ABCDEFAD
              seq2  SDFSFSDF
              seq3  SDFSFSDF

Version    : 1.0

mail_it

Download mail_it .pl
Version    : 1.0

read_fssp_files

Download read_fssp_files .pl
Function   : read hssp file and put sequences in a hash
Usage      : %anyarray = %{&read_fssp_files(\$any_sequence_file_fssp_form)};
Version    : 1.0

get_posi_shift_rms_whole

Download get_posi_shift_rms_whole .pl
Argument   : takes 2 refs. of scalars for dir name (protein group name)
             and threshold for rms
Example    : (0.284994272623139   0.166781214203895)
             The first figure is for error rate with out rms consideration
             The second is for after applying threshold.
Returns    : two refs. of scalar values (rates)
Usage      : just type   get_posi_shift_rms_whole.pl
Version    : 1.0

write_jp

Download write_jp .pl
Function   : gets a ref(s) for hash and prints the content in lines of 60 char
Returns    : Nothing, i.e. STDOUT
Usage      : &write_jp(\%input_hash1,\%input_hash2, \%input_hash3.... );
Version    : 1.0
Warning    : derived from  print_in_block

convert_num_to_0_or_1_hash

Download convert_num_to_0_or_1_hash .pl
Argument   : two references, one for hash one for scaler for threshold

Example    : A hash =>  name1  10012924729874924792742749748374297
                        name2  10012924729874924792710012924729874
             A threshold => 4
             !! if numbers are smaller than 4, they become 1 (or true).
             Outputhash  =>  name1  11111011011111011111011011110101111
                        name2  11111011010001011001011010010101100

             ($ref1, $ref2)=&convert_num_to_0_or_1_hash(\%hash, \%hash, \$threshold);
             above is the example when with more than 2 input hashes.
Function   : changes all the numbers into 0 or 1 according to threshold given.
             convert_num_0_or_1_hash converts threshold and bigger nums. to
             '0' while convert_num_0_or_1_hash_opposite converts to '1'.
Usage      : with a variable for threshold ->

             %out = %{&convert_num_to_0_or_1_hash(\%input_hash, \$threshold, \%input_hash2..)};

Version    : 1.0
Warning    : Threshold value is set to 0 as well as all values smaller than that.

convert_num_0_or_1_hash_opposite

Download convert_num_0_or_1_hash_opposite .pl
Argument   : two references, one for hash one for scaler for threshold

Example    : A hash =>  name1  10012924729874924792742749748374297
                        name2  10012924729874924792710012924729874
             A threshold => 4
             !! if numbers are smaller than 4, they become 1 (or true).
             Outputhash  =>  name1  11111011011111011111011011110101111
                        name2  11111011010001011001011010010101100

             ($ref1, $ref2)=&convert_num_to_0_or_1_hash(\%hash, \%hash, \$threshold);
             above is the example when with more than 2 input hashes.
Function   : changes all the numbers into 0 or 1 according to threshold given.
             convert_num_0_or_1_hash converts threshold and bigger nums. to
             '0' while convert_num_0_or_1_hash_opposite converts to '1'.
Usage      : with a variable for threshold ->

               %out = %{&convert_num_0_or_1_hash_opposite(\%input_hash, \$threshold)};

Version    : 1.0
Warning    : Threshold value is set to 0 as well as all values smaller than that.

convert_char_to_0_or_1_hash

Download convert_char_to_0_or_1_hash .pl
Argument   : one reference of HASH.

Example    : A hash =>  name1  ABCDSSFDSF..ASDFSD.....ADFASDF...AA
                        name2  ASDFSD.....ADFBCDSSFDSF..ASASDF...A

             Outputhash  => name1  00000000001100000011111000000011100
                            name2  00000011111000000000000110000001110

Function   : changes all the chars into 1, gaps are to 0
Keywords   : convert_char, translate_char, convert_char_to_digit,
             convert_char_to_number
Returns    : A ref. of a hash
Usage      : with a variable for threshold ->

               %out = %{&convert_char_0_or_1_hash(\%input_hash)};

Version    : 1.2

digitize_char

Download digitize_char .pl
Argument   : one reference of HASH.

Example    : A hash =>  name1  ABCDSSFDSF..ASDFSD.....ADFASDF...AA
                        name2  ASDFSD.....ADFBCDSSFDSF..ASASDF...A

             Outputhash  => name1  00000000001100000011111000000011100
                            name2  00000011111000000000000110000001110

Function   : changes all the chars into 1, gaps are to 0
Keywords   : convert_char, translate_char, convert_char_to_digit,
             convert_char_to_number, digitize_sequence, digitize_char
             digitize_hash
Returns    : A ref. of a hash
Usage      : with a variable for threshold ->

               %out = %{&digitize_char(\%input_hash)};

Version    : 1.1

get_posi_diff_and_rms_hash

Download get_posi_diff_and_rms_hash .pl
Argument   : Takes two ref. for hash
Function   : gets two ref. of hashes and calculates the position diffs.
Returns    : one ref. for an array of differences of input arrays. array context.
             ---Example input (a hash with numbers); The values are differences after comparion
                                            with structural and sequential alignments.
             %diffs =('seq1', '112342431111
             'seq2', '12222...09011.1122',
             'seq3', '13222...00011.1122',
             'seq4', '12262...00011.112.');

             %rms_corrected_0_or_1 => seq1_seq2  0111011111011101011110100101101010011
                           seq1_seq3  01111.....111110111111111111100001011
             example output;
             seq3_seq4       01040...00000.000.
             seq1_seq2       01012...1810...122
             seq1_seq3       02012...1110...122
             seq1_seq4       01032...1110...12.
             seq2_seq3       01000...09000.0000
             seq2_seq4       00040...09000.000.

Usage      : %position_diffs =\{&get_posi_diff_hash(\%diffs, \%rms_corrected)};
Version    : 1.0
Warning    : split and join char is ",";

get_posi_shift_rms_hash

Download get_posi_shift_rms_hash .pl
Argument   : takes 4 hash REFERENCES for (one seq. and one struc. alignment(2nd arg)
Returns    : two refs. for scalar values of shift rate of positions for proteins.
              frirst scalar is rate without correcting rms deviation
              second scalar is rate with    correcting rms deviation
             >> example of xx

             1cdg            APDTSVSNKQ NFSTDVIYQI FTDRFSDGNP ANNPTGAAFD GTC.TNLRLY
             2aaa            ......LSAA SWRTQSIYFL LTDRFGR... ....TDNSTT ATCNTGNEIY

             >> example of xx

             2aaa       ------lsaasWrtqSIYFLLTDRFGrtdns-------ttatCntgneiy
             1cdg       apdtsvsnkqnFSTDVIYQIFTDRFsdgnpannptgaafdgtCtn-lrly

             >> example of xx

             1cdg         APDTSVSNKQ NFSTDVIYQI FTDRFSDGNP ANNPTGAAFD GTCTN-LRLY
             2aaa         ------LSAA SWRTQSIYFL LTDRFGRTDN S-------TT ATCNTGNEIY
             1cdg_2aaa    ------7774 2221210000 0000000148 9-------99 41114-4000
             1cdg_6taa    ------8674 2232220000 0000011059 9-------99 52114-3000

Usage      : ($rate1_ref,$rate2_ref) =${&get_posi_shift_rms_hash(\%msf_hash, \%jp_hash,
                                                                 \%rms_file_hash, \$threshold)};
Version    : 1.0

open_rms_files

Download open_rms_files .pl
Argument   : takes one ref. for a file.
Function   : open rms files and put sequences in a hash
             Example of rms (aa
             1cdg         APDTSVSNKQ NFSTDVIYQI FTDRFSDGNP ANNPTGAAFD GTCTN-LRLY
             2aaa         ------LSAA SWRTQSIYFL LTDRFGRTDN S-------TT ATCNTGNEIY
             6taa         ------ATPA DWRSQSIYFL LTDRFARTDG S-------TT ATCNTADQKY
             1cdg_2aaa    ------7774 2221210000 0000000148 9-------99 41114-4000
             1cdg_6taa    ------8674 2232220000 0000011059 9-------99 52114-3000
             2aaa_6taa    ------1000 0000000000 0000000010 0-------00 0000000000

             Example output hash;
             1nor        LECHNQQSSQPPTTKTCS-GETNCYKKWWSDH----RGTIIERGFFC--GCPKVK-PGVNLNCCRT-DRCNN-------
             1cdg        APDTSVSNKQNFSTDVASISGLVTSLP-QGSYNDVLGGLLNGNTLSVGSGGAASNFTLAAGGTAVWQYTAATATPTIGH
             1cdg_2aaa   ------777002112111-----343333---431127----5433234-72354541131211111176899999999

Returns    : a ref. of a hash
Usage      : %anyarray = {&open_rms_files(\$any_sequence_file_msf_form)};
Version    : 1.0
Warning    : xxx.rms files are Tim Hubbard's 'msarms' program's output.

open_rms_files2

Download open_rms_files2 .pl
Argument   : takes one ref. for a file.
Function   : same as open rms files but returns two hashes.
             Example of rms (aa
             1cdg         APDTSVSNKQ NFSTDVIYQI FTDRFSDGNP ANNPTGAAFD GTCTN-LRLY
             2aaa         ------LSAA SWRTQSIYFL LTDRFGRTDN S-------TT ATCNTGNEIY
             1cdg_2aaa    ------7774 2221210000 0000000148 9-------99 41114-4000
             1cdg_6taa    ------8674 2232220000 0000011059 9-------99 52114-3000

             Example output 2 hashes;
             1nor        LECHNQQSSQPPTTKTCS-GETNCYKKWWSDH----RGTIIERGFFC--GCPKVK-PGVNLNCCRT-DRCNN-------
             1cdg        APDTSVSNKQNFSTDVASISGLVTSLP-QGSYNDVLGGLLNGNTLSVGSGGAASNFTLAAGGTAVWQYTAATATPTIGH

             1cdg_2aaa   ------777002112111-----343333---431127----5433234-72354541131211111176899999999
             1cdg_2taa   ------777002112111-----343333---431127----5433234-72354541131211111176899999999

Returns    : return(@out); while @out is (\%hash_rms, \%hash_jp)
Usage      : ($hash_for_jp, $hash_for_rms) = &open_rms_files(\$any_sequence_file_msf_form);
Version    : 1.0
Warning    : xxx.rms files are Tim Hubbard's 'msarms' program's output.

steve_permute_array

Download steve_permute_array .pl
Argument   : upto 3 arg. 1st one is for the ref. of an array. 2nd for min
             element no. 3rd for max element no. 2nd and 3rd are optional.
Returns    : a ref. of a hash.
Usage      : %final_out_hash=%{&steve_permute_array(\@list, \2, \4)};
                         Above is for pairs, 3 seqs, and 4 seqs.
Version    : 1.0

opendir_and_go_in_and_do_something

Download opendir_and_go_in_and_do_something .pl
Argument   : gets a ref. of a scaler (dir name) and returns nothing(void).
Example    : as in my 'indexing.pl' for perl file indexer.
Function   : open dir and process all files in the dir if you wish,
             and then go in any other sub
             if any file(dir) is linked, it skips that file.
Keywords   : open_dir_and_go_in_and_do_something,
             go in there do something, get into subdir and do something.
             go_in_subdir_and_do_something, recursive execution
Usage      : &opendir_and_go_in_and_do_something(\$input_dir);
                     $inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
Version    : 1.1
Warning    : Seems to work fine., !! Change the name of this sub to shorter one
                                  !! for your own purpose.

open_subdir_and_go_in_and_do

Download open_subdir_and_go_in_and_do .pl
Argument   : gets a ref. of a scaler (dir name) and returns nothing(void).
Example    : as in my 'indexing.pl' for perl file indexer.
Function   : open dir and process all files in the dir if you wish,
             and then go in any other sub
             if any file(dir) is linked, it skips that file.
Usage      : &opendir_and_go_in_and_do_something(\$input_dir);
                     $inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
Version    : 1.0
Warning    : Seems to work fine., !! Change the name of this sub to shorter one
                                  !! for your own purpose.

get_occurances_of_shift_type_hash

Download get_occurances_of_shift_type_hash .pl
Argument   : Two references of hashes.
Returns    : one reference  of hash. (eg, 0=>1000, 1=>888, 2=>83, ...
                                          0,1,2... are position shift types
                                          1000, 888, 83... are occurances in
                                          the comparision between str. and seq.
                                          alignments.)
Usage      : for single protein group
Version    : 1.0

get_occurances_of_shift_type_hash_all

Download get_occurances_of_shift_type_hash_all .pl
Version    : 1.0

get_occurances_of_char

Download get_occurances_of_char .pl
Argument   : one ref. of hash (seq1 alsdfjlsj
                               seq2 asldfjsld
                               seq3 owiurouou);
Function   : gets the numbers of occurances for 1, 2, 3 ... position shifts.
             If hash is given, it only looks at the values.
             If multiple string, array, hash or combinations of these
              are given, it will add up to one single result
Keywords   : composition of chars, composition table making,
             make_composition, make composition table
             occurances_of_char, get_char_occurances, occurances
             get_percentage_occurances_of_char, percentage_occurances_of_char
Options    : 'p' for percentage output of the char among others
             'n' for NO name option when HASH input is given
Returns    : one ref. of hash  (a =>5, b=>6, c=>4,,,,,)
Usage      : %occurances_shft_type=%{&get_occurances_of_char(\%final_posi_diffs)};
             %char_occur=%{&get_occurances_of_char(\@ref_array_of_chars)};
             %char_occur=%{&get_occurances_of_char(\$ref_string_of_chars)};
             %char_occur=%{&get_occurances_of_char($string_of_chars)};

Version    : 1.3

make_composition_table

Download make_composition_table .pl
Argument   : one ref. of hash (seq1 alsdfjlsj
                               seq2 asldfjsld
                               seq3 owiurouou);
Function   : gets the numbers of occurances for 1, 2, 3 ... position shifts.
Keywords   : composition of chars, composition table making, make composition table
             make_composition_table, get_composition, get_amino_acid_composition
             protein_composition, make_aa_composition_tablem, aa_composition
Returns    : one ref. of hash  (a =>5, b=>6, c=>4,,,,,)
Usage      : %occurances=%{&make_compos_table(\%key_and_value_for_seq)};
Version    : 1.2

make_composition_ratio_table_simple

Download make_composition_ratio_table_simple .pl
Argument   : one ref. of hash (seq1 alsdfjlsj
                               seq2 asldfjsld
                               seq3 owiurouou);
Function   : gets ratio of the numbers of occurances for any chars.
Keywords   : composition table, composition of chars, composition table making,
             make composition table, make_composition_table
Returns    : one ref. of hash  (a =>0.05, b=>0.06, c=>0.04,,,,,)
Usage      : %occurances=%{&make_compos_ratio_table(\%final_posi_diffs)};
Version    : 1.0
Warning    : This pools all the sequences, to not distinct seq composition if
              you put more than one seq.

make_composition_ratio_table

Download make_composition_ratio_table .pl
Argument   : one or more ref. of hash (seq1 alsdfjlsj
                                       seq2 asldfjsld
                                       seq3 owiurouou);
Function   : gets ratio of the numbers of occurances for any chars.
Keywords   : composition table, composition of chars, composition table making,
             make composition table, make_composition_table
             aa_composition_ratio, composition_ratio, protein_composition,
             get_composition_ratio, get_aa_composition_ratio
Returns    : one ref. of hash  ('seq_name', { a =>0.05, b=>0.06, c=>0.04,,,,, } )
Usage      : %rate=%{&make_compos_ratio_table(\%hash1, \%hash2, ,,,)};
Version    : 1.3
Warning    : This produces each composition ration table for each seq

get_position_shift_rate

Download get_position_shift_rate .pl
Argument   : %{&get_position_shift_rate(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
Example    : my(%error_rate)=%{&get_position_shift_rate(\%input, \%input2)};
Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment. Takes two file names of seq.
             Output >>
             seq1_seq2  1110...222...2222
             seq2_seq3  1111....10...1111
             seq1_seq3  1111....0000.0000

Options    : 'ss' for secondary structure regions(Helix and Beta region only
                 calculation for error rate). There is specialized sub called
              get_segment_shift_rate for sec. str. only handling.

    $ss_opt            becomes    ss by  ss, SS, -ss, -SS     #  for secondary structure only
    $H                 =         'H' by   -H or -h or H       # to retrieve only H segment
    $S                 becomes   'S' by   -S or  S            # to retrieve only S segment
    $E                 becomes   'E' by   -E or  E            # to retrieve only E segment
    $T                 becomes   'T' by   -T or -t or T or t  # to retrieve only T segment
    $I                 becomes   'I' by   -I or  I            # to retrieve only I segment
    $G                 becomes   'G' by   -G or -g or G or g  # to retrieve only G segment
    $B                 becomes   'B' by   -B or -b or B or b  # to retrieve only B segment
    $HELP              becomes    1  by   -help   # for showing help
    $simplify          becomes    1  by   -p or P or -P, p
    $simplify          becomes    1  by   -simplify or simplify, Simplify SIMPLIFY
    $comm_col          becomes   'C' by   -C or C or common
    $LIMIT             becomes    L  by   -L, L               # to limit the error rate to 9 .

Returns    : \%final_posi_diffs;
Usage      : %rate_hash = %{&get_position_shift_rate(\%hash_msf, \%hash_jp)};
Version    : 1.5
Warning    : split and join char is ','; (space)

get_posi_rates_hash_out

Download get_posi_rates_hash_out .pl
Argument   : %{&get_posi_rates_hash_out(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment.
             Output >>
             seq1_seq2  1110...222...2222
             seq2_seq3  1111....10...1111
             seq1_seq3  1111....0000.0000

Returns    : \%final_posi_diffs;
Usage      : %rate_hash = %{&get_posi_shift_hash(\%hash_msf, \%hash_jp)};
Version    : 1.0
Warning    : split and join char is ','; (space)

get_posi_diff_hash

Download get_posi_diff_hash .pl
Argument   : Takes a ref. for hash which have positions of residues of sequences.
Function   : gets a ref. of a hash and calculates the position diffs.
Returns    : one ref. for an array of differences of input arrays. array context.
             ---Example input (a hash with sequences); The values are differences after comparion
                                            with structural and sequential alignments.
             %diffs =('seq1', '112342431111
             'seq2', '12222...09011.1122',
             'seq3', '13222...00011.1122',
             'seq4', '12262...00011.112.');
             example output;
             seq3_seq4       01040...00000.000.
             seq1_seq2       01012...1810...122
             seq1_seq3       02012...1110...122
             seq1_seq4       01032...1110...12.
             seq2_seq3       01000...09000.0000
             seq2_seq4       00040...09000.000.
Usage      : %position_diffs =\{&get_posi_diff_hash(\@seq_position1, \@seq_position2)};
Version    : 1.0
Warning    : split and join char is ',';    # used in 'get_posi_shift_hash'

get_posi_shift_hash

Download get_posi_shift_hash .pl
Argument   : takes two hash REFERENCES for (one seq. and one struc. alignment(2nd arg)
Returns    : One scalar value of shift rate of position for proteins.
Usage      : $rate_final = ${&get_posi_shift_hash(\%hash_msf, \%hash_jp)};
Version    : 1.1
Warning    : split and join char is ','; (space)

print_seq_in_block_with_print

Download print_seq_in_block_with_print .pl
Function   : gets a ref(s) for hash and prints the content in lines of 60 char
Returns    : Nothing, STDOUT
Usage      : &print_seq_in_block (\%input_hash1,\%input_hash2, \%input_hash3.... );
Version    : 1.0
Warning    : derived from  print_in_block

fill_ending_space

Download fill_ending_space .pl
Argument   : (\%input1, \%input2, \%input3.....);
Function   : fills the ending gaps or space of sequences (shorter ones)
Returns    : (\%hash1,..... )
Usage      : (*out, *out2, *out3)=&fill_ending_space(\%input1, \%input2, \%input3);
             &print_seq_in_block(\%out,\%out2,\%out3); <-- if you want printout.
Version    : 1.0

print_seq_in_block_old

Download print_seq_in_block_old .pl
Argument   : one or more refs. for hash
               if there are more than one array input it makes such outputs

             Name1    THIS.IS.from.hash.one
             Name2    This

             Name1    THIS
             Name2    This.is.from.hash.two

Function   : gets a ref(s) for hash (single key and value)
             and prints the content in lines of 60 char
Returns    : Nothing, STDOUT
Usage      : &print_seq_in_block_old (\%input_hash1,\%input_hash2, \%input_hash3.... );
Version    : 1.0
Warning    : This is more or less for debugging. Use  print_seq_in_block

print_in_block

Download print_in_block .pl
Argument   : one or more refs. for array
               if there are more than one array input it makes such outputs
             Example out)
               THIS.IS.from.array.one
             This.is.from.array.two

              THIS.IS.from.array.one
               This.is.from.array.two

Function   : gets a ref(s) for array and prints the content in lines of 60 char
Returns    : Nothing, STDOUT
Usage      : &print_in_block (\@input_array,\@input_array2, \@input_array3.... );
Version    : 1.0
Warning    : This is more or less for debugging. Use  print_seq_in_block

get_posi_diff

Download get_posi_diff .pl
Argument   : Takes two ref. for arrays which have positions of residues.
Example    : @compacted_posi_dif =(1 ,2, 1, 1, '.' ,2,  1,  1, '.');
             @compacted_posi_dif2=(4 ,2, 1, 1, ,2,  1, '.' ,3,  1);
             output ==> ( 3 0 0 0 . 1 . 2 .)   (it ignores positions which have non digits.
             output ==> (-3 0 0 0 . 1 .-2 .) when abs is not used.
Returns    : one ref. for an @array of differences of input arrays. array context.
Usage      : @position_diffs =&get_posi_diff(\@seq_position1,\@seq_position2);
Version    : 1.4

get_posi_diff_abs

Download get_posi_diff_abs .pl
Argument   : Takes two ref. for arrays which have positions of residues.
Example    : @compacted_posi_dif =(1 ,2, 1, 1, '.' ,2,  1,  1, '.');
             @compacted_posi_dif2=(4 ,2, 1, 1, ,2,  1, '.' ,3,  1);
             output ==> ( 3 0 0 0 . 1 . 2 .)   (it ignores positions which have non digits.
             output ==> (-3 0 0 0 . 1 .-2 .) when abs is not used.
Returns    : one ref. for an @array of differences of input arrays. array context.
Usage      : @position_diffs =&get_posi_diff_abs(\@seq_position1,\@seq_position2);
Version    : 1.0

put_position_back_to_str_seq

Download put_position_back_to_str_seq .pl
Argument   : takes two refs for arrays (one for char the other for digits
Example    : @string_from_struct=('X', 'T', 'A' ,'B' , '.' ,'F',  'G', '.' , 'O' ,'P', '.');
             @compacted_posi_dif=(1 ,2, 1, 1, ,2, 1, 1, 1);
Returns    : a ref. for an array
Usage      : @result =@{&put_position_back_to_str_seq(\@string_from_struct, \@compacted_posi_dif)};
Version    : 1.0

get_posi_shift_hash_rms

Download get_posi_shift_hash_rms .pl
Function   : caculates the error rate of seq after filtering according to
                rms deviation.
Usage      : $result=${&get_posi_shift_hash_rm(\%h1, \%h2, \%h3)};
Version    : 1.0
Warning    : Not complete yet.

open_fil_file

Download open_fil_file .pl
Function   : reads xxx.fil file which shows whether I have to discard
             regions of sequences due to too big RMS deviation.
Returns    : a ref. for a hash(associative array).
Usage      : %out = %{&open_fil_file(\$input_seq_file)};
Version    : 1.0
Warning    : !!! not yet complete !!!

send_mail

Download send_mail .pl
Example    : 
             send_mail ( $to, $subject, @lines );
             #-# i -- $to      = email address
             #-# i -- $subject = string to be put in the Subject: line
             #-# i -- @lines   = lines to be mailed - must not have \n
             -- DISCUSSION:

             Uses /usr/lib/sendmail to mail a bunch of lines to the email address
             specified. The @lines should not have terminating \n characters: they
             will be supplied.

             -- EXAMPLE:
             &P10::mail ( 'schip@lmsc.lockheed.com', 'Test 34', @mylines );
             -- END
             : Could some one share their knowledge of how to mail a message from
             :  within a Perl script with a novice Perl user?
Function   : mail a bunch of @lines to a user
Version    : 1.0

rand_word

Download rand_word .pl
Function   : This sub routine should return an alphabet string of
             length specified by  an argument.
Keywords   : randomize words, makes random words, scramble_word,
              shuffle_words,
Usage      : $word = ${&rand_word(7)};
             print "sub rand_word gives $word\n";
Version    : 1.0

opendir_and_go_rand_fasta_and_clustal

Download opendir_and_go_rand_fasta_and_clustal .pl
Example    : $inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
             &opendir_and_go($inputdir);
Function   : open dir and process all files if you wish, and then go in any sub
             dir of it. Using recursion. created by A Biomatic
             if any file is linked, it skips that file.
Usage      : &opendir_and_go_rand_fasta_and_clustal(\$input_dir); #$inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
Version    : 1.0
Warning    : Seems to work fine.

opendir_and_go_rand_fasta

Download opendir_and_go_rand_fasta .pl
Example    : $inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
             &opendir_and_go($inputdir);
Function   : open dir and process all files if you wish, and then go in any sub
             dir of it. Using recursion. created by A Biomatic
             if any file is linked, it skips that file.
Usage      : &opendir_and_go_rand_fasta(\$input_dir); #$inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
Version    : 1.0

reverse_sequences

Download reverse_sequences .pl
Argument   : hash, eg(1, 'skdfj', 2, 'kdfjkdj', 3, 'kdfjk');
             Input example:
             ..
             >HI0256
             FLSANVLPIAPIINGGRTAVDNITQSVSDKPFVKDIGTKIKEAIALSKYSTQPQYISTTN
             >HI0094
             DILRTFVKMETGLKFPKKFKLKANLALFMNRRNKRPDTIMTAVADAGQKISEAKLNTTAK
             ..

             Output example: (Reversed :-)
             ..
             >HI0256_rv   <<-- note the added extension
             ALDJFLKAJFJALSDJFLAJSLFJAKLSDFJLASJDFLAJSLDFJASJDFLJSDFJSDLJ
             >HI0094_rv
             LASJDFLKAJFJALSDJFLKSDJLFAJLKDJFLASJDFLKDFJKDJFKDJFKDJFKJDLJ
             ..

Function   : gets ref. of strings, reverses the elems.
Keywords   : reverse_sequence, reverse_sequence_hash, rev_sequence_hash
Returns    : one or more hash references.
Usage      : %out = %{&rev_sequence_one_hash(\%input_seq_hash, \%hash2,...)};
Version    : 1.4

rev_sequence_mul_array

Download rev_sequence_mul_array .pl
Function   : gets a ref. of an string, reverses the elems.
Returns    : one ref. of  mul_array, eg. ('jfkdj', 'kdfjsdj', 'jjjkk')
Usage      : @out = @{&rev_sequence_mul_array(\@input_mul_seq_array)};
Version    : 1.0
Warning    : This reverses sequences!

scramble_sequences

Download scramble_sequences .pl
Argument   : ref. of hash, eg(1, 'skdfj', 2, 'kdfjkdj', 3, 'kdfjk');

             Input example:
             ..
             >HI0256
             FLSANVLPIAPIINGGRTAVDNITQSVSDKPFVKDIGTKIKEAIALSKYSTQPQYISTTN
             >HI0094
             DILRTFVKMETGLKFPKKFKLKANLALFMNRRNKRPDTIMTAVADAGQKISEAKLNTTAK
             ..

             Output example: (scrambled :-)
             ..
             >HI0256_sc   <<-- note the added extension
             ALDJFLKAJFJALSDJFLAJSLFJAKLSDFJLASJDFLAJSLDFJASJDFLJSDFJSDLJ
             >HI0094_sc
             LASJDFLKAJFJALSDJFLKSDJLFAJLKDJFLASJDFLKDFJKDJFKDJFKDJFKJDLJ
             ..
Function   : gets ref. of strings, scambles the elems.
Keywords   : make_scrambled_seq, make_scrambled_sequence, scramble_seq_hash,
              scramble_sequences, shuffle_sequences, shuffle_seq
Returns    : (\%hashout), or (\%hash1, \%hash2,,,,,)
Usage      : %out = %{&scramble_seq_hash(\%input_seq_hash)};
Version    : 1.5

scramble_array

Download scramble_array .pl
Function   : shuffles the elements of array
Keywords   : randomise_array, randomize_array, shuffle_array
Usage      : @in=@{&scramble_array(\@in)};
Version    : 1.4

rand_sequence_mul_array

Download rand_sequence_mul_array .pl
Argument   : one ref. of mul_array, eg. ('lsjdfj', 'kdfjsdj', 'jjjkk')
Function   : gets a ref. of an string, scambles the elem.
Keywords   : scramble_sequence_mul_array, shuffle_sequence_mul_array
Returns    : one ref. of  mul_array, eg. ('jfkdj', 'kdfjsdj', 'jjjkk')
Usage      : @out = @{&rand_sequence_mul_array(\@input_mul_seq_array)};
Version    : 1.1
Warning    : This scrambles sequences!!

rand_sequence_one_string

Download rand_sequence_one_string .pl
Argument   : one ref. of string, eg ( 'ldkfjlsdjfsdjflj' )
Function   : gets a ref. of a  string, scambles the elem.
Returns    : one ref. of string,
Usage      : @out = @{&rand_sequence_one_string(\$input_seq_string)};
Version    : 1.0
Warning    : This scrambles sequences!!

rand_sequence_one_array

Download rand_sequence_one_array .pl
Argument   : one ref. of array, eg ('e', 'b', 'c', 'd')
Function   : gets a ref. of an array, scambles the elem.
Returns    : one ref. of array,
Usage      : @out = @{&rand_sequence_one_array(\@input_seq_array)};
Version    : 1.0
Warning    : This scrambles sequences!!

make_random_sequence

Download make_random_sequence .pl
Argument   : 1 200 [-p] [@array_of_array_refs]
             1 = num of seq, 200=leng of seq, -p =option, @arr.. = option
             You can optionally give amino acid matrices
Example    : $out=${&make_random_sequence(@ARGV)};  While @ARGV can be '1 200 -p'
Function   : gets one or more numbers for seq length and makes random sequences
             It can handle proportional random sequenes according to the
             amino acid occurance matrix.
Keywords   : scramble_sequence, make_scrambled_sequence, shuffle_sequence
             random_sequence, make_random_sequence, generate_random_protein_seq
             create_random_sequene create_random_aa_sequence
Options    : 'p' for proportional random sequence option
             'f' for fastsa format output (returns one ref. of HASH)
Returns    : one or more scalar references according to the input numbers.
Usage      : $protein = ${&make_random_sequence(1, 400)};
Version    : 1.4

rand_DNA_seq_generate

Download rand_DNA_seq_generate .pl
Argument   : (343) or (\$length)
Function   : gets one or more numbers for seq length and makes random sequences
Returns    : one or more scalar references according to the input numbers.
Usage      : $DNA = ${&rand_DNA_seq_generate(400)};
Version    : 1.0

rand_RNA_seq_generate

Download rand_RNA_seq_generate .pl
Argument   : (343) or (\$length)
Function   : gets one or more numbers for seq length and makes random sequences
Returns    : one or more scalar references according to the input numbers.
Usage      : $DNA = ${&rand_RNA_seq_generate(400)};
Version    : 1.0

replace_text

Download replace_text .pl
Argument   : reference of one array of file names in pwd
Function   : finds patterns of text and replaces them in multiple input files
Keywords   : replace_txt, change_text,
Returns    : nothing
Usage      : &replace_text(\@input_array_of_filenames);
Version    : 1.4
Warning    : This produces a temporary file and rename it...

get_av_seq_length

Download get_av_seq_length .pl
Argument   : one hash reference for sequences.
Function   : gets hash of sequence, compares lengths, and outs av.
Returns    : one ref. for scaler digit.
Usage      : $std_devi_of_lengths = &get_av_seq_length(\%hash_ref);
Version    : 1.0
Warning    : uses a sub  &array_average(\@lengths);

get_sd_of_length_diff

Download get_sd_of_length_diff .pl
Argument   : gets one hash reference,
Returns    : one scaler digit
Usage      : $result = &get_sd_of_length_diff(\%input);
Version    : 1.0
Warning    : removes all non-char(.-, space....) in the input string

get_av_and_sd_seq_length

Download get_av_and_sd_seq_length .pl
Argument   : Two hash references for sequences.
Function   : gets ref of hash of sequence, compares lengths, and outs av.
Returns    : Two scaler digit.
Usage      : $get_av_and_sd_seq_length= &get_av_seq_length(\%hash_ref);
Version    : 1.0
Warning    : uses a sub  &array_average(\@lengths);

get_seq_hash_sans_gaps

Download get_seq_hash_sans_gaps.pl
Usage      : ($ref_out1, $ref_out2)=&get_seq_hash_sans_gaps(\%hash, \%hash);
              %out=%{&get_seq_hash_sans_gaps(\%hash)};
Version    : 1.0

get_posi_sans_gaps

Download get_posi_sans_gaps .pl
Argument   : one scalar variable input of sequence string.
Returns    : the positions of residues after removing gaps(but keeps pos).
               used for analysis of shifted positions of seq. comparison.
Usage      : @seq_position1 = &get_posi_sans_gaps($string1);
Version    : 1.0

get_posi_shift_rate

Download get_posi_shift_rate .pl
Argument   : takes two file names for seq. and struc. alignment.
             : Assumes the files are in the pwd.
Returns    : one ref. for scalar value of shift rate of position for proteins.
Usage      : $rate_final = &get_posi_shift_rate("perl.msf", "perl.jp");
Version    : 1.0
Warning    : sub  hash_common was unstable.

read_hssp_no_inserts

Download read_hssp_no_inserts .pl
Function   : read hssp file and put sequences in a hash
Usage      : %anyarray = &read_hssp_no_inserts ($any_sequence_file_hssp_form);
Version    : 1.0
Warning    : It produces incomplete sequences when hssp seqs. have insertions.

open_pdbg_files

Download open_pdbg_files .pl
Example    : %out = %{&open_pdbg_files(@ARGV)};
             while @ARGV at prompt was: 'pdb_40.pdbg'
Function   : open pdb group files and put scopclass in a hash.
             PDB group file format is like this;

  >d1bia_1 1.4.3.1.1 (1-63) Biotin repressor, N-terminal domain [Escherichia coli]
  >d1baba_ 1.1.1.1.15 Hemoglobin, alpha-chain [human (Homo sapiens)]
  >d1cpcb_ 1.1.1.2.1 C-phycocyanin [cyanobacterium (Fremyella diplosiphon)]
  >d1fcdc2 1.3.1.3.1 (81-174) Flavocytochrome c sulfide dehydrogenase, FCSD, cytochrome subunit [Purple phototrophic bacterium (Cromatium vinosum)]

             This can also return the sizes of sequences rather than seqs.
Keywords   : open_pdbg_files, open_pdb_group_files
Options    : any digit for the minimum seq length
        b  for simple style reading (this reads in the name of pdbg file as it is)

Usage      : %seq=%{&open_pdbg_files($tim_seq_file, ['1fcdc1'], [s] )};
             if you put additional seq name as 1fcdc1 it will
             fetch that scopclass only in the database file.
             Any digit will be used as minimum seq size to be fetched.
Version    : 1.5

open_cel_files

Download open_cel_files .pl
Example    : 
  Example INPUT file:
  #  Tabl delimited data file
   X       Y     MeanIn  STDEV  Pixel num

   0       0       200.0   15      16      1.24343
   1       0       200.0   15      16      1.24343
   2       0       200.0   15      16      1.24343
   3       0       200.0   15      16      1.24343

Returns    : 3D array =( [X][Y][0], [X][Y][1], [X][Y][2], [X][Y][3])
Version    : 1.1

open_stride_dat_files

Download open_stride_dat_files .pl
Author     : jong@salt2.med.harvard.edu
Usage      : @out=@{&open_stride_dat_files(@ARGV)};
Version    : 1.1

open_fasta_files

Download open_fasta_files .pl
Example    : %out = %{&open_fasta_files(@ARGV)};
             %out2=%{&open_fasta_files('seq.fa', \%index)};
             %out3=%{&open_fasta_files('seq.fa', \%range)};
             %seq=%{&open_fasta_files($PDB40_FASTA, \@seq_to_fetch)};

             while @ARGV at prompt was: 'GMJ.pep MJ0084'

Function   : open fasta files and put sequences in a hash
              If hash(es) is put which has sequence names and seek position
              of the index file, it searches the input FASTA file to
              fetch at that seek position. This is useful for Big fasta DBs
             If the seq name has ranges like  XXXXXX_1-30, it will only
              return 1-30 of XXXXXX sequence.

             FASTA sequence file format is like this;

             > 1st-seq
             ABCDEFGHIJKLMOPABCDEFGHIJKLMOPABCDEFGHIJKLMOPABCDEFG
             > 2nd.sequ
             ABCDEFGHIJKLMOYYUIUUIUIYIKLMOPABCDEFGHIJKLMOPABCDEFG
             >owl|P04439|1A03_HUMAN HLA CLASS I HISTOCOMPATIBILITY ANTIGEN, A-3 ALPHA CHAIN PRECURSOR....
             MARGDQAVMAPRTLLLLLSGALALTQTWAGSHSMRYFFTSVSRPGRGEPRFIAVGYVDDT

             This can also return the sizes of sequences rather than seqs.

             This ignores any dup entrynames coming later.

Keywords   : open_fasta, open_fa_files, open_FASTA_files,
Options    : Seq name to fetch the specified seq only.
             as open_fasta_files.pl MY_SEQ_NAME Swissprot.fasta
            -d  for giving back desc as well as the name. so it
                gives  'HI0002 This is the description part'
                as the key
             If you put hash which is like('seq_name', ['20-30', '30-44',..])
              it will produce hash which has got:
              ( seq_name_20-30 'asdfasdfasdfasdfasd',
                seq_name_30-44 'kljkljkjkjljkjljkll',
                ....           .... )
            -s for returning sequence size only
   $reverse_seq=r by r ## to reverse seq.
Usage      : %fasta_seq=%{&open_fasta_files($fasta_file, ['MJ0084'])};
             if you put additional seq name as MJ0084 it will
             fetch that sequence only in the database file.

             %out=%{&open_fasta_files(@ARGV, \%index)};
               while  %index has (seq indexpos seq2 indexpos2,,,)
               In this case, the fasta file should have xxxx.fa format

Version    : 4.1

msf_permute_hash_write

Download msf_permute_hash_write .pl
Function   : gets 2 references (one for %hash the other for group $name)
             uses &msf_permute_array_write(\%hash, \$group_name)
             the second arg is for output file name. can be anything.
Usage      : &msf_permute_hash_write(\%hash, $group_name); # void
Version    : 1.0

msf_permute_array_write

Download msf_permute_array_write .pl
Argument   : gets 2 references
Function   : 
             the second arg is for output file name. can be anything.
             used in &msf_permu_hash_write
Usage      : &msf_permu_array_write(\%hash, \$group_name); # void
Version    : 1.0

pir_permute_hash_write

Download pir_permute_hash_write .pl
Function   : gets a reference of hash which has names and sequences as keys and values.
             uses &pir_permute_array_write
             the second arg is for output file name. can be anything.
Usage      : &pir_permute_hash_write($hash_ref, $group_name); # void
Version    : 1.0

fasta_permute_hash_write

Download fasta_permute_hash_write .pl
Function   : gets a reference of a hash which has names and sequences as keys and values.
             uses &fasta_permute_array_write
             the second arg is for output file name. can be anything.
Usage      : &fasta_permute_hash_write($hash_ref, $group_name); # void
Version    : 1.0

fasta_permute_array_write

Download fasta_permute_array_write .pl
Function   : gets a reference of an array which has names and sequences as keys and values.
             the second arg is for output file name. can be anything.
             used in &fasta_permu_hash_write
Usage      : &fasta_permu_array_write($hash_ref, $group_name); # void
Version    : 1.0

ssp_permute_hash_write

Download ssp_permute_hash_write .pl
Function   : gets a reference of hash which has names and sequences as keys and values.
             uses &ssp_permute_array_write
             the second arg is for output file name. can be anything.
Usage      : &ssp_permute_hash_write($hash_ref, $group_name); # void
Version    : 1.0

pir_permute_array_write

Download pir_permute_array_write .pl
Function   : gets a reference of hash which has names and sequences as keys and values.
             the second arg is for output file name. can be anything.
             used in &pir_permu_hash_write
Usage      : &pir_permu_array_write($hash_ref, $group_name); # void
Version    : 1.0

ssp_permute_array_write

Download ssp_permute_array_write .pl
Function   : gets a reference of hash which has names and sequences as keys and values.
             the second arg is for output file name. can be anything.
             used in &ssp_permu_hash_write
             ssp file is for PHD secondary structure prediction service.
Usage      : &ssp_permu_array_write($hash_ref, $group_name); # void
Version    : 1.0

permute

Download permute .pl
Function   : gets permutated array elements except single char elements.
             fastest
Usage      : &permute(\@array);
Version    : 1.0
Warning    : from : Kenneth Albanowski  CIS: 70705,126)

permute_binary

Download permute_binary .pl
Example    : &permute_binary(@array);
Function   : outs permutated array elements
Version    : 1.0
Warning    : from : silly@ugcs.caltech.edu

ssp_write

Download ssp_write .pl
Example    : &ssp_write($hash_pointer, $out_file_name);
Function   : writes multiple seqs. in fasta format (takes one or more than one seq.!!)
             ssp is PHD server format.
Usage      : two argments:  $seq_hash_reference  and $output_file_name
             takes a hash which has got names keys and sequences values.
             uses Perl5 pointers(references).
Version    : 1.0

pir_write

Download pir_write .pl
Example    : &pir_write($hash_pointer, $out_file_name);
Function   : writes multiple seqs. in fasta format (takes one or more than one seq.!!)
             pir is PHD server format.
Usage      : two argments:  $seq_hash_reference  and $output_file_name
             takes a hash which has got names keys and sequences values.
             uses Perl5 pointers(references).
Version    : 1.0

pir_write

Download pir_write .pl
Example    : &pir_write($hash_pointer, $out_file_name);
Function   : writes multiple seqs. in fasta format (takes one or more than one seq.!!)
Usage      : two argments:  $seq_hash_reference  and $output_file_name
             takes a hash which has got names keys and sequences values.
             uses Perl5 pointers(references).
Version    : 1.0

make_singlet_list_from_pdb_entries

Download make_singlet_list_from_pdb_entries .pl
Example    : 
: &make_singlet_list_from_pdb_entries(\@files);
 Input>
   >d2sn3__ 7.3.6.1.1 scorpion toxin [Centruroides sculpturatus ewing, variant 3]
   KEGYLVKKSDGCKYGCLKLGENEGCDTECKAKNQGGSYGYCYAFACWCEGLPESTPTYPL

 OUTPUT>
   >d2cmd_1 3.18.1.5.2 (1-145) Malate dehydrogenase [Escherichia coli]
   >d2naca2 3.18.1.4.1 (148-335) Formate dehydrogenase [Pseudomonas sp. 101]

Function   : 
: gets the classificaiton of scop in pdb40d.fa like file and
             produces pdb40d.pdbs file.

             1.1.1.1.4  means: Class.Fold.Superfamily.Family.Protein

             Compare with make_groups_from_pdb_entries
Keywords   : 
: make_singlet_list_from_pdb40d, make_singlet_list_from_scop, make_superfamilies
             write_pdbs_files, make_pdbs_files, make_pdb_group_files,
             write_pdbs, make_singlet_list_from_pdb_entries
Options    : _  for debugging.
             #  for debugging.
: 
Usage      : 
: &write_pdbs_files(\@files);
Version    : 1.0
: 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

write_pdbg_files

Download write_pdbg_files .pl
Example    : 
 Input>
   >d2sn3__ 7.3.6.1.1 scorpion toxin [Centruroides sculpturatus ewing, variant 3]
   KEGYLVKKSDGCKYGCLKLGENEGCDTECKAKNQGGSYGYCYAFACWCEGLPESTPTYPL

 OUTPUT>
   >d2cmd_1 3.18.1.5.2 (1-145) Malate dehydrogenase [Escherichia coli]
   >d2naca2 3.18.1.4.1 (148-335) Formate dehydrogenase [Pseudomonas sp. 101]

Function   : gets the classificaiton of scop in pdb40d.fa like file and
             produces pdb40d.pdbg file.

             1.1.1.1.4  means: Class.Fold.Superfamily.Family.Protein

Keywords   : make_groups_from_pdb40d, make_groups_from_scop, make_superfamilies
             write_pdbg_files, make_pdbg_files, make_pdb_group_files,
             write_pdbg, make_groups_from_pdb_entries
Usage      : &write_pdbg_files(\@files);
Version    : 1.1

write_msp3_files

Download write_msp3_files .pl
Example    : &write_msp3_files(\@files);  # while @files has G*.pdbg
Function   : opens two files. Gx.msp_1 and Gx.msp_2 to create Gx.msp3 file
              you can set the msp3 file extension by e= option,
              for example, e=interm will make  G1.interm instead of G1.msp3

Keywords   : make_msp3_files, create_msp3_files
Options    : 
    $upper_expect_limit2= by u2=  # u2 is for msp_2 files (eg, 0.0006)
    $upper_expect_limit1= by u1=  # u1 is for msp_1 files (eg, 0.081 )
    $lower_expect_limit1= by l1=
    $lower_expect_limit2= by l2=
    R for NOT adding ranges in seq names.
    e= for  extension name
    n  for  no sort by columns in output
    e  for  sorting columns by E values (first first and then second)
    E  for  sorting columns by E values but reverse order

Returns    : returns the names of msp3 files
Usage      : &write_msp3_files(\@files);
Version    : 1.8

write_parf_files

Download write_parf_files .pl
Function   : takes xxxx.msp files and writes xxxx.parf file
Keywords   : write_parf
Options    : 
  $pdbd_seq_only=d by d -d
  $sam_571_seq_only=571 by 571 -571
$pdb95d_2092_seq =2092 by 2092 -2092
  $ISS_2nd_Eval_factor= by E= ## "E=$eval"
  $PDB40D_935_FASTA= 935 by 935
  $use_raw_score=r
  $use_eval_but_show_raw_score=e by e -e  ## eval order but only raw score is shown.
                                            This is to make a special graph
                                           requested by David Haussler
Usage      : &write_parf_files(@ARGV);
Version    : 2.4

write_mprf_files

Download write_mprf_files .pl
Keywords   : mean_position_rank_file, mean_rank_position_file
Usage      : &write_mprf_files(@files);
Version    : 1.1

write_evss_files

Download write_evss_files .pl
Author     : jong@salt2.med.harvard.edu
Function   : This produces EVSS file(Error VS Score) from PARF file
Keywords   : get_score_vs_error_from_parf_files.pl
Options    : 
  d=$query_number    for dividing the errors by all the query number
   $negate_score=n by n -n  # to make sign change for PSI scores
   $error_per_query=q by q -q # divide the error by query number
   $log_of_errors=l by l -l
   $log_of_evalue_or_score=e by e -e
   $get_log_base_10=t by t -t

Usage      : @files_produced=@{write_evss_files(\@files)};
Version    : 1.5

write_fasta

Download write_fasta .pl
Argument   : 
   $sort_seq_names=s by s  ## in writing sorted sequences are written
   $write_rv_seq_as_well=R by R  # write reverse seq as well as forward seq
Example    : &write_fasta(\%in1, \$out_file_name, \%in2, \%in3,..., );
             << The order of the hash and scalar ref. doesn't matter. >>
Function   : writes multiple seqs. in fasta format (takes one or more seq.!!)
             This needs hash which have 'name' 'actual sequence as value'

             To print out each fasta seq into each single file, use write_fasta_seq_by_seq
             This can rename seq names

Keywords   : write_fasta_file, print_fasta_file, write fasta file, fasta_write
             show_fasta, write_sequence_fasta, write_fasta_files,
Options    : v for STD out.
             r for rename the sequences so that Clustalw would not complain with 10 char limit
               so result wuld be:  0 ->ASDFASDF, 1->ASDFASFASF, 2->ADSFASDFA
       $write_pure_seq_only=o by o -o  ## writing only the seq (no gap chars or space)
Usage      : many argments:  $seq_hash_reference  and $output_file_name
             takes a hash which has got names keys and sequences values.
Version    : 3.0
Warning    : The default output file name is 'default_out.fa' if you do not
             specify output file name.
             OUTput file should have xxxxx.fa or xxxx.any_ext NOT just 'xxxxx'

write_fasta_seq_by_seq

Download write_fasta_seq_by_seq .pl
Example    : with >xxxx
                  ASDFASDFASDFASDFASDFASDFASDF
                  >yyyy
                  ASDFASDFASDFASDFASDFASDFSDAFSD

             You will get two files (xxxx.fa, yyyy.fa)
Function   : accepts one hash of multiple sequences and writes many files
             of single sequences by using the names as file names.
             If $extension is provided, it writes an output as in
             the below example (seq1_sc.fasta). If not, it just attach
             'fa' to files.
             This needs, hash of 'name', 'actual sequence as value'
Keywords   : write_each_fasta, write_single_fasta, write_fasta_single
             single_fasta_write, write_fasta_files_seq_by_seq,
             write_single_fasta_files,
Options    : can specify extension name.
             e  for checking fasta file exists or not and skipps if so
             r for rename the sequences so that Clustalw would not complain with 10 char limit
               so result wuld be:  0 ->ASDFASDF, 1->ASDFASFASF, 2->ADSFASDFA
   $write_rv_seq_as_well=R by R  # write reverse seq as well as forward seq
   $extension= by E=
Returns    : nothing. default OUTPUT file name is '$key.fa' !!
Usage      : &write_fasta_seq_by_seq(\%hash, [$extension], [\$output_filename]);
Version    : 2.1

write_rdif_files

Download write_rdif_files .pl
Version    : 1.0

write_rdif2_files

Download write_rdif2_files .pl
Usage      : &write_rdif2_files(@files);
Version    : 1.2

write_ardf_files

Download write_ardf_files .pl
Usage      : &write_ardf_files(@files);
Version    : 1.2

write_nhco_files

Download write_nhco_files .pl
Author     : jpark@rascal.med.harvard.edu, NHCO stands for Nomolog, Homolog Column Output file
Function   : writes nhco files with each class(4 of them) file nhco as well.
Keywords   : write homology column file, write_nomol_homol_column_files,
             write_homol_col_files
Version    : 1.5

write_c3ss_files

Download write_c3ss_files .pl
Author     : jong@salt2.med.harvard.edu
Version    : 1.2

write_pred_files

Download write_pred_files .pl
Example    : &write_pred_files(\%gapped_av_for_back_pred, $final_output_pred_name,
                 $graphical_rep_of_str, "$put_reliability_line");

Keywords   : write_predator_short_out_file, write_pred_file, write_prd_file
Options    : 
   $put_reliability_line=r by r
   $omit_coil_region=c by c
   $protein_name= by n=
   $graphical_rep_of_str=g by g
   $show_on_screen_only=s by s
   $seq_block_size= by b=
Version    : 1.4

write_prdl_files

Download write_prdl_files .pl
Keywords   : write_predator_long_out_file
Options    : 
   $protein_name= by n=
Version    : 1.1

show_in_fasta

Download show_in_fasta .pl
Example    : &show_in_fasta(\%hash);
Function   : shows multiple seqs. in fasta format (takes one or more seq.!!)
Usage      : &show_hash_in_fasta(\%in1, \%in2, \%in3, .... );
             takes a hash which has got names keys and sequences values.
             uses Perl5 pointers(references).
Version    : 1.0

One_To_Three_Letter

Download One_To_Three_Letter .pl
Function   : a hash of one letter to 3 letter amino acid code , returns a hash
Keywords   : 1_to_3
Usage      : %one_letter  = %{&One_To_Three_Letter};   # takes no arguments (void).
Version    : 1.0

ONE_TO_THREE_LETTER

Download ONE_TO_THREE_LETTER .pl
Function   : a hash of one letter to 3 letter amino acid code , returns a hash
Keywords   : 1_to_3, convert_1_to_3_letter
Usage      : %one_letter  = %{&ONE_TO_THREE_LETTER };   # takes no arguments (void).
Version    : 1.0

one_to_three_letter

Download one_to_three_letter .pl
Function   : a hash of one letter to 3 letter amino acid code , returns a hash
Usage      : %one_letter  = %{&one_to_three_letter};   # takes no arguments (void).
Version    : 1.0

three_to_one_letter

Download three_to_one_letter .pl
Function   : a hash of one letter to 3 letter amino acid code , returns a hash
Keywords   : 321, 3to1 3_to_1 THREE_TO_ONE_LETTER Three_To_One_Letter
             convert_3_to_1, convert_3_to_1_aa_name
Usage      : %three_letter  = &three_to_one_letter ;   # takes no arguments (void).
Version    : 1.1

convert_3_to_1_letter

Download convert_3_to_1_letter .pl
Function   : a hash of one letter to 3 letter amino acid code , returns a hash
Keywords   : 321, 3to1 3_to_1 THREE_TO_ONE_LETTER Three_To_One_Letter
             convert_3_to_1, convert_3_to_1_aa_name
Usage      : %three_letter  = &three_to_one_letter ;   # takes no arguments (void).
Version    : 1.1

convert_1_to_3_letter

Download convert_1_to_3_letter .pl
Function   : a hash of one letter to 3 letter amino acid code , returns a hash
Keywords   : 123, 1to3 1_to_3 one_TO_three_LETTER One_To_Three_Letter
             convert_1_to_3, convert_1_to_3_aa_name
Usage      : %three_letter  = &three_to_one_letter ;   # takes no arguments (void).
Version    : 1.1

amino_acid_compos_id_percent

Download amino_acid_compos_id_percent .pl
Argument   : hash of at least 2 sequences.
Function   : gets amino acid composition identity of any given
             number of sequences(at least 2).
Keywords   : get_amino_acid_composition, get_protein_composition, composition
Usage      : $percent = &amino_acid_compos_id_percent (%any_hash_with_sequences);
             The way identity(composition) is derived is;

Version    : 1.1

seq_id_percent_array

Download seq_id_percent_array .pl
Function   : produces amino acid composition identity of any given number of sequences.
Keywords   : get_percent_composition_identity, seq_composition_identity,
             percent_sequence_composition_id

Usage      : $percent = &seq_id_percent_array(@any_array_sequences);
             The way identity(pairwise) is derived is;

Version    : 1.0
Warning    : This can handle 'common gaps' in the sequences

compos_id_percent_array

Download compos_id_percent_array .pl
Function   : produces amino acid composition identity of any given number of sequences.
Usage      : $percent = &compos_id_percent_array(@any_array_sequences);
             The way identity(composition) is derived is;
Version    : 1.0

compos_id_percent_hash

Download compos_id_percent_hash .pl
Function   : gets amino acid composition identity of any given number of sequences.
Keywords   : get_amino_acid_composiiton
Usage      : $percent = &compos_id_percent_hash(%any_hash_with_sequences);
             The way identity(composition) is derived is;

Version    : 1.0

common_compos_id_hash

Download common_compos_id_hash .pl
Argument   : two references of hash of seqeunces.
Example    : ('A', 200, 'C', 191, D, 99)
                  ('A', 290, 'C', 199, D, 100)
             uses only two sequences.
Function   : actual calculation of identity
Returns    : ref. of a scaler (in percent)  eg)  95
Usage      : %hash = &common_compos_hash(\%any_hash1, \%any_hash1);
Version    : 1.0

calc_compos_id_hash

Download calc_compos_id_hash .pl
Argument   : two references of hash of seqeunces.
Example    : ('A', 200, 'C', 191, D, 99)
                  ('A', 290, 'C', 199, D, 100)
             uses only two sequences.
Function   : actual calculation of identity
Returns    : ref. of a scaler (in percent)  eg)  95
Usage      : %hash = &calc_compos_hash(\%any_hash1, \%any_hash1);
Version    : 1.0

get_percentage

Download get_percentage .pl
Argument   : ref. for Scalar string or Array of chars or Hash  AND 'the target char'
Example    : if the string is  'seq  ABCDEEEEEFFEFE' given in a hash
             if you put 'A' as one argument, it counts the occurances of 'A'
             and gets the percentage of it.
Function   : calculates the percentage content of any single char over the whole
             length of strings in it.
Keywords   : get_percentage_of_char
Options    : None yet.
Returns    : Numerical Percentage
Usage      : %out= %{&get_percentage(\%result, '1')};
Version    : 1.0
Warning    : This converts array and string input as ref. into arbitrary hash and
             returns hash
             programmed by A Biomatic

pairwise_percent_id

Download pairwise_percent_id .pl
Function   : takes a ref. of a hash of names and sequences, returns
             percent identity.
Usage      : $identity = ${&pairwise_percent_id(%arrayinput)};

Version    : 1.0

get_seq_identity

Download get_seq_identity .pl
Argument   : hash(es) of sequences.
Function   : takes a ref. of a hash of names and sequences, returns
             percent identity. NOT composition identity.
Keywords   : get_sequence_identity
Usage      : $identity = ${&get_seq_identity(%arrayinput)};

Version    : 1.0

get_correct_percent_alignment_rate

Download get_correct_percent_alignment_rate .pl
Argument   : two sequence files which have identical sequence names.
Function   : accepts two files and prints out the sequence identities of the alignment.
Options    : h  # for help
             v  # for verbose printouts(prints actual sequences)
Returns    : reference of Scalar for percentage correct alignment(for already
             aligned sequences)
Usage      : &get_correct_percent_alignment_rate(\$file1, \$file2);
Warning    : Alpha version,  A Biomatic , made for Bissan

amino_acid_compos_id_percent_trend

Download amino_acid_compos_id_percent_trend .pl
Version    : 1.0

composition_table

Download composition_table .pl
Function   : returns a table of alphabet with occurances.
             can handle any char, this converts char to upper case.
Returns    : %hash1 = ('A',3, 'C',2, 'D',1, 'Q',2, 'S',1), %hash2,,,
Usage      : %output = %{&compos_table(@input_array1, @input_array2,,,,)};
             example input

Warning    : converts all SMALL letters to Capital letters before counting!!

common_compos_2_hash

Download common_compos_2_hash .pl
Argument   : two references of hash of seqeunces.
Example    : common gaps means only '.' (dots, not alphabets!!)
             AAA....BBCB
             AAAB..B.BCC  --> A.A.....BC. (as in an array)
             A.AAA...BCA
Returns    : a hash (string1, number1, string2, number2, string3, number3, ...)
Usage      : %hash = &common_compos_hash(\%any_hash1, \%any_hash1);

pair_percent_id_trend

Download pair_percent_id_trend .pl
Example    : common gaps means only '.' (dots, not alphabets!!)
             AAA....BBCB
             AAAB..B.BCC  --> A.A.....BC. (as in an array)
             A.AAA...BCA
             The resulting array XXXXX..XXXX is literally like so.
             This is to detect absurd gaps in the above.


Usage      : @array = &pair_percent_id_trend (%arrayinput);

smaller_one

Download smaller_one .pl
Example    : will return   5   with  &smaller_one(5, 50);
Function   : gets smaller value of the two inputs
Usage      : $smaller = & smaller_one($var, $var2);
Warning    : gets only digits!!

count_num_of_char

Download count_num_of_char .pl
Function   : takes only ARRAY and counts the number of char. Each elem should be
             a single char.
Usage      : $num_char = &count_num_of_char(@input_array_of_single_char);

remov_com_column2

Download remov_com_column2 .pl
Argument   : accepts reference for a hash.
Example    : seq1  ABCDE------DDD         seq1  ABCDE--DDD
             seq2  ABCDEE-----DD-  ==>    seq2  ABCDEE-DD-
             seq3  ---DEE----DDE-         seq3  ---DEEDDE-
                         ^^^^
             from above the 4 columns of gap will be removed
             To remove absurd gaps in multiple sequence alignment
Returns    : a ref. of a hash.

                                  

Usage      : %new_string = %{&remov_com_column2(\%input_hash)};
Version    : 1.0

get_common_column

Download get_common_column .pl
Argument   : 2 or more ref for hash of identical keys and value length.
             One optional arg for replacing space char to the given one.
Author     : jong@salt2.med.harvard.edu
Class      : get_common_column, get_common_column_in_seq, get common column in sequence
             for secondary structure only representation.
Example    : %out =%{&get_common_column(\%hash1, \%hash2, '-')};
             output> with 'E' option >>> "name1     --HHH--1232-"
   Following input will give;
  %hash1 = ('s1', '--EHH-CHHEE----EHH--HHEE----EHH--HHEE----EHH-CHHEE--');
  %hash2 = ('s2', '--EEH-CHHEE----EEH-CHHEE----EEH-CHHEE----EEH-CHHEE--');
  %hash3 = ('s3', '-KEEH-CHHEE-XX-EEH-CHHEE----EEH-CHHEE----EEH-CHHEE--');
  %hash4 = ('s4', '-TESH-CHEEE-XX-EEH-CHHEE----EEH-CHHEE----EEH-CHHEE--');

    s1_s2_s3_s4    --E-H-CH-EE----E-H--HHEE----E-H--HHEE----E-H-CHHEE--

Function   : (name1         --EHH--HHEE-- )
             (name2         --HHH--EEEE-- ) ==> result is;

             (name1_name2   -- HH--  EE-- )
             to get the identical chars in hash strings of sequences.

Keywords   : Overlap, superpose hash, overlay identical chars, superpose_seq_hash
             get_common_column, get_com_column, get_common_sequence,
             get_common_seq_region, multiply_seq_hash, get_common_column_in_sequence
Returns    : one hash ref. of the combined key name (i.e., name1_name2). Combined by '_'
Usage      : %out =%{&get_common_column(\%hash1, \%hash2, '-')};
Version    : 1.6
Warning    : This gets more than 2 hashes. Not more than that!


overlay_seq_for_identical_chars

Download overlay_seq_for_identical_chars .pl
Argument   : 2 ref for hash of identical keys and value length. One optional arg for
             replacing space char to the given one.
Example    : %out =%{&overlay_seq_for_identical_chars(\%hash1, \%hash2, '-')};
             output> with 'E' option >>> "name1     --HHH--1232-"
Function   : (name1         --EHH--HHEE-- )
             (name2         --HHH--EEEE-- ) ==> result is;

             (name1_name2   -- HH--  EE-- )
             to get the identical chars in hash strings of sequences.

Keywords   : Overlap, superpose hash, overlay identical chars, superpose_seq_hash
Returns    : one hash ref. of the combined key name (i.e., name1_name2). Combined by '_'
Usage      : %out =%{&overlay_seq_for_identical_chars(\%hash1, \%hash2, '-')};
Version    : 1.0
Warning    : Works only for 2 sequence hashes.

remov_com_column

Download remov_com_column .pl
Argument   : accepts reference for hash(es) and array(s).
Function   : removes common gap column in seq.
Keywords   : remove_com_column, remove_common_column,
             remove_common_gap_column, remov_common_gap_column,
             remove com column
Returns    : a ref. of  hash(es) and array(s).

             name1   ABCDE....DDD       name1  ABCDE..DDD
             name2   ABCDEE..DD..  -->  name2  ABCDEEDD..
             name3   ...DEE..DDE.       name3  ...DEEDDE.

             (ABC....CD, ABCD...EE) --> (ABC.CD, ABCDEE)
             from above the two column of dot will be removed
             To remove absurd gaps in multiple sequence alignment. for nt6-hmm.pl
Usage      : %new_string = %{&remov_com_column(\%hashinput)};
             @out=@{&remov_com_column(\@array3)};

remov_common_gap

Download remov_common_gap .pl
Example    : XXX...XXX with AAA.....BBBB, The common positions of 3,4,5 deleted
             XXX...XXX will be removed in AAA.....BBBB --> AAA..BBBB
             XXX...XXX is an @array, while AAA.....BBBB is a value of the input hash
Function   : XXX...XXX, and an hash input. removes all the common gap(dots) in targets.
Usage      : %result = &remov_common_gap (*common_pos_arr, *target_hash_of_sequence);
Version    : 1.0

com_gap_pos_hash

Download com_gap_pos_hash .pl
Argument   : gets a ref. of a hash of sequences
Example    : common gaps means only '.' (dots, not alphabets!!)
             AAA....BBBB
             AABB....BBC  --> XXXXX..XXXX (as in an array)
             ..AAA...BCA
             This is to detect absurd gaps in the above.
Function   : returns X...XXXX, as an array. '.' means common elements.
Keywords   : common_gap_pos_hash
Usage      : @array = @{&com_elem_pos_hash(%arrayinput)};
Version    : 1.0

pairwise_iden_pos

Download pairwise_iden_pos .pl
Example    : common gaps means only '.' (dots, not alphabets!!)
             AAA....BBCB
             AAAB..B.BCC  --> A.A.....BC. (as in an array)
             A.AAA...BCA
             The resulting array XXXXX..XXXX is literally like so.
             This is to detect absurd gaps in the above.
Usage      : @array = &pairwise_iden_pos(%arrayinput);
Version    : 1.0

open_pdb_files

Download open_pdb_files .pl
Argument   : one ref. for an inputfile (absolute
             >>> PDB example >>>
             SEQRES   1 A  284  MET ASP ALA ILE LYS LYS LYS MET GLN MET LEU LYS LEU  2TMA  51
             SEQRES   2 A  284  ASP LYS GLU ASN ALA LEU ASP ARG ALA GLU GLN ALA GLU  2TMA  52

Function   : Convert a PDB structure file to FASTA format sequences.
Keywords   : read_pdb_files{, read pdb files, open pdb files
Returns    : One ref. for a hash of sequences(DNA, RNA, PROTEIN (IN diff chains)
             If the two chains are identical, it rids of one of them and returns
             a name with out chain note-->  2tma, not 2tmaA and 2tmaB
Usage      : %out = %{&open_pdb_files(\$VAR)};
Version    : 1.7
Warning    : (read the sequences only)

open_brk_files

Download open_brk_files .pl
Argument   : one ref. for an inputfile (absolute
             >>> PDB example >>>

             SEQRES   1 A  284  MET ASP ALA ILE LYS LYS LYS MET GLN MET LEU LYS LEU  2TMA  51
             SEQRES   2 A  284  ASP LYS GLU ASN ALA LEU ASP ARG ALA GLU GLN ALA GLU  2TMA  52
             SEQRES   3 A  284  ALA ASP LYS LYS ALA ALA GLU ASP ARG SER LYS GLN LEU  2TMA  53

Function   : Convert a PDB structure file to FASTA format sequences.
Returns    : One ref. for a hash of sequences(DNA, RNA, PROTEIN (IN diff chains)
             If the two chains are identical, it rids of one of them and returns
             a name with out chain note-->  2tma, not 2tmaA and 2tmaB
Usage      : %out = %{&open_brk_files(\$VAR)};

open_msf_jp_files

Download open_msf_jp_files .pl
Function   : makes two hashes from  ...msf and ..jp files. %array1 is for msf
Usage      : &open_msf_jp_files($file1, $file2);
Warning    : !!! not very general bettter not use.
             msf file is meant to be seq
             jp file is meant to be structural alignment (correct seq

             msf format is

             cofi_human  ATFVKM
             ici2_horvu  RVRLFVDKLD NIA
             ici3_horvu  RVRLFVDRLD NIA

             jp format is;

             ycah_ecoli  RNVEIV----VID-GVRRFGNIA
             icis_vicfa  RVRLYVDESNKVV-RAAPIGNIA
             ier1_lyces  RVRLFVNLLDIVV-QTPKVGNIA

scoring

Download scoring .pl
Warning    : not general, !!!

sort_files_by_time

Download sort_files_by_time .pl
Function   : sorts files by creation time. Oldest the first
Keywords   : sort_by_time, sort_files_chronically
Options    : _  for debugging.
             #  for debugging.
Usage      : @files = @{&sort_files_by_time(\@files)};
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

sort_hash_by_value_and_make_array

Download sort_hash_by_value_and_make_array .pl
Function   : sorts any hash by its values and returns ref. of sorted hash values
             with keys attached. So, if the input key value were
             key1 value1, the result will be an element 'value1 key1' as
             a string
Keywords   : sort_hash_by_value, sort_hash, sort_by_values,
Options    : -n  for numerical sort(not working yet)
Usage      : @values_sorted =@{&sort_hash_by_value_and_make_array(\%assoc)};
Version    : 1.1
Warning    : The same values will be overwritten.

sort_by_hash_values

Download sort_by_hash_values .pl
Function   : sorts any hash by its values and returns ref. of sorted hash values
Keywords   : sort_hash_by_value, sort_hash, sort_by_values, sort_by_value
Usage      : @values_sorted =@{sort_by_by_values(\%assoc)};
Version    : 1.1
Warning    : The same values will be overwritten.

sort_by_keys

Download sort_by_keys .pl
Function   : sorts any hash by its values and returns ref. of sorted hash values
Keywords   : sort_hash_by_keys, sort_hash, key_sort
Usage      : @values_sorted =@{sort_by_by_values(\%assoc)};
Version    : 1.0

sort_hash_by_keys

Download sort_hash_by_keys .pl
Function   : sorts any hash by its values and returns ref. of sorted hash values
Keywords   : sort_hash_by_keys, sort_hash, key_sort
Usage      : @values_sorted =@{sort_by_values(\%assoc)};
Version    : 1.0

sort_hash_by_value

Download sort_hash_by_value .pl
Function   : sorts any hash by its values and returns ref. of sorted hash values
Keywords   : sort_hash_by_value, sort_hash, value_sort,
Usage      : @values_sorted =@{sort_hash_by_values(\%assoc)};
Version    : 1.0

by_values

Download by_values .pl
Usage      : for $key(sort by_values(values %assoc)){print $assoc{$key},"\n";}
Version    : 1.0

sort_string_by_length

Download sort_string_by_length .pl
Function   : sorts strings in array according to their sizes
             bigger comes first.
Keywords   : sort_array_by_length, sort_str_by_length, sort_array_string_by
             sort_string_by_leng, sort_by_length, sort_by_leng,
             sort_array_by_string_length, sort_array_elements_by_string_length
Options    : -r  reverse the order
Usage      : @output = @{&sort_string_by_length(@any_input_strings, [-r], @more)};
Version    : 1.2

get_host_by_addr

Download get_host_by_addr .pl
Example    : ($name,$aliases,$addrtype,$length,@addrs)=&get_host_by_addr($var); while $var = "13.13.12.12";
Keywords   : get_host_by_address, get_hostname_by_address
Usage      : ($name,$aliases,$addrtype,$length,@addrs)=&get_host_by_addr('131.111.137.11'); or
Version    : 1.0

hostname

Download hostname .pl
Keywords   : get_hostname
Version    : 1.0

get_host_by_name

Download get_host_by_name .pl
Example    : ($name,$aliases,$addrtype,$length,@addrs)=&get_host_by_name($var);
             while $var = "ind4";
Usage      : ($name,$aliases,$addrtype,$length,@addrs)=&get_host_by_name('ind4'); or
Version    : 1.0
Warning    : ! not working yet.

word_wrap

Download word_wrap .pl
Returns    : 
             The string with newlines replacing spaces in appropriate places.
Usage      : &word_wrap($line_to_format)
Version    : 1.0
Warning    : 
             The following subroutine does word wrapping on a text string

show_array

Download show_array .pl
Example    : Output:      item1
             Output:      item2
             Output:      item3
Function   : for debugging purpose. Shows any array elem line by line.
Options    : -h  for horizontal display of elements
             c   for compact (do not put new line between array chunk)
             s   for putting new line between arrays
Usage      : &show_array(\@input_array);
Version    : 2.4
Warning    : can handle scalar ref, too.

array_most_occur

Download array_most_occur .pl
Argument   : \@array
Keywords   : median_array, get_median_array, get_array_median, array_median
Returns    : \$median
Usage      : $median = ${&array_most_occur(\@array)};
Version    : 1.0

array_least_occur

Download array_least_occur .pl
Argument   : \@array
Keywords   : median_array, get_median_array, get_array_median, array_median
Returns    : \$median
Usage      : $median = ${&array_least_occur(\@array)};
Version    : 1.0

show_hash

Download show_hash .pl
Example    : Output:      item1
             Output:      item2
             Output:      item3
Function   : for debugging purpose. Shows any array elem line by line.
             the line is 60 elements long (uses recursion)
Options    : -s or -S or s or S for spaced output. Eg)
             seq1       1 1 1 1 1 1 1 1 1 1 1 1

             instead of
             seq1       111111111111

             -h or -H or h or H for horizontal line of '---------...'

Usage      : &show_hash(\@input_array);
Version    : 1.7
Warning    : There is a global variable:  $show_hash_option
             It tries to detect any given sting which is joined by ','

open_predator_files

Download open_predator_files .pl
Example    : 
  There are 2 types of output.  The short output:>

  > MOZ_HUMAN_part
                .         .         .         .         .
  1    LDHKTLYYDVEPFLFYVLTQNDVKGCHLVGYFSKEKHCQQKYNVSCIMIL   50
       ___EEEEEE__HHHHHHH_______EEE____________EEEEEEEEE_

 ((-l option for long output )
  NAME MOZ_HUMAN_part
  HEADER  |- Residue -|  Pred  Rel      NAli   Asn
  PRED    1    MET    M  c     0.000    0      ?
  PRED    2    ALA    A  c     0.000    0      ?

Function   : gets sec. str. prediction of predator and puts in hash
             If 's' option is given, it also gives sequence hash ref
             as the second output ref. This can handle the 2 types
             of output format of predator. So, the output can will
             be different according to inputs.
Keywords   : open_prd_files, open_pred_files, predator, open_prdl_files
             open_pre_files, secondary structure prediction file
Options    : 's' for sequence output as well (\%sec_str, \%seq)
             'p' for percentage of the sec. str.
             'a' for accumulated percentage. This will
                  set 'p' automatically
             'n' for NO name when outputing Percentage of chars with
                 HASH input to get_occurances_of_char sub.
      $reverse_residue_order=r by r
Version    : 1.8

open_phd_files

Download open_phd_files .pl
Argument   : one or more file names and options. Files should be PHD server's result.
Function   : open phd files and put sequences in a hash(s) (run open_phd_files.pl to
             get some ideas on how this works. type  'open_phd_files.pl xxx.phdo s',
             it will produce 5 different hashes of secondary structure pred.
Options    : $secondary, $access, $PHD_sec, $Rel_sec, $prH_sec, $prE_sec, $prL_sec,
                  $prL_sec, $SUB_sec, $P_3_acc, $PHD_acc, $Rel_acc, $SUB_acc);
   $attach_class_info_in_seq_name=c by c ## this makes seq_name   seq_name_PHD_s
   $simple_seq_with_name_hash=s by s

Returns    : one or more hashes(ref.) secondary structure prediction of PHD server
             --- The PHD secondary server output which are read by open_phd_files -----
             1 =>       PHD sec |         HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH     HHHHHHH|
             2 =>       Rel sec |987544342178899999999987678999998478999999999995679771688999|
             3 =>       prH sec |001222323478899999999987778999998678999999999986110115788999
             4 =>       prE sec |000010000101000000000000010000000000000000000000000000010000
             5 =>       prL sec |987666565410000000000001110000001211000000000002789774100000
             6 =>       SUB sec |LLLL
             7 =>       P_3 acc |eeeeeeeeee bbeeebbbebbbbebeeee b bbebbebb eebeebe eee eebbeb|
             8 =>       PHD acc |988787787630066600060000606667515007007005760671847885760160
             9 =>       Rel acc |979685546222352421667053233245604127749164753790316552446141
             0 =>       SUB acc |eeeeeeeee
             types of PHD output, like 1 for 'PHD sec', 2 for 'Rel sec' etc.
Usage      : &open_phd_files(\$file_name, $options,,,,,);
Version    : 1.6
Warning    : All the spaces are converted to '_'

open_swissprot_seq_files

Download open_swissprot_seq_files .pl
Function   : open swiss files and puts ONLY the sequences in a hash(s)
Keywords   : open_swiss_seq_files, open_swiss_seq, read_swissprot_seq_files,
            read_swiss_seq, get_swissprot_seq, take_swissprot_seq,
Options    : 'v' for STDOUT printout as well.
Version    : 1.2
Warning    : ONLY the seq.

open_clu_files

Download open_clu_files .pl
Example    : Clu file eg)

  Cluster 7360103
    1  1 SLL1058         7-255       2   Origin: 3   736   Sub:3
    1  1 MJ0422          17-283      2   Origin: 3   736   Sub:3
    1  1 HI1308          3-245       2   Origin: 3   736   Sub:3

Keywords   : open_cluster_files,
Options    : _  for debugging.
             #  for debugging.
             b  for to get just names ($simple_clu_reading)
             r  for adding ranges in the names
             U  for makeing sequence names upppercase

Returns    : a ref of hash of $clus{"$clus_size\-$id"}.=$m."\n";
             Actual content:
             3-133 => 'HI00111 HI00222 MG1233 '
Usage      : %clus=%{&open_clu_files(\$input)};
Version    : 1.9
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.
              This automatically converts lower to upper letters

open_msf_files

Download open_msf_files .pl
Argument   : (\$inputfile1, \$inputfile2, .... )};
Function   : open msf files and put sequences in a hash(s)
Options    : 
   $no_gap_char_included=n by n  ## to remove gaps noted by '.'
   $reverse_seq=r by r
   $produce_seq_oder_info=o by o
Returns    : (*out, *out2)  or (@out_array_of_refs)
Usage      : (*out, *out2) = @{&open_msf_files(\$inputfile1, \$inputfile2)};
             : %hash_seq = %{&open_msf_files(\$inputfile1)};
             : (@out)        = @{&open_msf_files(\$inputfile1, \$inputfile2)};
             ---------- Example of MSF ---
             PileUp

             MSF:   85  Type: P    Check:  5063   ..

Version    : 1.7

open_hmmls_files

Download open_hmmls_files .pl
Function   : hmmls matches the full length model to target seq. while, hmmfs
             does for fragments as well.
Options    : 
   t=$thresh  for bits score threshold
   e=$evalue_thresh  for bits score threshold
    r for adding ranges
    m for making MSP file format output
    E=Enguiry_name    for specifying enquiry seq name rather than 'HMM', the default
Usage      : %out=%{&open_hmmls_files(\@file)};
Version    : 1.5

open_hmmfs_files

Download open_hmmfs_files .pl
Options    : 
   "t=$thresh"  for bits score threshold
Usage      : %out=%{&open_hmmfs_files(\@file, "t=$thresh", $attch_ranges)};
Version    : 1.0

open_seq_files

Download open_seq_files .pl
Example    : %out = %{&open_seq_files(@ARGV)};
                    while @ARGV at prompt was: 'pdb_40.seq'
             %seq=%{&open_seq_files(@ARGV, '1cgpa_140-197')};
                    to fetch 1cgbA but in range of 140-197 only
Function   : open seq files and put sequences in a hash
             seq sequence file format is like this;

 1l94   162 MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRTFRTGTWDAYK
 1lye   162 MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRTFRTGTWDAYK
 1lyj   162 MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRTFRTGTWDAYK
 1mngA  203 PYPFKLPDLGYPYEALEPHIDAKTMEIHHQKHHGAYVTNLNAALEKYPYLHGVLNWDVAEEFFKKA

             This can also return the sizes of sequences rather than seqs.
Keywords   : open_pdbs_files
Options    : any digit for the minimum seq length
Usage      : %seq=%{&open_seq_files($tim_seq_file, ['MJ0084'], [15] )};
             if you put additional seq name as MJ0084 it will
             fetch that sequence only in the database file.
             Any digit will be used as minimum seq size to be fetched.
Version    : 1.6

open_sso_files

Download open_sso_files .pl
Example    : 
  717    0         0.343  16    373    EC1260_16-373              74    434    YBL6_YEAST_74-434
  348    9e-16     0.500  113   233    EC1260_113-233             27    146    YDBG_ECOLI_27-146
  472    2.9e-08   0.271  13    407    EC1260_13-407              148   567    YHJ9_YEAST_148-567
  459    1.9e-22   0.260  1     407    EC1260_1-407               65    477    YLQ6_CAEEL_65-477
  452    4.5e-14   0.275  1     407    EC1260_1-407               103   537    YSCPUT2_103-537
  1131   0         0.433  1     407    EC1260_1-407               112   519    ZMU43082_112-519

  Input SSO file example)-> below

   >>MG032 ATP-dependent nuclease (addA) {Bacillus subtilis  (666 aa)
    Z-score: 88.3 expect()  1.9
   Smith-Waterman score: 77;  27.143% identity in 70 aa overlap

           30        40        50        60        70        80
   MJ0497 RSAGSKGVDLIAGRKGEVLIFECKTSSKTKFYINKEDIEKLISFSEIFGGKPYLAIKFNG
                                        : .. ...  . .:.:::. :: : ..:
   MG032  HDKVRYAFEVKFNIALVLSINKSNVDFDFDFILKTDNFSDIENFNEIFNRKPALQFRFYT
        200       210       220       230       240       250

           90       100             110       120       130
   MJ0497 EMLFINPFLLSTNGK------NYVIDERIKAIAIDFYEVIGRGKQLKIDDLI
          .   ::   :: ::.      : ....... . ::. . :
   MG032  K---INVHKLSFNGSDSTYIANILLQDQFNLLEIDLNKSIYALDLENAKERFDKEFVQPL
        260          270       280       290       300       310

 Parseable form -m 10 option =========================================
   >>>MJ0497.fa, 133 aa vs GMG.fa library
   ; pg_name: Smith-Waterman (PGopt)
   ; pg_ver: 3.0 June, 1996
   ; pg_matrix: BL50
   ; pg_gap-pen: -12 -2
   >>MG032 ATP-dependent nuclease (addA) {Bacillus subtilis
   ; sw_score:  77
   ; sw_z-score: 88.3
   ; sw_expect    1.9
   ; sw_ident: 0.271
   ; sw_overlap: 70
   >MJ0497 ..
   ; sq_len: 133
   ; sq_type: p
   ; al_start: 58
   ; al_stop: 121
   ; al_display_start: 28

Function   : This reads the parseable( -m 10 option)
              and non-parseable form of ssearch program output
             If you give 5 files, it produces 5 hashes as a ref of array.
             This understands xxxx.gz files.
             This reads FASTA -m 10 output, too.
Keywords   : open_ssearch_output_files, ssearch_output, ssearch, FASTA,
Options    : _  for debugging.
             #  for debugging.
             u= for upper E value limit
             l= for lower E value limit
             r  for attaching ranges to out seq names (eg> HI0001_1-20 as a key)
             U  for making the matched seqname to upppercase
             L  for making the matched seqname to lowercase
             R  for attaching ranges to out seq names for both TARGET and MATCH
             n  for new format (msp2)
             a  for getting alignments of the pair

Usage      : @sso=@{&open_sso_files(@file, $add_range, $add_range2, "u=$upper_expect_limit",
                               "l=$lower_expect_limit", "m=$margin", $new_format)};
Version    : 4.5
Warning    : By default, the SW score comes to the first
             If expect value is not found, it becomes '0'
             By default, the offset of seq match with a seq name like seq_30-40
               will be 30 not 1.
             It ignores special chars like , : .prot in the name (eg, AADF_FASDF: will be AADF_FASDF)

open_msp_files

Download open_msp_files .pl
Example    : Example output(with 'n' opt):
   d1bi6h1         d1bi6h1_1-24     IBR1_ANACO_20-42  IBR2_ANACO_19-42
   e1bi6.1h1       IBR1_ANACO_38-52 e1bi6.1h1_1-18    IBR2_ANACO_38-52

Function   : opens Erik Sonhammer's MSPcrunch file output(default).
             This looks up xxxxx.fa files in the pwd (with S opt) and see
             if it can get the sequences as well.
             With 'n' option you can just get the matched sequence
              names with ranges.
Keywords   : exchange_msp_file_columns,
Options    : 
          s -s  for size return only
          S -S  for the sequences are fetched if equivalent xxxx.fa files are in pwd
          n -n  for matched seq NAMEs with ranges only (eg: HI0001_1-12,,), hash ref is out
          R     for NO range attachment in Name only return option (n)
          e=    for evalue threshhold, if e=1, ignores all which are over 1
          t=    for score threshhold if t=100, ignores all which are less 100
          l=    for match length threshold.
          x     for exchange query with matched seqs. eg)      12 0.09 1 30 QUERY  1 29 MATCH
                                                       becomes 12 0.09 1 30 MATCH  1 29 QUERY
                This returns the same lines as input only with exchanged query and match seqs

Usage      : %seq=%{&open_msp_files(@file, $names_only)};
Version    : 2.8

open_dssp_files

Download open_dssp_files .pl
Argument   : files names like (6taa, 6taa.dssp) If you put just '6taa' without extension, it
             searches if there is a '6taa.dssp' in both PWD and $DSSP env. set directory.
             ---------- Example of dssp ---
             **** SECONDARY STRUCTURE DEFINITION BY THE PROGRAM DSSP, VERSION JUL
             REFERENCE W
             HEADER    RIBOSOME-INACTIVATING PROTEIN           01-JUL-94   1MRG
             COMPND    ALPHA-MOMORCHARIN COMPLEXED WITH ADENINE
             SOURCE    BITTER GOURD (CUCURBITACEAE MOMORDICA CHARANTIA) SEEDS
             AUTHOR    Q
             246  1  0  0  0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN)                .
             112 95.0   ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2)                                                                         .
             171 69.5   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J)  , SAME NUMBER PER 100 RESIDUES                              .
             12   4.9   TOTAL NUMBER OF HYDROGEN BONDS IN     PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
             36  14.6   TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
             1    0.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-5), SAME NUMBER PER 100 RESIDUES                              .
             1    0.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-4), SAME NUMBER PER 100 RESIDUES                              .
             74  30.1   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4), SAME NUMBER PER 100 RESIDUES                              .
             5    2.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+5), SAME NUMBER PER 100 RESIDUES                              .
             1    2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     *** HISTOGRAMS OF ***           .
             0    0  0  0  1  1  0  2  0  0  1  0  0  1  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0    RESIDUES PER ALPHA HELIX         .
             1    0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    PARALLEL BRIDGES PER LADDER      .
             2    0  1  2  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    ANTIPARALLEL BRIDGES PER LADDER  .
             2    0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    LADDERS PER SHEET                .
             #   RESIDUE AA STRUCTURE BP1 BP2  ACC   N-H-->O  O-->H-N  N-H-->O  O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA
             1    1   D              0   0  132    0, 0.0   2,-0.3   0, 0.0  49,-0.2   0.000 360.0 360.0 360.0 153.4   44.0   96.9  -23.8
             2    2   V  E     -a   50   0A  10   47,-1.5  49,-2.8   2, 0.0   2,-0.3  -0.889 360.0-163.3-115.9 151.4   43.1  100.4  -22.5
             3    3   S  E     -a   51   0A  63   -2,-0.3   2,-0.3  47,-0.2  49,-0.2  -0.961  10.3-172.8-131.0 152.3   44.8  103.7  -23.4
             4    4   F  E     -a   52   0A   8   47,-2.2  49,-2.3  -2,-0.3   2,-0.4  -0.985   6.9-161.2-143.2 139.5   45.0  107.2  -22.0
             5    5   R  E     -a   53   0A 144   -2,-0.3   4,-0.2  47,-0.2  49,-0.2  -0.993   9.7-156.0-121.0 125.9   46.6  110.2  -23.6
             6    6   L  S    S+     0   0    1   47,-2.3   2,-0.5  -2,-0.4   3,-0.4   0.644  73.2  90.9 -73.3 -22.4   47.5  113.2  -21.4
             7    7   S  S    S+     0   0   81   47,-0.3   3,-0.1   1,-0.2  -2,-0.1  -0.695 106.0   5.2 -75.5 121.0   47.4  115.6  -24.4
             8    8   G  S    S+     0   0   72   -2,-0.5  -1,-0.2   1,-0.3   5,-0.1   0.269  97.6 147.8  90.2 -10.7   43.9  117.0  -24.7
             9    9   A        +     0   0   10   -3,-0.4  -1,-0.3  -4,-0.2  -3,-0.1  -0.256  16.8 166.8 -58.8 142.4   42.9  115.2  -21.5
             (\$inputfile1, \$inputfile2, .... )};
Function   : open dssp files and put sequences in a hash(s)
              It can take options for specific secondary structure types. For example,
              if you put an option $H in the args of the sub with the value of 'H'
              open_dssp_files will only read secondary structure whenever it sees 'H'
              in xxx.dssp file ignoring any other sec. str. types.
              If you combine the options of 'H' and 'E', you can get only Helix and long
              beta strand sections defined as segments. This is handy to get sec. str. segments
              from any dssp files to compare with pdb files etc.
             With 'simplify' option, you can convert only all the 'T', 'G' and 'I' sec. to
              'H' and 'E'.
Options    : H, S, E, T, I, G, B, P, C, -help
 $H        =        'H' by   -H or -h or H or h  # to retrieve 4-helix (alpha helical)
 $S        becomes  'S' by   -S or -s or S or s  # to retrieve Extended strand, participates in B-ladder
 $E        becomes  'E' by   -E or -e or E or e  # to retrieve residue in isolated Beta-bridge
 $T        becomes  'T' by   -T or -t or T or t  # to retrieve H-bonded turn
 $I        becomes  'I' by   -I or -i or I or i  # to retrieve 5-helix (Pi helical) segment output
 $G        becomes  'G' by   -G or -g or G or g  # to retrieve 3-helix (3-10 helical)
 $B        becomes  'B' by   -B or -b or B or b  # to retrieve only B segment
 $simplify becomes   1  by   -p or P or -P, p
 $comm_col becomes  'c' by   -c or c or C or -C or common
 $HELP     becomes   1  by   -help   # for showing help

Returns    : (*out, *out2)  or (@out_array_of_refs)
Usage      : (*out, *out2) = @{&open_dssp_files(\$inputfile1, \$inputfile2, \$H, \$S,,,,)};
             (@out)        = @{&open_dssp_files(\$inputfile1, \$inputfile2, \$H, \$S,,,,)};
Version    : 2.9
$debug feature has been added to make it produce error messages with '#' option.
Warning    : 6taa.dssp  and 6taa are regarded as the same.

open_dna_files

Download open_dna_files .pl
Argument   : (\$inputfile1, \$inputfile2, .... )};
Function   : open dna files and put sequences in a hash(s)
Returns    : (@out_array_of_refs)
Usage      : ($out, $out2) = @{&open_dna_files(\$inputfile1, \$inputfile2)};
             : (@out)        = @{&open_dna_files(\$inputfile1, \$inputfile2)};
             ---------- Example of dna file --- dna files are genbank file format


             1 ggatcttgct gaatacatgg tggcacaatt gaaattagat ccgcgaattt
               tcatcaaaac
             61 agcgggatta tggtcaacaa atccgtaaaa atgaaaagcc tgtcttgcga
               caggcttttt
             121 tatttgaatg taatcctcac tggtaaacgt ttaacgccaa agacaaaggg
               actagggatc
             181 gcttcaagct tttcatcatg agcagctttt tcgatacaag ctgacattga


open_tem_files

Download open_tem_files .pl
Argument   : (\$inputfile1, \$inputfile2, .... )};
Function   : opens JPO's xxxx.tem file, stores in 5 hashes. (usually one tem file)
Options    : -n, n, or N for removing any gaps in the sequences.
             -s, s, or S for getting only the sequences.
Returns    : ($r1, $r2, $r3, $r4, $r5) <= these are references for hashes.
Usage      : ($r1, $r2, $r3, $r4, $r5)=&open_tem_files(\$infile1, \$inputfile2..)};
             ---------- Example of xxxx
             >P1;1cdg
             sequence
             APDTSVSNKQNFSTDVIYQIFTDRFSDGNPANNPTGAAFDGTCTN-LRLYCGGDWQGIINKINDGYLTGMGVTAI
             >P1;1cdg
             secondary structure and phi angle
             CCCCCCCCCCCCCCCCEEECCHHHHCCCCHHHCCCPHHCCCCPCC-CCCCCPCCHHHHHHHHHCPHHHHHPCCEE
             >P1;1cdg
             solvent accessibility
             TTTTTTTTTTTFFFFFFFFFFFFFFTTTTTTTTTTTTTTTTTFTT-TTTTFFFFFTFFTTTFTTTFFTTFTFTFF
             >P1;1cdg
             DSSP
             CCCCCCCCCCCCCCCCEEECCHHHHCCCCGGGCCCGGGCCCCCCC-CCCCCCCCHHHHHHHHHCCHHHHHCCCEE
             >P1;1cdg
             percentage accessibility
             67523272360000000000000002213792129b722248085-14110000030015105660028040200
             2ltn           ----TETTSFLITKFSPDQQNLIFQGDGYTT-KEKLTLTK------AVKNTVGRALYSSP
             1loe           ----TETTSFSITKFGPDQQNLIFQGDGYTT-KERLTLTK------AVRNTVGRALYSSP

             2ltn           ----CEEEEEEECCCCCCCCCEEEEPCCEEP-PPCEEEEC------CCCPCEEEEEECCC
             1loe           ----CEEEEEEECCCCCCCCCEEEEPCCEEE-PPEEEEEC------CCCPCEEEEEECCC

             2ltn           ----TTTTTTTTTTFTTTTTTFTTTTTFTFT-TTTFTFFT------TTTTTTFFFFTTTT
             1loe           ----TTTTTTTTTTFTTTTTTFTTTTTFTFT-TTTFFFFT------TTTTTTFFFFTTTT

             2ltn           ----CEEEEEEECCCCCCCCCEEEEECCEEC-CCCEEEEC------CCCCCEEEEEECCC
             1loe           ----CEEEEEEECCCCCCCCCEEEEECCEEE-CCEEEEEC------CCCCCEEEEEECCC

             2ltn           ----543251b16504681c50422650502-75201006------35681200001453
             1loe           ----6532e1508a07981b50422750404-8a200006------36672200001453
Version    : 1.0

open_hlx_files

Download open_hlx_files .pl
Function   : 
             Example of hlx file (For Bo Nielson)
             Residue Frame Score Probability
             1 M   a  1.00563E+00 2.05479E-03
             2 T   b  1.01814E+00 2.52053E-03
             3 R   c  1.01814E+00 2.52053E-03
Returns    : list of ref. for hash(es)

open_jp_files

Download open_jp_files .pl
Function   : reads jp files and stores results in a hash.
Returns    : a reference of a hash for names and  their sequences.
Usage      : %out_hash=%{&open_jp_files(\$file_name)};
Version    : 1.1
Warning    : All the spaces  '-' !!!

open_ali_files

Download open_ali_files .pl
Function   : open fasta files and put sequences in a hash
             FASTA sequence file format is like this;

             >P1;1abp
             structureX:1abp:   1 : : 306 : :L-arabinose-binding protein:Escherichia coli: 2.40:-1.00
             ENLKLGFLVKQPEEPWFQTEWKFADKAGKDLG-FEVIKIAV-PDGEKTLNAIDSLAASGAKGFVICTPDPKLGSA
             TEGQGFKAADIIGIGINGVDAVSELSKAQATGFYGSLLPSPDVHGYKSSEMLYNWVAK--------DVEPPKFTE
             VTDVVLITRDNFKEELEKKGLGGK*
             >P1;2gbp
             structureX:2gbp:   1 : : 309 : :D-galactose/D-glucose-bind:Escherichia coli: 1.90:14.60
             ADTRIGVTIYKYDDNFMSVVRKAIEQDAKAAPDVQLLMNDSQNDQSKQNDQIDVLLAKGVKALAINLVDPAAAGT
             LKAHNKS-SIP-VFGVDA--LPEALALVKSGALAGTVLNDANNQAKATFDLAKNLADGKGAADGTNWKIDNKVVR
             VP-YVGVDKDNLAEFSKK------*

Usage      : %anyhash = %{&open_ali_files(\$filename)};

open_pir_files

Download open_pir_files .pl
Function   : open fasta files and put sequences in a hash
             FASTA sequence file format is like this;

             >P1;1abp
             structureX:1abp:   1 : : 306 : :L-arabinose-binding protein:Escherichia coli: 2.40:-1.00
             ENLKLGFLVKQPEEPWFQTEWKFADKAGKDLG-FEVIKIAV-PDGEKTLNAIDSLAASGAKGFVICTPDPKLGSA
             VTDVVLITRDNFKEELEKKGLGGK*
             >P1;2gbp
             structureX:2gbp:   1 : : 309 : :D-galactose/D-glucose-bind:Escherichia coli: 1.90:14.60
             LKAHNKS-SIP-VFGVDA--LPEALALVKSGALAGTVLNDANNQAKATFDLAKNLADGKGAADGTNWKIDNKVVR
             VP-YVGVDKDNLAEFSKK------*

Usage      : %anyhash = &open_pir_files($any_sequence_file_fasta_form);
Version    : 1.2

open_aln_files

Download open_aln_files .pl
Function   : reads CLUSTALW aln files and stores results in a hash.
Returns    : a reference of a hash for names and  their sequences.
Usage      : %out_hash=%{&open_aln_files(\$file_name)};
Version    : 1.1

open_seq_alignment_files

Download open_seq_alignment_files .pl
Argument   : (\$inputfile1, \$inputfile2, .... )};
Function   : open various sequence alignment files and put sequences in a hash(s)
Returns    : (*out, *out2)  or (@out_array_of_refs)
Usage      : (*out, *out2) = @{&open_seq_alignment_files(\$inputfile1, \$inputfile2)};
           : %hash_seq = %{&open_seq_alignment_files(\$inputfile1)};
           : (@out)        = @{&open_seq_alignment_files(\$inputfile1, \$inputfile2)};
Version    : 1.0

open_sst_files

Download open_sst_files .pl
Argument   : a ref. for scaler of "jp file name"
Example    : jp file  ==  seq1 ABDSF--DSFSDFS   <- true sequence
                              seq2 lkdf-jlsjlsjf

                 sst files == seq1.sst, seq2.sst

                 output hash == seq1 hHHHHHHHttEEEEEEEE
                                seq2 hHHHHHHHHHEEEEEEhh

Function   : gets the name of a file(jp file) with its absolute dir path
             reads the sequence names in the jp file and looks up all
             the sst files in the same directory. Puts sst sequences
             in a hash with keys of sequence names.

Returns    : a ref. for a hash
Usage      : %out_sst_hash =%{&open_sst_files(\$jp_file_dir_and_name)};
Warning    : $jp_file_dir_and_name should be absolute dir and file name

read_sst_files

Download read_sst_files .pl
Argument   : a ref. for scaler of "jp file name"
Example    : jp file  ==  seq1 ABDSF--DSFSDFS   <- true sequence
                              seq2 lkdf-jlsjlsjf

                 sst files == seq1.sst, seq2.sst

                 output hash == seq1 hHHHHHHHttEEEEEEEE
                                seq2 hHHHHHHHHHEEEEEEhh

Function   : gets the name of a file(jp file) with its absolute dir path
             reads the sequence names in the jp file and looks up all
             the sst files in the same directory. Puts sst sequences
             in a hash with keys of sequence names.

Returns    : a ref. for a hash
Usage      : %out_sst_hash =%{&read_sst_files(\$jp_file_dir_and_name)};
Warning    : $jp_file_dir_and_name should be absolute dir and file name

open_slx_files

Download open_slx_files .pl
Argument   : takes one ref. for a file.
Example    : selex file (foo.slx) looks like this:

         #=SQ GLB_TUBTU  5.9393 - - 0..0::0 -
         #=SQ GGZLB      20.9706 - - 0..0::0 -
         #=RF        x.....x.xxxx.xxx.xxxxxx....xxxxxxxxxxxxxxx.xxxx
         HAHU        ......VLSPADKTNVKAAWGKVGA......HAGEYGAEALERMFLS
         HBA3_PANTR  ......VLSPADKTNVKAAWGKVGA......HAGZYGAEALERMFLS

Function   : open slx files and put sequences in a hash
Returns    : a ref. of a hash
Usage      : %anyarray = &open_slx_files(\$any_sequence_file_slx_form);
Version    : 1.0
Warning    : The slx FORMAT SHOULD BE AT LEAST 30 residue long

open_out_files

Download open_out_files .pl
Argument   : takes one ref. for a file.
             >>Out file looks like this===>

             3aat         mfe   aapadp----adlfraderpGk   gigvY--etgktpvltS
             1ama       sswwshvemgppdp  krdtns--kkMnLG---YrddngkpyvLnC-

Function   : open out files and put their sequences in a hash
Returns    : a ref. of a hash
             Output example in a hash(fills the space)

             3aat       --mfe---aapadp----adlfraderpGk---gigvY--etgktpvltS
             1ama       ---eamiaakkmdkeylpiaGladFtraSA----eAfksgryVTV

Usage      : %anyarray = &open_out_files(\$any_out_file);
Warning    : well tested. It skips lines starting with blank, lines with '-' in them.

package

Download package .pl
Function   : Roman.pm : Roman <-> Arabic conversion package
Version    : 1.0
Warning    : From: ozawa@prince.pe.u-tokyo.ac.jp (OZAWA Sakuro)

time_date

Download time_date .pl
Function   : returns current time & date as 05/15/95 23:22:41
Version    : 1.0

sep

Download sep .pl
Function   : separater. \n#________________________________\n
Keywords   : separating_line
Usage      : &sep;
Version    : 1.0

diff_dates

Download diff_dates .pl
Function   : gets number of days between two dates ( "05/15/94" )
Usage      : $output = &diff_dates("05/15/1994", "05/15/1995")
Version    : 1.0
Warning    : modified (originally from reb@serf.nsc.com (Edward Brown))

fromJulian

Download fromJulian .pl
Example    : print &fromJulian(34469), "\n";
Function   : taking the days between two dates.
Version    : 1.0
Warning    : got from reb@serf.nsc.com (Edward Brown)
             require "julian
             $Value1 = &toJulian("05/15/1994");        # Assign $Value1 a Julian Day
             print "$Value1\n";
             $Value2 = &toJulian("05/20/1994");        # Assign Value2 a Julian Day
             print "$Value2\n";
             $Days = $Value2 - $Value1;              #Difference in Days
             print "$Days\n";
             print &fromJulian(34469), "\n";         # Give a Julian Day, give the date
             print &fromJulian(34474), "\n";
             What is the Date 25 Days from Today?  (You can get format from `date`)

             $Value = &toJulian("05/16/1995");
             $Value +=  25;
             print &fromJulian($Value), "\n";

toJulian

Download toJulian .pl
Example    : $Value1 = &toJulian("05/15/94"); print "$Value1\n";
Function   : taking the days between two dates.
Version    : 1.0
Warning    : got from reb@serf.nsc.com (Edward Brown)

opendir_and_go

Download opendir_and_go .pl
Example    : as in my 'indexing.pl' for perl file indexer.
Function   : open dir and process all files if you wish, and then go in any sub
             dir of it. Using recursion. created by A Biomatic
             if any file is linked, it skips that file.
Usage      : &opendir_and_go($input_dir); #$inputdir='/nfs/ind4/ccpe1/people/A Biomatic /jpo/align';
Version    : 1.0
Warning    : Seems to work fine.

occurances

Download occurances .pl
Function   : this is for sort, to sort things according to the higher num. of occu.
Usage      : sort occurances (@any_array_with_repeating_element);
Version    : 1.0
Warning    : This is from 21 DAYS book, page 373.

extract_ori_seq nt5

Download extract_ori_seq nt5 .pl
Function   : extract seqs. which are from struc. alignment only. to be analysed.
             after mul. alignment with added seq. you can extract original str.
             sequ. by using this. The output always has ...msff  ext.
             *array_ali is the JPO's or true alignment hash.
Usage      : &extract_ori_seq($input_file, $output_file, $out_seq_no, *array2);
Version    : 1.0

get_pair_homol_array

Download get_pair_homol_array .pl
Function   : get pair wise seq. !! Number of pair identical residues.
Usage      : $hom_out_count = ${&get_pair_homol_array(\@any_array_of_2_elem)};= @ar=(ABCDE..., CDEGA..)
Version    : 1.0
Warning    : reliable, but input seq. strings shouldn't contain spaces.

get_percent_homol_arr

Download get_percent_homol_arr .pl
Function   : get pair wise seq. identity of any two strings, outputs a scalar (%)
Usage      : $homology_out = ${&get_pair_homol(\@any_array_of_2_elem)};= @ar=(ABCDE..., CDEGA..)
Version    : 1.0
Warning    : reliable, but input seq. strings shouldn't contain spaces.

get_pair_homol_hash

Download get_pair_homol_hash .pl
Function   : get pair wise seq. identity as a scalar count
Usage      : $homology_out = & get_pair_homol (%any_hash); , eg) %hash = (name1, ABCDE..., name2, CDEGA..)
Version    : 1.0
Warning    : reliable, but input seq. strings shouldn't contain spaces.

get_percent_homo_hash

Download get_percent_homo_hash .pl
Function   : get pair wise seq. identity(%) of any two strings put in as a hash
Usage      : $homology_out = &get_pair_homol_hash(%any_hash); , eg) %hash = (name1, ABCDE..., name2, CDEGA..)
Version    : 1.0
Warning    : reliable, but input seq. strings shouldn't contain spaces.

file_size

Download file_size .pl
Function   : returns the size of any single testing file
Usage      : $outputfilesize = &file_size($input_file_name);
Version    : 1.0
Warning    : Q is for quality of this sub. This can't be wrong.

seq_comp_percent2

Download seq_comp_percent2 .pl
Function   : get string seq COMPOSITION identities(a to z). gets array
             of strings and outs array of % numbers
Usage      : @outarray = &seq_comp_percent2(@any_input_string_array);
Version    : 1.0

get_full_file_name

Download get_full_file_name .pl
Function   : returns full directory path (= pwd ), eg.  /nfs/ind4/ccpe1/people/A Biomatic
Keywords   : get_long_path_name, get_complete_path_name
Usage      : $any_path = ${&get_full_dir_path($any_directory)}; or &dir_path('.') for pwd.
Version    : 1.0

dir_path

Download dir_path .pl
Function   : returns directory path (= pwd ), eg.  /nfs/ind4/ccpe1/people/A Biomatic
Usage      : $any_path = &dir_path($any_directory); or &dir_path('.') for pwd.
Version    : 1.0

full_pwd_path

Download full_pwd_path .pl
Function   : returns full directory path (= pwd ), eg.  /nfs/ind4/ccpe1/people/A Biomatic
Keywords   : get_long_path_name, get_complete_path_name
Usage      : $any_path = ${&full_dir_path($any_directory)}; or &dir_path('.') for pwd.
Version    : 1.0

get_full_pwd_path

Download get_full_pwd_path .pl
Function   : returns full directory path (= pwd ), eg.  /nfs/ind4/ccpe1/people/A Biomatic
Keywords   : get_long_path_name, get_complete_path_name
Usage      : $any_path = ${&get_full_dir_path($any_directory)}; or &dir_path('.') for pwd.
Version    : 1.0

get_whole_pwd_path

Download get_whole_pwd_path .pl
Function   : returns full directory path (= pwd ), eg.  /nfs/ind4/ccpe1/people/A Biomatic
Keywords   : get_long_path_name, get_complete_path_name
Usage      : $any_path = ${&get_whole_dir_path($any_directory)}; or &dir_path('.') for pwd.
Version    : 1.0

pwd_path

Download pwd_path .pl
Function   : returns directory path (= pwd ), eg.  /nfs/ind4/ccpe1/people/A Biomatic
Usage      : $any_path = ${&dir_path($any_directory)}; or &dir_path('.') for pwd.
Version    : 1.0

get_pwd_dir

Download get_pwd_dir .pl
Function   : returns present working dir base
Usage      : $dir = &get_pwd_dir($anydir); # to return say,  'perl' .
Version    : 1.0
Warning    : well tested.

dir_name

Download dir_name .pl
Function   : returns present working dir base
Usage      : $dir = &pwd_dir($anydir); # to return say,  'perl' .
Version    : 1.0
Warning    : well tested.

pwd_dir_name

Download pwd_dir_name .pl
Example    : returns 'jong' with the input of '/nfs/ind5/A Biomatic '
Function   : returns present working dir name
Usage      : $dir = &pwd_dir($any_absolute_path_dir);
Version    : 1.0
Warning    : well tested.

get_pwd_dir_name

Download get_pwd_dir_name .pl
Function   : returns present working dir name
Usage      : $dir = &get_pwd_dir($any_absolute_path_dir);
Version    : 1.0
Warning    : well tested.

get_full_path_dir_names

Download get_full_path_dir_names .pl
Example    : with 'jong' it gives '/nfs/ind5/jong', '/nfs/ind4/ccep1/people/A Biomatic '...
             when 'jong' is in /nfs/ind4/jong/Perl, it returns /nfs/ind4/A Biomatic
Function   : returns full path dir names with given short dir names.
Usage      : @full_path_dirs = @{&get_full_path_dir_names(@short_dir_name)};
Version    : 1.0
Warning    : when 'jong' is in /nfs/ind4/jong/Perl, it returns /nfs/ind4/A Biomatic

get_extension_names

Download get_extension_names .pl
Keywords   : get_file_extension, get_extension, get_file_ext, get_ext_names
             get_file_extensions
Options    : _  for debugging.
             #  for debugging.
Usage      : @ext=@{&get_file_extensions(\@file)}  or
             $ext=${&get_file_extensions(\$file)}
Version    : 1.2
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_file_extensions

Download get_file_extensions .pl
Keywords   : get_file_extension, get_extension, get_file_ext, get_ext_names
             get_extension_names
Options    : _  for debugging.
             #  for debugging.
Usage      : @ext=@{&get_file_extensions(\@file)}  or
             $ext=${&get_file_extensions(\$file)}
Version    : 1.2
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_base_names

Download get_base_names .pl
Argument   : handles both ref and non-ref.
Example    : $base => 'test'  with 'test.txt' or '/home/dir/of/mine/text.txt'
Function   : produces the file base name(eg, "evalign"  out of "evalign.pl" ).
              when xxxx.xx.gz form file is given, it removes gz as well

Keywords   : get_base_name{, base_name, file_base_name ,  get_file_base_name
             get_basename, basename, get_root_name, base , root, get_file_root
Usage      : $base =${&get_base_names(\$file_name)};
             :   or @bases = &get_base_names(\@files);  # <-- uses `pwd` for abs directory
Version    : 1.5

read_file_names_only

Download read_file_names_only .pl
Example    : @all_files=@{&read_file_names_only(\$abs_path_dir_name, ..)};
             @all_files=@{&read_file_names_only(\$dir1, '.pl', '.txt')};
             @all_files=@{&read_file_names_only(\$dir1, '.', \$dir2, \$dir3, 'e=pl')};
             @all_files=@{&read_file_names_only(\$abs_path_dir_name, 'G1_*.txt')};
             @all_files=@{&read_file_names_only(\$abs_path_dir_name, \@target_file_names)};

Function   : read any file names and REMOVES the '.', '..' and dir entries.
             And then put in array.  This checks if anything is a real file.
             You can use 'txt' as well as '.txt' as extension
             You can put multiple file extension (txt, doc, ....)
               and multiple dir path (/usr/Perl, /usr/local/Perl....)
               It will fetch all files wanted in all the direc specified

             It can handle file glob eg)
             @all_files=@{&read_file_names_only(\$abs_path_dir_name, 'G1_*.txt')};
               for all txt files starting with 'G1_'

Keywords   : filename only, filename_only, read_files_only, read files
             get_file_names_only, get_files_only, read_files_only
Options    : "extension name". If you put , 'pl' as an option, it will show
             files only with '.pl' extension.
  '-p'      for path also included resulting in '/path/path/file.ext'
              rather than 'file.ext' in output @array
  '-s'      for sorting the results
  e='xxx'  for extention xxx
  '.pl'    for files extended by '.pl'
  'pl'     for files extended by 'pl', same as above
  D=       for dir name input

Usage      : @all_files=@{&read_file_names_only(, [extension])};
Version    : 2.8
Warning    : This does not report '.', '..'
             Only file names are reported. Compare with &read_any_dir
             extension size should be less than 15 char.
             It sorts the results!

read_file_extension_names_only

Download read_file_extension_names_only .pl
Function   : reads only extension names. It returns the ext as keys
             and occurrances of them as values of the keys.
Keywords   : read_file_ext_only, read_file_ext_names_only, read_ext_names_only,
             read_ext_only
Usage      : %file_ext=%{&read_file_extension_names_only('.')};
Version    : 1.1

read_dir_names_only

Download read_dir_names_only .pl
Argument   : takes one or more scaler references. ('.', \$path, $path, ... )
Example    : @files=@{&read_dir_names_only('n', "s=1", '.')};
Function   : read any dir names and and then put in array. If no argument
             for the target directory, it opens PWD automatically
             You can specify the length of dir names to choose.
Keywords   : read_dir_only, get_dir_names, get_dir_names_only, get_subdir_names,
Options    : n   for names only reading(not the full path) , default is full path
             s=  for the size of dirs name. If you want all the dir names
                   with a size of 1 char, s=1
Returns    : one ref. of array.
Usage      : @all_dirs_list = @{&read_dir_names_only(\$absolute_path_dir_name, ....)};
Version    : 3.4
Warning    : This does not report '.', '..'
             Only file names are reported. Compare with &read_any_dir

take_file_name

Download take_file_name .pl
Example    : will return file.name  from /dir/dir/file.name

Function   : takes file name portion from long dir/filename
Keywords   : get_file_name_only, extract_file_name, take_file_name_only
Options    : _  for debugging.
             #  for debugging.
Usage      : $base_portion =${&take_file_name(\'/dir/file.name')};
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_file_dir_name

Download get_file_dir_name .pl
Example    : /dir/file.name
             =>  /dir/
Function   : returns the dir portion of long filename.
             If file does not have dir portion it returns './'
Keywords   : get_file_dir_name, take_file_dir_name, take_file_dir_names
Options    : _  for debugging.
             #  for debugging.
Version    : 1.0
Warning    : You MUST NOT delete '# options : ..' entry
              as it is read  by various subroutines.

get_dir_names_only

Download get_dir_names_only .pl
Argument   : takes one or more scaler references. ('.', \$path, $path, ... )
Function   : read any dir names and and then put in array.
Returns    : one ref. of array.
Usage      : @all_dirs_list = @{&get_dir_names_only(\$absolute_path_dir_name, ....)};
Version    : 3.0
Warning    : This does not report '.', '..'
             Only file names are reported. Compare with &read_any_dir

get_subdir_names

Download get_subdir_names .pl
Argument   : takes one or more scaler references. ('.', \$path, $path, ... )
Example    : Function   : Gets all subdir and subsubsub...dir names in absolute path names.
Returns    : one ref. of array. (NOT full path names), refer  'read_full_dir_names'
Usage      : @all_sub_dirs_list = @{&read_dir_names_only(\$absolute_path_dir_name, ....)};
Version    : 3.0
Warning    : This does not report '.', '..' ,  Also, this does not show full path
             Only file names are reported. Compare with &read_any_dir

read_full_dir_names

Download read_full_dir_names .pl
Argument   : takes one or more scaler references. ('.', \$abs_path, $path, ... )
Example    : input>> &read_full_dir_names('/nfs/ind4/ccpe1/people/A Biomatic /perl');
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/code
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/tk
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/ch1
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/sub2
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/sub3
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/xxxx.cong
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/whatever
Returns    : one ref. of array.
Usage      : @all_files_list = @{&read_full_dir_names(\$absolute_path_dir_name, ....)};
Version    : 1.0
Warning    : This does not report '.', '..'
             Only file names are reported. Compare with &read_any_dir

get_full_dir_names

Download get_full_dir_names .pl
Argument   : takes one or more scaler references. ('.', \$abs_path, $path, ... )
Example    : input>> &get_full_dir_names('/nfs/ind4/ccpe1/people/A Biomatic /perl');
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/code
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/tk
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/ch1
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/sub2
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/sub3
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/xxxx.cong
             /tmp_mnt/nfs/ind4/ccpe1/people/A Biomatic /perl/whatever
Returns    : one ref. of array.
Usage      : @all_files_list = @{&read_full_dir_names(\$absolute_path_dir_name, ....)};
Version    : 1.0
Warning    : This does not report '.', '..'
             Only file names are reported. Compare with &read_any_dir

read_any_dir_simple

Download read_any_dir_simple .pl
Argument   : takes one scaler reference.
Function   : read any dir and REMOVES the '.' and '..' entries. And then put in array.
Returns    : one ref. of array.
Usage      : @file_list = @{&read_any_dir(\$absolute_path_dir_name)};
Version    : 1.1

read_any_dir

Download read_any_dir .pl
Argument   : takes one scaler reference.
Function   : read any dir and REMOVES the '.' and '..' entries.
             And then put in array.
Returns    : one ref. of array. for the files in the given directory.
Usage      : @file_list = @{&read_any_dir(\$absolute_path_dir_name)};
Version    : 1.2

read_any_dir2

Download read_any_dir2 .pl
Argument   : takes one or more scaler references.
Function   : read any dir and REMOVES the '.' and '..' entries. And then put in array.
Returns    : one ref. of array.
Usage      : @file_list = @{&read_any_dir(\$absolute_path_dir_name, ....)};
Version    : 1.0
Warning    : This does not report '.', '..', '#xxxx', ',xxxx', etc. only legitimate
             file and dir names are reported.

max_str_value_hash

Download max_str_value_hash .pl
Function   : gets the largest 'string' length in values of any one hash
Usage      : $largest_str_length_of_values = &max_value_hash(%any_hash);
Version    : 1.0

get_max_hash_by_value

Download get_max_hash_by_value .pl
Function   : gets the largest 'string' length in values of any one hash
Keywords   : get_max_hash_value, get_largest_hash_value, get_max_hash_key_value
             get_max_hash_num_value, max_hash_value
Usage      : $largest_str_length_of_values = &max_value_hash(%any_hash);
Version    : 1.1

max_str_key_hash

Download max_str_key_hash .pl
Function   : gets the largest 'string' length in keys of any one hash
Keywords   : largest key length,
Usage      : $largest_str_length_of_values = &max_value_hash(%any_hash);
Version    : 1.0

min_string_value_hash

Download min_string_value_hash .pl
Function   : gets the smallest 'string' length in values of any one hash
Usage      : $small_str_length_of_values = &min_str_value_hash(%any_hash);
Version    : 1.0

min_str_key_hash

Download min_str_key_hash .pl
Function   : gets the smallest 'string' length in values of any one hash
Usage      : $small_str_length_of_values = &min_str_value_hash(%any_hash);

min_string_key_hash

Download min_string_key_hash .pl
Function   : gets the smallest 'string' length in values of any one hash
Usage      : $small_str_length_of_values = &min_str_value_hash(%any_hash);
Version    : 1.0

fasta_append

Download fasta_append .pl
Function   : append addtional one fasta format sequence.
Usage      : &fasta_append($name, $string, $output_file);
Version    : 1.0

fasta_output

Download fasta_output .pl
Function   : prints fasta format output which is using $mul_factor
$seq is the whole sequence number(largest).
             $dir.$mul_factor.fasta can be any output name,
Usage      : &fasta_output($dir.$mul_factor.fasta,  $whole_seq, *array_ali, *array1);
Version    : 1.0

fasta_out_seq_no

Download fasta_out_seq_no .pl
Function   : prints fasta format output with specified seq no from whole seq. no.
$seq is the whole sequence number(largest). $out_seq_no is the target
Usage      : &fasta_out_seq_no($dir, $out_seq_no, $seq, *array2, *array1);
Version    : 1.0

ctime

Download ctime .pl
Example    : $Date = &ctime(time);
Function   : a simple Perl emulation for the well known ctime(3C) function.
Usage      : $Date = &ctime(time);
Version    : 1.0

get_time

Download get_time .pl
Example    : "Nov30 4:37 1995"
Function   : a simple Perl emulation for the well known ctime(3C) function.
Usage      : $Date = &get_time(time);
Version    : 1.0

get_date

Download get_date .pl
Example    : 30-Nov-1995
Function   : returns date: $date6d (6 digit format) and
             $datec (dd-mmm-yyyy format), Tim's version is 'getdate' in th_lib.pl
Keywords   : get_present_date,
Returns    : ref of an array for (1-May-1995 and 010595)
Usage      : @outformat = &get_date;  eg result >  (010595 1-May-1995)
Version    : 1.1

if_file_older_than_x_days

Download if_file_older_than_x_days .pl
Function   : checks the date of last modi of file given and compares with
             present time. Substracts diff and returns the actual diff days.
Keywords   : how_old_file, how_old, is_file_older_than_x_days, file_age,
             file_age_in_days, if_older_than_x_days,
Returns    : the actual days older, so NON-ZERO, otherwise, 0
Usage      : if( ${&if_file_older_than_x_days($ARGV[0], $days)} > 0){
Version    : 1.3

array_chk

Download array_chk .pl
Argument   : gets on ref. of array.
Example    : This is used only with subs which accepts array inputs.
Function   : checks if any inputting array is empty or with one element.
Keywords   : array_check
Returns    : nothing, prints out messages to STDOUT
Usage      : &array_chk(\@any_array_to_chk);
Version    : 1.0

hash_chk

Download hash_chk .pl
Function   : checks hash input of any subroutine.
Keywords   : hash_check
Usage      : &hash_chk(\%input_hash);
Version    : 1.0

hash_output_chk

Download hash_output_chk .pl
Function   : checks hash output of any subroutine.
Usage      : &hash_output_chk(\%outing_hash);
Version    : 1.0

n

Download n .pl
Function   : puts one single new line
Usage      : &n;

cls

Download cls .pl
Function   : clears screen
Usage      : &cls;
Version    : 1.0

seq_comp_percent1

Download seq_comp_percent1 .pl
Argument   : one ref. of an array
Function   : get string seq identities(a to z). gets array of strings and outs array of % numbers
Returns    : one ref. of an array
Usage      : @outarray = &seq_comp_percent1(@any_input_string_array);

get_id_among_2_1

Download get_id_among_2_1 .pl
Function   : gets the % id of any two sequences, returns in  100.0% format.
Usage      : $id = &get_id_among_2(*charcount1, *charcount2) <- hashes

get_id_among_2_2

Download get_id_among_2_2 .pl
Function   : gets the % id of any two sequences, returns in  100.0% format.
Usage      : $id = &get_id_among_2(*charcount1, *charcount2) <- hashes
Version    : 1.0

array_average

Download array_average .pl
Argument   : takes one array reference.
Function   : (the same as average_array)
Keywords   : get_array_average, av_array, average_array, get_average_array
             average_of_array, average_array
Returns    : single scaler digit.
Usage      : $output = &array_average(\@any_array);
Version    : 1.2
Warning    : If divided by 0, it will automatically replace it with 1

average_array

Download average_array .pl
Argument   : takes one array reference.
Function   : (the same as array_average)
Returns    : single scaler digit.
Usage      : $output = &average_array(\@any_array);
Version    : 1.0
Warning    : If divided by 0, it will automatically replace it with 1

average_of_array

Download average_of_array .pl
Argument   : takes one array reference.
Options    : -int to make the resultant numbers shown in integer
Returns    : single scaler digit.
Usage      : $output = &average_of_array(\@any_array);
Version    : 2.0
Warning    : If divided by 0, it will automatically replace it with 1
             '$item == 0' does not work !!! in the following

hash_average

Download hash_average .pl
Example    : %in=(1, "13242442", 2, "92479270", 3, "2472937439");
Returns    : %out =(1, 2.13242, 2, 5.2702, 3, 1.72937439); <-- somethins like
             numbers. So, undefined array element is not counted
             This is more correct.
Usage      : %out=%{&hash_average(\%in)};  or
             ($out1, $out2)=&hash_average(\%in,\%in2);
Version    : 1.0

get_hash_value_average

Download get_hash_value_average .pl
Keywords   : get_values_average, get_average_hash_value, get_average_value
Returns    : %out =(1, 2.13242, 2, 5.2702, 3, 1.72937439); <-- somethins like
             numbers. So, undefined array element is not counted
             This is more correct.
Usage      : %out=%{&get_hash_value_average(\%in)};  or
             ($out1, $out2)=&hash_average(\%in,\%in2);
Version    : 1.0

hash_stat_for_all

Download hash_stat_for_all .pl
Example    : %in =(1, "13242442", 2, "92479270", 3, "2472937439");
             %in2=(1, "28472", 2, "23423240", 3, "123412342423439");

             %in =(name1, "1,3,2,4,2,4,4,2", name2, "9,2,4,7,9,2,7,0");

Function   : gets the min, max, av, sum for the whole values of ALL the
             hashes put in. (grand statistics)
Returns    : normal array of ($min, $max, $sum, $av)
             Example  out:>                 |  min max sum  av
                            -----------------------------------
                            of the whole    |   0   9  110   6
Usage      : %out=%{&hash_average(\%in, \%in2,..)};
Version    : 1.0

min

Download min .pl
Function   : accepts ref of array, scalar and normal digits to
             find the min. Only gets numbers. If you put something
             like 'H333333', it gets digits '333333' only and returns it.
             this uses RECURSION.
Usage      : $min = &min (37, 24, 3,1,5, \@array, @array2, \$arr_ref);
Version    : 1.0

max

Download max .pl
Function   : accepts ref of array, scalar and normal digits to
             find the min. Only gets numbers. If you put something
             like 'H333333', it gets digits '333333' only and returns it.
             this uses RECURSION.
Usage      : $max = &max (37, 24, 3,1,5, \@array, @array2, \$arr_ref);
Version    : 1.0

get_longest_str_size

Download get_longest_str_size .pl
Argument   : gets one reference of an array of strings.
Function   : get_longest_str_size in an array. eg. get ABCDE among (A, CAB, CDE, ABCDE)
             When hash is given it processes the values of it.
Keywords   : get_the_largest_string_size{, get_largest_string_size,
             get_largest_str_size{,largest_string_size{, get_largest_string_size_hash
             get_long_str_size, get_longest_string_size, lonest_string_size
Usage      : $long_str_size = ${&get_long_str_size (\@any_array_of_string)};
             $long_str_size = ${&get_long_str_size (\@any_array_of_string)};
Version    : 1.2

get_shortest_str_size

Download get_shortest_str_size .pl
Argument   : gets one reference of an array of strings.
Function   : get_shortest_str_size in an array. eg. get A among (A, CAB, CDE, ABCDE)
Keywords   : get_short_str_size, get_short_string_size, shortest_string_size,
Usage      : $short_str_size = &get_short_str_size (\@any_array_of_string);
Version    : 1.0
Warning    : once debugged. 1st May/95

get_id_among_2

Download get_id_among_2 .pl
Argument   : gets two references of hashes of chars and their occurances.
Example    : %hash1=('A', 30, 'B', 99, 'C', 15 .....)
Function   : gets the % id of any two sequences
Usage      : $id = &get_id_among_2(\%charcount1, \%charcount2) <- hashes
Version    : 1.0

extract_num_to_array

Download extract_num_to_array .pl
Function   : extract only numbers(including negatives) from a string and put into an array
Usage      : @my_outarray = &extract_num_to_array($any_input_string);
Version    : 1.0

weighted_average

Download weighted_average .pl
Version    : 1.0

weighted_av

Download weighted_av .pl
Version    : 1.0

sum_digits_in_string

Download sum_digits_in_string .pl
Author     : Randal
Keywords   : add_digits_in_string,
Version    : 1.0

sum_array

Download sum_array .pl
Argument   : ref. of an array of numbers.
Function   : sum of all the  elements of an array .
Keywords   : get_array_sum get_sum_array, get sum of array
Returns    : a ref. of a scaler.
Usage      : $out =  ${&sum_array(\@anyarray)};

sum_of_array

Download sum_of_array .pl
Argument   : ref. of an array of numbers.
Function   : sum of all the  elements of an array .
Options    : -int for integerised output.
Returns    : a ref. of a scaler.
Usage      : $out =  ${&sum_of_array(\@anyarray)};
Version    : 1.0

array_sum

Download array_sum .pl
Argument   : ref. of an array of numbers.
Function   : sum of all the  elements of an array .
Returns    : a ref. of a scaler.
Usage      : $out =  ${&sum_array(\@anyarray)};
Version    : 1.0

sum_hash_values_of_string

Download sum_hash_values_of_string .pl
Example    : %hashinput= ( name1, '12..3e',
                            name2, '...234');
             $result = 1+2+3+2+3+4 = 15 (from above example)
Function   : sum of all the  numbers in valuse of a hash
Keywords   : sum_hash_string_values, get_sum_hash_string_values, get_hash_value_sum
Usage      : $out = &sum_hash_values_of_string(\%anyhash);
Version    : 1.1
Warning    : It only gets digits in the input strings and sums them up.

sum_hash_values

Download sum_hash_values .pl
Example    : %hashinput= ( name1, '12..3e',
                            name2, '...234');
             $result = 1+2+3+2+3+4 = 15 (from above example)
Function   : sum of all the  numbers in valuse of a hash
Keywords   : sum_hash_number_values, get_sum_hash_values, get_hash_value_sum
Usage      : $out = &sum_hash_values(%anyhash);
Version    : 1.0
Warning    : It only gets digits in the input strings and sums them up.

key_ready

Download key_ready .pl
Function   : detects keyboard input without reading it
Returns    : 
             You should check out the Frequently Asked Questions list in
             comp.unix.* for things like this: the answer is
             essentially the same.
             It's very system dependent.  Here's one solution that
             works on BSD systems:
Version    : 1.0

round

Download round .pl
Function   : gives rounded numbers
Version    : 1.0

round_number

Download round_number .pl
Function   : gives rounded integer numbers. 9.5 will be 10, 9.4 will be 9
Version    : 1.0

round_numbers

Download round_numbers .pl
Function   : gives rounded integer numbers. 9.5 will be 10, 9.4 will be 9
Usage      : @output=@{&round_numbers(\@input_numbs)};
             or  $output=${&round_numbers(\$input_numbs)};
Version    : 1.0

trim_numbers

Download trim_numbers .pl
Example    : given num array( 1.33333, 3.555242424, 0.2342324, 4.9234723747)
             >>>            (1.33,  3.56,  0.23,  4.92 )

Function   : gives trimmed numbers (not rounded)
Usage      : @output=@{&trim_numbers(\@input_numbs, \$size_of_posi)};
Version    : 1.0
Warning    : If you put '1' with trimming value of 2 it will be '1.00'

min_elem_array

Download min_elem_array .pl
Argument   : numerical arrays
Function   : gets the smallest element of any array of numbers.
Returns    : one or more ref. for scalar numbers.
Usage      : ($out1, $out2)=@{&min_elem_array(\@array1, \@array2)};
             ($out1)       =${&min_elem_array(\@array1)          };
Version    : 1.0

max_elem_array

Download max_elem_array .pl
Argument   : numerical arrays
Function   : gets the largest element of any array of numbers.
Returns    : one or more ref. for scalar numbers.
Usage      : ($out1, $out2)=@{&max_elem_array(\@array1, \@array2)};
             ($out1)       =${&max_elem_array(\@array1)          };
Version    : 1.0

max_elem_string_array

Download max_elem_string_array .pl
Argument   : numerical arrays
Function   : gets the largest string length of element of any array of numbers.
Keywords   : largest string length of array
Returns    : one or more ref. for scalar numbers.
Usage      : ($out1, $out2)=@{&max_elem_array(\@array1, \@array2)};
             ($out1)       =${&max_elem_array(\@array1)          };
Version    : 1.0

min_elem_string_array

Download min_elem_string_array .pl
Argument   : numerical arrays
Function   : gets the largest string length of element of any array of numbers.
Keywords   : shortest string length of array
Returns    : one or more ref. for scalar numbers.
Usage      : ($out1, $out2)=@{&max_elem_array(\@array1, \@array2)};
             ($out1)       =${&max_elem_array(\@array1)          };
Version    : 1.0

maximum

Download maximum .pl
Function   : another way of finding maximum
Keywords   : get_maximum, get_bigger, get_largest
Usage      : $biggest = &maximum(37, 24);
Version    : 1.0

minimum

Download minimum .pl
Function   : another way of finding minimum
Usage      : $biggest = &maximise(37, 24);
Version    : 1.0

get_largest_element

Download get_largest_element .pl
Function   : If strings are given, it gets the largest string elem(by leng)
             If numbers are given, it gets the largest number elem
             It automatically checks if string is given
Keywords   : get_largest_value, get_biggest_value,
             get_maximum_element, get_largest_number,
             get_largest_number_element, get_longest_element,
             get_longest_string
Options    : _  for debugging.
             #  for debugging.
             s  for string input (as the second input argument!)

Usage      : $max=${&get_largest_element(\@array_input)};
Version    : 1.3

sqrt_array

Download sqrt_array .pl
Function   : sqrt all elements of an array

square_array

Download square_array .pl
Function   : converts all the elements of an array to squared values.

sum_of_squared_array

Download sum_of_squared_array .pl
Function   : sum of all the squared elements of an array .
Usage      : $out = &sum_of_squared_array(@anyarray);
Version    : 1.0

x_mul_y_arrays

Download x_mul_y_arrays .pl
Function   : multiplies each item of two arrays .
Usage      : @out_array = &x_mul_y_arrays(*array1,*array2);
Version    : 1.0

sum_x_mul_y_arrays

Download sum_x_mul_y_arrays .pl
Function   : sums up multiplied items of two arrays .
             one to one multiplication(elem 1 of array 1 x elem 1 of array2)
Usage      : $out = &sum_x_mul_y_arrays(*array1,*array2);
Version    : 1.1

corelation_coefficient

Download corelation_coefficient .pl
Function   : gets corelation_coefficient of two equal length arrays
Keywords   : cc, get_cc, get_corelation_coefficient
Usage      : $cc = &corelation_coefficient(\@array_not_hash1, \@array_not_hash2);
Version    : 1.0
Warning    : uses references for ARRAY.

cc

Download cc .pl
Function   : synonmym of  corelation_coefficient
Usage      : $cc = &cc(\@array_not_hash1, \@array_not_hash2);
Version    : 1.0
Warning    : uses references for ARRAY

sd

Download sd .pl
Argument   : array references are accepted. outputs scalar single val.
Keywords   : standard deviation, get_standard_deviation,
             standard_deviation, get_SD, get_sd, stdev
Returns    : a ref. of a scaler
Usage      : $sd=${&sd(\@array_of_numbers)};
Version    : 1.2

se

Download se .pl
Argument   : ref. for an array.
Function   : gets standard error of any given array
Keywords   : standard error, get_standard_error, sterr
Usage      : $se=${&se(\@array_of_numbers)};
Version    : 1.1

remove_non_char

Download remove_non_char .pl
Function   : removes non chars on any input string. (scaler context)
Usage      : $outstring = &remove_non_char($input_string);
Version    : 1.0

numerically

Download numerically .pl
Function   : sorts elements by nemerical size.
Usage      : sort numerically (@array);
Version    : 1.0

abs_numerically

Download abs_numerically .pl
Keywords   : numerically_abs, numerically_absolutely
Version    : 1.0

rev_abs_numerically

Download rev_abs_numerically .pl
Version    : 1.0

randomise_lines

Download randomise_lines .pl
Function   : 
             outs line numbers with lines
Usage      : To randomize th_lib.pl just type &random_lines(300,500,"th_lib.pl");
             &random_lines(300, 50, "th_lib.pl"); <-- to get 300 lines
                                                      from 50 numbers
Version    : 1.0

pick_random_hash_pairs

Download pick_random_hash_pairs .pl
Example    : in signature roation or FVWM rc file menu color rotation.
Function   : randomly pick any num of pairs of hash elements.
             outs line numbers with lines
             Default pick number is 1.
Keywords   : choose_random_hash_pairs
Returns    : ARRAY ref not HASH  ref
Usage      : @array = @{&pick_random_hash_pairs(\%hash1, \$xx)};
Version    : 1.3

pick_random_files

Download pick_random_files .pl
Example    : @array=@{&pick_random_files(\@files, \$num_of_pick)};
Function   : randomly pick any num of files given.
Keywords   : choose_random_files pick_files_randomly
Returns    : ARRAY ref not HASH  ref
Usage      : @array = @{&pick_random_files(\@files, \$num_of_pick)};
Version    : 1.0

hash_substract_by_keys

Download hash_substract_by_keys .pl
Example    : %hash1 = %hash1 - %hash2, ==> (4,4)=(2,2, 4,4) - (2,2)
Function   : removes overlapping entries in hashes.
Keywords   : substract_hash, substract_hash_by_value, hash_substract
Usage      : %hash1 = %{&hash_substract_by_keys(\%hash1, \%hash2)};
Version    : 1.1

substract_hash_by_keys

Download substract_hash_by_keys .pl
Example    : %hash1 = %hash1 - %hash2, ==> (4,4)=(2,2, 4,4) - (2,2)
Function   : removes overlapping entries in hashes.
Keywords   : substract_hash, substract_hash_by_keys
Usage      : %hash1 = %{&substract_hash(\%hash1, \%hash2)};
Version    : 1.1

substract_hash_by_values

Download substract_hash_by_values .pl
Example    : %hash1 = %hash1 - %hash2, ==> (4,4)=(2,2, 4,4) - (2,2)
Function   : removes overlapping value entries in hashes.
Keywords   : substract_hash, substract_hash_by_values
Usage      : %hash1 = %{&substract_hash_by_values(\%hash1, \%hash2)};
Version    : 1.1

substract_array

Download substract_array .pl
Example    : Following will produce (A K C);
  @array1= qw( A B K B B C);
    @array2= qw( B E D);
    @subs = @{&substract_array(\@array1, \@array2)};
Function   : removes any occurances of certain elem. of the first
             input array with second input array.
Keywords   : array_subtract, substract_array, ary1_minus_ary2
Usage      : @subs = @{&substract_array(\@array1, \@array2)};
Version    : 1.6

hash_catenate

Download hash_catenate .pl
Function   : removes overlapping entries in hashes.
Usage      : %output = %{&hash_catenate(\%hash1, \%hash2)};

merge_hash

Download merge_hash .pl
Function   : removes overlapping entries in hashes.
Keywords   : merge_hash_elements,add_hash, merge two hashes.
             merge hashes, merge_hashes.
Usage      : %output = %{&merge_hash(\%hash1, \%hash2)};
Version    : 1.2
Warning    : one bug caught.

superpose_hash

Download superpose_hash .pl
Function   : superpose hash keys and values to another hash. %target
             is the superposing hash(new ones will have the values of
             this target hash. For example, if you superpose
                (1, 123, 2, 343)
             to (1, 111, 2, 2222, 3, 3333), you will get
                (1, 123, 2, 343,  3, 3333) as the result.
             Template provide blank key entries.
Usage      : %output = %{superpose_hash(\%template, \%target));
Version    : 1.0

hash_common2

Download hash_common2 .pl
Argument   : accepts only two references of hashes
Example    : %hashout= %hash1 - %hash2, ==> (4,4)=(2,2, 4,4) - (2,2)
Returns    : a ref of a hash.
Usage      : %output = &hash_common($ref1, $ref2);
Version    : 1.0
Warning    : NOT working

remove_dup_in_hash

Download remove_dup_in_hash .pl
Argument   : one or more hash ref.
Example    : If %input was
                (1,1, 2,1, 3,1);
              The values are the same, so the last key value (3 1) will
              be the result.
             If %input was
                (1,1, 2,1, 3,1, 4,2, 5,2)
              result=(3 1, 5 2)

Function   : removes the duplicate  values of any hashes
Keywords   : remove_dupplicate_values_in_hash, remove_duplicate_values,
             remov_hash_dup, remove_duplication_in_hash
Returns    : one or more hash ref.
Usage      : %out=%{&remove_dup_in_hash(\%input_hash)};
Version    : 1.0

reverse_hash

Download reverse_hash .pl
Argument   : one or more hash ref.
Function   : exchanges the value and key of any hashes
Keywords   : invert_hash, inverse_hash
Returns    : one or more hash ref.
Usage      : %out=%{&reverse_hash(\%input_hash)};
Version    : 1.0
Warning    : Takes ALIGNED sequences.

hash_common

Download hash_common .pl
Returns    : the VALUES OF THE FIRST HASH which occur in later hashes
             are returned
Usage      : %hash1_value = %{&hash_common(\%hash1, \%hash2,...)};
Version    : 1.0

hash_common_by_keys

Download hash_common_by_keys .pl
Returns    : the VALUES OF THE FIRST HASH which occur in later hashes
             are returned
Usage      : %hash1_value = %{&hash_common_by_keys(\%hash1, \%hash2,...)};
Version    : 1.0

get_common_hash_keys

Download get_common_hash_keys .pl
Function   : gets the common hash keys of two hashes.
Usage      : @output_array_of_keys = @{&get_common_hash_keys(\%hash1, \%hash2)};
Version    : 1.0

hash_no_common

Download hash_no_common .pl
Example    : %hashout= %hash1 - %hash2, ==> (4,4)=(2,2, 4,4) - (2,2)
Function   : removes overlapping entries in hashes.
Usage      : %output = &hash_catenate(*hash1, *hash2);
Version    : 1.0
Warning    : surely working, This grep version is faster than for and defined loop.

beep

Download beep .pl
Usage      : &beep;
Version    : 1.0

capitalize_word

Download capitalize_word.pl
Keywords   : capitalise word,  capitalise_word
Version    : 1.0

capitalize_sentence

Download capitalize_sentence.pl
Version    : 1.0

shift_word_recursively

Download shift_word_recursively .pl
Argument   : SCALAR or ARRAY refs. and delimitor ('/', '.', '-'.....)
             delimitor can be multi line => '#$%/=.'
             default delimiter is space ' ';
Example    : @new_lines=shift_word_recursively(\@lines, '/-', 2); to chop lines
             off two words with the two delimiters of '/' and '-'.
             /jong1/perl-jong2/perl-jong3  will become   /perl-jong2/perl-A Biomatic 3
             /bin/-kkk/-jjj/-jj will become  /-kkk/-jjj/-jj
             @out=@{&shift_word_recursively($testline, '/-', 2)};
             You can use perl regexp patterns for  $delimiter as it is directly
             used in a pattern matching in the sub. So, you canuse '\W'
Function   : shift lines word by word. This needs delimiter like '/' or '.'
             and stores the resulting arrays. This is to get all the possible
             directories.
             For example, with /nfs/A Biomatic /perl/temp/here  input, you get
             (  /A Biomatic /perl/temp/here,   /perl/temp/here ,
             temp/here, /here, )  in an array.

Usage      : @new_lines=shift_word_recursively(\@lines, '/'); or
             @new_lines=shift_word_recursively(\@lines, '\W'); or
             @new_lines=shift_word_recursively(\@lines, 'a-zA-Z'); or
             @new_lines=shift_word_recursively(\@lines, '/', 2); <--- for multiple chop unit
             or $new_line = shift_word_recursively(\$line, '.'); <--- for scalar input.
Version    : 1.0

shift_word

Download shift_word .pl
Argument   : SCALAR or ARRAY refs. and delimitor ('/', '.', '-'.....)
             delimitor can be multi line => '#$%/=.'
             default delimiter is space ' ';
Example    : @new_lines=shift_word(\@lines, '/-', 2); to shift off lines two words
             with the two delimiters of '/' and '-'.
             /jong1/perl-jong2/perl-jong3  will become   /jong1/perl-A Biomatic 2
             /bin/-kkk/-jjj/-jj will become  /jong1/perl-A Biomatic 2 by
             @out=@{&shift_word($testline, '/-', 2)};
             You can use perl regexp patterns for  $delimiter as it is directly
             used in a pattern matching in the sub. So, you canuse '\W'
Function   : shift lines word by word. This needs delimiter like '/' or '.'
Usage      : @new_lines=shift_word(\@lines, '/'); or
             @new_lines=shift_word(\@lines, '\W'); or
             @new_lines=shift_word(\@lines, 'a-zA-Z'); or
             @new_lines=shift_word(\@lines, '/', 2); <--- for multiple chop unit
             or $new_line = shift_word(\$line, '.'); <--- for scalar input.
Version    : 1.0

chop_word

Download chop_word .pl
Argument   : SCALAR or ARRAY refs. and delimitor ('/', '.', '-'.....)
             delimitor can be multi line => '#$%/=.'
             default delimiter is space ' ';
Example    : @new_lines=chop_word(\@lines, '/-', 2); to chop off lines two words
             with the two delimiters of '/' and '-'.
             /jong1/perl-jong2/perl-jong3  will become   /jong1/perl-A Biomatic 2
             /bin/-kkk/-jjj/-jj will become  /jong1/perl-A Biomatic 2 by
             @out=@{&chop_word($testline, '/-', 2)};
             You can use perl regexp patterns for  $delimiter as it is directly
             used in a pattern matching in the sub. So, you canuse '\W'
Function   : chop lines word by word. This needs delimiter like '/' or '.'
Keywords   : chop_word_recursively, remove_word, chop_word_one_by_one
Options    : -w, w, Word, etc,  for getting the chopped off word(s) rather
             than the original lines minus the word.
Usage      : @new_lines=chop_word(\@lines, '/'); or
             @new_lines=chop_word(\@lines, '\W'); or
             @new_lines=chop_word(\@lines, 'a-zA-Z'); or
             @new_lines=chop_word(\@lines, '/', 2); <--- for multiple chop unit
             or $new_line = chop_word(\$line, '.'); <--- for scalar input.
Version    : 2.0
Warning    : The returning value is not the chopped off word.

get_median

Download get_median .pl
Argument   : \@array
Keywords   : median_array, get_median_array, get_array_median, array_median
Returns    : \$median
Usage      : $median = ${&get_median(\@array)};
Version    : 1.0

array_median

Download array_median .pl
Argument   : \@array
Keywords   : median_array, get_median_array, get_array_median, array_median
Returns    : \$median
Usage      : $median = ${&array_median(\@array)};
Version    : 1.0

get_median

Download get_median .pl
Argument   : \@array
Keywords   : median_array, get_median_array, get_array_median, array_median
Returns    : \$median
Usage      : $median = ${&get_median(\@array)};
Version    : 1.0

push_if_not_already

Download push_if_not_already .pl
Argument   : two references. The first should be an array ref. The 2nd can be either
             scalar or array reference.
Function   : returns ref. of an array for a list of non-repetitive entry.
Returns    : a ref. of an array.
Usage      : @out=@{&push_if_not_already(@mother_array, @adding_array )};
             @out=@{&push_if_not_already(@mother_array, $adding_scalar)};
Version    : 1.0

replace_lines

Download replace_lines .pl
Function   : replace_lines in any txt files
Usage      : &replace_lines(@files, 'removing_string', 'match_str' );
Version    : 1.0

insert_lines_anywhere

Download insert_lines_anywhere .pl
Function   : insert lines anywhere in any txt files. Without any
              position options(Before, After), it attaches the line
Keywords   : insert_text, insert_lines, insert_something,
             attach_lines_in_text, attach_lines, insert_text_lines
Options    : 
   $adding_line= by a=
   $pattern_match_line= by p=
   $option_before_or_after= by o=

Usage      : &insert_lines_anywhere(\@files, \$inst_str,'after', \@match_str);
Version    : 1.4
Warning    : Case Insensitive by default.

get_all_dirs_from_ENV

Download get_all_dirs_from_ENV .pl
Argument   : NONE
Example    : my(@default_env_dirs) = @{&get_all_dirs_from_ENV}; in handle_arguments
Function   : extracts all the directories from %ENV  setting.
Options    : None
Returns    : a ref. of an array of directories.
Usage      : my(@default_env_dirs) = @{&get_all_dirs_from_ENV};
Version    : 1.0
Warning    : produces repetitive pathes (ie, can output identical path several times)

get_path_dirs_from_ENV

Download get_path_dirs_from_ENV .pl
Argument   : NONE
Example    : my(@default_env_dirs) = @{&get_path_dirs_from_ENV}; in handle_arguments
Function   : extracts path directories from %ENV  setting.
Options    : None
Returns    : a ref. of an array of directories.
Usage      : my(@default_env_dirs) = @{&get_path_dirs_from_ENV};
Version    : 1.0
Warning    : Replaces '.' to $pwd.

handle_arguments_old

Download handle_arguments_old .pl
Argument   : one single ref. (\@input_args);
Function   : Sub argument handling for opening files with options. General
             form of 'handle_arguments_xxxx', while xxxx can be files, hashes, arrays,,,,
Options    : None yet, extendable by adding refs. of something.
Returns    : an array of refs for file names, hashes, arrays and  the opion string
Usage      : my(@in)=&handle_arguments_old(\@input_args);   Do not dereference it.
Version    : 1.0

mv

Download mv .pl
Argument   : 2 references of file name or  2 file names.
Author     : Larry Wall, Jong
Example    : mv("mv.pl", *STDOUT);  # This will print mv.pl contents to your screen.
Function   : moves files fast, replacement of 'system("mv xxx xxxx"); '
Keywords   : move files fast. mv_file, mv_files, move_files, move_file
Usage      : &mv( \$srcFile, \$dstFile); or  &mv( $srcFile, $dstFile);
             or &mv(FILEHANDLE1, FILEHANDLE2),  or  &mv(FILEHANDLE1, $output)
Version    : 1.4
Warning    : 27 times slower than 'mv' at prompt.  using system is 32 times slower

cp

Download cp .pl
Argument   : 2 references of file name or  2 file names.
Author     : Larry Wall, Jong
Example    : cp("cp.pl", *STDOUT);  # This will print cp.pl contents to your screen.
Function   : copies files fast, replacement of 'system("cp xxx xxxx"); '
Keywords   : copy files fast. cp_file, cp_files, copy_files, copy_file
Usage      : &cp( \$srcFile, \$dstFile); or  &cp( $srcFile, $dstFile);
             or &cp(FILEHANDLE1, FILEHANDLE2),  or  &cp(FILEHANDLE1, $output)
Version    : 1.4
Warning    : 27 times slower than 'cp' at prompt.  using system is 32 times slower

wh

Download wh .pl
Function   : shows the path for a file you want
             similar to which in UNIX

condense_script

Download condense_script .pl
Argument   : one or more files.
Example    : condense_script.pl th_lib.pl th-test.pl xxx xxxx ....
Function   : makes compact size subroutines of developed perl codes
Options    : None
Returns    : xxxxxx.pl.out  but sub routines condensed.
Usage      : condense_script.pl  xxxxxx.pl
Version    : 1.0
Warning    : The only condition is that you need to have 'sub xxxxx' from the
             first column and the last '}' should be again at the first column
             This is due to the pattern matching for any sub routines.

initialize_code

Download initialize_code .pl
Argument   : None
Function   : initialize all developing codes by putting Header section infor
Returns    : None
Usage      : &initialze_code;
Version    : 1.0
Warning    : This writes over the program you run (itself). temp file is ini_code.temp

parse_arguments

Download parse_arguments .pl
Argument   : uses @ARGV
Example    : &parse_arguments(1);
             @files=@{&parse_arguments(1)};
Function   : Parse and assign any types of arguments on prompt in UNIX to
             the various variables inside of the running program.
             This is more visual than getopt and easier.
             just change the option table_example below for your own variable
             setttings. This program reads itself and parse the arguments
             according to the setting you made in this subroutine or
             option table in anywhere in the program.
             It also imports the ENV variables to your program.

Keywords   : pass_arguments
Options    : '0'  to specify that there is no argument to sub, use
              &parse_arguments(0);
             parse_arguments itself does not have any specific option.
             '#' at prompt will make a var  $debug set to 1. This is to
              print out all the print lines to make debugging easier.

             'e=xxxx' for filtering input files by extension xxxx

Returns    : Filenames in a reference of array
             and input files in an array (file1, file2)=@{&parse_arguments};
Usage      : &parse_arguments; or  (file1, file2)=@{&parse_arguments};
Version    : 2.1
Warning    : HASH and ARRAY mustn't be like = (1, 2,3) or (1,2 ,3)

assign_options_to_variables

Download assign_options_to_variables .pl
Argument   : None.
Example    : When you want to set 'a' char to a variable called '$dummy' in
             the program, you put a head box commented line
             '#  $dummy    becomes  a  by  -a '
             Then, the parse_arguments and this sub routine will read the head
             box and assigns 'a' to $dummy IF you put an argument of '-a' in
             the prompt.
Function   : Assigns the values set in head box to the variables used in
             the programs according to the values given at prompt.
             This produces global values.
             When numbers are given at prompt, they go to @num_opt
              global variable. %vars global option will be made

Options    : '#' at prompt will make a var  $debug set to 1. This is to
              print out all the print lines to make debugging easier.
Returns    : Some globaly used variables according to prompt options.
             @num_opt,

Usage      : &assign_options_to_variables(\$input_line);
Version    : 2.8
Warning    : This is a global vars generator!!!

read_head_box

Download read_head_box .pl
Argument   : One or None. If you give an argu. it should be a ref. of an ARRAY
              or a filename, or ref. of a filename.
             If no arg is given, it reads SELF, ie. the program itself.
Example    : Output is something like
             ('Title', 'read_head_box', 'Tips', 'Use to parse doc', ...)
Function   : Reads the introductory header box(the one you see on top of sub routines of
             Jong's programs.). Make a hash(associative array) to put entries
             and descriptions of the items. The hash values have new lines '\n' are
             attached, so that later write_head_box just sorts Title to the top
             and prints without much calculation.
             This is similar to read_head_box, but
             This has one long straight string as value(no \n inside)
             There are two types of ending line one is Jong's #---------- ...
             the other is Astrid's  #*************** ...
Keywords   : open_head_box, open_headbox, read_headbox
Options    : 'b' for remove blank lines. This will remove all the entries
             with no descriptions
Returns    : A hash ref.
Usage      : %entries = %{&read_head_box([\$file_to_read, \@BOXED ] )};
Version    : 2.7

read_first_head_box

Download read_first_head_box .pl
Function   : Reads the header box(the one you see on top of sub routines of
             Jong's programs.)
             There are two types of ending line one is Jong's #---------- ...
             the other is Astrid's  #*************** ...
Usage      : %entries = %{&read_first_head_box(\$file_to_read )};
Version    : 2.0

read_head_boxes

Download read_head_boxes .pl
Argument   : one or more filenames
Example    : @hashes = @{&read_head_boxes(@ARGV)};
             $num_of_sub = @hashes;
             print "\n Number of subs was $num_of_sub\n";
Function   : Reads the introductory header box(the one you see on top of sub routines of
             Jong's programs.). Make a hash(associative array) to put entries
             and descriptions of the items.
Returns    : A hash ref.
Usage      : %entries = %{&read_head_box(\$file_to_read, ,,, )};
Version    : 1.2

read_head_box2

Download read_head_box2 .pl
Argument   : One or None. If you give an argu. it should be a ref. of an ARRAY
              or a filename, or ref. of a filename.
             If no arg is given, it reads SELF, ie. the program itself.
Example    : Output is something like
             ('Title', 'read_head_box', 'Tips', 'Use to parse doc', ...)
Function   : Reads the header box(the one you see on top of sub routines of
             Jong's programs.). This is similar to read_head_box, but
             This has one long straight string as value(no \n inside)
             There are two types of ending line one is Jong's #---------- ...
             the other is Astrid's  #*************** ...
Options    : 'b' for remove blank lines. This will remove all the entries
             with no descriptions
Returns    : A hash ref.
Usage      : %entries = %{&read_head_box(\$file_to_read )};
Version    : 1.5

read_all_head_boxes

Download read_all_head_boxes .pl
Function   : Reads the header boxes(the one you see on top of sub routines of
             Jong's programs.)
             There are two types of ending line one is Jong's #---------- ...
             the other is Astrid's  #*************** ...
Usage      : %entries = %{&read_all_head_box(\$file_to_read )};
Version    : 1.0

correct_head_box

Download correct_head_box .pl
Argument   : a filename
Example    : correct_head_box.pl Bio.pl
Function   : Makes headbox in right and updated format. The most
            updated headbox format is very this headbox. So, to
            change all other headbox format, change this first.
Usage      : just type correct_head_box.pl with a file name.
Version    : 1.1

read_correct_head_box

Download read_correct_head_box .pl
Function   : This reads correct_head_box only.
Keywords   : read_update_head_box, read update headbox
Options    : v  for verbose message printing.
Version    : 1.0

write_head_box

Download write_head_box .pl
Function   : gets a hash ref. and writes the head box for a subroutine
Keywords   : write_headbox
Options    : v  for verbose representation. This will print boxes on STDOUT
            n  for no '#' leader.
Version    : 2.2

read_option_table

Download read_option_table .pl
Function   : Reads the option table made by Jong in any perl script. The
             option table is a box with separators.
Version    : 1.0

show_default_help

Download show_default_help .pl
Example    : &show_default_help2; &show_default_help2(\$arg_num_limit);   &show_default_help2( '3' );
             1 scalar digit for the minimum number of arg (optional),
             or its ref. If this defined, it will produce exit the program
             telling the minimum arguments.
Function   : Prints usage information and others when invoked. You need to have
             sections like this explanation box in your perl code. When invoked,
             show_default_help routine reads the running perl code (SELF READING) and
             displays what you have typed in this box.
             After one entry names like # Function :, the following lines without
             entry name (like this very line) are attached to the previous entry.
             In this example, to # Function : entry.
Keywords   : default_help
Returns    : formated information
Usage      : &show_default_help2;  usually with 'parse_arguments' sub.
Version    : 3.4
Warning    : this uses format and references

default_help

Download default_help .pl
Example    : &show_default_help2; &show_default_help2(\$arg_num_limit);   &show_default_help2( '3' );
             1 scalar digit for the minimum number of arg (optional),
             or its ref. If this defined, it will produce exit the program
             telling the minimum arguments.
Function   : Prints usage information and others when invoked. You need to have
             sections like this explanation box in your perl code. When invoked,
             show_default_help routine reads the running perl code (SELF READING) and
             displays what you have typed in this box.
             After one entry names like # Function :, the following lines without
             entry name (like this very line) are attached to the previous entry.
             In this example, to # Function : entry.
Keywords   : default_help
Returns    : formated information
Usage      : &show_default_help2;  usually with 'parse_arguments' sub.
Version    : 3.4
Warning    : USE show_default_help, This is not action oriented

show_default_help_old

Download show_default_help_old .pl
Argument   : 1 scalar digit for the minimum number of arg (optional),
             or its ref. If this defined, it will produce exit the program
             telling the minimum arguments.
Example    : &show_default_help; &show_default_help(\$arg_num_limit);   &show_default_help( '3' );
Function   : prints usage information and others when invoked. You need to have
             sections like this explanation box in your perl code. When invoked,
             show_default_help routine reads the running perl code (self reading) and
             displays what you have typed in this box.
             After one entry names like # Function :, the following lines without
             entry name (like this very line) are attached to the previous entry.
             In this example, to # Function : entry.
Returns    : formated information
Usage      : &show_default_help;  usually with 'parse_arguments' sub.
Version    : 2.0
Warning    : this uses format and references

print_seq_in_block

Download print_seq_in_block .pl
Argument   : many refs  for hash (one for bottm, one for top, etc,top hash is usually
               to denote certain caculations or results of the bottom one
Example    : If there are 3 hashes output will be; (in the order of \%hash3, \%hash2, \%hash1)
             >> 1st Hash        >> 2nd Hash         >> 3rd Hash
             Name1  THIS-IS-    Name123  eHHHHHHH   Name123  12222223

             You will get;
                            Name1    THIS-IS-
                            Name123  eHHHHHHH
                            Name123  12222223

             Example of ( no option, DEFAULT )  # Example of ('i' or 'I' option,
                                                                INTERLACE )
             6taa           ----ATPADWRSQSIY    #   6taa       ------ATPADWRSQSIY
             2aaa           ------LSAASWRTQS    #   6taa       ------CCHHHHCCCCEE
             1cdg           APDTSVSNKQNFSTDV    #   6taa       ------563640130000

             6taa           ------CCHHHHCCCC    #   2aaa       ------LSAASWRTQSIY
             2aaa           ------CCHHHHCCCC    #   2aaa       ------CCHHHHCCCCEE
             1cdg           CCCCCCCCCCCCCCCC    #   2aaa       ------271760131000

             6taa           ------5636401300    #   1cdg       APDTSVSNKQNFSTDVIY
             2aaa           ------2717601310    #   1cdg       CCCCCCCCCCCCCCCCEE
             1cdg           6752327236000000    #   1cdg       675232723600000000

             Example of('s' or 'S' option,SORT) # Example of ('o' or 'O' option,
                                                        ORDERED by input hashes )
             1cdg           APDTSVSNKQNFSTDV    #   6taa       ------ATPADWRSQSIY
             2aaa           ------LSAASWRTQS    #   2aaa       ------LSAASWRTQSIY
             6taa           ------ATPADWRSQS    #   1cdg       APDTSVSNKQNFSTDVIY

             1cdg           CCCCCCCCCCCCCCCC    #   6taa       ------CCHHHHCCCCEE
             2aaa           ------CCHHHHCCCC    #   2aaa       ------CCHHHHCCCCEE
             6taa           ------CCHHHHCCCC    #   1cdg       CCCCCCCCCCCCCCCCEE

             1cdg           6752327236000000    #   6taa       ------563640130000
             2aaa           ------2717601310    #   2aaa       ------271760131000
             6taa           ------5636401300    #   1cdg       675232723600000000

Function   : gets many refs  for one scalar  or hashes and prints
               the contents in lines of \$block_leng(the only scalar ref. given) char.
Keywords   : print_sequence_in_block print_alignment_in_block
Options    : 'o' or 'O' => ordered hash print,
             'n' or'N' => no space between blocks.
             's' or 'S' => printout sorted by seq names.
             'i' or 'I' => interlaced print.(this requires identical names in hashes)
             'v' or 'V' => show sequence start number at each line
             'g' or 'G' => with gap chars between  aa residues
              l= for block length. Default is 60 char
              t= for specifying the length of seq names shown
              t  for truncating seq names shwn to 12 chars.
              f= for file output  eg. f=XXXXXX.issa
              r=digit-digit  (eg. 10-70) to take only the defined region of sequences
            digit-digit  (eg. 10-70) to take only the defined region of sequences

            just digit  for block length

             (all options can be like \$sort
             while $sort has 's' as value. naked number like 100 will be the
             block_length. 'i' or 'I' => interlaced print.(this requires
             identical names in hashes)
Usage      : &print_seq_in_block (\$block_leng, 'i',\%h1, 'sort', \%h2, \%hash3,,,);
Version    : 1.6

print_seq_in_columns

Download print_seq_in_columns .pl
Argument   : many refs  for hash (one for bottm, one for top, etc,top hash is usually
               to denote certain caculations or results of the bottom one
Example    : With command 'print_seq_in_columns.pl c2 s2', you get:

      name1 11111111  name1 22222
      name2 11        name2 2222222
      name3 1111111   name3 22222
      name4 11111     name4 2222
      name5 11111     name5 222

      name1 3333      name1 4444
      name2 3333      name2 444
      name3 333       name3 4
      name4 333       name4 4444
      name5 3333      name5 4444444

Function   : gets many refs  for one scalar  or hashes and prints
               the contents in lines of \$block_leng(the only scalar ref. given) char.
Options    : c, i, s
Usage      : &print_seq_in_block (\$block_leng, 'i',\%h1, 'sort', \%h2, \%hash3,,,);
Version    : 1.1

convert_arr_and_str_2_hash

Download convert_arr_and_str_2_hash .pl
Argument   : one or more ref. of arrays
Example    : &print_seq_in_block(&convert_arr_and_str_2_hash(\@input,\@input2,\@input3 ));
             &convert_arr_and_str_2_hash(\$input1,\$input2, '2' );
             results in; (ordering starts from the given '2')
                          array_2       input1 arraystring
                          array_3       input2 arraystring

             one more exam
                          string_6       This is st                  and 3 strings)
                          string_10      This is st
                          array_2        111233434242
                          array_6        111233434242
                          array_10       111243424224
Function   : makes hash(es) out of array(s)
             if ordering digit(s) is put, it orders the keys according to it.
             if ordering digit is not increased by one, the difference is used
             as the increasing factor. No option results in
             array_1, array_2, array_3...

Returns    : one or more ref. of hashes.
Usage      : ($hash1, $hash2)=&convert_arr_and_str_2_hash(\$input, \$input2, '1', '2'.. );
             * This is the combination of convert_string_to_hash & convert_array_to_hash
Version    : 1.0

convert_string_to_hash

Download convert_string_to_hash .pl
Argument   : one or more ref. of arrays
Example    : &print_seq_in_block(&convert_string_to_hash(\$input,\$input2,\$input3 ));
             &convert_string_to_hash(\$input1,\$input2, '2' );
             results in; (ordering starts from the given '2')
                          string_2       input1 string
                          string_3       input2 string

Function   : makes hash(es) out of string(s)
             if ordering digit(s) is put, it orders the keys according to it.
             if ordering digit is not increased by one, the difference is used
             as the increasing factor. No option results in
             string_1, string_2, string_3...

Returns    : one or more ref. of hashes.
Usage      : ($hash1, $hash2)=&convert_string_to_hash(\$input, \$input2, '1', '2'.. );
Version    : 1.0

convert_array_to_hash

Download convert_array_to_hash .pl
Argument   : one or more ref. of arrays
Example    : &print_seq_in_block(&convert_array_to_hash(\@input,\@input2,\@input3 ));
             &convert_array_to_hash(\$input1,\$input2, '2' );
             results in; (ordering starts from the given '2')
                          array_2       input1 arraystring
                          array_3       input2 arraystring

Function   : makes hash(es) out of array(s)
             if ordering digit(s) is put, it orders the keys according to it.
             if ordering digit is not increased by one, the difference is used
             as the increasing factor. No option results in
             array_1, array_2, array_3...

Returns    : one or more ref. of hashes.
Usage      : ($hash1, $hash2)=&convert_array_to_hash(\$input, \$input2, '1', '2'.. );
Version    : 1.0

remove_dup_in_array

Download remove_dup_in_array .pl
Argument   : one or more refs for arrays or one array.
Example    : (1,1,1,1,3,3,3,3,4,4,4,3,3,4,4);  --> (1,3,4);
Function   : removes duplicate entries in an array. You can sort the
             result if you wish by 's' opt. Otherwise, result will keep
             the original order
Keywords   : merge array elements, remove_repeting_elements,
             remove_same_array_elements, remove_redundancy, remove_redundant_elements
             remove_duplication_in_array
Options    : 
   s  for sorting the array output
Returns    : one or more references.
Usage      : @out2 = @{&remove_dup_in_array(\@input1, \@input2,,,,)};
             @out1 = &remove_dup_in_array(\@input1 );
Version    : 1.6

remove_text

Download remove_text .pl
Argument   : reference of one array of file names in pwd
Author     : jong
Function   : finds patterns of text and replaces them in multiple input files
Returns    : nothing
Usage      : &remove_text(\@input_array_of_filenames);
Version    : 1.3
Warning    : This produces a temporary file and rename it...

remove_elements_by_pattern

Download remove_elements_by_pattern .pl
Argument   : one or more refs for arrays. The first array is always the
             only target.
Example    : @TARGET=qw(1 % $ ^ # A B 4444 44 4 4 3 33 3 11 A 3 4 4 7 AB);
              @remove=qw(\W);  # removes all the non word stuff
              @remove2=qw(\d );
              @out=@{&remove_elements_by_pattern(\@TARGET, \@remove,\@remove2)};
Function   : removes elements by pattern in the array
Keywords   : remove_this_elements, remove_these_elements, remove_elements
             remove_elements_by_position, kill_array_elements, kill_elements
             take_away_elements, remove_array_elements
Returns    : one or more references.
Usage      : @out2 = @{&remove_elements_by_pattern(\@input1, \@input2,,,,)};
             @out1 = @{&remove_elements_by_pattern(\@input1 )};
Version    : 1.2

remove_elements_by_name

Download remove_elements_by_name .pl
Argument   : one or more refs for arrays. The first array is always the
             only target. The removing elements can be scalar ref or
             just scalar.
Example    : ( two input:  (1,2,3,4,4,4,5,5,6,7), (1,3,4)  --> (2,5,5,6,7);
Function   : removes elements by name in the array
Keywords   : remove_this_elements, remove_these_elements, remove_elements
             remove_elements_by_position, kill_array_elements, kill_elements
             take_away_elements
Returns    : one or more references.
Usage      : @out2 = @{&remove_elements_by_name(\@input1, \@input2)};
             @out1 = @{&remove_elements_by_name(\@input1, \$name )};
Version    : 1.1

remove_elements_by_position

Download remove_elements_by_position .pl
Argument   : one or more refs for arrays. The first array is always the
             only target.
Example    : ( two input:  (1,2,3,4,5,6,7), (1,3,4)  --> (2 5 6 7);
Function   : removes elements by name in the array
Keywords   : remove_this_elements, remove_these_elements, remove_elements
             remove_elements_by_position, kill_array_elements, kill_elements
             take_away_elements
Returns    : one or more references.
Usage      : @out2 = @{&remove_elements_by_position(\@input1, \@input2,,,,)};
             @out1 = @{&remove_elements_by_position(\@input1 )};
Version    : 1.1
Warning    : Position 1 means $array[0]

merge_array

Download merge_array .pl
Argument   : one or more refs for arrays.
Example    : (1,1,1,1,3,3,3,3,4,4,4,3,3,4,4);  --> (1,3,4);
Function   : removes duplicate entries in an array. If you put
             more than one array as inputs, it will produce references of
             arrays merged singly. Each resulting array is independant.
             CF. merge_many_arrays
Keywords   : merge array elements, merge_array_elements,
Returns    : one or more references.
Usage      : @out2 = @{&merge_array(\@input1, \@input2,,,,)};
             @out1 = @{&merge_array(\@input1 )};
Version    : 1.1

make_one_array

Download make_one_array .pl
Argument   : Two or more refs for arrays.
Example    : (1,2,3,4,5),(6,7,8,9,10)-----> (1,2,3,4,5,6,7,8,9,10)
Function   : makes one array from several
Keywords   : make_one_array, make_one_from_several
Returns    : An array reference
Usage      : @array_one=@{&make_one_array(\@input_array_1, \@input_array_2)};
Version    : 1.0
Warning    : This does not remove duplicate entries.

get_multiple_array_entry

Download get_multiple_array_entry .pl
Argument   : one or more refs for arrays.
Example    : (1,1,1,1,3,3,3,3,4,4,4,3,3,4,4);  --> (1,3,4);
             if you put two arrays(1,1,1,3,3, 100) and (2,2, 4,4, 100), you will get
             references of arrays( 1,3) and (2,4) ignoring single array entries.
Function   : Gets any multiple array entry in a given array. If more than
             one array is given, each array will have a reference return.
Keywords   : multiple entry array, get_common_entry_array
Returns    : one or more references.
Usage      : @out2 = @{&merge_array(\@input1, \@input2,,,,)};
             @out1 = @{&merge_array(\@input1 )};

get_common_array_entry

Download get_common_array_entry .pl
Argument   : one or more refs for arrays.
Example    : (1,1,1,2,3,3,3,4)                 --> (1,3);
             (1,2,3) (1,2,3,4,5)               --> (1,2,3);
             (1,2,3,4,5) (1,2,3,4,5) (3,4,5,6) --> (4,5);
Function   : Gets any common array entry in given arrays. If one single array
             is given, mutiply occurring entries in the array will be returned.
Keywords   : multiple entry array, get_common_entry_array, multiply array,
             get_common_array_elements, get_common_array_element,
             get_dup_array_elements,
Returns    : one or more references.
Usage      : @out2 = @{&get_common_array_entry(\@input1, \@input2,,,,)};
             @out1 = @{&get_common_array_entry(\@input1 )};
Version    : 1.1
Warning    : accepts only references of arrays(others are ignored).

merge_many_arrays

Download merge_many_arrays .pl
Argument   : one or more refs for arrays. or just arrays.
Example    : (1,1,1,1,3,3), (1,3,3,4,4,4,3,3,4,4);  --> (1,3,4);
Function   : removes duplicate entries in multiple array inputs.
Keywords   : merge array elements from multiple arrays. merge_array_elements
Returns    : one reference.
Usage      : @out2 = @{&merge_many_arrays(\@input1, @inputX, \@input2,,,,)};
             @out1 = @{&merge_many_arrays(\@input1 )};
Version    : 1.0
Warning    : synonym of  remove_dup_in_array

remove_repetitives_in_array

Download remove_repetitives_in_array .pl
Argument   : one or more refs for arrays.
Example    : (1,1,1,1,3,3,3,3,4,4,4,3,3,4,4);  --> (1,3,4);
Function   : removes duplicate entries in an array. If you put
             more than one array as inputs, it will produce references of
             arrays merged singly. Each resulting array is independant.
             CF. merge_many_arrays
Keywords   : remove_dup_in_array, merge array elements, remove_duplicates,
Returns    : one or more references.
Usage      : @out2 = @{&remove_repetitives_in_array(\@input1, \@input2,,,,)};
             @out1 = @{&remove_repetitives_in_array(\@input1 )};
Version    : 1.0
Warning    : synonym of  remove_dup_in_array

filter_hash_by_num_value

Download filter_hash_by_num_value .pl
Function   : returns hash refs. after filtering with threshold value.
Usage      : ($ref1, $ref2, $ref3)=&filter_hash_by_num_value(\%h1, \$thres,...);
Version    : 1.0

dir_search_single

Download dir_search_single .pl
Argument   : One Ref. for a scalar.
Function   : With given full path or single name for a dir. it returns
             the full path dir name. If it fails to find in pwd or given
             specified path, it tries to search PATH, HOME etc..
Returns    : one Ref. for an array.
Usage      : $output_best_possible_dir = ${&dir_search_single(\$input_name)};
Version    : 1.0

dir_search

Download dir_search .pl
Argument   : One Ref. for a scalar.
Function   : With given full path or single name for a dir. it returns
             the full path dir name. If it fails to find in pwd or given
             specified path, it tries to search PATH, HOME etc..
Returns    : one Ref. for an array.
Usage      : @output_possible_dirs = @{&dir_search(\$input_name)};
Version    : 1.0

break_down_clu_file

Download break_down_clu_file .pl
Example    : INPUT looks like this:>

 Cluster 1330106       # Ori:9    Sub:6    From:133
   1  1 EC2987          1-332       1
   1  1 HI0530          1-338       1
   1  1 MJ1130          5-302       1
   1  1 SLR0807         3-333       1
   1  1 D09_ORF319      6-312       1
   1  1 MG046           6-307       1
 Cluster 1330203       # Ori:9    Sub:3    From:133
   1  1 SLL1063         16-209      1
   1  1 HI0388          4-236       1
   1  1 EC1764          2-224       1
 Cluster 1330302       # Ori:9    Sub:2    From:133
   1  1 EC2987          1-118       1
   1  1 HI0388          4-104       1

Function   : breaks down the single linkage cluster into smaller clusters
Keywords   : split_clu_files
Options    : 

  pattern= 'Cluster'

Usage      : &break_down_clu_file(\@file);
Version    : 1.0

exchange_query_with_match_in_msp

Download exchange_query_with_match_in_msp .pl
Keywords   : exchange_msp_columns, open_and_exchange_query_with_match_in_msp,
             open_msp_files_with_exchange_of_columns
Options    : 
          R     for NO range attachment in Name only return option (n)
          e=    for evalue threshhold, if e=1, ignores all which are over 1
          s=    for score threshhold if t=100, ignores all which are less 100
Usage      : @exchanged_msp=@{&exchange_query_with_match_in_msp(\@file)};
Version    : 1.1

run_fasta_sequence_search

Download run_fasta_sequence_search .pl
Author     : Jong Park, jong@salt2.med.harvard.edu, for commercial use, ask me.
Keywords   : run_ssearch_sequence_search, do_fasta_sequence_search
Options    : 
             Query_seqs=  for enquiry sequences eg)  "Query_seqs=$ref_of_hash"
             DB=   for target DB  "DB=$DB_used"
             File= to get file base(root) name.  "File=$file[0]"
             i= to get file base(root) name. same as File=
             m  for MSP format directly from FASTA or Ssearch result than through sso_to_msp to save mem
             s  for the big single output (msp file output I mean)
             s= for the single big msp file name
             O= for Out file name, same as s=
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             d  for very simple run and saving the result in xxxx.gz format in sub dir starting with one char
             r  for reverse the query sequence
             R  for attaching ranges of sequences
             k= for k-tuple value. default is 1 (ori. FASTA prog. default is 2)
             u= for $upper_expect_limit
             l= for $lower_expect_limit
             a= for choosing either fasta or ssearch algorithm
             d= for defining the size of subdir made. 2 means it creates
                    eg, DE while 1 makes D
             d  for $make_gz_in_sub_dir_opt, putting resultant sso files in gz format and in single char subdir
             D  for $make_msp_in_sub_dir_opt, convert sso to msp and put in sub dir like /D/, /S/
             n  for new format to create new msp file format with sso_to_msp routine
          PVM=  for PVM run of FASTA (FASTA only)
             M  for machine readable format -m 10 option
             M= for machine readable format -m 10 option
             N  for 'NO' do not do any processing but, do the searches only.
       FILE_AGE for defining the age of file in days to be overwritten.
Usage      : $gzipped_msp_file=${&run_fasta_sequence_search("a=$algorithm",
                        "O=$out_file_msp_name", "File=$temp_file_name", "e=$E_val",
                        "DB=$sequence_DB", "k=$k_tuple", "$machine_readable")};

Version    : 1.1

do_self_blastp_search

Download do_self_blastp_search .pl
Keywords   : run_blastp, run_blastp_seq_search, blastp_seq_search, blastp_search,
             do_blast_search
Options    : 
   r  for reverseing enquiry sequences
   T= for Blastp T param
   S= for Blastp S param
   B= for Blastp B param
   V= for Blastp V param
   E= for Blastp E param

Usage      : &do_blastp_search(\@file);
Version    : 1.3

show_options

Download show_options .pl
Keywords   : display_options, show_help_options, show_argument_options,
             show_options_in_headbox, show_prompt_options
Usage      : &show_options;  usually with 'parse_arguments' sub.
Version    : 1.2

self_self_search

Download self_self_search .pl
Example    : &self_self_search(\@file, $over_write, $msp_directly_opt, $create_sso, $single_big_msp);
Function   : self_to_self input database search with reverse query as an option
Keywords   : do_self_self_search, self_self_sequence_search, self_self_seq_search,
             self_to_self_search, self_to_rev_self_search, self_to_reversed_self_search
             search_self, search_self_seq, search_self
Options    : 
             Query_seqs=  for enquiry sequences eg)  "Query_seqs=$ref_of_hash"
             DB=   for target DB  "DB=$DB_used"
             File= to get file base(root) name.  "File=$file[0]"
             m  for MSP format directly from FASTA or Ssearch result than through sso_to_msp to save mem
             s  for the big single output (msp file output I mean)
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             r  for reverse the query sequence
             R  for attaching ranges of sequences
             b  for doing in batch. Reads all the seqs in memory at one time
             m10 for machine readable form
             k= for k-tuple value. default is 1 (ori. FASTA prog. default is 2)
             u= for $upper_expect_limit
             l= for $lower_expect_limit
             a= for choosing either fasta or ssearch algorithm
             d= for defining the size of subdir made. 2 means it creates
                    eg, DE while 1 makes D
             d  for $make_gz_in_sub_dir_opt, putting resultant sso files in gz format and in single char subdir
             D  for $make_msp_in_sub_dir_opt, convert sso to msp and put in sub dir like /D/, /S/
             n  for new format (msp2 format)
Usage      : &self_self_search(\@file, $over_write, $msp_directly_opt, $create_sso, $single_big_msp);
Version    : 2.2

search_self

Download search_self .pl
Example    : &search_self(\@file, $over_write, $msp_directly_opt,
                                 $create_sso, $single_big_msp);
Function   : self_to_self input database search with reverse query as an option
Keywords   : do_search_self, self_self_sequence_search, self_self_seq_search,
             self_to_self_search, self_to_rev_self_search, self_to_reversed_self_search
             search_self, search_self_seq
Options    : 
             Query_seqs=  for enquiry sequences eg)  "Query_seqs=$ref_of_hash"
             DB=   for target DB  "DB=$DB_used"
             File= to get file base(root) name.  "File=$file[0]"
             m  for MSP format directly from FASTA or Ssearch result
                       than through sso_to_msp to save mem
             s  for the big single output (msp file output I mean)
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             r  for reverse the query sequence
             R  for attaching ranges of sequences
             b  for doing in batch. Reads all the seqs in memory at one time
             m10 for machine readable form
             k= for k-tuple value. default is 1 (ori. FASTA prog. default is 2)
             u= for $upper_expect_limit
             l= for $lower_expect_limit
             a= for choosing either fasta or ssearch algorithm
             d= for defining the size of subdir made. 2 means it creates
                    eg, DE while 1 makes D
             d  for $make_gz_in_sub_dir_opt, putting resultant sso files
                       in gz format and in single char subdir
             D  for $make_msp_in_sub_dir_opt, convert sso to msp and
                      put in sub dir like /D/, /S/
             n  for new format (msp2 format)
Usage      : &search_self(\@file, $over_write, $msp_directly_opt,
                                      $create_sso, $single_big_msp);
Version    : 2.5

make_cdf_file

Download make_cdf_file .pl
Function   : gets all the clu files and producesf one xxxx.cdf file
             CDF file is a fasta database file with all the clu domains are
Keywords   : make_cdf_file_with_clu, clu_to_cdf, clu_2_cdf
Usage      : @file=@{&parse_arguments(1)};
Version    : 1.1

make_seq_index_file

Download make_seq_index_file .pl
Function   : creates xxxx.fa.idx file and makes a link to pwd. If @file contains
              names with .idx extension already, it will not put another idx
              index to it.
Keywords   : make_fasta_seq_index_file, create_seq_index_file, make_idx_file,
             create_idx_file, create_seq_idx_file, make_index_file, create_index_file
             make_sequence_index_file, create_sequene_index_file
Usage      : @idx_files_made=@{&make_seq_index_file(\@file)};
Version    : 1.4

randomise_file_contents

Download randomise_file_contents .pl
Options    : 
   V for NON-verbose
Usage      : &randomise_file_contents(@ARGV);
Version    : 1.0

filter_by_string_length

Download filter_by_string_length .pl
Keywords   : find_palindromes, get_palindromes, find_palindrome, GetPalindrom
: filter_hash_by_string_length
Options    : 
  min=  for miniumum palindrome size
   p    for putting the position of the start of the palindrome
: 
  cutoff_min= by cutoff_min=
  c=  by c=     # the same as cutoff_min

Usage      : search_palindromes(\%seq, [\%seq2]);
: %out=%{&filter_by_string_length(\%hash, [100], ["cutoff=100"])};
Version    : 1.2
: 1.0

filter_seq_DB_by_seq_length

Download filter_seq_DB_by_seq_length .pl
Keywords   : filter_seq_file_by_seq_length
Options    : 
  cutoff_min= by cutoff_min=
  c=  by c=     # the same as cutoff_min
Version    : 1.1

remove_dup_seq_entry

Download remove_dup_seq_entry .pl
Keywords   : remove_duplicated_sequence_entries, remove_dup_sequence_entry
             remove_dup_seq_entries, remove_dup_sequences
Usage      : &remove_dup_seq_entry(\@file);  # while @file has 'xxx.fa'
Version    : 1.0

sort_files_by_size

Download sort_files_by_size .pl
Function   : sort files by size and returns the ref of the array
Keywords   : sort_file_by_size
Usage      : @sorted=@{&sort_files_by_size(\@files)};
Version    : 1.0

make_fasta_files_from_msp_1_files

Download make_fasta_files_from_msp_1_files .pl
Example    : &make_fasta_files_from_msp_1_files(\@files, "E=0.081", "l=0");
             &make_fasta_files_from_msp_1_files(\@files,
                    "E=$E_thresh",
                    "Seq_Source_DB=$seq_source_db",
                    "l=$lower_expect_limit",
                    "i=$cut_off_increase_factor",
                     $over_write_file);
Function   : creates fasta files for each query seq in xxxx.msp_1 file
Options    : 
   $seq_source_db= by "Seq_Source_DB=xxxxx.fa"
   $E_thresh = by E=  #  E value cutoff
   u= for $upper_expect_limit
   l= for $lower_expect_limit
   $over_write_file=o by o -o
   $cut_off_increase_factor = by i=
   s for selfless fasta out put (removes the original self seq among intermediates)

Usage      : &make_fasta_files_from_msp_1_files(\@files);

Version    : 1.2

remove_dup_match_in_msp_files

Download remove_dup_match_in_msp_files .pl
Function   : removes the exact duplicates in MSP files
Keywords   : remove_redundancy_in_msp_files, remove_redundant_matches_in_msp
             remove_redundant_matches, make_non_redundant_msp_files
Usage      : @out=@{&remove_dup_match_in_msp_files(\@file)};
Version    : 1.3

filter_intermediates_by_E_value

Download filter_intermediates_by_E_value .pl
Function   : filters intermediate sequences according to the E value
              thresholds and returns the lines in an array
Usage      : @filtered_msp3=@{&filter_intermediates_by_E_value(\@msp3,
                                           "E1=$E_value1", "E2=$E_value2")};
Version    : 1.0

make_intermediate_sequence_library

Download make_intermediate_sequence_library .pl
Function   : extracts intermediate sequences from OWL fasta database to
             make intermediate seq library
             This looks for /gn0/jong/DB/PDB/PDB95D_against_OWL/E/$msp_file_gz
                and         /gn0/jong/DB/PDB/PDB95D_against_OWL/D/$msp_file_gz
Keywords   : make_interm_library_for_each_group, make_interm_lib,
             make_intermediate_library, compile_interm_library, create_interm_library,
Options    : 
      'FASTA_DB' for sequence source fasta file  eg:  "FASTA_DB=$source_db_fasta"
      o  for overwrite option (overwrites 1.2.3.fa like file)
      MSP_DIR= for msp seq file result directory
      m=       for msp seq file result direc (same as MSP_DIR)
      e=       for E value thresh
   $pdbg_file= by p=
      E=       for E value thresh
      s=       for score thresh

Usage      : &make_intermediate_sequence_library(\@files, "FASTA_DB=$owl_db_fasta");
               while @files have either pdbs or pdbg file (PDB grouping file)
Version    : 1.7

read_machine_readable_sso_lines

Download read_machine_readable_sso_lines .pl
Keywords   : read_m10_sso_lines read_msso_lines
Options    : a c r r2 n
             u= for upper E value limit
             l= for lower E value limit
Usage      : @out_refs=@{&read_machine_readable_sso_lines(\@SSO, $get_alignment,
                           $create_sso, $upper_expect_limit,$new_format, $lower_expect_limit,
                           $attach_range_in_names, $attach_range_in_names2)};
Version    : 1.5

read_sso_lines

Download read_sso_lines .pl
Function   : Main subroutine for open_sso_files. This calls either machine
              readable or unreadable form parsing subroutine
Keywords   : read_sso_lines_in_array
Options    : a c r r2 n
             u= for upper E value limit
             l= for lower E value limit
Usage      : &read_sso_lines([@sso], $create_sso, $attach_range_in_names,
                 $attach_range_in_names2, $new_format, $get_alignment) );
Version    : 1.4

read_machine_unreadable_sso_lines

Download read_machine_unreadable_sso_lines .pl
Example    : output will look=>
  ZFH1_DROME 60 d1ad3a_ 446 d1ad3a_ 0.9 71 34 3-37 253-287
  ZFH1_DROME 60 d1ahdp_ 68 d1ahdp_ 0.00018 100 56 2-58 3-59
  ZFH1_DROME 60 d1crka2 282 d1crka2 8.4 58 50 5-55 73-123
  ZFH1_DROME 60 d1dkza_ 215 d1dkza_ 4.9 59 40 1-41 112-152
  ZFH1_DROME 60 d1ecra_ 305 d1ecra_ 3.2 63 47 9-56 201-248

Keywords   : read_normal_sso_lines
Options    : a c r r2 n
             u= for upper E value limit
             l= for lower E value limit
Usage      : @out_refs=@{&read_machine_unreadable_sso_lines(\@SSO, $get_alignment,
                           $create_sso, $upper_expect_limit,$new_format, $lower_expect_limit,
                           $attach_range_in_names, $attach_range_in_names2)};
Version    : 1.3

make_seq_alignment_length_even

Download make_seq_alignment_length_even .pl
Example    : 
    seq3     ---lasdkfjklsdjfkldjklfj----
    seq4     dfasdfasdfadsfsadfsaas

  will result in
    seq3     ---lasdkfjklsdjfkldjklfj----
    seq4     dfasdfasdfadsfsadfsaas------

Function   : creates hashes with values of equal lengths.
Keywords   : make_alignment_length_even, equalise_seq_alignments
Usage      : @out=@{&make_seq_alignment_length_even(\%hash1, \%hash2)};
             @out=@{&make_seq_alignment_length_even(\%hash1)};
Version    : 1.0

tempname

Download tempname .pl
Function   : Returns a unique temporary filename.
             Reasonably robust but not completely immune to race conditions
             with other processes simultaneously requesting a tempname.
Usage      : $tmp=&tempname;
Version    : 1.0

fasta_kt1_search

Download fasta_kt1_search .pl
Author     : Sarah A. Teichmann
Date       : 19th September 1997
Example    : &fasta_kt1_search ($qdb_main, $tdb_main, $fastaver_main);
Function   : to search one database against the other using fasta
                ktup=1 (default is simply "fasta"). The results are stored in sub dirs
                which are from the 2 first chars of the query sequence.
Keywords   : fasta_search, fasta_database_search
Usage      : &fasta_kt1_search($query_database, $target_database, $fasta_version_to_use
Version    : 1.1

msp_single_link_hash

Download msp_single_link_hash .pl
Author     : Sarah A. Teichmann with thanks to Alex Bateman
Function   : To make a hash with all the genes in the msp files as the keys,
             which are linked at or below the E-value threshhold,
             with the values denoting the cluster number
Keywords   : single_linkage, msp_single_linkage, msp_single_linkage_hash
Usage      : %hash=%{&msp_single_link_hash(\@msp_files, E-value);
Version    : 1.3

print_clusfile_from_hash

Download print_clusfile_from_hash .pl
Author     : Sarah A. Teichmann
Function   : To print out a file in cluster file format from an input hash containing the genes as keys and the cluster number as values.
Keywords   : print_single_linkage_cluster, print_cluster_file
Usage      : &print_clusfile_from_hash(\%hash)
Version    : 1.2

make_clustering_summary

Download make_clustering_summary .pl
Author     : Sarah A. Teichmann
Date       : 19th September 1997
Function   : to make a summary file of a sorted cluster file
Keywords   : summary, make_cluster_summary, subclustering summary
Usage      : &make_summ($sorted_cluster_file)
Version    : 1.5

create_sorted_cluster

Download create_sorted_cluster .pl
Author     : Sarah A. Teichmann, modified by Jong
Date       : 19th September 1997
Function   : to make a "sorted_cluster_file" from the .clu files in a directory
Keywords   : make_cluster_file, sort_clu_files
Usage      : &create_sorted_cluster
Version    : 1.7

interm_lib_search

Download interm_lib_search .pl
Example    : &interm_lib_search(\@file, $over_write, $msp_directly_opt, $create_sso, $single_big_msp);
Function   : self_to_self input database search with reverse query as an option
Keywords   : do_interm_lib_search, self_self_sequence_search, self_self_seq_search,
             self_to_self_search, self_to_rev_self_search, self_to_reversed_self_search,
Options    : 
             Query_seqs=  for enquiry sequences eg)  "Query_seqs=$ref_of_hash"
             DB=   for target DB  "DB=$DB_used"
             File= to get file base(root) name.  "File=$file[0]"
             m  for MSP format directly from FASTA or Ssearch result than through sso_to_msp to save mem
             s  for the big single output (msp file output I mean)
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             r  for reverse the query sequence
             b  for doing in batch. Reads all the seqs in memory at one time
             m10 for machine readable form
             k= for k-tuple value. default is 1 (ori. FASTA prog. default is 2)
             u= for $upper_expect_limit
             l= for $lower_expect_limit
             a= for choosing either fasta or ssearch algorithm
             d  for $make_gz_in_sub_dir_opt, putting resultant sso files in gz format and in single char subdir
             D  for $make_msp_in_sub_dir_opt, convert sso to msp and put in sub dir like /D/, /S/
             n  for new format (msp2 format)
       FILE_AGE for defining the age of file in days to be overwritten.

Usage      : &interm_lib_search(\@file, $over_write, $msp_directly_opt, $create_sso, $single_big_msp);
             &interm_lib_search(\%seq,  $over_write, $msp_directly_opt, $create_sso, $single_big_msp);
Version    : 1.8

geanfammer

Download geanfammer .pl
Author     : Sarah A Teichmann, Jong Park, sat@mrc-lmb.cam.ac.uk,
                                      jong@salt2.med.harvard.edu
Example    : geanfammer.pl E_gnme.fa             # simplest form
            geanfammer.pl E_gnme.fa a=ssearch   # use SSEARCH
            geanfammer.pl E_gnme.fa o           # for overwriting
                                                   when you want a
                                                   fresh run ovr old
            geanfammer.pl E_gnme.fa c         # For keeping
                                                 SSO files
                                                 (fasta output)
            geanfammer.pl E_gnme.fa k=2       # changing default
                                                 k tuple for
                                                 FASTA to 2
            geanfammer.pl E_gnme.fa E=0.01     # set the E value
                                                 for initial single
                                                 linkage clustering
            geanfammer.pl E_gnme.fa e=0.01    # set the E value
                                                for domain level linkage
       -->  geanfammer.pl E_gnme.fa e=0.01 E=0.01 # set the 2 E values
                                                    separately (no need
                                                    to do this)

Function   : Creates a domain level clustering file from a given
              FASTA format sequence DB. It has been used for complete
              genome sequence analysis.

              ------------ USAGE INFORMATION -------------------
             The parameters you put are important for the meaningful
               protein family maker.
             The most important one is the E and e options (Mostly,
               they can have same value).
             Large E is for setting the threshold for the single
               linkage clustering.
             This means, any sequence hit BELOW the threshold
               (which is good ) will be linked.
             For example, if Seq1 matched with Seq2 with E value
              of FASTA search:
              0.001, and you set the threshold 0.1, then YOU
              ordered the geanfammer to regard them a family.

             The second small e option is for the dividing a complex
              and wrong cluster into correct more correct
              duplication modules. This is necessary as a
              lot of multidomain proteins can be clustered together
              WRONGLY by single linkage.
             At this stage, the e value is irrelevant to E value
              and you can set a higher or lower one. Or you can set
              the same as E.

             Rough guide from our experience for E and e values:
              We know that with 1000 sequence database, 0.01
              produces around 1% error in grouping sequences
              according to the evalue.
              With 180,000, 0.081 gave us less than 1% error.
             Evalue of FASTA and SSEARCH is DEPENDENT on DB size,
              so you need to play a little bit to know the best
              E value for your database or genome.
             The best approach is :
               1) You run geanfammer.pl with any of your target DB
                  with certain E value you like
               2) Check sequence families which are clustered
                  in the final resultant file xxxx.gclu and decide
                  if the E value is low or high. Lower evalues will
                  make sure you do not make wrong clusters while
                  high evalue will include more probable sequence
                  family members.
               3) Put all the xxxx.msp files in subdirectory(s)
                  created by geanfammer and run divclus.pl (which
                  is accompanied in the package) with different
                  Evalues. Divclus will not run any search algorithm
                  etc, so it can be done fairly quickly.

Keywords   : genome_analysis_and_protein_family_maker,
             genome_ana_protein_fam_maker
Options    : 
             o  for overwrite existing xxxx.fa files for search
             c  for create SSO file (sequence search out file)
             d  for very simple run and saving the result in
                    xxxx.gz format in sub dir starting with one char
             N
             s
             m
             v
             z
             D
             y  for dynamic factor
             L  for Lean output(removes all the intermediate
                                     outputs to save space)
             u  for making separate summary file (redundant now)

             DB=
             File=
             k= for k-tuple value. default is 1 (ori. FASTA prog.
                                                   default is 2)
             a= for choosing either fasta or ssearch algorithm
             E= for Evalue cutoff for single linkage clustering
                    $E_cut_main
             e= for Evalue cutoff for divide_clusters subroutine.
             u=
             l=
             d=

   !! Do not remove the following lines down to # Author line.
                This program parses them

  $Lean_output=L           by L -L
  $dynamic_factor=y        by y  Y -y -Y
  $over_write=o            by o -o
  $create_sso_file=c       by c -c
  $k_tuple=                by k=
  $upper_expect_limit=     by u=
  $lower_expect_limit=     by l=
  $algorithm=              by a=
  $No_processing=N         by N -N
  $single_msp=s            by s -s
  $sequence_db_fasta=      by DB=
  $query_file=             by File=
  $machine_readable=M      by M -M
  $make_subdir_out=D       by D
  $make_subdir_gzipped=d   by d -d
  $direct_MSP_conversion=m by m -m
  $verbose=v               by v -v
  $sub_dir_size=           by d=
  $Evalue_cut_single_link= by E=
  $Evalue_cut_divclus=     by e=
  $optimize=z              by z -z
  $make_separate_summary=u by u -u
  $length_thresh=       by T=

Usage      : &geanfammer(\@your_genome_or_db_to_analyse_file,
                          $verbose);

Version    : 2.5

ISS_server

Download ISS_server .pl
Function   : This does ISS and makes HTML file to return to HTTPD server
Keywords   : intermdediate_sequence_search_server
Options    : z S s
Usage      : &ISS_server(\%seq, "e=$E_val", "k=$ktuple", "a=$algorithm", "t=$leng_thresh",
             "$which_score", $show_raw_result, $segged_ISSL );
Version    : 1.3

import_ENV_vars

Download import_ENV_vars .pl
Function   : You can use any ENV set variables directly in your
             program. So, you can say $USER instead of $ENV{'USER'}
Keywords   : import_Env_vars, import_ENV_variables
Version    : 1.1

read_any_dir_for_dir

Download read_any_dir_for_dir .pl
Argument   : takes one or more scaler references.
Function   : read any dir and REMOVES the '.' and '..' entries. And
            then put in array.
Keywords   : read_any_dir_for_dir_command
Returns    : one ref. of array.
Usage      : @file_list = @{&read_any_dir(\$absolute_path_dir_name, ....)};
Version    : 1.1
Warning    : This does not report '.', '..', '#xxxx', ',xxxx', etc. only legitimate
: file and dir names are reported.

produce_random_numbers

Download produce_random_numbers .pl
Example    : @rand_nums=@{&produce_random_numbers($how_many, $range)};
              while $how_many->10, $range->10
Keywords   : generate_random_numbers, make_random_numbers, create_random_numbers
             random_numbers, get_random_numbers
Usage      : @rand_nums=@{&produce_random_numbers($how_many, $range)};
Version    : 1.0

read_seq_matrix_files

Download read_seq_matrix_files .pl
Function   : Makes similarrity matrix hash(reflexive, so it has AT as well as TA)
             %matrix looks like this:  $matrix{X}{Y}= 4
Keywords   : get_2D_aa_matrix, read_seq_matrix
Options    : 
     $reflexive_combi=r by r -r
Usage      : %matrix=%{&read_seq_matrix_files(\@file)};
Version    : 1.2

attach_classification_to_pdb_seq

Download attach_classification_to_pdb_seq .pl
Keywords   : attach_scop_classificaion_to_pdb_seq, add_scop_classification_to_pdb_seq
Usage      : &attach_classification_to_pdb_seq(\%hash_classification, \%correcting_pairs, \@files, $over_write);
Version    : 1.1

put_slash_before_special_chars

Download put_slash_before_special_chars .pl
Version    : 1.1

split_file_by_string

Download split_file_by_string .pl
Keywords   : split_file_by_string  divide_file_by_string
Version    : 1.1

check_input_file_extension

Download check_input_file_extension .pl
Example    : @file=@{&check_input_file_extension('msp', \@file)};
Usage      : @file=@{&check_input_file_extension('msp', \@file)};
             or @file=@{&check_input_file_extension('msp,nhco', \@file)};
                for multiple extension allowance
Version    : 1.1

geanfammer_main

Download geanfammer_main .pl
Function   : The main sub of geanfammer
Usage      : &geanfammer_main;
Version    : 1.7

encrypt_passwd

Download encrypt_passwd .pl
Author     : --Mark Henderson, modified by Jong
Example    : $crypted = ${&encrypted_passwd( $plaintext, $salt )};
Version    : 1.0

detect_file_format_type

Download detect_file_format_type .pl
Author     : jong
Example    : $crypted = ${&encrypted_passwd( $plaintext, $salt )};
Usage      : $file_type=${&detect_file_format_type($file[$i])};
Version    : 1.1

get_first_seq_in_alignment

Download get_first_seq_in_alignment .pl
Author     : jong@salt2.med.harvard.edu
Usage      : $seq_name=${&get_first_seq_in_alignment($file)};
Version    : 1.1

find_program_in_path

Download find_program_in_path .pl
Author     : Jong, jong@salt2.med.harvard.edu
Keywords   : which, whence
Version    : 1.2

do_psi_blast_search

Download do_psi_blast_search .pl
Options    : 
    $source_DB_file= by d= s=
    $input_seq_file= by i=
    $Eval_limit= by E=
    $iteration_limit= by j=
    $step_evalue= by h= e=
    $over_write=o by o
    $make_msp_in_sub_dir_opt=D by D
Usage      : &do_psi_blast_search(\@files, "d=$source_DB_file",
                     "i=$input_seq_file",
                     $over_write,
                     $make_msp_in_sub_dir_opt);
Version    : 1.3

merge_superfam_fasta_files_for_ISL

Download merge_superfam_fasta_files_for_ISL .pl
Author     : jong@salt2.med.harvard.edu
Keywords   : compile_superfam_fasta_files_for_ISL
Usage      : &merge_superfam_fasta_files_for_ISL;
Version    : 1.0

get_total_memory_size_in_linux

Download get_total_memory_size_in_linux .pl
Author     : jong@salt2.med.harvard.edu
Example    : The /proc/meminfo file looks like this:>>>>
           total:    used:    free:  shared: buffers:  cached:
   Mem:  395735040 233975808 161759232 65953792 111476736 41345024
   Swap:  7319552   147456  7172096
   MemTotal:    386460 kB
   MemFree:     157968 kB
   MemShared:    64408 kB
   Buffers:     108864 kB
   Cached:       40376 kB
   SwapTotal:     7148 kB
   SwapFree:      7004 kB

Keywords   : get_memory_size_in_linux, get_mem_size
Usage      : $mem=${&get_total_memory_size_in_linux};
Version    : 1.0

check_file_exists_in_path

Download check_file_exists_in_path .pl
Author     : jong@salt2.med.harvard.edu
Category   : File
Function   : checks if file exists in UNIX path
Usage      : $exist=&check_file_exists_in_path("hmmbuild");
Version    : 1.0

compress_files_by_gzip

Download compress_files_by_gzip .pl
Author     : jong@salt2.med.harvard.edu
Usage      : &compress_files_by_gzip('file_name_to_be_compressed');
Version    : 1.3

check_if_sec_str_form_hash

Download check_if_sec_str_form_hash .pl
Author     : jong@salt2.med.harvard.edu
Usage      : $check_sec_str_form_hash=${&check_if_sec_str_form_hash(\%sec)};
Version    : 1.0

check_if_defined

Download check_if_defined .pl
Author     : jong@salt2.med.harvard.edu
Function   : checks if all the args are defined
Usage      : $defined=&check_if_defined($var, $file);
Version    : 1.0

check_if_files_exist

Download check_if_files_exist .pl
Author     : jong@salt2.med.harvard.edu
Function   : checks if all the args are defined
Keywords   : check_if_exists, check_if_file_exist
Usage      : $defined=&complain_if_not_defined($var, $file);
Version    : 1.0

die_if_file_not_present

Download die_if_file_not_present .pl
Author     : jong@salt2.med.harvard.edu
Function   : checks if all the args are present
Keywords   : die_unless_present, die_unless_file_present
Usage      : &die_if_file_not_present($var, $file);
Version    : 1.0

ask_for_ENV_vars

Download ask_for_ENV_vars .pl
Author     : jong@salt2.med.harvard.edu, On commercial use issue, Email me.
Function   : asks for env var and write the env var to appropriate shell
             UNIX only RC file
Keywords   : write_ENV_vars, write_env_vars
Usage      : &ask_for_ENV_vars('BLAST_DIR');
Version    : 1.0

reset_shell_environment

Download reset_shell_environment .pl
Author     : jong@salt2.med.harvard.edu, On commercial use issue, Email me.
Version    : 1.0