Search/Context_Graph version 0.01 ================================= This module is an implementation of a search technique called 'contextual network search', or 'spreading activation search'. The idea is to represent a document collection as a set of document and term nodes in a bipartite graph. How you generate the term list is up to you - our own approach is to extract all nouns and noun phrases using a part-of-speech tagger (see L). Documents and terms are connected by weighted edges. Weights on the edges are a function of your choice of weighting algorithm. The only restriction is that weights must not exceed 1. We search the graph by energizing a query node with an arbitrary starting energy E. We then distribute that energy among the neighbor nodes, according to the following formula. First, divide the energy by the number of neighbor nodes - call this new value S. For example, if the starting energy is 10,000, and our node has five neighbors, S = 2000 units. Next, determine whether S exceeds our arbitrary threshold. If S is less than the threshold, we stop propagating. If S exceeds the threshold, we assign energies to all the neighbor nodes, and recurse down. The energy assigned to each neighbor node will depend on the weight of the edge connecting it to the starting node. Since this weight is guaranteed to fall between 0 and 1, the maximum energy a neighbor node can receive is S. INPUT FORMAT The module can take either a hash of document titles and term lists, or a term- document matrix (TDM) file. The first format looks like this { TITLE => { WORD => COUNT, WORD => COUNT, ... }, ... } The TDM input format is useful for very large collections. The TDM file is a plain text file with the following format: Arbitrary text Arbitrary text TERMS DOCS Arbitrary text A B C B C B C .... The first two lines INSTALLATION To install this module type the following: perl Makefile.PL make make test make install COPYRIGHT AND LICENCE Copyright (C) 2003 Maciej Ceglowski, John Cuadrado, NITLE This software may be distributed under the same terms as Perl itself.