Text-Statistics-Latin version 0.01 OVERVIEW Performs corpora statistical analyses(1) SYNOPSIS use Text::ParseWords; use utf8; &Text::Statistics::Latin:LATIN(); INSTALLATION Copy Latin.pm to apropriate perl modules directory. Nothing more needs be done! DESCRIPTION Text::Statistics::Latin creates a seven column CSV file output with one line each token per text given as input a corpus that files names follows: 1 (1). txt', '1 (2). txt', ..., '1 (n).txt' or 1 \(([1-9]|[1-9][0-9]+)\)\.txt Columns stores statistical information: (1) number of word forms in document d; (2) number of tokens in d; (3) Id number of d, ie., n; (4) frequency of term t in d; (5) corpus frequency of t ; (6) document frequency of t (number of documents where t occurs at least once); (7) t, UTF8 latin coded token-string Main output file name is '1 (n + 5).txt' and it is stored in the same directory as the corpus itself, toghether with residual files on each input file with .txu and .txv extensions. Example: #!/usr/bin/perl use strict; use Text::Statistics::Latin; &Text::Statistics::Latin::LATIN("5"); #4 files (5 - 1) are analysed. SEE ALSO http://search.cpan.org/~ambs/ http://search.cpan.org/~sid/ AUTHOR Rodrigo Panchiniak Fernandes COPYRIGHT AND LICENSE Copyright (C) 2007 Rodrigo Panchiniak Fernandes This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.