Text::Soundex Version 3.01 NOTE: Users of Text::Soundex Version 2.x should consult the 'History' section at the end of this document before installing this module. The interface has been simplified, and existing code that takes advantages of Version 2.x features may need to be altered to function properly. This is a perl 5 module implementing the Soundex algorithm described by Knuth. The algorithm is used quite often for locating a person by name where the actual spelling of the name is not known. This version directly supercedes the version of Text::Soundex that can be found in the core from Perl 5.8.0 and down. (This version is a drop-in replacement) The algorithm used by soundex() is NOT fully compatible with the algorithm used to index names for US Censuses. Use the soundex_nara() subroutine to return codes for this purpose. Basic Usage: Soundex is used to do a one way transformation of a name, converting a character string given as input into a set of codes representing the identifiable sounds those characters might make in the output. For example: use Text::Soundex; print soundex("Mark"), "\n"; # prints: M620 print soundex("Marc"), "\n"; # prints: M620 print soundex("Hansen"), "\n"; # prints: H525 print soundex("Hanson"), "\n"; # prints: H525 print soundex("Henson"), "\n"; # prints: H525 In many situations, code such as the following: if ($name1 eq $name2) { ... } Can be substituted with: if (soundex($name1) eq soundex($name2)) { ... } Installation: Once the archive has been unpacked then the following steps are needed to build, test and install the module (to be done in the directory which contains the Makefile.PL) perl Makefile.PL make make test If the make test succeeds then the next step may need to be run as root (on a Unix-like system) or with special privileges on other systems. make install If you do not want to use the XS code (for whatever reason) do the following instead of the above: perl Makefile.PL --no-xs make make test make install If any of the tests report 'not ok' and you are running perl 5.6.0 or later then please contact Mark Mielke History: Version 3.01: A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper routines were included and the documentation refers the user to the excellent Text::Unidecode module to perform soundex encodings using unicode strings. The Perl versions of the routines have been further optimized, and correct a border case involving non-alphabetic characters at the beginning of the string. Version 3.00: Support for UTF-8 strings (unicode strings) is now in place. Note that this allows UTF-8 strings to be passed to the XS version of the soundex() routine. The Soundex algorithm treats characters outside the ascii range (0x00 - 0x7F) as if they were not alphabetical. The interface has been simplified. In order to explicitly use the non-XS implementation of soundex(): use Text::Soundex (); $code = Text::Soundex::soundex_noxs($name); In order to use the NARA soundex algorithm: use Text::Soundex 'soundex_nara'; $code = soundex_nara($name); Use of the ':NARA-Ruleset' import directive is now obsolete. To emulate the old behaviour: use Text::Soundex (); *soundex = \&Text::Soundex::soundex_nara; $code = soundex($name); Version 2.20: This version includes support for the algorithm used to index the U.S. Federal Censuses. There is a slight descrepancy in the definition for a soundex code which is not commonly known or recognized involved similar sounding letters being seperated by the characters H or W. This is defined as the NARA ruleset, as this descrepency was discovered by them. (Calling it "the US Census ruleset" was too unwieldy...) NARA can be found at: http://www.nara.gov/genealogy/ The algorithm requested by NARA can be found at: http://home.utah-inter.net/kinsearch/Soundex.html Ways to use it in your code: Transparently change existing code like this: ============================================= use Text::Soundex qw(:NARA-Ruleset); ... soundex(...) ... -- Make the change visibly distinct like this: =========================================== use Text::Soundex qw(soundex_nara); ... soundex_nara(...) ... Version 2.00: This version is a full re-write of the 1.0 engine by Mark Mielke. The goal was for speed... and this was achieved. There is an optional XS module which can be used completely transparently by the user which offers a further speed increase of a factor of more than 7.5X. Version 1.00: This version can be found in the perl core distribution from at least Perl 5.8.0 and down. It was written by Mike Stok. It can be identified by the fact that it does not contain a $VERSION in the beginning of the module, and as well it uses an RCS tag with a version of 1.x. This version, before some perl5'ish packaging was introduced, was actually written for perl4.