NAME Text::Match::FastAlternatives - efficient search for many strings SYNOPSIS use Text::Match::FastAlternatives; my $expletives = Text::Match::FastAlternatives->new(@naughty); while (my $line = <>) { print "Do you email your mother with that keyboard?\n" if $expletives->match($line); } DESCRIPTION This module allows you to search for any of a list of substrings ("keys") in a larger string. It is particularly efficient when the set of keys is large. This efficiency comes at the cost of some flexibility: the keys may not contain any control characters or non-ASCII characters; and it cannot do case-insensitive matching. If you want case-insensitivity, you have to fold case yourself: my $expletives = Text::Match::FastAlternatives->new( map { lc } @naughty); while (my $line = <>) { print "Do you email your mother with that keyboard?\n" if $expletives->match(lc $line); } This module is designed as a drop-in replacement for Perl code of the following form: my $expletives_regex = join |, map { quotemeta } @naughty; $expletives_regex = qr/$expletives_regex/; while (my $line = <>) { print "Do you email your mother with that keyboard?\n" if $line =~ $expletives_regex; } Text::Match::FastAlternatives can easily perform this test a hundred times faster than the equivalent regex, if you have enough keys. The more keys it searches for, the faster it gets compared to the regex. Modules like Regexp::Trie can build an optimised version of such a regex, designed to take advantage of the niceties of perls regex engine. With a large number of keys, this module will substantially outperform even an optimised regex like that. In one real-world situa tion with 339 keys, Regexp::Trie produced a regex that ran 857% faster than the naive regex (according to Benchmark), but using Text::Match::FastAlternatives ran 18275% faster than the naive regex, or twenty times faster than Regexp::Tries optimised regex. METHODS Text::Match::FastAlternatives->new(@keys) Constructs a matcher that can efficiently search for all of the @keys in parallel. Throws an exception if any of the keys are undefined, or if any of them contain any control characters or non- ASCII characters. $matcher->match($target) Returns a boolean value indicating whether the $target string con tains any of the keys in $matcher. CAVEATS Subclassing Text::Match::FastAlternatives has a "DESTROY" method implemented in XS. If you write a subclass with its own destructor, you will need to invoke the base destructor, or you will leak memory. Perl 5.10 Perl 5.10 will contain many enhancements to the regex engine, including built-in optimisations for regexes with many branches that contain only literal strings. I suspect that, for the cases where Text::Match::FastAlternatives is currently very fast, it will also be faster than Perl 5.10s regex engine. But I may be wrong; if youre using Perl 5.9.4 or newer, youd be well advised to compare the avail able options on data sets youre likely to use in practice. IMPLEMENTATION Text::Match::FastAlternatives manages to be so fast by using a trie internally. The time to find a match at a given position in the string (or determine that there is no match) is independent of the number of keys being sought; worst-case match time is linear in the length of the longest key. Since a match must be attempted at each position in the target string, total worst-case search time is O(mn) where m is the length of the target string and n is the length of the longest key. SEE ALSO , Regexp::Trie, Regexp::Optimizer, Regexp::Assemble, perl594delta. AUTHOR Aaron Crane COPYRIGHT Copyright 2006 Aaron Crane. This library is free software; you can redistribute it and/or modify it under the terms of the Artistic License, or (at your option) under the terms of the GNU General Public License version 2.