grepmail - search mailboxes for a particular email Grepmail searches a normal, gzip'd, bzip'd, or tzip'd mailbox for a given regular expression, and returns those emails that match it. Piped input is allowed, and date and size restrictions are supported. New in version 4.80: - Added prototype -E flag to support complex searches. (Thanks to Nelson Minar for the original suggestion in Sep 2000, And terry jones for seconding the idea.) - Added -F flag to force processing of files which grepmail determines are not mailboxes. (feature suggested by terry jones) - Documentation updated to reflect that -B no longer exists. (By terry jones) - The test to determine if a file is a mailbox was improved to adhere better to RFC 822, while still providing some flexibility. (Initial suggestion and patch by terry jones) - Improved date extraction to also look at the 'From ' line when both the Received and Date headers fail. (patch by terry jones) - Fixed a long-standing bug in which filenames of compressed mailboxes which contained special shell characters would cause problems. (Thanks to Jost Krieger for giving me the kick in the pants to finally fix this.) - Fixed a long-standing bug in which grepmail would incorrectly report the filename of compressed mailboxes in error messages. (Thanks to Jost Krieger for giving me the kick in the pants to finally fix this.) As with last release, this release benefits greatly from the feedback and insight of Terry Jones. His desire for complex pattern matches, along with Nelson Minar's original request, is the reason for the new -E flag. This flag allows you to perform complex searches involving logical operators. For example, $email_header =~ /^From: .*\@coppit.org/ && $email =~ /grepmail/i will find all emails which originate from coppit.org (you must escape the "@" sign with a backslash), and which contain the keyword "grepmail" anywhere in the message, in any capitalization. NOTE: -E support is experimental right now. I'm looking for feedback on the following: - Do you like the feature? - Do you like the Perl-based syntax? Is there an alternative which is easier? - How should date and size constraints be integrated? Should they be "variables", a la: "$email =~ /grepmail/ && $date <= 'sep 20 1998' || $size > 50000"? - Should -i, -h, and -b be supported in conjunction with -E? (Where "-h pattern" would mean augmenting the -E pattern with "$email_header =~ /pattern/ && ") - -S ignores signatures. If/when this feature is implemented for -E, should it be "global" for all $email_body matches, or should it be possible to specify this for each $email_body match? For example, one can append an "i" modifier to an individual pattern match to make it case-insensitive. Should there be a standard way of dealing with such "global" pattern matching options on an individual pattern match basis? NOTE: For emails without message ids, grepmail will use Digest::MD5 to compute a hash based on the email header. If you don't have Digest::MD5, grepmail will just use the header itself as the messsage id. The Digest::MD5 checksum takes a little while to compute, but saves a lot of space. Currently there is no easy way to choose space over time. Let me know if this is a problem. MODULE DEPENDENCIES - Date::Parse: required if you want to search based on date (-d) - Date::Manip: required if you want to search using complex date specifications (-d) - Digest::MD5: not required, but can help grepmail use less memory if you are checking for unique emails (-u) and your emails don't have a Message-Id header - Inline >0.41: required if you want to use the mailbox parser written in C (approximately 5% faster than default Perl parser). The modules can be found here: Date::Parse (in TimeDate): http://search.cpan.org/search?dist=TimeDate Date::Manip: http://search.cpan.org/search?dist=DateManip Digest::MD5: http://search.cpan.org/search?dist=Digest-MD5 Inline: http://search.cpan.org/search?dist=Inline Alternatively, installation can be done automatically using the CPAN module: perl -MCPAN -e 'install Date::Parse' perl -MCPAN -e 'install Date::Manip' perl -MCPAN -e 'install Digest::MD5' perl -MCPAN -e 'install Inline' INSTALLATION => On Non-Windows systems: % perl Makefile.PL % make % make test % make install By default, "perl Makefile.PL" does an interactive installation. You can avoid the question about installing Mail::Folder::FastReader by specifying either "FASTREADER=0" or "FASTREADER=1". You can avoid the question about the installation location by specifying either "PREFIX=/installation/path" (for installation into a custom location), "INSTALLDIRS=site" (for installation into site-specific Perl directories), or "INSTALLDIRS=perl" (for installation into standard Perl directories). If make test fails, please see the INSTALLATION PROBLEMS section below. => On Windows systems: - Just copy "grepmail" to a place in your path. You may want to rename it "grepmail.pl" if you've associated .pl files with perl.exe. CONFIGURATION You may want to set your MAIL environment variable so that grepmail will know the default location to search for mailboxes. If you are terribly concerned about performance, you may want to modify the value of the variable READ_CHUNK_SIZE located in the code. This variable controls how much text is read from the mailbox at a time. If the value is set to 0, the entire file is read into memory. (There is no user-visible option for setting this value.) You may also want to hack the code to not use Digest::MD5, thereby trading space for time. If you frequently use the same set of flags, you may wish to alias "grepmail" to "grepmail -flags" within your command interpreter (shell). See the documentation for your shell for details on how to do this. INSTALLATION PROBLEMS If "make test" fails, run make testfunc and see which test(s) are failing. Please email, to the address below, the test##.stderr and test##.stdout files for the test, which are located in t/results. Also email the output of running the test with the -D flag. e.g.: blib/script/grepmail library -D -d "before July 9 1998" t/mailarc-1.txt \ > test##.debug If you see errors about your timezone, and you are in an uncommon timezone, it may be the case that Date::Manip does not support your timezone yet. Try this: perl -MDate::Manip -e 'print "TIMEZONE: ".&Date::Manip::Date_TimeZone."\n"' If you get an error, contact the author of Date::Manip. For other bugs, see the section REPORTING BUGS below. DOCUMENTATION Just "perldoc grepmail". After installation on Unix systems, you can also do "man grepmail". HOMEPAGE Visit http://grepmail.sourceforge.net/ for the latest version, mailing lists, discussion forums, CVS access, cool utilities, and more. REPORTING BUGS You can report bugs at http://sourceforge.net/bugs/?group_id=2207. Please attach the output of running grepmail with the -D switch. If the bug is related to processing of a particular mailbox, try to trim the mailbox to the smallest set of emails that still exhibit the problem. Then use the "anonymize_mailbox" program that comes with grepmail to remove any sensitive information, and attach the mailbox to the bug report. PRIMARY AUTHOR Written by David Coppit (david@coppit.org, http://coppit.org/), with the generous help of many kind people. See the file CHANGES for detailed information. LICENSE This code is distributed under the GNU General Public License (GPL). See http://www.opensource.org/gpl-license.html and http://www.opensource.org/.