Tue Mar 20 12:35:47 EDT 2007 The implementation of duplicate elimination (a la formail -D) has changed radically. Formerly, I read the Message-Id header just like formail does, and used it as an index into a DB_File. Unfortunately, there are many mail programs and systems (including unixoids) that create b0rk3d Message-Id headers, so I can't depend on it alone. Instead I string together From, Date and Message-Id if present and in that order. One would like to use other bits of the message to further enhance uniqueness and guard against collisions, for instance the Subject and Lines headers (if present). This fails because mailing lists modify messages in ways that change these bits. This release is slightly incompatible with past ones. Although the API remains the same, the semantics of header matches has subtly changed. They are now performed on the _original_ header of the incoming message, not on a sanitized (unfolded) copy of it. This is because it was very hard to come up with a clean way of allowing modifications to the original header (which was always the one to use for delivery) while keeping a separate copy for matching. Here's an example where the result is different. Let's say the email header contain the following lines: Content-Type: text/plain; charset="iso-8859-1" and assume it got slurped into $sort. With past versions, $sort->header_match('content-type', '; charset') returned true; now it returns false. This, however, still returns true $sort->header_match('content-type', ';\s+charset') and so does this $sort->header_match('content-type', ';\n charset') Also, header_match and all methods based on it now return a list of header _indices_ rather than the actual header text. Most of the time the returned list list will have exactly one element, but may have more for repeated headers like Received. The actual text of the matched header lines can be recovered with the new method get_header(). For instance, my @msg = ( "Received: from foo\n", "To: myself\n", "Received: from bar\n", "\n", "Boo!\n" ); my $sort = Mail::Sort->new (@msg); my @indices = $sort->header_match ('Received'); # now @indices is (0, 2) my @lines = $sort->get_header (@indices); # now @lines is ("Received: from foo\n", "Received: from bar\n") Mail::Sort is yet another module intended to enable the writing of mail filters in the style of procmail(1). It was written when I realized that the previous entries (Mail::Audit(3) and Mail::Procmail(3)), while trying to emphasize elegance and brevity, sacrificed a good deal of procmail's flexibility and power. First and foremost, both of these existing modules only allow for matching a single header at a time, throwing away procmail's TO_ feature (or any equivalent way of using a non-trivial regexp to match a header tag). This really hurts when you try to match mailing lists and spam list headers. Second, while I sympathize with the choice of procedural interface with global variables in Mail::Procmail (way too much of Perl code out there uses object-oriented syntax for no good reason whatsoever), in the present case it means that the message cannot be modified and then fed into another instance of the filter, again a limitation in comparison to procmail. Mail::Sort is meant to end the history of this particular area by matching procmail's features 1-1. For distribution terms, read the file COPYING.