WARNING: This code is in an evaluation phase until 1 August 1996. Depending on any comments/complaints received before this cutoff date, the interface may change in a non-backwards-compatible manner.
require MIME::Head;
You can create a MIME::Head object in a number of ways:
# Create a new, empty header, and populate it manually: $head = MIME::Head->new; $head->set('content-type', 'text/plain; charset=US-ASCII'); $head->set('content-length', $len); # Create a new header by parsing in the STDIN stream: $head = MIME::Head->read(\*STDIN); # Create a new header by parsing in a file: $head = MIME::Head->from_file("/tmp/test.hdr"); # Create a new header by running a program: $head = MIME::Head->from_file("cat a.hdr b.hdr |");
To get rid of all internal newlines in all fields:
# Get rid of all internal newlines: $head->unfold();
To test whether a given field exists:
# Was a "Subject:" given? if ($head->exists('subject')) { # yes, it does! }
To get the contents of that field as a string:
# Is this a reply? $reply = 1 if ($head->get('Subject') =~ /^Re: /);
To set the contents of a field to a given string:
# Is this a reply? $head->set('Content-type', 'text/html');
To extract parameters from certain structured fields, as a hash reference:
# What's the MIME type? $params = $head->params('content-type'); $mime_type = $$params{_}; $char_set = $$params{'charset'}; $file_name = $$params{'name'};
To get certain commonly-used MIME information:
# The content type (e.g., "text/html"): $mime_type = $head->mime_type; # The content transfer encoding (e.g., "quoted-printable"): $mime_encoding = $head->mime_encoding; # The recommended filename (e.g., "choosy-moms-choose.gif"): $file_name = $head->recommended_filename; # The boundary text, for multipart messages: $boundary = $head->multipart_boundary;
open()
so as to return a readable filehandle. The ``file''
will be opened, read, and then closed:
# Create a new header by parsing in a file: my $head = MIME::Head->from_file("/tmp/test.hdr");
Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:
# Create a new header by parsing in a file: my $head = MIME::Head->new->from_file("/tmp/test.hdr");
On success, the object will be returned; on failure, the undefined value.
This is really just a convenience front-end onto read()
.
# Output to STDOUT: $head->print(\*STDOUT);
WARNING: this method does not output the blank line that terminates the header in a legal message (since you may not always want it).
Supply this routine with a reference to a filehandle glob; e.g., \*STDIN
:
# Create a new header by parsing in STDIN: my $head = MIME::Head->read(\*STDIN);
Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:
# Create a new header by parsing in STDIN: my $head = MIME::Head->new->read(\*STDIN);
Except that you should probably use the first form. On success, the object will be returned; on failure, the undefined value.
# Add the trace information: $head->add('Received', 'from eryq.pr.mcs.net by gonzo.net with smtp');
The FIELD is automatically coerced to lowercase. Returns the TEXT.
Normally, the new occurence will be appended
to the existing occurences. However, if the optional WHERE argument is the
string "BEFORE"
, then the new occurence will be prepended
.
NOTE:
if you want to be explicit
about appending, use the string "AFTER"
for this argument.
WARNING
: this method always adds new occurences; it doesn't overwrite any existing
occurences... so if you just want to change
the value of a field (creating it if necessary), then you probably don't
want to use this method: consider using set()
instead.
# Force an explicit character set: if ($head->get('Content-type') !~ /\bcharset=/) { $head->add_text('Content-type', '; charset="us-ascii"'); }
The FIELD is automatically coerced to lowercase.
WARNING: be careful if adding text that contains a newline! A newline in a field value must be followed by a single space or tab to be a valid continuation line!
I had considered building this routine so that it ``fixed'' bare newlines for you, but then I decided against it, since the behind-the-scenes trickery would probably create more problems through confusion. So, instead, you've just been warned... proceed with caution.
# Remove all the MIME information: $head->delete('MIME-Version'); $head->delete('Content-type'); $head->delete('Content-transfer-encoding'); $head->delete('Content-disposition');
Currently returns 1 always.
# Was a "Subject:" given? if ($head->exists('subject')) { # yes, it does! }
The FIELD is automatically coerced to lowercase. This method returns the undefined value if the field doesn't exist, and some true value if it does.
foreach $field (sort $head->fields) { print "$field: ", $head->get($field), "\n"; }
# Is this a reply? $is_reply = 1 if ($head->get('Subject') =~ /^Re: /);
NOTE: this returns the first occurence of the field, so as to be consistent with Mail::Internet::get(). However, if the optional OCCUR argument is defined, it specifies the index of the occurence you want: zero for the first, and -1 for the last.
# Print the first 'Received:' entry: print "Most recent: ", $head->get('received'), "\n"; # Print the first 'Received:' entry, explicitly: print "Most recent: ", $head->get('received', 0), "\n"; # Print the last 'Received:' entry: print "Least recent: ", $head->get('received', -1), "\n";
# How did it get here? @history = $head->get_all('Received');
NOTE:
I had originally experimented with having get()
return all occurences when invoked in an array context... but that causes a
lot of accidents when you get careless and do stuff like this:
print "\u$field: ", $head->get($field), "\n";
It also made the intuitive behaviour unclear if the OCCUR argument was given in an array context. So I opted for an explicit approach to asking for all occurences.
read()
in to create this
object:
print "PARSED FROM:\n", $head->original_text;
# Set the MIME type: $head->set('content-type', 'text/html'); The FIELD is automatically coerced to lowercase. This method returns the text.
``Unfolding'' is the act of removing all newlines.
$head->unfold;
Currently, returns 1 always.
Content-type Content-transfer-encoding Content-disposition
Be aware that they do not just return the raw contents of those fields, and
in some cases they will fill in sensible (I hope) default values. Use get()
if you need to grab and process the raw field text.
Content-Type: Message/Partial; number=2; total=3; id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
Here is how you'd extract them:
$params = $head->params('content-type'); if ($$params{_} eq 'message/partial') { $number = $$params{'number'}; $total = $$params{'total'}; $id = $$params{'id'}; }
Like field names, parameter names are coerced to lowercase. The special '_' parameter means the default parameter for the field.
WARNING: the syntax is a little different for each field (content-type, content-disposition, etc.). I've attempted to come up with a nice, simple catch-all solution: it simply stops when it can't match anything else.
If no encoding could be found, the empty string is returned.
"text/plain"
,
"image/gif"
, "x-weird-type"
, which is returned in all-lowercase.
A happy thing: the following code will work just as you would want, even if
there's no subtype (as in "x-weird-type"
)... in such a case, the $subtype
would simply be the empty
string:
($type, $subtype) = split('/', $head->mime_type);
If the content-type information is missing, it defaults to "text/plain"
, as per RFC-1521:
Default RFC-822 messages are typed by this protocol as plain text in the US-ASCII character set, which can be explicitly specified as "Content-type: text/plain; charset=us-ascii". If no Content-Type is specified, this default is assumed.
If just
the subtype is missing (a syntax error unless the type begins with "x-"
, but we'll tolerate it, since some brain-dead mailers actually do this),
then it simply is not reported; e.g.,
"Content-type: TEXT"
is returned simply as "text"
.
WARNING: prior to version 1.17, a missing subtype was reported as ``x-subtype-unknown''. I said at the time that this might be a really horrible idea, and that I might change it in the future. Well, it was, so I did.
If the content type is present but can't be parsed at all (yow!), the empty string is returned.
Content-type:
field; that is, the leading double-hyphen (--
) is not
prepended.
(Well, almost exactly... from RFC-1521:
(If a boundary appears to end with white space, the white space must be presumed to have been added by a gateway, and must be deleted.)
so we oblige and remove any trailing spaces.)
Returns undef (not the empty string) if either the message is not multipart, if there is no specified boundary, or if the boundary is illegal (e.g., if it is empty after all trailing whitespace has been removed).
Returns undef if no filename could be suggested.
"From "
will be either ignored
, flagged as an
error
, or coerced
into the special field "Mail-from:"
(the default; this approach was inspired by Emacs's ``Babyl'' format).
Though not valid for a MIME header, this will provide compatibility with
some Unix mail messages. Just do this:
MIME::Head->tweak_FROM_parsing($choice)
Where $choice
is one of 'IGNORE'
, 'ERROR'
, or 'COERCE'
.
There is also IMHO no requirement [for] MIME::Heads to look like [email] headers; so to speak, the MIME::Head [simply stores] the attributes of a complex object, e.g.:
new MIME::Head type => "text/plain", charset => ..., disposition => ..., ... ;
See the next question for an answer to this one.
I have often wished that the original RFC-822 designers had taken a different approach, and not given every other field its own special grammar: read RFC-822 to see what I mean. As I understand it, in Heaven, all mail message headers have a very simple syntax that encodes arbitrarily-nested objects; a consistent, generic representation for exchanging OO data structures.
But we live in an imperfect world, where there's nonsense like this to put up with:
From: Yakko Warner <yakko@tower.wb.com> Subject: Hello, nurse! Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp (Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C) id AA13596; Thu, 21 Dec 95 17:20:38 -0500 Content-type: text/html; charset=US-ASCII; name="nurse.html"
I quote from Achim Bohnet, who gave feedback on v.1.9 (I think he's using the word header where I would use field ; e.g., to refer to ``Subject:'', ``Content-type:'', etc.):
MIME::Head is too big. A better approach IMHO would be to have a general header class that knows about allowed characters, line length, and some (formatting) output routines. There should be other classes that handle special headers and that are aware of the semantics/syntax of [those] headers...
From, to, reply-to, message-id, in-reply-to, x-face ...
MIME::Head should only handle MIME specific headers.
As he describes, each kind of field really merits its own small class (e.g,
Mail::Field::Subject, Mail::Field::MessageId, Mail::Field::XFace, etc.),
each of which provides a from_field()
method for parsing field
data into
a class object, and a to_field()
method for generating that
field from
a class object.
I kind of like the elegance of this approach. We could then have a generic Mail::Head class, instances of which would consist simply of one or more instances of subclasses of a generic Mail::Field class. Unrecognized fields would be represented as instances of Mail::Field by default.
There would be a MIME::Field class, with subclasses like MIME::Field::ContentType that would allow us to get fields like this:
$type = $head->field('content-type')->type; $subtype = $head->field('content-type')->subtype; $charset = $head->field('content-type')->charset;
And set fields like this:
$head->field('content-type')->type('text'); $head->field('content-type')->subtype('html'); $head->field('content-type')->charset('us-ascii');
And, with that same MIME::Head object, get at other fields, like:
$subject = $head->field('subject')->text; # just the flat text $sender_name = $head->field('from')->name; # e.g., Yakko Warner $sender_addr = $head->field('from')->addr; # e.g., yakko@tower.wb.com
So why a special MIME::Head subclass of Mail::Head? Why, to enable us to add MIME-specific wrappers, like this:
package MIME::Head; @ISA = qw(Mail::Head); sub recommended_filename { my $self = shift; my $try; # First, try to get it from the content-disposition: ($try = $self->field('content-disposition')->filename) and return $try; # Next, try to get it from the content-type: ($try = $self->field('content-type')->name) and return $try; # Give up: undef; }
Looking at a typical mail message header, it is sooooooo tempting to just
store the fields as a hash of strings, one string per hash entry.
Unfortunately, there's the little matter of the Received:
field, which (unlike From:
, To:
, etc.) will often have multiple occurences; e.g.:
Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp (Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C) id AA13596; Thu, 21 Dec 95 17:20:38 -0500 Received: (from eryq@localhost) by rhine.gsfc.nasa.gov (8.6.12/8.6.12) id RAA28069; Thu, 21 Dec 1995 17:27:54 -0500 Date: Thu, 21 Dec 1995 17:27:54 -0500 From: Eryq <eryq@rhine.gsfc.nasa.gov> Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov> To: eryq@eryq.pr.mcs.net Subject: Stuff and things
The Received:
field is used for tracing message routes, and although it's not generally
used for anything other than human debugging, I didn't want to
inconvenience anyone who actually wanted to get at that information.
I also didn't want to make this a special case; after all, who knows what other fields could have multiple occurences in the future? So, clearly, multiple entries had to somehow be stored multiple times... and the different occurences had to be retrievable.
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The more-comprehensive filename extraction is courtesy of Lee E. Brotzman, Advanced Data Solutions.