require MIME::Head;
You can create a MIME::Head object in a number of ways:
# Create a new, empty header, and populate it manually: $head = MIME::Head->new; $head->set('content-type', 'text/plain; charset=US-ASCII'); $head->set('content-length', $len); # Create a new header by parsing in the STDIN stream: $head = MIME::Head->read(\*STDIN); # Create a new header by parsing in a file: $head = MIME::Head->from_file("/tmp/test.hdr"); # Create a new header by running a program: $head = MIME::Head->from_file("cat a.hdr b.hdr |");
To get rid of all internal newlines in all fields:
# Get rid of all internal newlines: $head->unfold();
To test whether a given field exists:
# Was a "Subject:" given? if ($head->exists('subject')) { # yes, it does! }
To get the contents of that field as a string:
# Is this a reply? $reply = 1 if ($head->get('Subject') =~ /^Re: /);
To set the contents of a field to a given string:
# Is this a reply? $head->set('Content-type', 'text/html');
To extract parameters from certain structured fields, as a hash reference:
# What's the MIME type? $params = $head->params('content-type'); $mime_type = $$params{_}; $char_set = $$params{'charset'}; $file_name = $$params{'name'};
To get certain commonly-used MIME information:
$mime_type = $head->mime_type; $mime_encoding = $head->mime_encoding; $file_name = $head->recommended_filename; $boundary = $head->multipart_boundary;
"From "
will be either ignored
, flagged as an
error
, or coerced
into the special field "Mail-from:"
(the default; this approach was inspired by Emacs's ``Babyl'' format).
Though not valid for a MIME header, this will provide compatibility with
some Unix mail messages. Just do this:
MIME::Head->tweak_FROM_parsing($choice)
Where $choice
is one of IGNORE
, ERROR
, or COERCE
.
open()
so as to return a readable filehandle. The ``file''
will be opened, read, and then closed:
# Create a new header by parsing in a file: my $head = MIME::Head->from_file("/tmp/test.hdr");
Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:
# Create a new header by parsing in a file: my $head = MIME::Head->new->from_file("/tmp/test.hdr");
On success, the object will be returned; on failure, the undefined value.
This is really just a convenience front-end onto read()
.
Supply this routine with a reference to a filehandle glob; e.g., \*STDIN
:
# Create a new header by parsing in STDIN: my $head = MIME::Head->read(\*STDIN);
Since this method can function as either a class constructor or an instance initializer, the above is exactly equivalent to:
# Create a new header by parsing in STDIN: my $head = MIME::Head->new->read(\*STDIN);
Except that you should probably use th first form. On success, the object will be returned; on failure, the undefined value.
# Output to STDOUT: $head->print(\*STDOUT);
WARNING: this method does not output the blank line that terminates the header in a legal message (since you may not always want it).
Anything that you can't do here, you'll have to do
# Add the trace information: $head->add('Received', 'from eryq.pr.mcs.net by gonzo.net with smtp');
The FIELD is automatically coerced to lowercase. Returns the TEXT.
Normally, the new occurence will be appended
to the existing occurences. However, if the optional WHERE argument is the
string "BEFORE"
, then the new occurence will be prepended
.
NOTE:
if you want to be explicit
about appending, use the string "AFTER"
for this argument.
WARNING
: this method always adds new occurences; it doesn't overwrite any existing
occurences... so if you just want to change
the value of a field (creating it if necessary), then you probably don't
want to use this method: consider using set()
instead.
# Force an explicit character set: if ($head->get('Content-type') !~ /\bcharset=/) { $head->add_text('Content-type', '; charset="us-ascii"'); }
The FIELD is automatically coerced to lowercase.
WARNING: be careful if adding text that contains a newline! A newline in a field value must be followed by a single space or tab to be a valid continuation line!
I had considered building this routine so that it ``fixed'' bare newlines for you, but then I decided against it, since the behind-the-scenes trickery would probably create more problems through confusion. So, instead, you've just been warned... proceed with caution.
# Remove all the MIME information: $head->delete('MIME-Version'); $head->delete('Content-type'); $head->delete('Content-transfer-encoding'); $head->delete('Content-disposition');
Currently returns 1 always.
# Was a "Subject:" given? if ($head->exists('subject')) { # yes, it does! }
The FIELD is automatically coerced to lowercase. This method returns the undefined value if the field doesn't exist, and some true value if it does.
foreach $field (sort $head->fields) { print "$field: ", $head->get($field), "\n"; }
# Is this a reply? $is_reply = 1 if ($head->get('Subject') =~ /^Re: /);
NOTE: this returns the first occurence of the field, so as to be consistent with Mail::Internet::get(). However, if the optional OCCUR argument is defined, it specifies the index of the occurence you want: zero for the first, and -1 for the last.
# Print the first 'Received:' entry: print "Most recent: ", $head->get('received'), "\n"; # Print the first 'Received:' entry, explicitly: print "Most recent: ", $head->get('received', 0), "\n"; # Print the last 'Received:' entry: print "Least recent: ", $head->get('received', -1), "\n";
# How did it get here? @history = $head->get_all('Received');
NOTE:
I had originally experimented with having get()
return all occurences when invoked in an array context... but that causes a
lot of accidents when you get careless and do stuff like this:
print "\u$field: ", $head->get($field), "\n";
It also made the intuitive behaviour unclear if the OCCUR argument was given in an array context. So I opted for an explicit approach to asking for all occurences.
read()
in to create this
object:
print "PARSED FROM:\n", $head->original_text;
# Set the MIME type: $head->set('content-type', 'text/html'); The FIELD is automatically coerced to lowercase. This method returns the text.
``Unfolding'' is the act of removing all newlines.
$head->unfold;
Currently, returns 1 always.
Content-type Content-transfer-encoding Content-disposition
Be aware that they do not just return the raw contents of those fields, and
in some cases they will fill in sensible (I hope) default values. Use get()
if you need to grab and process the raw field text.
Content-Type: Message/Partial; number=2; total=3; id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
Here is how you'd extract them:
$params = $head->params('content-type'); if ($$params{_} eq 'message/partial') { $number = $$params{'number'}; $total = $$params{'total'}; $id = $$params{'id'}; }
Like field names, parameter names are coerced to lowercase. The special '_' parameter means the default parameter for the field.
WARNING: the syntax is a little different for each field (content-type, content-disposition, etc.). I've attempted to come up with a nice, simple catch-all solution: it simply stops when it can't match anything else.
If no encoding could be found, the empty string is returned.
"$type/$subtype"
in all-lowercase.
($type, $subtype) = split('/', $head->mime_type);
If both
the type and
the subtype are missing, the content-type defaults to "text/plain"
, as per RFC-1521:
Default RFC-822 messages are typed by this protocol as plain text in the US-ASCII character set, which can be explicitly specified as "Content-type: text/plain; charset=us-ascii". If no Content-Type is specified, this default is assumed.
If just
the subtype is missing (really a syntax error, but we'll tolerate it, since
some mailers actually do this), then the subtype defaults to "x-subtype-unknown"
. This may change in the future, since I don't know if this was a really
horrible idea: unfortunately, there is no standard default subtype, and
even when a good default can be decided upon, I felt queasy about returning
the erroneous "text"
as either the legal "text/plain"
or the still-illegal "text/"
.
If the content type is present but can't be parsed at all (yow!), the empty string is returned.
Content-type:
field; that is, the leading double-hyphen (--
) is not
prepended.
(Well, almost exactly... from RFC-1521:
(If a boundary appears to end with white space, the white space must be presumed to have been added by a gateway, and must be deleted.)
so we oblige and remove any trailing spaces.)
Returns undef (not the empty string) if either the message is not multipart, if there is no specified boundary, or if the boundary is illegal (e.g., if it is empty after all trailing whitespace has been removed).
Returns undef if no filename could be suggested.
Received:
field, which (unlike From:
, To:
, etc.) will often have multiple occurences; e.g.:
Received: from gsfc.nasa.gov by eryq.pr.mcs.net with smtp (Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C; Thu, 21 Dec 95 16:34 CST Received: from rhine.gsfc.nasa.gov by gsfc.nasa.gov (5.65/Ultrix3.0-C) id AA13596; Thu, 21 Dec 95 17:20:38 -0500 Received: (from eryq@localhost) by rhine.gsfc.nasa.gov (8.6.12/8.6.12) id RAA28069; Thu, 21 Dec 1995 17:27:54 -0500 Date: Thu, 21 Dec 1995 17:27:54 -0500 From: Eryq <eryq@rhine.gsfc.nasa.gov> Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov> To: eryq@eryq.pr.mcs.net Subject: Stuff and things
The Received:
field is used for tracing message routes, and although it's not generally
used for anything other than human debugging, I didn't want to
inconvenience anyone who actually wanted to get at that information. I also
didn't want to make this a special case; after all, who knows what other
fields could have multiple occurences in the future? So, clearly, multiple
entries had to somehow be stored multiple times.
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
More-comprehensive filename extraction by Lee E. Brotzman, Advanced Data Solutions.