use MIME::Parser; # Create a new parser object: my $parser = new MIME::Parser; # Optional: set up parameters that will affect how it extracts # documents from the input stream: $parser->output_dir("$ENV{HOME}/mimemail"); # Parse an input stream: $entity = $parser->read(\*STDIN) or die "couldn't parse MIME stream"; # Congratulations: you now have a (possibly multipart) MIME entity! $entity->dump_skeleton; # for debugging
multipart-body := preamble 1*encapsulation close-delimiter epilogue
encapsulation := delimiter body-part CRLF
delimiter := "--" boundary CRLF ; taken from Content-Type field. ; There must be no space between "--" ; and boundary.
close-delimiter := "--" boundary "--" CRLF ; Again, no space by "--"
preamble := discard-text ; to be ignored upon receipt.
epilogue := discard-text ; to be ignored upon receipt.
discard-text := *(*text CRLF)
body-part := <"message" as defined in RFC 822, with all header fields optional, and with the specified delimiter not occurring anywhere in the message body, either on a line by itself or as a substring anywhere. Note that the semantics of a part differ from the semantics of a message, as described in the text.>
From this we glean the following algorithm for parsing a MIME stream:
PROCEDURE parse INPUT A FILEHANDLE for the stream. An optional end-of-stream OUTER_BOUND (for a nested multipart message). RETURNS The (possibly-multipart) ENTITY that was parsed. A STATE indicating how we left things: "END" or "ERROR". BEGIN LET OUTER_DELIM = "--OUTER_BOUND". LET OUTER_CLOSE = "--OUTER_BOUND--". LET ENTITY = a new MIME entity object. LET STATE = "OK". Parse the (possibly empty) header, up to and including the blank line that terminates it. Store it in the ENTITY. IF the MIME type is "multipart": LET INNER_BOUND = get multipart "boundary" from header. LET INNER_DELIM = "--INNER_BOUND". LET INNER_CLOSE = "--INNER_BOUND--". Parse preamble: REPEAT: Read (and discard) next line UNTIL (line is INNER_DELIM) OR we hit EOF (error). Parse parts: REPEAT: LET (PART, STATE) = parse(FILEHANDLE, INNER_BOUND). Add PART to ENTITY. UNTIL (STATE != "DELIM"). Parse epilogue: REPEAT (to parse epilogue): Read (and discard) next line UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF LET STATE = "EOF", "DELIM", or "CLOSE" accordingly. ELSE (if the MIME type is not "multipart"): Open output destination (e.g., a file) DO: Read, decode, and output data from FILEHANDLE UNTIL (line is OUTER_DELIM or OUTER_CLOSE) OR we hit EOF. LET STATE = "EOF", "DELIM", or "CLOSE" accordingly. ENDIF RETURN (ENTITY, STATE). END
For reasons discussed in MIME::Entity, we can't just discard the ``discard text'': some mailers actually put data in the preamble.
my $parser = new MIME::Parser; $parser->output_dir("/tmp"); $parser->output_prefix("msg1"); my $entity = $parser->read(\*STDIN);
WARNING: This needs a lot more work.
If you don't like the behavior of this function, you can override it in a subclass.
"."
.
If OPTVALUE
is not
given, the current output directory is returned. If OPTVALUE
is
given, the output directory is set to the new value, and the previous value
is returned.
Get/set the output directory for the parsing operation. This is a short string that all filenames for extracted and decoded body parts will begin with. The default is "msg" .
If OPTVALUE
is not
given, the current output prefix is returned. If OPTVALUE
is
given, the output directory is set to the new value, and the previous value
is returned.
output_dir().
The stream should be given as a glob ref to a readable FILEHANDLE; e.g., \*STDIN
.
Returns a MIME::Entity, which may be a single entity, or an arbitrarily-nested multipart entity. Returns undef on failure.
read()
, intended for programs running under mail-handlers like deliver
, which splits the incoming mail message into a header file and a body
file.
Simply give this method the paths to the respective files. These must be pathnames: Perl ``open-able'' expressions won't work, since the pathnames are shell-quoted for safety.
WARNING: it is assumed that, once the files are cat'ed together, there will be a blank line separating the head part and the body part.
A better solution for this case would be to set up some form of state machine for input processing. This will be left for future versions.
The revised implementation uses temporary files (a la tmpfile()
) to hold the encoded portions of MIME documents. Such files are deleted
automatically after decoding is done, and no more than one such file is
opened at a time, so you should never need to worry about them.
"\r\n"
). However, it is extremely likely that folks will want to parse MIME
streams where each line ends in the local newline character "\n"
instead.
An attempt has been made to allow the parser to handle both CRLF and newline-terminated input.
"7bit"
and "8bit"
decoders will decode both a "\n"
and a "\r\n"
end-of-line sequence into a "\n"
.
The "binary"
decoder (default if no encoding specified) still outputs stuff verbatim...
so a MIME message with CRLFs and no explicit encoding will be output as a
text file that, on many systems, will have an annoying ^M at the end of
each line... but this is as it should be
.
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.