MIME::Words - deal with RFC-1522 encoded words
Before reading further, you should see MIME::Tools to make sure that you understand where this module fits into the grand scheme of things. Go on, do it now. I'll wait.
Ready? Ok...
use MIME::Words qw(:all);
### Decode the string into another string, forgetting the charsets:
$decoded = decode_mimewords(
'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
);
### Split string into array of decoded [DATA,CHARSET] pairs:
@decoded = decode_mimewords(
'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>',
);
### Encode a single unsafe word:
$encoded = encode_mimeword("\xABFran\xE7ois\xBB");
### Encode a string, trying to find the unsafe words inside it:
$encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB in town");
Fellow Americans, you probably won't know what the hell this module
is for. Europeans, Russians, et al, you probably do. :-)
.
For example, here's a valid MIME header you might get:
From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
=?US-ASCII?Q?.._cool!?=
The fields basically decode to (sorry, I can only approximate the
Latin characters with 7 bit sequences /o and 'e):
From: Keith Moore <moore@cs.utk.edu>
To: Keld J/orn Simonsen <keld@dkuug.dk>
CC: Andr'e Pirard <PIRARD@vm1.ulg.ac.be>
Subject: If you can read this you understand the example... cool!
In an array context, splits the ENCODED string into a list of decoded
[DATA, CHARSET]
pairs, and returns that list. Unencoded
data are returned in a 1-element array [DATA]
, giving an effective
CHARSET of undef
.
$enc = '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>';
foreach (decode_mimewords($enc)) {
print "", ($_[1] || 'US-ASCII'), ": ", $_[0], "\n";
}
In a scalar context, joins the "data" elements of the above list together, and returns that. Warning: this is information-lossy, but if you know that all charsets in the ENCODED string are identical, it might be useful to you.
In the event of a syntax error, $@ will be set to a description of the error, but parsing will continue as best as possible (so as to get something back when decoding headers). $@ will be false if no error was detected.
Any arguments past the ENCODED string are taken to define a hash of options:
Subject: Here is =?US-ASCII?Q?=46=4F=4F.doc?=
You can access it in either of two ways:
$subj = $head->get("subject"); ### gets: "Here is =?....?="
$subj = unmime $head->get("subject"); ### gets: "Here is FOO.doc"
### Encode "<<Franc,ois>>": $encoded = encode_mimeword("\xABFran\xE7ois\xBB");
You may specify the ENCODING ("Q"
or "B"
), which defaults to "Q"
.
You may specify the CHARSET, which defaults to iso-8859-1
.
### Encode a string with some unsafe "words": $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB");
Returns the encoded string. Any arguments past the RAW string are taken to define a hash of options:
"q"
or "b"
. The default is "q"
.
Warning: this is a quick-and-dirty solution, intended for character
sets which overlap ASCII. It does not comply with the RFC-1522
rules regarding the use of encoded words in message headers.
You may want to roll your own variant,
using encoded_mimeword()
, for your application.
Thanks to Jan Kasprzak for reminding me about this problem.
Exports its principle functions by default, in keeping with MIME::Base64 and MIME::QuotedPrint.
Eryq (
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Thanks also to...
Kent Boortz For providing the idea, and the baseline
RFC-1522-decoding code!
KJJ at PrimeNet For requesting that this be split into
its own module.
Stephane Barizien For reporting a nasty bug.
$Revision: 5.403 $ $Date: 2000/11/04 19:54:48 $