HTML::TextToHTML Sample Conversion
This sample is based hugely on the original sample.txt produced by Seth Golub for txt2html.
I used the following options to convert this document:
-tf --mail --heading '^ *--[\w\s]+-- *$' --system_link_dict TextToHTML.dict --append_body sample.foot --infile sample.txt --outfile sample.html
This has either been done at the command line with:
perl -MHTML::TextToHTML -e run_txt2html -- options
or from a (test) perl script with:
use HTML::TextToHTML; my $conv = new HTML::TextToHTML(); $conv->txt2html([options]);
From bozo@clown.wustl.edu
Return-Path: <bozo@clown.wustl.edu>
Message-Id: <9405102200.AA04736@clown.wustl.edu>
Content-Length: 1070
From: bozo@clown.wustl.edu (Bozo the Clown)
To: rubykat@katspace.com (Kathryn Andersen)
Subject: Re: HTML::TextToHTML
Date: Sun, 12 May 2002 10:01:10 -0500
Bozo wrote:
BtC> Can you post an example text file with its html'ed output?
BtC> That would provide a much better first glance at what it does
BtC> without having to look through and see what the perl code does.
Good idea. I'll write something up.
The header lines were kept separate because they looked like mail headers and I have mailmode on. The same thing applies to Bozo's quoted text. Mailmode doesn't screw things up very often, but since most people are usually converting non-mail, it's off by default.
Paragraphs are handled ok. In fact, this one is here just to demonstrate that.
THIS LINE IS VERY IMPORTANT!
(Ok, it wasn't that important)
Since this is the first header noticed (all caps, underlined with an "="), it will be a level 1 header. It gets an anchor named "section-1".
This is the second type of header (not all caps, underlined with "="). It gets an anchor named "section-1.1".
This header was in the same style, so it was assigned the same header tag. Note the anchor names in the HTML. (You probably can't see them in your current document view.) Its anchor is named "section-1.2". Get the picture?
You can define your own custom header patterns if you know what your documents look like.
It just needs to have enough whitespace in the line. Surrounding blank lines aren't necessary. If it sees enough whitespace in a line, it preformats it. How much is enough? Set it yourself at command line if you want.
We're the knights of the round table
We dance whene'er we're able
We do routines and chorus scenes
With footwork impeccable.
We dine well here in Camelot
We eat ham and jam and spam a lot.
If I want to emphasize something, then I'd use stars to wrap around the words, even if there were more than one, that's what I'd do. But I could also underline words, so long as the darn thing was not a_variable_name, in which case I wouldn't want to lose the underscores in something which thought it was underlining. Though we might want to underline more than one word in a sentence. Especially if it is The Title Of A Book. For another kind of emphasis, let's go and put something in bold.
There are some things which this module doesn't handle yet which I would like to implement. Something which would be *really exciting* would be coping with italics and similar things *spread across multiple lines*.
I would also like to implement tables, like the following:
Written For: The First Sentinel Lyric Wheel Disclaimer: The characters and concepts of The Sentinel are owned by Pet Fly
Productions. I'm just borrowing them for a while. Rating: PG (language) Summary: After the fire, guilt and nightmare. Hurt, angst, a little comfort.
Definition lists would also be cool.
Fanfic:
Fiction based on the universe of some movie or TV show.
Fanzine:
Amateur magazine produced by fans.
The footer is everything from the end of this sentence to the </BODY> tag.