The Text::EtText Format Converter
This converter converts from Text::EtText, a simple plain-text format, to
HTML. Like most simple text markup formats (POD, setext, etc.), EtText markup
handles the usual things: insertion of <P> tags, header
recognition and markup. However it adds a powerful link markup system.
EtText markup is simple and effective; it's based loosely on WikiWikiWeb TextFormattingRules.
Basic Text Markup
If you leave blank lines between paragraphs, <p> and
</p> tags will be inserted in the correct places.
EtText does quite a good job of this.
Words wrap and fill automatically, so there's no need to worry about wrapping
before 80 characters. (It's good form to do so anyway, in case other people
ever need to edit your text, though.)
A paragraph consisting of a line of 10 or more consecutive - or _ signs will
be converted to a HR tag.
Sections of text between pairs of certain characters will be turned into
markup, as follows:
EtText
|
Tag Used
|
Result
|
**text**
|
<strong>
|
text
|
__text__
|
<em>
|
text
|
##text##
|
<code>
|
text
|
& signs that have whitespace on either side will be converted
to & signs automatically.
Text indented from the left margin will be converted into a <P>
paragraph wrapped in a <blockquote> -- unless it starts with a
* , - , + or o character
followed by whitespace, in which case it's interpreted as a list item; see
Lists below.
Another exception to the above rule is that text indented by only 1 space, or
on lines starting in the first column with two colon characters, will be
surrounded by <pre> tags.
If you find writing HTML tag-pairs manually annoying, EtText includes an idea
from Latte; balanced-tag generation. Wrap the text to be tagged with
the name of the tag followed immediately by a { character on the left, and a }
character on the right. In other words,
strong{text}
will be rendered as
<strong>text</strong>
or, in other words, text . This can be nested, so strong{text
with i{italic} bits} will be rendered as text with italic
bits.
In addition, the balanced-tag support has a bonus feature, in that it supports
CSS classes; follow the name of the tag with a full stop and the class, and
it will use that class, like so:
i.green{foo}
will be rendered as
<i class="green>foo</i>
Lists
A paragraph indented from the left margin (by either spaces or tabs, or both),
and starting with a * , - , + or
o character followed by whitespace, will be converted into a list
item (<li> tag).
The same goes for indented paragraphs that start with the string
1. , followed by whitespace. However the default list tag in this
case will be an <ol>...</ol> list. Any positive integer
followed immediately by a full stop and a space will do the trick. (BTW: I
used to use # to do this, but I preferred the WikiIdea, it
looks better.)
(Compatibility note: previous versions of EtText required that the
<ul> or <ol> tags be written manually. This is no
longer the case.)
Some text editors (such as vim) will reformat list items automatically,
assuming that you want the text to line up with the start of the text, instead
of the bullet-point character, on the previous line, like so:
- this is a list item. We should make sure that
blah blah etc. etc.
WebMake supports this.
Indented paragraphs that start with term: tab rest of paragraph will be converted
into definition lists (this is another StolenFromWikiIdea). They
look like this:
-
Foo
-
Blah blah blah etc.
Sidebars and Side Images
If you wish to display an image, or small sidebar, beside a paragraph of text,
use the <etleft> and <etright>
tags. These are rendered as a one-row, two-column
<table> wrapping the paragraph and the sidebar, as
follows:
<etleft><img src=bubba.png></etleft>This is the main
paragraph body. Foo bar baz blah blah blah etc.
Is displayed as:
|
This is the main paragraph body.
Foo bar baz blah blah blah etc.
|
<etright><img src=bubba.png></etright>This is the
main paragraph body. Foo bar baz blah blah blah etc.
Is displayed as:
This is the main paragraph body.
Foo bar baz blah blah blah etc.
|
|
When HTML and EtText Collide
HTML tags can be used freely throughout an EtText document. However, in some
situations, you may wish to preserve whitespace, avoid paragraph tags being
added, etc.; to use your own HTML without meddling from EtText, wrap it in an
<!--etsafe-->...<!--/etsafe-->
tag pair; this will protect it.
Note that text blocks wrapped in <pre>,
<listing> and <xmp> tags are
automatically protected in this way; the <!--etsafe-->
tag pair is not required.
EtText adds two entities, &etsqi; and &etsqo;. These represent
[ and ] respectively, and are used to protect a square-bracketed
piece of text from being interpreted as a link URL (see Link Markup
below).
EtText Links
As well as the standard <a href=url>...</a> link
specification used in HTML, EtText will automatically add href tags for URLs
and email addresses that occur in the text. In addition, EtText supports its
own link format, as follows.
The basic concept is of a word or "quoted set of words" followed by a link
label in [square brackets], like this: "this is a link"
[label].
The href used in the link is then defined at another point in the document, as
an indented line like this:
[label]: http://url...
Text and markup can be enclosed in the quotes, everything quoted will become
part of the link text. Single words or HTML tags do not need to be quoted, so
<img src="http://jmason.org/license_plate.jpg" width="10"
height="10"> [homepage] will work correctly.
Glossary Links
EtText also supports a concept called glossary links; if you define a
link, the name of that link will automatically become a href if enclosed in
quotes. For example:
[Justin Mason]: http://jmason.org/
will mean that any occurrence of the name "Justin Mason'', in quotes, in
any EtText content chunk or file in the site, becomes a link to that
address. These links are stored in the WebMake cache file.
Quoted bits of text that do not map to an entry in the glossary are not
converted to links (unless they're followed by a square-bracketed link-label
reference).
URLs, such as http://webmake.taint.org/ , and email addresses, such as
jm@nospam-jmason.org, are automatically converted into links to that same
address.
Blocking EtText Link Interpretation
To block interpretation as a link, replace square brackets with the HTML
entities &etsqi; and &etsqo;, which map to [ and ]
respectively; replace quote characters, ", with two apostrophes,
''. If that doesn't do the trick, wrap the entire section of text
with the <!--etsafe-->...<!--/etsafe--> tags.
Similar Systems
EtText-like plain-text-to-markup conversion systems have a long history. The
first time I came across the concept was with Setext, which was
included with Tony Sanders' Plexus web server, back in September 1993.
Yes, 1993. Setext has been around for a while!
WikiWikiWeb is quite a recent, well-established system which uses
a similar markup style.
Userland's Frontier includes a text-to-markup conversion
system as well.
Some well-known sites that use their own converters to convert
plain-text to markup include http://www.blogger.com/, http://slashdot.org/
(for comments) and http://www.advogato.org/.
Jorn Barger maintains an impressive summary of etext formats at his Robot
Wisdom site. Skip down to section 3, Internet etext
standards, for the directly-relevant stuff.
Zope and ZWiki use a format called StructuredText, which again comes from
WikiLand. There's some interesting work going on there with the STXDocument
object, which is a web-managable object that contains information marked up
in the structured text format.
|