All-In-One Documentation Contents
Contents for the 'Introduction' section
The Blurb
WebMake is a simple content management system, based around a templating
system for HTML documents, with lots of built-in smarts about what a
"typical" informational website needs in the way of functionality; metadata,
sitemapping, navigational aids, and (of course) embedded perl code. ;)
-
Creates portable sites: It requires no dynamic scripting capabilities
on the server; WebMake sites can be deployed to a plain old FTP site
without any problems.
-
No need to edit lots of files: A multi-level website can be generated
entirely from 1 WebMake file containing content, links to content files,
perl code (if needed), and output instructions.
-
Useful for team work: Since the file-to-page mapping is no longer
required, WebMake allows the separation of responsibilities between the
content editors, the HTML page designers, and the site architect. Only the
site architect needs to edit the WebMake file itself, or know perl or
WebMake code. Standard file access permissions can be used to restrict
editing by role.
-
Efficient: WebMake supports dependency checking, so a one-line change
to one source file will not regenerate your entire site -- unless it's
supposed to. Only the files that refer to that chunk of content, however
indirectly, will be modified.
-
Supports content conversion, on the fly: Text can be edited as
standard HTML, converted from plain text (see below), or converted from
any other format by adding a conversion method to the
WebMake::FormatConvert module.
-
Edit text as text, not as HTML: One of the built-in content conversion
modules is Text::EtText, which provides an easy-to-edit,
easy-to-read and intuitive way to write HTML, based on the plain-text
markup conventions we've been using for years.
-
Rearrange your site in 30 seconds: Since URLs can be referred to
symbolically, pages can be moved around and URLs changed by changing just
one line. All references to that URL will then change automatically. This
is vaguely Xanalogical.
-
Scriptable: Content items and output URLs can be generated, altered,
or read in dynamically using perl code. Perl code can even be used to
generate other perl code to generate content/output URLs/etc.,
recursively. New tags can be defined and interpreted in perl.
-
Extensible: New tags (for use in content items or in the WebMake file
itself) can be added from perl code, providing what amounts to a
dynamically-loaded plugin API.
-
Inclusion of text: Content can incorporate other content items, simply
by referring to it's name. This is a form of Xanadu-style transclusion.
WebMake as a CMS
WebMake is, arguably, a Content Management System, or CMS.
To be more specific, it's oriented entirely towards generating a relatively
static site, such as a weblog, a news site (without comments or
personalisation) or a typical informational site.
It does not have any dynamic, database-driven, features suitable for "live"
sites that update frequently with dynamic data; nor does it have support for
"personalisation" features, where the site displays different data based on
what the user presents in their HTTP request. (Of course, using WebMake does
not preclude using PHP, mod_perl, Mason etc. to provide these, however.)
In addition, it does not yet have a graphical interface for defining and
editing content, although this is planned.
Here's the relevant details of what it can do.
WebMake's CMS Features
-
Separation between content and layout
Since, logically, content and layout are entirely separate tasks, they
should be easy to keep separate in the CMS.
WebMake uses content references to include content into pages, and
implement templating. This allows you to separate the content text from
the template layout HTML; the template designers just need to include
a content reference, such as ${body} , instead of the
text.
-
No requirement for text editors to know HTML
Only the layout staff should really need to know HTML, so the staff who
provide text content can do this without HTML knowledge.
WebMake provides Text::EtText, which provides an
easy-to-edit, easy-to-read and intuitive way to write HTML, based on the
plain-text markup conventions we've been using for years.
-
Generation of pages automatically, using metadata from content items
It should be possible to generate index pages, sitemaps, navigation links,
and other text automatically, based on properties and metadata of the
pieces of content loaded.
WebMake supports this by allowing any content item to carry arbitrary
metadata text. Perl code can then be used to dynamically
request a list of content items that have a particular set of metadata,
and any page can refer to another content item's title, description,
abstract etc. without itself needing to parse the content text.
-
Flexible URL support
It should be trivial to rearrange a site, if required, totally changing
the URLs used in the site's pages.
WebMake supports this by using symbolic URL references,
which can be modified by changing one line, causing references to that
URL throughout the site to change.
What WebMake Is Missing
-
Edit-In-Page Functionality
Most CMSes boast a nice, browser-based user interface to creating, naming,
uploading and filling out content items and media.
WebMake currently leaves you with your trusty text editor, to edit the
.wmk file and add the tags needed to define these. This is fine if you're
of the UNIX mindset -- but for general-purpose use, it will need a more
friendly interface.
-
Database Support
It would be nice if WebMake could load content from a database. It
currently cannot, although there's nothing in the architecture that would
preclude this; there just has not been a need, just yet.
Unfortunately, this may not be possible -- this IBM software patent details a mechanism whereby a server can dynamically rebuild its
pages, based on changes to objects in a database. WebMake could run
afoul of this if database support is added (although there are a few
points where this could be avoided).
-
XML Support
This will definitely arrive -- as soon as a good XSLT engine becomes part
of Perl, or at least becomes easy to install from CPAN. It's on my list ;)
-
Workflow
There's currently no logic to support workflow. This would not
be difficult to add to the graphical interface discussed above.
WebMake Operation
When you run WebMake, it'll first search for a file ending with .wmk in
the current directory, then in the parent directory, and so on 'til it hits
the root directory.
Alternatively if you use the -R switch, it'll search relative to the
filename specified on the command-line; this is very handy if you're
calling WebMake from a macro in your editor or IDE, as it means you don't even
have to be running the editor in the same working directory as the files
you're working on.
Anyway, once it finds the WebMake file, it reads the file and parses it.
Contents and media statements will cause it to search directories or
other data sources for content (directly includable files and text blocks) or
media (static content that can be linked to or otherwise referred to with a
href). Include statements will cause it to directly include a block of
code from another file.
Finally, once the WebMake file has been parsed, the list of out
files will have been read. Each of these is roughly equivalent to
a target in traditional UNIX make(1) terminology.
If a target has been specified on the command line, that file will be made;
otherwise, all the out files named in the WebMake file will be made.
Dependencies And Other Optimisations
"Making" the target is not the end of it -- strictly speaking, the target
may or may not be updated. WebMake tracks the dependencies of each file, and
if these have not changed, the file will not be rebuilt.
That's the first optimisation. However it doesn't always work; if some of the
file's text is generated by, or depends on text that contains dynamic Perl
code, WebMake will always have to rebuild the file.
To avoid continually "churning" the file, regenerating it every time WebMake
is run, a comparison step takes place. Before the file is written to disk,
WebMake compares the file in memory with the file on disk; if there are no
changes, the on-disk file will not be modified in any way. This means tools
like rsync(1), rdist(1) or even make(1) itself will work fine with
a WebMake site.
All of these optimisations can be overridden by using the -F (freshen)
command-line switch; this will force output whether or not the files have
changed.
Ensuring A Seamless Transition
A very large (or very complicated) WebMake site can take a while to update.
To avoid broken links while updating the site, WebMake generates all output
into temporary files called filename.new; once all the output
has been generated, these are renamed into place. This minimises the
time during which there may be inconsistencies in the site.
Caching
Since WebMake uses dependencies to avoid rebuilding the entire site
every time, it needs to cache metadata and dependency information
somewhere.
Currently this data is stored in a file called filename/cache.db,
where filename is a sanitised version of the WebMake file's name, in the
.webmake subdirectory of your home directory.
Invoking Webmake
WebMake can be run using the command-line tool webmake, or by
using the Perl module HTML::WebMake::Main.
In addition the EtText format can be used using its command line tools,
or by using the Perl modules directly.
The command-line tools' POD documentation:
And the POD documentation for the Perl module:
How to Use WebMake
Chances are, you already have a HTML site you wish to migrate to WebMake.
This document introduces WebMake's way of doing things, and how to go
about a typical migration.
Place The .wmk File
First, pick a top-level directory for the site; that's where you'll place your
.wmk file. All the generated files should be beneath this directory. In
this example I'll call it index.wmk.
Make Templates
Next, identify the page templates used in the site. To keep it simple, let's
imagine you have only one look and feel on the pages, with the usual stuff in
it; high-level HTML document tags, such as <html>, <head>,
<title>, <body>, that kind of stuff. There may also be some
formatting, such as a <table> with a side column containing links, etc.,
or a top-of-page title. All of these are good candidates for moving into a
template. I typically call these templates something obvious like
page_template or sitename_template, where sitename is the name of
the site.
For this example, let's imagine you have the HTML high-level tags and a page
title as your typical template items.
So edit the index.wmk file, and add a template content item, by cutting
and pasting it from one of your pages. Instead of cutting and pasting the
real title, use a metadata reference:
$[this.title]. Also, replace the text of the page
with ${page_text}; the plan is that, before this content item
will be referenced, this content item will have been set to the text you wish
to use.
<webmake>
<content name=page_template>
<html><head><title>$[this.title]</title></head>
<body bgcolor=#ffffff><h1>$[this.title]</h1>
<hr>
${page_text}
<hr>
</body></html>
</content>
Grab The Pages' Text
Next, run through the pages you wish to WebMake-ify, and either:
-
move them into a "raw" subdirectory, from where WebMake can read them
with a <contents> tag, or;
-
include them into the index.wmk file directly.
It's a matter of taste; I initially preferred to do 1, but nowadays 2 seems
more convenient for editing, as it provides a very easy way to break up long
pages, and it makes search-and-replace easy. Anyway, it's up to you. I'll
illustrate using 2 in this example.
Give each content item a name. I generally use the name of the HTML file, but
with a .txt extension instead of .html to mentally differentiate the
input from the output.
Strip the template elements (head tag, surrounding eye-candy tables, etc.)
from each page, leaving just the main text body behind. Keep the titles
around for later, though.
<content name="document1.txt">
....your html here...
</content>
<content name="document2.txt">
....your html here...
</content>
<content name="document3.txt">
....your html here...
</content>
Convert To EtText (OPTIONAL!)
Now, one of the best bits of WebMake (in my opinion) is EtText,
the built-in simple text markup language; to use this, run the command-line
tool ethtml2text on each of your HTML files to convert them
to EtText, then include that text, instead of the HTML, as the content items.
Don't forget to add format="text/et" to the content tag's attributes,
though:
<content name="document1.txt" format="text/et">
....your ettext here...
</content>
...
To keep things simple, I'll assume you haven't used EtText in the examples
from now on.
Add Titles
Next, you need to set the titles in the content items, so that they can be
used in higher-level templates, such as the page_template content item we
defined earlier.
To really get some power from WebMake, use metadata to do this.
What is Metadata?
A metadatum is like a normal content item, except it is exposed to other
pages in the index.wmk file. Normally, you cannot reliably read a dynamic
content item that was set from another page; if one content item sets a
variable like this:
<{set foo="Value!"}>
Any content items evaluated after that variable is set can access
${foo}, as long as they occur on the same output page.
However if they occur on another output page, they may not be able to access
${foo}.
To get around this, WebMake includes the <wmmeta> tag,
which allows you to attach data to a content item. This data will then be
accessible, both to other pages in the site (as
$[contentname.metaname], and to other content
items within the same page (as $[this.metaname]).
Think of them as like size, modification time, owner etc. on files; or member
variables in an object-oriented language.
Anyway, titles of pages are a perfect fit for metadata. So convert your
page titles into <wmmeta> tags like so:
<content name="document1.txt">
<wmmeta name="title">Your Title Here</wmmeta>
....your ettext here...
</content>
...
Sometimes, for example if you plan to generate index pages or a sitemap later
on, you may wish to add a one-line summary of the content item as a metadatum
called abstract. I'll leave it out of the examples, just to keep them
simple.
Metadata should always be referred to in $[square
brackets]. I'll explain why in the next section.
Naming The Output URLs
Finally, you've assembled all the content items; now to tell WebMake
where they should go. This is accomplished using the <out> tag.
Each output URL, in this example, requires the following content items:
As you can see, both title and page_text rely on which output URL is
being written, otherwise you'll wind up with lots of finished pages containing
the same text. ;)
There are several ways to deal with this.
-
Set a variable in the <out> text, using <{set}>, to the name
of the content item that should be used for the page_text.
-
Derive the correct value for page_text using the name of the
<out> section itself.
The easiest way is the latter. WebMake defines a built-in "magic" variable,
${WebMake.OutName}, which contains the name
of the output URL. (Note that output URLs have both a name and a
filename; you'll see why in the next section.)
To do this, define another content item:
<content name=out_helper>
<{set page_text="${${WebMake.OutName}.txt}" }>
${page_template}
</content>
As you can see, this takes the name of the output URL, appends .txt to it,
and sets a variable called page_text to contain the content item named
thereby.
BTW: you could simply skip defining this "helper" content item
altogether, and just go to the top of the file and change the template
to refer directly to ${${WebMake.OutName}.txt}
instead of ${page_text} . That's what I usually do.
But what about the title? Handily, since we defined the titles as metadata,
and referred to them as $[this.title] in page_template,
this is taken care of; once the ${page_text} reference is
expanded, $[this.title] will be set.
What's With the Square Brackets?
Remember I mentioned that metadata should always be referred to in
$[square brackets]? Here's why. Square bracket
references, or deferred references, are
evaluated only after normal, "squiggly bracket" content references.
The example page contains the following content references:
Since ${page_text} is a normal content reference, it will be
expanded first; and when it's expanded, the <wmmeta> tag setting
title will be encountered. This will cause this.title to be set.
Once all the normal content references are expanded, WebMake runs through
the deferred references, causing $[this.title] to
be expanded.
If page_template had used a normal content reference to refer to
${this.title}, WebMake would have tried to expand it before
${page_text}, since it appeared in the file earlier.
Anyway, I digress.
Writing The <out> Tags
Each output URL needs an <out> tag, with a name and a file. The
name provides a symbolic name which one can use to refer to the URL; the
file names the file that the output should be written to.
Typically the name should be similar to the page's main content item's name,
to keep things simple and allow the shortcut detailed in the previous section
to work.
Also, sites typically use a pretty similar filename to the name, for obvious
reasons. At least, they do, to start with; further down the line, you may
need to move one (or more) pages around in the URL or directory hierarchy;
since you've been referring to them by name, instead of by URL or by filename,
this means changing only one attribute in the <out> tag, instead of
trying to do a global search and replace throughout hundreds of HTML files.
Anyway, here's a sample <out> tag:
<out name="document1" file="document1.html"> ${out_helper} </out>
But what about multiple outputs? Two choices:
-
Simply list all the output HTML files, one after the other.
Works fine for small sites, and it's simple.
-
Use a <for> tag.
I don't think you need to see how 1. works, it's pretty obvious.
Let's see how 2. does it:
<for name="page" values="document1 document2 document3">
<out name="${page}" file="${page}.html"> ${out_helper} </out>
</for>
Simple.
Putting <out> Names To Work
So you've named the output URLs. However all your content items contain
static URLs in the HREFs! Let's fix that.
This really is up to you; it's a global search-and-replace. Let's say
you want to fix all links to "document1.html". Replace this:
<a href="document1.html">foo</a>
with an URL reference, like this:
<a href="$(document1)">foo</a>
Now, even if "document1.html" is renamed to "blah/whatever/doc1.cgi", you
won't have to do a search-and-replace again.
Getting Advanced - Adding Navigation and a Sitemap
This hasn't been written yet. Sorry! (TODO)
Tips On Using WebMake
Editor/IDE Support
The root directory of the WebMake distribution includes a Vim rc file
to support syntax-highlighting for WebMake. To use it, make a directory
called .vim in your home directory, copy it there, and add the following
lines to your .vimrc:
au BufNewFile,BufReadPost *.wmk so $HOME/.vim/webmake.vim
map ,wm :w!<CR>:! /usr/local/bin/webmake -R %<CR>
Change /usr/local/bin/webmake to whatever the real path to the webmake
command is.
Once you do this, the macro sequence ,wm will cause a rebuild of the site
which contains the file you're currently editing. In addition, opening a
file called something.wmk will automatically use WebMake syntax
highlighting (if you have syntax highlighting enabled in VIM).
The Button
WebMake now includes a WebMake button:
Feel free to include it on your pages; but please, if possible, add it with a
href to http://webmake.taint.org/, so people who are curious can find out more
about WebMake.
It's 88 pixels wide and 31 high, by the way. If you look in the "images"
directory of the distribution, there's also an 130x45 one and a 173x60 one.
To make things really easy, here's some cut-and-paste HTML
for the image:
<a href="http://webmake.taint.org/"><img
src="http://webmake.taint.org/BuiltWithWebMake.png"
width="88" height="31" border="0" /></a>
Contributors to WebMake
Here's a list of people who've contributed to WebMake:
-
Justin Mason <jm /at/ jmason.org>: original author and maintainer
-
Mark McLoughlin <mark /at/ skynet.ie>: added perlout directive,
fixes to HTML cleaner
-
Caolan McNamara <caolan /at/ csn.ul.ie>: EtText contributions; lists,
pre-formatted text, lots of suggestions
-
Jan Hudec <bulb /at/ ucw.cz>: navtree plugin, patches to remove
metadata from site mapping and control mapping of media items
-
Matthew Clarke <clamat /at/ van.maves.ca>: doco fix for datasource
documentation
-
rudif /at/ bluemail.ch: lots of help with supporting Windows
Thanks all! Patches and suggestions are welcomed -- send them in!
(By the way, patch contributors get listed at the top, 'cos patches save
me writing the code ;)
Contents for the 'Tags and Their Attributes' section
The <webmake> Tag
The <webmake> section is required in a WebMake file. Any text
before or after this section will be ignored.
In the current implementation, you can leave these tags
out, but it isn't advised; their requirement may be enforced later.
Example
<webmake>
[...WebMake file omitted...]
</webmake>
The <include> Tag
Arbitrary files can be included into the current WebMake file
using this tag. It has one attribute, file, which
names the file to include.
A set of libraries are available to include, distributed with
WebMake. See the Included Library Code section of the index
page for their documentation. However, these
should be loaded using the <use> tag instead of this
one.
Example
<include file="inc/footer.wmk" />
The <use> Tag
WebMake supports "plugin" libraries, which are generally other .wmk files or
Perl modules which can be loaded to extend WebMake's functionality.
For example, there are standard plugins to provide support for "download"
links, which allows links to files including their size, ownership
information, etc.; there's also a plugin which allows HTML tables to be
defined using a comma-separated value list.
It has one attribute, plugin, which names the plugin to load.
Plugins can be loaded from the WebMake perl library directory, or from the
user's home directory. The search path for a plugin is as follows:
The set of standard plugins are listed in the Included Library Code
section of the index page.
Example
<use plugin="safe_tag" />
The <content> Tag
The <content> tag has one required attribute: its name, which is used to
substitute in that section's text, by inserting it in other sections or out
tags in a curly-bracket reference, like so:
${foo}
The following attributes are supported. These can also be set using the
<attrdefault> tag.
-
format
-
This allows the user to define what format the content
is in. This allows markup languages other than HTML to be used;
webmake will convert to HTML format, or other output formats, as
required using the HTML::WebMake::FormatConvert module. The default
value is "text/html".
-
asis
-
This will block any interpretation of content or URL
references in the content item, until after it has been converted into
HTML format. This is useful for POD documentation, which may be
embedded inside a file containing other text; without "asis", the
text would be scanned for content references before the POD converter
stripped out the extraneous bits. The default value is "false".
-
map
-
Whether the content item should be mapped in a site
map, or not. The default value is "true".
-
up
-
The name of the content item which is this content item's
parent, in the site map.
-
isroot
-
Whether or not this content item is the root of the
site map. The default value is "false".
If you wish to define a number of content sections at once, they can be
searched for and loaded en masse using the <contents> tag.
Every content item can have metadata associated with it. See the
metadata documentation for details.
Defining Content Items On-The-Fly
The <{set}> processing instruction can be used to define small
pieces of content on the fly, from within other content or <out>
sections.
In addition, Perl code can create content items using the set_content()
function.
Using Content From Perl Code
Perl code can obtain the text of content items using the get_content()
function, and can treat content items as whitespace-separated lists using
get_list().
In addition, each content item has a range of properties and associated
metadata; the get_content_object() method allows Perl code to retrieve
an object of type HTML::WebMake::Content representing the content
item.
Example
<content name="foo" format="text/html">
<em>This is a test.</em>
</content>
<content name="bar" format="text/et">
Still Testing
-------------
So is this!
</content>
The <contenttable> Tag
Quite often, it's handy to define small (one-line) content items quickly, in
bulk, directly inside the WMK file itself. The <contenttable> tag
provides a good way to do this.
Firstly, pick a delimiter character, such as | . Set the delimiter
attribute to this character.
Next, list a table of content names and their values, separated by a delimiter
character, one name-value-pair per line.
Note: if you would prefer to load the content items from a separate
file, the <contents> tag is better suited.
Another note: this is not the way to define data about other content
items (in other words, metadata), such as titles, authorship, or brief
descriptions, as WebMake's built-in metadata support will not be available
in that case. Embedding the metadata into the content item using
<wmmeta> tags, or loading them in bulk using <metatable> tags,
should be used instead.
Example
<contenttable delimiter="|">
person1|Justin
person2|Catherine
person3|The cat
</contenttable>
The <contents> Tag
Content can be searched for using the <contents> tag, which allows you
to search a data source (directory, delimiter-separated-values file,
database etc.) for a pattern.
The attributes supported are listed on the data source page.
Apart from the fact that it loads many contents instead of one, it's otherwise
identical to the content tag; see that tag's documentation
for details on what attributes are supported.
Example
<contents src="file:raw/text" name=".../*.txt" format="text/et" />
<contents src="file:raw/html" name=".../*.html" format="text/html" />
The <media> Tag
WebMake allows you to refer to files and web pages symbolically, separating
the site layout from the URL structure, and avoiding later problems with
dangling links when a page's URL is changed. This is done using $(url_refs).
This works well for content items defined in WebMake, such as output files
defined using the <out> tag. However it is not handy
when dealing with a images or other files that are not
generated using WebMake.
Therefore media files, such as images, and external, non-WebMake-controlled
files, can be searched for using the <media> tag. This tag allows you to
search a data source (directory, etc.) for a pattern.
The attributes supported are listed on the data source page.
Note that data sources which do not map to files in a filesystem, or other
methods accessible to a web browser browsing your site, do not make sense for
the <media> tag; so, for example, the svfile: protocol is not
supported, as a web browser cannot load an image from a CSV file.
As a result, currently only one data source protocol can be used with the
<media> tag, namely file:.
Example
<media src="file:images" name=".../*.gif" />
<media src="file:images" name=".../*.jpg" />
Data Sources for the <contents> and <media> Tags
Contents or URLs can be searched for using the <contents>
or <media> tags, which allow you to search a data source
(directory, delimiter-separated-values file, database etc.) for a pattern.
Currently two data source protocols are defined, file: and svfile: .
More will probably follow, especially if other people contribute them, hint
hint ;)
file: is the default protocol, if none is specified.
Attributes Supported By Datasource Tags
-
src
-
All datasources require this attribute, which
specifies a protocol and path, in a URL-style syntax:
protocol:path
-
name
-
This attribute is used to specify the pattern of data,
under this path, which will be converted into content items. The part
of the data's location which matches this name pattern will become the
name of the item. Typically, glob patterns, such as "*.txt" or
".../*.html" are used.
-
prefix
-
The items' names can be further modified by specifying
a prefix and/or suffix; these strings are prepended or
appended to the raw name to make the name the content is given.
-
suffix
-
See above.
-
namesubst
-
a Perl-formatted s// substitution, which is used
to convert source filenames to content names.
-
nametr
-
a Perl tr// translation, which is used to convert
source filenames to content names.
-
listname
-
a name of a content item. This content item will be
created, and will contain the names of all content items picked up by
the <contents> or <media> search.
In addition, the attributes supported by the content tag can
be specified as attributes to <contents>, including
format, up, map, etc.
The content blocks picked up from a <contents> search can
also contain meta-data, such as headlines, visibilty dates, workflow approval
statuses, etc. by including metadata.
The file: Protocol
The file: protocol loads content from a directory; each file is made into one
content chunk. The src attribute indicates the source directory, the
name attribute indicates the glob pattern that will pick up the
content items in question. The filename of the file will be used as the
content chunk's name.
<contents src="stories" name="*.txt" />
Note that the files in question are not actually opened until their content
chunks are referenced using ${name} or get_content("name").
Normally only the top level of files inside the src directory are added to
the content set. However, if the name pattern starts with .../,
the directory will be searched recursively:
<contents src="stories" name=".../*.txt" />
The resulting content items will contain the full path from that directory
down, i.e. the file stories/dir1/foo/bar.txt exists, the example
above would define a content item called
${dir1/foo/bar.txt}.
The svfile: Protocol
The svfile: protocol loads content from a delimiter-separated-file; the
src attribute is the name of the file, the name is the glob
pattern used to catch the relevant content items. The namefield
attribute specifies the field number (counting from 1) which the name
pattern is matched against, and the valuefield specifies the number of
the field from which the content chunk is read. The delimiter
attribute specifies the delimiter used to separate values in the file.
<contents src="svfile:stories.csv" name="*"
namefield=1 valuefield=2 delimiter="," />
Adding New Protocols
New data sources for <contents> and <media> tags are added by
writing an implementation of the DataSourceBase.pm module, in the
HTML::WebMake::DataSources package space (the
lib/HTML/WebMake/DataSources directory of the distribution).
Every data source needs a protocol, an alphanumeric lowercase identifier
to use at the start of the src attribute to indicate that a data source is
of that type.
Each implementation of this module should implement these methods:
-
new ($parent)
-
instantiate the object, as usual.
-
add ()
-
add all the items in that data source as content
chunks. (See below!)
-
get_location_url ($location)
-
get the location (in URL
format) of a content chunk loaded by add() .
-
get_location_contents ($location)
-
get the contents of the
location. The location, again, is the string provided by add() .
-
get_location_mod_time ($location)
-
get the current modification
date of a location for dependency checking. The location, again, is
in the format of the string provided by add() .
Notes:
-
If you want add() to read the content immediately, call
$self->{parent}->add_text ($name, $text, $self->{src},
$modtime) .
-
add() can defer opening and reading content chunks straight away.
If it calls $self->{parent}->add_location ($name, $location,
$lastmod) , providing a location string which starts with the data
source's protocol identifier, the content will not be loaded until
it is needed, at which point get_location_contents() is called.
-
This location string should contain all the information needed to
access that content chunk later, even if add() was not been
called. Consider it as similar to a URL. This is required so that
get_location_mod_time() (see below) can work.
-
All implementations of add() should call $fixed =
$self->{parent}->fixname ($name); to modify the name of each
content chunk appropriately, followed by
$self->{parent}->add_file_to_list ($fixed); to add the content
chunk's name to the filelist content item.
-
Data sources that support the <media> tag need to implement
get_location_url , otherwise an error message will be output.
-
Data sources that support the <contents> tag, and defer
reading the content until it's required, need to implement
get_location_contents , which is used to provide content from a
location set using $self->{parent}->add_location() .
-
Data sources that support the <contents> tag need to implement
get_location_mod_time . This is used to support dependency
checking, and should return the modification time (in UNIX
time_t format) of that location. Note that since this is used
to compare the modification time of a content chunk from the
previous time webmake was run, and the current modification time,
this is called before the real data source is opened.
The <for> Tag
The <for> tag provides a quick way to iterate through a list of items.
It requires two attributes, name and values; the content item named
name is set to each space-separated value in the values string, and
the text inside the block is processed.
Supported Attributes
-
name
-
The name of the variable which will be set to each
value in the values list in turn (if you know your comp-sci
lingo, the iterator).
-
values
-
A space-separated list of values which is iterated
through.
-
namesubst
-
A Perl s/// substitution; each value in the values
list will be processed by this, if set.
Variable references to ${name} are processed immediately, so
you can use this variable inside another variable reference, like this:
${all_${name}_text} .
Example
Here's an example, taken from my own home site:
<!-- Create output for files in top dir -->
<for name="out" values="index contact work nonwork home">
<out file="${out}.html" name="${out}">
${jmason_template}
</out>
</for>
The <out> Tag
The <out> tag is used to generate output. Surprise!
It has one required attribute -- file, which defines the output file
generated by this section. In addition it has some optional attributes, as
follows:
-
name
-
which is used to substitute in that section's URL address, by
inserting it in other sections or out tags in a URL reference, like
so: $(out_foo) .
More optional attributes are as follows. These ones also pick up defaults
from the <attrdefault> tag.
-
format
-
which defines the format the output is expected in
(MIME-style). The default is text/html.
-
clean
-
specifies which features of the HTML cleaner
to use. The HTML cleaner is a powerful filter which can polish grotty,
messy HTML into fully-standards-compliant glory. The default value
is all.
-
ismainurl
-
Whether this output file should be used as a "main
URL" for any content items used within it, to support the url magic
metadatum. If you plan to have multiple output styles for
your content, be sure to set "ismainurl=false" on the pages which use
"alternative" styles. The default value is true.
Perl code can also access out URLs using the get_url() function.
The production of multiple out files that are more-or-less identical can be
automated using the <for> tag.
Output and Dependencies
Out files will not be generated if the resulting text has not changed from the
previous run, or if the content sections it depends on have not changed.
The latter functionality is accomplished by caching the modification dates of
each file from which content was read to generate the output file. If:
-
the output file exists,
-
none of the files are newer than they were last time the output
file was written,
-
none of them are newer than the output file itself, and
-
none of the content items contain dynamic content, such as Perl
code or sitemaps,
then it does not need to be rebuilt.
Note: the -r switch to webmake, or the risky_fast_rebuild
option to the HTML::WebMake::Main constructor, indicates that
WebMake can take some risks when rebuilding. If this is on, then
step 4. from the list above is ignored.
Example
<out name="index" file="index.html">
${header}
${index_text}
${footer}
</out>
The <sitemap> Tag
The <sitemap> tag is used to generate a content item containing a map,
in a tree structure, of the current site.
It does this by traversing every content item you have defined, looking for
one tagged with a isroot=true attribute. This will become the root of the
site map tree.
While traversing, it also searches for content items with a metadatum called up. This is used to tie all the content together into a
tree structure.
Note: content items that do not have an up metadatum are considered
children of the root by default. If you do not want to map a piece of
content, declare it with the attribute map=false.
By default, the content items are arranged by their score and title metadata
at each level. The sort criteria can be overridden by setting the
sortorder attribute.
Note: if you wish to include external HTML pages into the sitemap, you
will need to load them as URL references using the <media> tag and use
the <metatable> tag to associate metadata with them.
t/data/sitemap_with_metatable.wmk in the WebMake test suite demonstrates
this. This needs more documentation (TODO).
The <sitemap> tag takes the following required attributes:
-
name
-
The name of the sitemap item, used to refer to it
later. Sitemaps are referred to, in other content items or in out
files, using the normal ${foo} style of content reference.
-
node
-
The name of the content item to evaluate for each
node with children in the tree. See Processing, below.
-
leaf
-
The name of the content item to evaluate for each leaf
node, ie. a node with no children, in the tree. See Processing,
below.
And the following optional attributes:
-
rootname
-
The root content item to start traversing at. The
default root is whichever content item has the isroot attribute
set to true.
-
all
-
Whether or not all content items should be mapped.
Normally dynamic content, such as metadata and perl-code-defined
content items, are not included. (default: false)
-
dynamic
-
The name of the content item to evaluate for
dynamic content items, required if the all attribute is set
to true.
-
grep
-
Perl code to evaluate at each step of the tree.
See the Grep section below.
-
sortorder
-
A sort string specifying what metadata
should be used to sort the items in the tree, for example "section
score title".
Note that the root attribute is deprecated; use rootname instead.
The sitemap can be declared either as an empty element, with /> at the
end, or with a pair of starting and ending tags and text between. If the
sitemap is declared using the latter style, any text between the tags will be
prepended to the generated site map. It's typically only useful if you wish
to set metadata on the map itself.
Processing
Here's the key to sitemap generation. Once the internal tree structure of the
site has been determined, WebMake will run through each node from the root
down up to 20 levels deep, and for each node, evaluate one of the 3 content
items named in the <sitemap> tag's attributes:
-
node: For pages with pages beneath them;
-
leaf: For "leaf" pages with no pages beneath them;
-
dynamic: For dynamic content items, defined by perl code
or metadata.
By changing the template content items you name in the tag's attributes, you
have total control over the way the sitemap is rendered.
The following variables (ie. content items) are set for each node:
-
name
-
the content name
-
title
-
the content's Title metadatum, if set
-
score
-
the content's Score metadatum, if set
-
list
-
the text for all children of this node (node
items only)
-
is_node
-
whether the content is a node or a leaf (1 for
node, 0 for leaf)
In addition, the following URL reference is set:
-
url
-
the first URL listed in a WebMake <out> tag
to refer to the content item.
Confused? Don't worry, there's an example below.
Grep
The grep attribute is used to filter which content items are included in
the site map.
The "grep" code is evaluated once for every node in the sitemap, and $_
is the name of that node; you can then decide to display/not display it, as
follows.
$_ is set to the current content item's name. If the perl code returns 0,
the node is skipped; if the perl code sets the variable $PRUNE to 1, all
nodes at this level and below are skipped.
Example
If you're still not sure how it works, take a look at examples/sitemap.wmk
in the distribution. Here's the important bits from that file.
Firstly, two content items are necessary -- a template for a sitemap node, and
a template for a leaf. Note the use of $(url),
${title}, etc., which are filled in by the sitemap code.
<content name=sitemapnode map=false>
<li>
<a href=$(url)>${title}</a>: $[${name}.abstract]<br>
<!-- don't forget to list the sub-items -->
<ul> ${list} </ul>
</li>
</content>
And the template for the leaf nodes. Note that the ${list}
reference is not needed here.
<content name=sitemapleaf map=false>
<li>
<a href=$(url)>${title}</a>: $[${name}.abstract]<br>
</li>
</li>
</content>
Finally, the sitemap itself is declared.
<sitemap name=mainsitemap node=sitemapnode leaf=sitemapleaf />
From then on, it's just a matter of including the sitemap content item in
an output file:
<out name=map file=sitemap_html/map.html>
${header}${mainsitemap}${footer}
</out>
And that's it.
This documentation includes a sitemap, by the way. It's used to generate
the navigation links. Take a look here.
The <navlinks> Tag
A common site structure strategy is to provide Back, Forward and
Up links between pages. This is especially frequent in papers or
manuals, and (as you can see above) is used in this documentation.
WebMake supports this using the <navlinks> tag.
To use this, first define a sitemap. This tells WebMake how to order the page
hierarchy, and which pages to include.
Next, define 3 content items, one for previous, one for next and one
for up links. These should contain references to ${url}
(note: not $(url)), which will be replaced with the URL for
the next, previous, or parent content item, whichever is applicable for the
direction in question.
Also, references to ${name} will be expanded to the name of the
content item in that direction, allowing you to retrieve metadata for that
content like so: $[${name}.title] .
You can also add content items to be used when there is no previous,
next or up content item; for example, the "top" page of a site has
no up content item. These are strictly optional though.
Then add a <navlinks> tag to the WebMake file as follows.
<navlinks name=mynavlinks map=sitemapname
up=upcontentname
next=nextcontentname
prev=prevcontentname
noup=noupcontentname
nonext=nonextcontentname
noprev=noprevcontentname>
content text
</navlinks>
The content text acts just like a normal content item, but references to
${nexttext}, ${prevtext} or ${uptext}
will be replaced with the appropriate content item; e.g. ${uptext}
will be replaced by either ${upcontentname} or
${noupcontentname} depending on if this is the top page or
not.
You can then add references to $[mynavlinks] in
other content items, and the navigation links will be inserted.
Note: navlinks content items must be included as a deferred
reference!
Attribute Reference
These are the attributes accepted by the <navlinks> tag.
-
name
-
the name of the navigation-links content item.
Required.
-
map
-
the name of the sitemap used to determine page
ordering. Required.
-
up
-
the name of the content item used to draw Up
links. Required.
-
next
-
the name of the content item used to draw Next
links. Required.
-
prev
-
the name of the content item used to draw Prev
links. Required.
-
noup
-
the name of the content item used when there is
no Up link, ie. for the page at the top level of the
site. Optional -- the default is an empty string.
-
nonext
-
the name of the content item used when there is
no Next link, ie. the last page in the site.
Optional -- the default is an empty string.
-
noprev
-
the name of the content item used when there is
no Prev link, ie. for the first page in the site.
Optional -- the default is an empty string.
Example
This will generate an extremely simple set of <a href> links, no frills.
The sitemap it uses isn't detailed here; see the sitemap documentation for details on how to make a site map.
<content name=up><a href=${url}>Up</a></content>
<content name=next><a href=${url}>Next</a></content>
<content name=prev><a href=${url}>Prev</a></content>
<navlinks name=name map=sitemapname up=up next=next prev=prev>
${prevtext} | ${uptext} | ${nexttext}
</navlinks>
The <breadcrumbs> Tag
Another common site navigation strategy is to provide what Jakob Nielsen has
called a "breadcrumb trail". The <breadcrumbs> tag supports this.
WTF Is A Breadcrumb Trail?
The "breadcrumb trail" is a piece of navigation text, displaying a list of
the parent pages, from the top-level page right down to the current page.
You've probably seen them before; take a look at this Yahoo
category for an example.
To illustrate, here's an example. Let's say you're browsing the Man Bites
Dog story in an issue of Dogbiting Monthly, which in turn is part of the
Bizarre Periodicals site. Here's a hypothetical breadcrumb trail for that
page:
Bizarre Periodicals : Dogbiting Monthly : Issue 24 : Man
Bites Dog
Typically those would be links, of course, so the user can jump right back to
the contents page for Issue 24 with one click.
If you have a site that contains pages that are more than 2 levels deep from
the front page, you should consider using this to aid navigation.
How To Use It With WebMake
To use a breadcrumb trail, first define a sitemap. This tells WebMake how to
order the page hierarchy, and which pages to include.
Next, define a content item to be used for each entry in the trail. This
should contain references to ${url} (note: not $(url)), which will be replaced with the URL for the page in
question; and ${name}, which will be expanded to the name of the
"main" content item on that page, allowing you to retrieve metadata for that
content like so: $[${name}.title] .
Note: the "main" content item is defined as the first content
item on the page which is not metadata, not perl-generated code, and
has the map attribute set to "true".
You can also define two more content items to be used at the top of the
breadcrumb trail, ie. the root page, and at the tail of it, ie. the
current page being viewed. These are optional though, and if not specified,
the generic content item detailed above will be used as a default.
Then add a <breadcrumbs> tag to the WebMake file as follows.
<breadcrumbs name=mycrumbs map=sitemapname
top=topcontentname
tail=tailcontentname
level=levelcontentname />
The top and tail attributes are optional, as explained above.
The level attribute, which names the "generic" breadcrumb content
item to use for intermediate levels, is mandatory.
You can then add references to $[mycrumbs] in
other content items, and the breadcrumb-trail text will be inserted. Note!
be sure to use a deferred reference, or the links may not appear!
Attribute Reference
These are the attributes accepted by the <breadcrumbs> tag.
-
name
-
the name of the breadcrumb-trail content item.
Required.
-
map
-
the name of the sitemap used to determine page
hierarchy. Required.
-
level
-
the name of the content item used to draw links at the
intermediate levels of the trail. Required.
-
top
-
the name of the content item used to draw the link to
the top-most, or root, page. Optional -- level will be used as a
fallback.
-
tail
-
the name of the content item used to draw the link to
the bottom-most, currently-viewed page. Optional -- level will be
used as a fallback.
Example
This will generate an extremely simple set of <a href> links, no frills.
The sitemap it uses isn't specified here; see the sitemap tag documentation for details on how to generate a site map.
<content name=btop map=false>
[ <a href=${url}>$[${name}.title]</a> /
</content>
<content name=blevel map=false>
<a href=${url}>$[${name}.title]</a> /
</content>
<content name=btail map=false>
<a href=${url}>$[${name}.title]</a> ]
</content>
<breadcrumbs map=sitemapname name=crumbs
top=btop tail=btail level=blevel />
The <cache> Tag
The <cache> tag takes one attribute, dir, which names the
directory where the WebMake site cache is kept.
WebMake will store data about the site in this directory in order
to speed up later rebuilds of the site.
The following special characters and escapes are supported:
-
~
-
the user's home directory on UNIX.
-
%u
-
the user's username.
-
%f
-
.wmk filename, non-alphanums replaced with _ .
-
%F
-
.wmk full path, non-alphanums replaced with _ .
-
%l
-
perl lib dir for plugins.
The default setting is ~/.webmake/%F .
Example
<cache file="../webmake.cache" />
The <option> Tag
The <option> tag takes two attributes:
-
name
-
The name of the option;
-
value
-
The value to set it to.
Example
<option name="FileSearchPath" value="../files" />
Defining Tags
Like Roxen or Java Server Pages, WebMake allows you to define your own tags;
these cause a perl function to be called whenever they are encountered in
either content text, or inside the WebMake file itself.
Defining Content Tags
You do this by calling the define_tag() function from
within a <{perl}> section in the WebMake file. This
will set up a tag, and indicates a reference to the handler function to call
when that tag is encountered, and the list of attributes that are required to
use that tag.
Any occurrences of this tag, with at least the set of attributes defined in
the define_tag() call, will cause the handler function to be called.
Handler functions are called as fcllows:
handler ($tagname, $attrs, $text, $perlcodeself);
Where $tagname is the name of the tag, $attrs is a reference
to a hash containing the attribute names and the values used in the tag, and
$text is the text between the start and end tags.
$perlcodeself is the PerlCode object, allowing you to write proper
object-oriented code that can be run in a threaded environment or from
mod_perl. This can be ignored if you like.
Note that there are two variations, one for conventional tag pairs with a
start and end tag, the other for stand-alone empty tags with no end tag. The
latter variation is called define_empty_tag() .
-
define_empty_tag()
-
define a standalone
content tag
-
define_tag()
-
define a content tag with a
start and end
Defining WebMake Tags
This is identical to using content tags, above, but the functions are as
follows:
-
define_empty_wmk_tag()
-
define a
standalone WebMake tag
-
define_wmk_tag()
-
define a WebMake tag
with a start and end
Example
Let's say you've got the following in your WebMake file.
<{perl
define_tag ("thumb", \&make_thumbnail, qw(img thumb));
}>
<content name="foo">
<thumb img="big.jpg" thumb="big_thumb.jpg">
Picture of a big thing
</thumb>
</content>
When the foo content item comes to be included in an output file, the tag
will be replaced with a call to a perl function, as follows:
make_thumbnail ("thumb",
{ img => 'big.jpg', thumb => 'big_thumb.jpg' },
'Picture of a big thing', $perlcodeself);
Note that if the tag omitted one of the 2 required attributes, img or
thumb, it would result in an error message.
For more serious examples of tag definition, the WebMake distribution comes
with several plugins, such as safe_tag.wmk which define their own tags.
Contents for the 'Processing Logic' section
The Order of Processing
In order to fully control the WebMake file processing using Perl code, it's
important to know the order in which the tags and so on are parsed.
Parsing of the WebMake File
Initially, WebMake used a set order of tag parsing, but this proved to be
unwieldy and confusing. Now, it uses the order in which the tags are defined
in the .wmk file, so if you want tag A to be interpreted before tag B, put A
before B and the right thing will happen.
Perl code embedded inside the WebMake file, using <{perl}> processing directives, will be evaluated there
and then (unless the <{perl}> block is embedded in another block, such
as a content item or <out> file block).
This means that you can define content items by hand, search for other content
items using a <contents> tag, and then use a <{perl}> section to define a list of all content items
which satisfy a particular set of criteria.
This list can then be used in later <{perl}> blocks, content references, or <for> tags.
Processing the <out> Tags
Once the file is fully parsed, the <out> tags are
processed, one by one.
At this point, content references, <{set}> tags, and
<{perl}> processing directives will be interpreted,
if they are found within content chunks. Finally, deferred content references
and metadata references are expanded.
Eventually, no content references, <{set}> tags, <{perl}> processing directives, metadata references, or
URL references are left in the file text. At this point, the file is written
to disk under a temporary name, and the next output file is processed.
Once all output files are processed, the entire set of files which have
been modified are moved into place, replacing any previous versions.
The <{set}> Directive
Small pieces of content can be set from within other content chunks or
<out> sections using the <set> directive. The format is
<{set name="value"}>
This can be useful to set small chunks of text, by including a <{set}> directive in the content item that uses them.
For example, a common use of <{set}> is to define, ahead of
time, what text should be inserted into a template:
<{set template_body="${foo.txt}"}>
${bar_template}
Note: Order of Content Reference Processing
The processing of content references starts at each <out> URL in turn, and descends from the chunk of text
defined for that file, replacing each ${content_ref} and $(url_ref) one-by-one, in a depth-first manner.
Finally, the tree-traversal starts again from the chunk of <out> text,
searching for $[deferred_content
refs].
Therefore if you wish to <{set}> a variable, let's say x, in a chunk
of content that will not be loaded before x is accessed, you should use
a $[deferred content ref] to
access it.
How <{set}> Relates To Meta-data
The <{set}> directive was implemented before metadata was, and initially
provided a way to do similar things, such as substitute page titles, etc.
Now, however, it's probably better to use <wmmeta> tags to
handle data that is associated with a content-item. Using <wmmeta> tags
means your pages will be able to take advantage of new features, like index
and site-map generation.
The <{set}> directive is retained as a way of quickly setting content
items from within other content, in case this feature proves useful for other
purposes.
The <{perl}> Directives
Arbitrary perl code can be executed using this directive.
It works like perl's eval command; the return value from the perl block is
inserted into the file, so a perl code block like this:
<{perl
$_ = '';
for my $fruit (qw(apples oranges pears)) {
$_ .= " ".$fruit;
}
$_;
}>
will be replaced with the string " apples oranges pears". Note that the
$_ variable is declared as local when you enter the perl block,
you don't have to do this yourself.
If you don't like the eval style, you can use a more PHP/JSP/ASP-like
construct using the perlout directive, which replaces the perl code text
with anything that the perl code prints on the default output filehandle, like
so:
<{perlout
for my $fruit (qw(apples oranges pears)) {
print " ", $fruit;
}
}>
Note that this is not STDOUT, it's a local filehandle called $outhandle .
It is selected as the default output handle, however.
<{perl}> sections found at the top level of the
WebMake file will be evaluated during the file-parsing pass, as they
are found.
<{perl}> sections embedded inside content chunks
or other tagged blocks will be evaluated only once they are referenced.
Perl code can access content variables and URLs using the library functions provided.
The library functions are available both as normal perl functions in the
default main package, or, if you want to write thread-safe or mod_perl-safe
perl code, as methods on the $self object. The $self
object is available as a local variable in the perl code block.
A good example of perl use inside a WebMake file can be found in the
news_site.wmk file in the examples directory.
Globs and Regexps
A number of WebMake parameters and perl APIs support pattern matching.
This is performed using glob patterns and regular expressions.
Glob Patterns
These are more-or-less traditional shell- or MS-DOS-like globs, as follows:
*
|
matches any number of characters except /
|
...
|
matches any number of characters, including /
|
?
|
matches one character
|
This is the default mode of matching. Example globs are:
*.html, .../*.txt.
Regular Expressions
These are perl-style regular expressions. They are differentiated
from glob patterns by prefixing them with RE:, for example:
RE:^.*\.html$ .
An introduction to regular expressions is beyond the scope of this
documentation. For more details, check your perl documentation, or search the
web.
Sorting Lists of Content Items
Frequently, you will need to get a list of content items in sorted order.
WebMake itself does this for the sitemap tag, among others.
Sorting is typically performed using a content item's metadata; some metadata
that are especially useful are:
-
score
-
A number representing the "priority" of a content
item; specifically intended for use when sorting. Defaults to 50
if unset.
-
title
-
The title of a content item. Handy for alphabetic
lists. Defaults to (Untitled) if not set.
-
declared
-
The item's declaration order. This is a number
representing when the content item was first encountered in the
WebMake file; earlier content items have a lower declaration order.
You do not need to set this; WebMake will do so automatically.
-
mtime
-
The modification date, in UNIX time_t
seconds-since-the-epoch format, of the file the content item was
loaded from.
-
name
-
The name of the content item.
WebMake provides a built-in mechanism to allow easy sorting of content items,
called a sort spec or sort string.
This is typically used either with the Perl code library's
sort_content_objects() call, or using a
sortorder attribute as the sitemap tag does.
A sort string is a text string, containing a space-separated list of metadata
items. The first entry in the list is the main sorting criterion; the second
entry is then used to break deadlocks if two entries match for the main
criterion, etc.
In addition, a metadata item can be prefixed with a ! , to reverse its
order.
Example
score title: sort by score, and if two content items have the same
score, sort by title.
declared: sort by the order in which they were declared in the WebMake
file.
score title !mtime: sort by score and title, and if more than one content
item have the same score and title, sort them into oldest-first order.
Contents for the 'Variable References' section
${content_refs} - References to Content Chunks
Content chunks and variables can be referred to using this format. This is
evaluated before any other variable reference is.
${name}
Content chunks can refer to other chunks, URLs, or use deferred references,
up to 30 levels deep.
If you wish to refer to a content item or variable, but are not sure if it
exists, you can provide a default value by following the content name
with a question mark and the default value.
${name?defaultvalue}
$(url_refs) - References to URLs
URLs of defined <out> sections and <media> items can be inserted
into the current content using this reference format.
$(name)
Note that all URL references are written relatively; so a file created in the
foo/bar/baz subdirectory which contains a URL reference to
blah/argh.html will be rewritten to refer to ../../../blah/argh.html.
Again, if you're not sure a URL exists, a default value can be supplied,
using this format:
$(name?defaultvalue)
$[deferred_content refs] - Deferred Content References
These are identical to ${content_refs},
but are evaluated only after all other references.
$[name]
This means that a content variable can be set at the end of an <out> section, but referred to at the start, for example. Handy for HTML page
titles.
In addition, this is the recommended way to access metadata set using the
wmmeta tag.
Again, a default value can be supplied, using this format:
$[name?defaultvalue]
Contents for the 'Meta Tags and Meta-Data' section
Metadata
What Is Metadata?
Everyone is familiar with data, but the term meta-data is not so familiar.
Here's a brief primer.
To illustrate, I'll use an example familiar to most readers. Most computer
operating systems nowadays have the concept of files in a filesystem. If you
consider the files as data, then details such as file size, modification
times, username of the owner etc. are metadata, ie. data about the files.
In WebMake, metadata is used to refer to properties of textual content items.
For example, a newspaper article may have a title, an abstract (ie. a
brief summary), etc.
This kind of data is very useful for building indices and catalogues, in the
same way that Windows Explorer or the UNIX ls(1) command uses filesystem
metadata to display file listings. As a result, a good way to think of it is
as "catalog data", as opposed to "narrative data", which is what a normal
content item is. (thanks to Vaibhav Arya, vaibhav /at/ mymcomm.com, for that
analogy.)
How to Define Metadata
WebMake can load metadata from a number of sources:
-
Inferred from the content text itself: WebMake supports
"magic" metadata, which contains some inferred data about the
content, such as its last modification date (which can be inferred
from the filesystem storage of the content file itself). In
addition, title metadata can be inferred from several sources, such
as the <title> tag in HTML, or =head1 tags in POD
text.
-
Tags embedded within the content text: This is handled
using the <wmmeta> tag.
-
Set as defaults before the content items are defined: the
<metadefault> WebMake tag.
-
Defined in bulk and added to the content items: the
<metatable> tag.
Referring to Metadata
Metadata is referred to using the deferred content ref format:
$[content.metaname]
Where content is the name of the content item, and metaname is the
name of the metadatum. So, for example, $[blurb.txt.title]
would return the title metadatum from the content item blurb.txt.
Meta tag names are case-insensitive, for compatibility with HTML meta tags.
Any content chunk can access metadata from other content chunks within the
same <out> tag, using this as the content name, i.e.
$[this.title] . This is handy, for example, in setting the
page title in the main content chunk, and accessing it from the header chunk.
If more than one content item sets the same item of metadata inside the
<out> tag, the first one will take precedence.
The example files "news_site.wmk" and "news_site_with_sections.wmk"
demonstrate how meta tags can be used to generate a SlashDot or Wired
News-style news site. The index pages in those sites are generated
dynamically, using the metadata to decide which pages to link to, their
ordering, and the titles and abstracts to use.
How Do I Use Metadata In WebMake?
WebMake provides extra support for metadata in an efficient way. A
metadatum is like a normal content item, except it is exposed to all other
pages in the WebMake file. This data is accessible, both to other pages in
the site (as $[contentname.metaname]), and to other
content items within the same page (as
$[this.metaname]).
In addition, WebMake caches metadata in the site cache file between runs, so
that a subsequent partial site build will not require loading all the content
text, just to read a page title.
Note that content items representing metadata cannot, themselves, have
metadata.
What Metadata Should I Use?
The items marked (built-in) are supported directly inside WebMake, and used
internally for functionality like building site maps and indices. All the
other suggested metadata names here are just that, suggestions, which support
commonly-required functionality.
Also note that the names are case-insensitive, they're just capitalised here
for presentation.
-
Title
-
the title of a content item. The default title for
content items is inferred from the content text where possible,
or (Untitled) if no title can be found. (built-in)
-
Score
-
a number representing the "priority" of a content
item; used to affect how the item should be ranked in a list of
stories. The default value is 50. Items with the same score will
be ranked alphabetically by title. (built-in)
-
Abstract
-
a short summary of a content item.
-
Up
-
used to map the site's content; this metadata indicates the
content item that is the parent of the current content item. This metadatum
is used to generate dynamic sitemaps. (built-in)
-
Section
-
the section of a site under which a story should be
filed.
-
Author
-
who wrote the item.
-
Approved
-
has this item been approved by an editor; used to
support workflow, so that content items need to be approved before
they are displayed on the site.
-
Visible_Start
-
the start of an item's "visibility window",
ie. when it is listed on an index page. (TODO: define a recommended
format for this, or replace with DC.Coverage.temporal)
-
Visible_End
-
the end of an item's "visibility window",
ie. when it is listed on an index page.
-
DC.Publisher
-
a Dublin Core metadatum. The organisation or
individual that publishes the entire site.
The Dublin Core is a whole load of suggested metadata names and formats,
which can be used either to replace or supplement the optional metadata named
above. Regardless of whether you replace or supplement the metadata above
internally, it is definitely recommended to use the DC names for metadata
that's made visible in the output HTML through conventional HTML <meta>
tags.
Built-In Metadata
These are some built-in "magic" items of metadata that do not need to be
defined manually. Instead, they are automatically inferred by WebMake itself:
-
declared
-
the item's declaration order. This is a number
representing when the content item was first encountered in the
WebMake file; earlier content items have a lower declaration order.
Useful for sorting.
-
url
-
the first <out> URL which contains that content
item (you should order your <out> tags to ensure each stories'
"primary" page is listed first, or set ismainurl=false on the
"alternative" output pages, if you plan to use this). See also the
get_url() method on the HTML::WebMake::Content object.
-
is_generated
-
0 for items loaded from a <content> or
<contents> tag, 1 for items created by Perl code using the
add_content() function.
-
mtime
-
The modification date, in UNIX time_t
seconds-since-the-epoch format, of the file the content item was
loaded from. Handy for sorting.
Why Use Metadata
Support for metadata is an important CMS feature.
It is used by Midgard and Microsoft's SiteServer, and is available as
user-contributed code for Manila. It provides copious benefits
for flexible index and sitemap generation, and, with the addition of an
Approved tag, adds initial support for workflow.
It allows the efficient generation of site maps, back/forward
navigation links, and breadcrumb trails, and
enables index pages to be generated using Perl code easily and in a
well-defined way.
The <wmmeta> Tag
WebMake can load meta-data directly from the content text,
using the <wmmeta> tag.
This tag is automatically stripped from the content when the content is
referenced. It can be used either as an XML-style empty tag, similar to the
HTML <meta> tag, if it ends in />:
<wmmeta name="Title" value="Story 1, blah blah" />
or with start and end tags, for longer bits of content:
<wmmeta name="Abstract">
Story 1, just another story. Blah blah blah foo bar baz etc.
</wmmeta>
As you can see, each item of metadata needs a name and a value. The
latter format reads the value from the text between the start and end tags.
Example
<content name="foo">
< wmmeta name="Title" value="Foo" />
< wmmeta name="Abstract">
Foo is all about fooing.
</ wmmeta>
Foo foo foo foo bar. etc.
</content>
The <metadefault> Tag
Metadata is usually embedded inside a content item using the <wmmeta>
tag. However, this can be a chore for lots of content items, so to make
things easier, you can specify default metadata settings, using the
<metadefault> tag.
Specify this tag before the content items in question, and those content items
will all be tagged with the metadata you set.
Like the attrdefault tag, this tag can be used either in a
scoped mode, or in a command mode.
Scoped Mode
"Scoped" mode uses opening (<metadefault>) and closing
(</metadefault>) tags; the metadata is only set on content items
between the two tags.
Note! one warning about "scoped" mode: note that WebMake does not use
a fully-correct XML parser to parse the XML in the .wmk file, so if you nest
<metadefault> tags, it may not correctly parse them; instead, the
first closing </metadefault> tag found will be used.
Command Mode
Command mode uses standalone tags (<metadefault ... />); the
metadata are set until the end of the WebMake file, or until you change
them with another <metadefault> tag.
Attributes
-
name
-
the metadatum's name, e.g. Title, Section,
etc. This is required.
-
value
-
the metadatum's value. This is optional. If the
value is not specified, the metadatum will be removed from the list of
default metadata.
Example
Using the scoped style:
<metadefault name="section" value="tags_and_attributes">
<content name="chunk_1.txt">...</content>
<content name="chunk_2.txt">...</content>
<content name="chunk_3.txt">...</content>
<content name="chunk_4.txt">...</content>
</metadefault>
Or, in the "command" style:
<metadefault name="section" value="tags_and_attributes" />
<content name="chunk_1.txt">...</content>
<content name="chunk_2.txt">...</content>
<content name="chunk_3.txt">...</content>
<content name="chunk_4.txt">...</content>
<metadefault name="section" />
The <metatable> Tag
Metadata is usually embedded inside a content item using the <wmmeta>
tag. However, sometimes you may want to tag a content item with metadata from
outside, if the text of the content is not under your control; or you may want
to tag metadata to an object that is not text-based, such as an image.
The metatable tag allows you to do this, and in bulk. You list a table of
content names and the metadata you want to attach to each content item, in
tab-, comma-, pipe-separated-value, or XML format.
By default, the table is read from between the <metatable> and
</metatable> tags. However, if you set the src attribute,
the table will be read from the location specified, instead.
Delimiter-Separated-Value Format
Firstly, pick a delimiter character, such as | . Set the delimiter
attribute to this character.
Next, the first line of the metatable lists the metadata you wish to set; it
must start with the value . . This indicates to WebMake that it's
defining the metadata to be set.
Finally, list as many lines of metadata as you like; the first value on the
line is the name of the content item you wish to attach the metadata to. From
then on, the other values on the line are the values of the metadata.
So, for example, consider this table, from the WebMake documentation:
<metatable delimiter="|">
.|title|abstract
Main.pm|HTML::WebMake::Main|module documentation
PerlCodeLibrary.pm|HTML::WebMake::PerlCodeLibrary|module documentation
Content.pm|HTML::WebMake::Content|module documentation
EtText2HTML.pm|Text::EtText::EtText2HTML|module documentation
HTML2EtText.pm|Text::EtText::HTML2EtText|module documentation
webmake|webmake(1)|script documentation
ettext2html|ettext2html(1)|script documentation
ethtml2text|ethtml2text(1)|script documentation
</metatable>
This will set Main.pm.title to HTML::WebMake::Main,
Main.pm.abstract to module documentation, etc.
XML Format
The XML block is surrounded with a <metaset> tag, and contains
<target> blocks naming the content items the enclosed metadata items are
associated with.
Inside the <target> blocks, <meta> tags name each metadatum, and
enclose the metadatum's value.
<metaset>
<target id="foo.txt">
<meta name="title">
This is Foo.txt's title.
</meta>
</target>
</metaset>
Using <metatable> To Tag Non-Content Items
Often, you will need to attach metadata to non-content items, such as images,
or HTML files that are not generated by WebMake. Here's how to do this.
First, load the URLs of the items using a <media> tag. Then
create an empty content item for each one (possibly automated using the
<for> tag). The URLs from the <media> tag will
automatically take precedence over the URLs of the fake content items.
Then use a metatable, as above, to set the metadata you wish to use.
The <attrdefault> Tag
Attributes are usually specified inside a content item's <content> or <contents> tags, or, for output files, inside
the <out> tag. However, this can be a chore if you have many
items to set attributes on, so, to make things easier, you can specify default
attributes using the <attrdefault> tag.
Specify this tag before the content items or output files in question, and
those items will all be tagged with the attributes you set.
Like the metadefault tag, this tag can be used either in a
scoped mode, or in a command mode.
Scoped Mode
"Scoped" mode uses opening (<attrdefault>) and closing
(</attrdefault>) tags; the attributes are only set on content items
or output files between the two tags.
Note! one warning about "scoped" mode: note that WebMake does not use
a fully-correct XML parser to parse the XML in the .wmk file, so if you nest
<attrdefault> tags, it will not correctly parse them; instead, the
first closing </attrdefault> tag found will be used.
Command Mode
Command mode uses standalone tags (<attrdefault ... />); the
attributes are set until the end of the WebMake file, or until you change
them with another <attrdefault> tag.
Attributes
-
name
-
the attribute's name, e.g. up, map,
etc. This is required.
-
value
-
the attribute's value. This is optional. If the
value is not specified, the attribute will be removed from the list of
default attributes.
Example
Using the scoped style:
<attrdefault name="format" value="text/html">
<content name="chunk_1.txt">...</content>
<content name="chunk_2.txt">...</content>
<content name="chunk_3.txt">...</content>
<content name="chunk_4.txt">...</content>
</attrdefault>
Or, in the "command" style:
<attrdefault name="format" value="text/html" />
<content name="chunk_1.txt">...</content>
<content name="chunk_2.txt">...</content>
<content name="chunk_3.txt">...</content>
<content name="chunk_4.txt">...</content>
<attrdefault name="format" />
Contents for the 'Magic Variables' section
The ${IMGSIZE} Magic Variable
This reference provides an easy way to automatically add image size
information to an <img> tag, for example:
<img src="foo.gif" ${IMGSIZE}>
Would become:
<img src="foo.gif" height=30 width=11>
It requires the Image::Size Perl module be installed, otherwise
it does nothing.
The $(TOP/) Magic Variable
This URL reference always evaluates to a relative path to the top-level of the
site, for URLs.
Note that setting the EtTextHrefsRelativeToTop option will cause all URLs
in Text::EtText blocks, which don't start with a slash or a protocol
specification, to be made relative to the top-level of the site.
The ${WebMake.*} Magic Variables
WebMake defines several magic variables that expand to useful information about
the current environment. These are as follows. Each one is illustrated with
the value at the time this documentation was generated.
-
WebMake.Version
-
The version of WebMake
that generated this site. (1.2)
-
WebMake.GeneratorString
-
A generator string for
WebMake; this is in the form WebMake/v.vv where v.vv
is the version number of WebMake. (WebMake/1.2)
-
WebMake.Who
-
The username of the person who generated
the site. (jm)
-
WebMake.Time
-
The time the site was last generated.
(Thu May 3 13:01:51 2001)
-
WebMake.OutFile
-
The filename used in the current <out> tag.
(allinone.html)
-
WebMake.OutName
-
The name used in the current <out> tag.
(allinone)
-
WebMake.PerlLib
-
The directory WebMake expects to find
Perl code library files (ie. plugins) in. (/nethome/jm/ftp/webmake/lib/HTML/WebMake/PerlLib)
Contents for the 'Format Converters' section
The Text::EtText Format Converter
This converter converts from Text::EtText, a simple plain-text format, to
HTML. Like most simple text markup formats (POD, setext, etc.), EtText markup
handles the usual things: insertion of <P> tags, header recognition
and markup. However it adds a powerful link markup system.
EtText is no longer included in WebMake; instead it must be downloaded
separately from http://ettext.taint.org/, where there is also a more
detailed set of documentation. However, this page is retained in WebMake
for quick reference.
Basic Text Markup
If you leave blank lines between paragraphs, <p> and
</p> tags will be inserted in the correct places.
EtText does quite a good job of this.
Words wrap and fill automatically, so there's no need to worry about wrapping
before 80 characters. (It's good form to do so anyway, in case other people
ever need to edit your text, though.)
A paragraph consisting of a line of 10 or more consecutive - or _ signs will
be converted to a HR tag.
Sections of text between pairs of certain characters will be turned into
markup, as follows:
EtText
|
Tag Used
|
Result
|
**text**
|
<strong>
|
text
|
__text__
|
<em>
|
text
|
##text##
|
<code>
|
text
|
& signs that have whitespace on either side will be converted
to & signs automatically.
Text indented from the left margin will be converted into a <P>
paragraph wrapped in a <blockquote> -- unless it starts with a
* , - , + or o character
followed by whitespace, in which case it's interpreted as a list item; see
Lists below.
Another exception to the above rule is that text indented by only 1 space, or
on lines starting in the first column with two colon characters, will be
surrounded by <pre> tags.
If you find writing HTML tag-pairs manually annoying, EtText includes an idea
from Latte; balanced-tag generation. Wrap the text to be tagged with
the name of the tag followed immediately by a { character on the left, and a }
character on the right. In other words,
strong{text}
will be rendered as
<strong>text</strong>
or, in other words, text . This can be nested, so strong{text
with i{italic} bits} will be rendered as text with italic
bits.
In addition, the balanced-tag support has a bonus feature, in that it supports
CSS classes; follow the name of the tag with a full stop and the class, and
it will use that class, like so:
i.green{foo}
will be rendered as
<i class="green>foo</i>
Lists
A paragraph indented from the left margin (by either spaces or tabs, or both),
and starting with a * , - , + or
o character followed by whitespace, will be converted into a list
item (<li> tag).
The same goes for indented paragraphs that start with the string
1. , followed by whitespace. However the default list tag in this
case will be an <ol>...</ol> list. Any positive integer
followed immediately by a full stop and a space will do the trick. (BTW: I
used to use # to do this, but I preferred the WikiIdea, it
looks better.)
(Compatibility note: previous versions of EtText required that the
<ul> or <ol> tags be written manually. This is no
longer the case.)
Some text editors (such as vim) will reformat list items automatically,
assuming that you want the text to line up with the start of the text, instead
of the bullet-point character, on the previous line, like so:
- this is a list item. We should make sure that
blah blah etc. etc.
WebMake supports this.
Indented paragraphs that start with term: tab rest of paragraph will be converted
into definition lists (this is another StolenFromWikiIdea). They
look like this:
-
Foo
-
Blah blah blah etc.
Sidebars and Side Images
If you wish to display an image, or small sidebar, beside a paragraph of text,
use the <etleft> and <etright>
tags. These are rendered as a one-row, two-column
<table> wrapping the paragraph and the sidebar, as
follows:
<etleft><img src=bubba.png></etleft>This is the main
paragraph body. Foo bar baz blah blah blah etc.
Is displayed as:
|
This is the main paragraph body.
Foo bar baz blah blah blah etc.
|
<etright><img src=bubba.png></etright>This is the
main paragraph body. Foo bar baz blah blah blah etc.
Is displayed as:
This is the main paragraph body.
Foo bar baz blah blah blah etc.
|
|
When HTML and EtText Collide
HTML tags can be used freely throughout an EtText document. However, in some
situations, you may wish to preserve whitespace, avoid paragraph tags being
added, etc.; to use your own HTML without meddling from EtText, wrap it in an
<!--etsafe-->...<!--/etsafe-->
tag pair; this will protect it.
Note that text blocks wrapped in <pre>,
<listing> and <xmp> tags are
automatically protected in this way; the <!--etsafe-->
tag pair is not required.
EtText adds two entities, &etsqi; and &etsqo;. These represent
[ and ] respectively, and are used to protect a square-bracketed
piece of text from being interpreted as a link URL (see Link Markup
below).
EtText Links
As well as the standard <a href=url>...</a> link
specification used in HTML, EtText will automatically add href tags for URLs
and email addresses that occur in the text. In addition, EtText supports its
own link format, as follows.
The basic concept is of a word or "quoted set of words" followed by a link
label in [square brackets], like this: "this is a link"
[label].
The href used in the link is then defined at another point in the document, as
an indented line like this:
[label]: http://url...
Text and markup can be enclosed in the quotes, everything quoted will become
part of the link text. Single words or HTML tags do not need to be quoted, so
this will work correctly:
<img src="http://jmason.org/license_plate.jpg"
width="10" height="10" /> [homepage]
Glossary Links
EtText also supports a concept called glossary links; if you define a
link, the name of that link will automatically become a href if enclosed in
quotes. For example:
[Justin Mason]: http://jmason.org/
will mean that any occurrence of the name "Justin Mason", in quotes, in
any EtText content chunk or file in the site, becomes a link to that
address. These links are stored in the WebMake cache file.
Quoted bits of text that do not map to an entry in the glossary are not
converted to links (unless they're followed by a square-bracketed link-label
reference).
URLs, such as http://webmake.taint.org/ , and email addresses, such as
jm@nospam-jmason.org, are automatically converted into links to that same
address.
Blocking EtText Link Interpretation
To block interpretation as a link, replace square brackets with the HTML
entities &etsqi; and &etsqo;, which map to [ and ]
respectively; replace quote characters, ", with two apostrophes,
''. If that doesn't do the trick, wrap the entire section of text
with the <!--etsafe-->...<!--/etsafe--> tags.
Similar Systems
EtText-like plain-text-to-markup conversion systems have a long history. The
first time I came across the concept was with Setext, which was
included with Tony Sanders' Plexus web server, back in September 1993.
Yes, 1993. Setext has been around for a while!
WikiWikiWeb is quite a recent, well-established system which uses
a similar markup style.
Userland's Frontier includes a text-to-markup conversion
system as well.
Some well-known sites that use their own converters to convert
plain-text to markup include http://www.blogger.com/, http://slashdot.org/
(for comments) and http://www.advogato.org/.
Jorn Barger maintains an impressive summary of etext formats at his Robot
Wisdom site. Skip down to section 3, Internet etext
standards, for the directly-relevant stuff.
Zope and ZWiki use a format called StructuredText, which again comes from
WikiLand. There's some interesting work going on there with the STXDocument
object, which is a web-managable object that contains information marked up
in the structured text format.
The POD Format Converter
This converter converts from POD to HTML, using Tom Christiansen's
Pod::Html module.
POD is a powerful, but simple, editable-text format for marking up
manual-page-style documentation. See the "perlpod" manual page in your Perl
documentation for more information on the POD format.
Things to watch out for in WebMake's support for POD:
-
Anything before the <BODY> tag, or after the </BODY>
tag, in the generated output is stripped, so that the POD output can be
embedded in HTML pages without requiring a page of its own.
-
WebMake allows options to pod2html to be specified using
the podargs attribute of the <content> tag; see below.
-
If you are reading POD documentation embedded inside other files,
you should probably use the "asis" attribute on the content items in
question, otherwise all sorts of wierd things could happen as WebMake tries
to interpret Perl variable references and so on! See the <content> documentation for details on "asis".
Specifying Options to the POD Translator
If you want to specify pod2html options to the converter, just
put them in a string as a podargs attribute of the <content> tag,
like so:
<content name="some_pod" podargs="--noindex">
...
</content>
The HTML Cleaner
The HTML cleaner is a powerful filter which can polish grotty, messy HTML into
fully-standards-compliant glory. By default, all output of format
text/html (the default format) will be passed through it.
It is controlled using the clean parameter of the <out> tag.
The features to be used should be listed in this parameter's value, separated
by whitespace.
Here are the features available:
-
pack - Compress the HTML, removing all white space that is not
part of an attribute's value, or inside <xmp> or <pre> tags.
-
nocomments - Trim all comments.
-
addimgsizes - Add image sizes to <img> tags if they do not
already specify them.
-
cleanattrs - Quote all attributes in opening tags, and lowercase
all tag names.
-
addxmlslashes - Add XML-style slashes to the end of empty-element
tags, such as <hr>, <img> etc.
-
fixcolors - Fix colors that do not start with a # character, so that
they do.
The feature string all can be used to include all cleaning modes.
This is the default.
Contents for the 'Module Documentation' section
HTML::WebMake::Content
Content - a content item.
<{perl
$cont = get_content_object ("foo.txt");
[... etc.]
}>
This object allows manipulation of WebMake content items directly.
-
$text = $cont->get_name();
-
Return the content item's name.
-
$text = $cont->as_string();
-
A textual description of the object for debugging purposes; currently it's
name.
-
$fname = $cont->get_filename();
-
Get the filename or datasource location that this content was loaded from.
Datasource locations look like this:
proto :protocol-specific-location-data , e.g. file:blah/foo.txt or
http://webmake.taint.org/index.html .
-
@filenames = $cont->get_deps();
-
Return an array of filenames and locations that this content depends on,
i.e. the filenames or locations that it contains variable references to.
-
$flag = $cont->is_generated_content();
-
Whether or not a content item was generated from Perl code, or is metadata.
Generated content items cannot themselves hold metadata.
-
$val = $cont->expand()
-
Expand a content item, as if in a curly-bracket content reference. If the
content item has not been expanded before, the current output file will be
noted as the content item's ''main'' URL.
-
$val = $cont->expand_no_ref()
-
Expand a content item, as if in a curly-bracket content reference. The
current output file will not be used as the content item's ''main'' URL.
-
$val = $cont->get_metadata($metaname);
-
Get an item of this object's metadata, e.g.
$score = $cont->get_metadata("score");
The metadatum is converted to its native type, e.g. score is return as an integer, title as a string, etc. If the metadatum is not provided, the default value for
that item, defined in HTML::WebMake::Metadata, is used.
-
$score = $cont->get_score();
-
Return a content item's score.
-
$title = $cont->get_title();
-
Return a content item's title.
-
$modtime = $cont->get_modtime();
-
Return a content item's modification date, in UNIX time_t format, ie.
seconds since Jan 1 1970.
-
$order = $cont->get_declared();
-
Returns the content item's declaration order. This is a number representing
when the content item was first encountered in the WebMake file; earlier
content items have a lower declaration order. Useful for sorting.
-
@kidobjs = $cont->get_kids ($sortstring);
-
Get the child content items for this item. The ''child'' content items are
items that use this content as their up metadatum.
Returns a list of content objects in unsorted order.
-
@kidobjs = $cont->get_sorted_kids ($sortstring);
-
Get the child content items for this item. The ''child'' content items are
items that use this content as their up metadatum.
Returns a list of content objects sorted by the provided sort string.
-
$text = $cont->get_url();
-
Get a content item's URL. The URL is defined as the first page listed in
the WebMake file's out tags which refers to that item of content.
Note that, in some cases, the content item may not have been referred to
yet by the time it's get_url() method is called. In this case,
WebMake will insert a symbolic tag, hold the file in memory, and defer
writing the file in question until all other output files have been
processed and the URL has been found.
HTML::WebMake::Main
HTML::WebMake - a simple web site management system, allowing an entire
site to be created from a set of text and markup files and one WebMake
file.
my $f = new HTML::WebMake::Main ();
$f->readfile ($filename);
$f->make();
my $failures = $f->finish();
exit $failures;
WebMake is a simple web site management system, allowing an entire site to
be created from a set of text and markup files and one WebMake file.
It requires no dynamic scripting capabilities on the server; WebMake sites
can be deployed to a plain old FTP site without any problems.
It allows the separation of responsibilities between the content editors,
the HTML page designers, and the site architect; only the site architect
needs to edit the WebMake file itself, or know perl or WebMake code.
A multi-level website can be generated entirely from 1 or more WebMake
files containing content, links to content files, perl code (if needed),
and output instructions. Since the file-to-page mapping no longer applies,
and since elements of pages can be loaded from different files, this means
that standard file access permissions can be used to restrict editing by
role.
Since WebMake is written in perl, it is not limited to command-line
invocation; using the HTML::WebMake::Main module directly allows WebMake to be run from other Perl scripts, or even
mod_perl (WebMake uses use strict throughout, and temporary globals are used only where strictly necessary).
-
$f = new HTML::WebMake::Main
-
Constructs a new HTML::WebMake::Main object. You may pass the following attribute-value pairs to the
constructor.
-
force_output
-
Force output. Normally if a file is already up to date, it is not modified.
This will force the file to be re-made.
-
force_cache_rebuild
-
Force the cached metadata and dependency data for the site to be rebuilt.
Normally this is used to speed up partial rebuilds of the site. This option
implies force_output.
-
risky_fast_rebuild
-
Run more quickly, but take more risks. Normally, dynamic content, such as
Perl sections, sitemaps, or navigation links, are always considered to be
in need of rebuilding, as mapping their dependencies is often very
difficult or impossible. This switch forces them to be ignored for
dependency-tracking purposes, and so an output file that depends on them
will not be rebuilt unless a normal content item on that page changes.
-
base_href
-
Rewrite links to be absolute URLs based at this URL. By default, links are
specified as relative wherever possible.
-
base_dir
-
Generate output, and look for support files (images etc.), relative to this
directory.
-
paranoid
-
Paranoid mode; do not allow perl code evaluation or accesses to directories
above the WebMake file.
-
debug
-
Debug mode; more output.
-
$f->set_option ($optname, $optval);
-
Set a WebMake option. Currently supported options are:
-
$f->readfile ($filename)
-
Read and parse the given WebMake file.
-
$f->make ()
-
Make all outputs, based on the WebMake files read earlier.
-
$num_failures = $f->finish();
-
Finish with a WebMake object and dispose of its internal open files etc.
Returns the number of serious failure conditions that occurred (files that
could not be created, etc.).
See also http://webmake.taint.org/
for more information.
webmake ettext2html ethtml2text HTML::WebMake Text::EtText::EtText2HTML Text::EtText::EtHTML2Text
Justin Mason <jm /at/ jmason.org>
WebMake is distributed under the terms of the GNU Public License.
The latest version of this library is likely to be available from CPAN as
well as:
http://webmake.taint.org/
HTML::WebMake::PerlCodeLibrary
PerlCodeLibrary - a selection of functions for use by perl code embedded in
a WebMake file.
<{perl
$foo = get_content ($bar);
[... etc.]
# or:
$foo = $self->get_content ($bar);
[... etc.]
}>
These functions allow code embedded in a <{perl}> or
<{perlout}> section of a WebMake file to be used to script the
generation of content.
Each of these functions is defined both as a standalone function, or as a
function on the PerlCode object. Code in one of the <{perl*}> sections can access this PerlCode object as the $self variable. If you plan to use WebMake from mod_perl or in a threaded
environment, be sure to call them as methods on $self .
-
@names = content_matching ($pattern);
-
Find all items of content that match the glob pattern $pattern . If
$pattern begins with the prefix RE:, it is treated as a regular expression. The list of items returned is not
in any logical order.
-
@objs = content_names_to_objects (@names);
-
Given a list of content names, convert to the corresponding list of content
objects, ie. objects of type HTML::WebMake::Content.
-
$obj = get_content_object ($name);
-
Given a content name, convert to the corresponding content object, ie.
objects of type HTML::WebMake::Content.
-
@names = content_objects_to_names (@objs);
-
Given a list of objects of type HTML::WebMake::Content, convert to the corresponding list of content name strings.
-
@sortedobjs = sort_content_objects ($sortstring, @objs);
-
Sort a list of content objects by the sort string $sortstring . See ''sorting.html'' in the WebMake documentation for details on sort
strings.
-
@names = sorted_content_matching ($sortstring, $pattern);
-
Find all items of content that match the glob-style pattern $pattern . The list of items returned is ordered according to the sort string $sortstring . If $pattern begins with the prefix RE:, it is treated as a regular expression.
See ''sorting.html'' in the WebMake documentation for details on sort
strings.
This, by the way, is essentially implemented as follows:
my @list = $self->content_matching ($pattern);
@list = $self->content_names_to_objects (@list);
@list = $self->sort_content_objects ($sortstring, @list);
return $self->content_objects_to_names (@list);
-
$str = get_content ($name);
-
Get the item of content named $name . Equivalent to a $ {content_reference}.
-
@list = get_list ($name);
-
Get the item of content named, but in Perl list format. It is assumed that
the list is stored in the content item in whitespace-separated format.
-
set_content ($name, $value);
-
Set a content chunk to the value provided. This content will not appear in
a sitemap, and navigation links will never point to it.
Returns the content object created.
-
set_list ($name, @values);
-
Set a content chunk to a list containing the values provided, separated by
spaces. This content will not appear in a sitemap, and navigation links
will never point to it.
Returns the content object created.
-
set_mapped_content ($name, $value, $upname);
-
Set a content chunk to the value provided. This content will appear in a
sitemap and the navigation hierarchy. $upname should be the name of it's parent content item. This item must not be
metadata, or other dynamically-generated content; only first-class mapped
content can be used.
Returns the content object created.
-
del_content ($name);
-
Delete a named content chunk.
-
@names = url_matching ($pattern);
-
Find all URLs (from <out> and <media> tags) whose name matches the glob-style pattern $pattern . The names of the URLs, not the URLs themselves, are returned. If $pattern begins with the prefix RE:, it is treated as a regular expression.
-
$url = get_url ($name);
-
Get a named URL. Equivalent to an $ (url_reference).
-
set_url ($name, $url);
-
Set an URL to the value provided.
-
del_url ($name);
-
Delete an URL.
-
$listtext = make_list ($itemname, @namelist);
-
Generate a list by iterating through the @namelist , setting the content item
item to the current name, and interpreting the content chunk named
$itemname . This content chunk should refer to PerlCodeLibrary.pm appropriately.
Each resulting block of content is appended to a $listtext, which is
finally returned.
See the news_site.wmk sample site for an example of this in use.
-
define_tag ($tagname, \&handlerfn, @required_attributes);
-
Define a tag for use in content items. Any occurrences of this tag, with at
least the set of attributes defined in @required_attributes, will cause the
handler function referred to by handlerfn to be called.
Handler functions are called as fcllows:
handler ($tagname, $attrs, $text, $perlcode);
Where $tagname is the name of the tag, $attrs is
a reference to a hash containing the attribute names and the values used in
the tag, and $text is the text between the start and end tags.
$perlcode is the PerlCode object, allowing you to write proper
object-oriented code that can be run in a threaded environment or from
mod_perl. This can be ignored if you like.
This function returns an empty string.
-
define_empty_tag ($tagname, \&handlerfn, @required_attributes);
-
Define a tag for use in content items. This is identical to define_tag
above, but is intended for use to define ''empty'' tags, ie. tags which
occur alone, not as part of a start and end tag pair.
The handler in this case is called with an empty string for the
$text argument.
-
define_wmk_tag ($tagname, \&handlerfn, @required_attributes);
-
Define a tag for use in the WebMake file.
Aside from operating on the WebMake file instead of inside content items,
this is otherwise identical to define_tag above,
-
define_empty_wmk_tag ($tagname, \&handlerfn, @required_attributes);
-
Define an empty, aka. standalone, tag for use in the WebMake file.
Aside from operating on the WebMake file instead of inside content items,
this is otherwise identical to define_tag above,
-
$obj = get_root_content_object();
-
Get the content object representing the ''root'' of the site map. Returns
undef if no root object exists, or the WebMake file does not contain a
<sitemap> command.
-
$name = get_current_main_content();
-
Get the ''main'' content on the current output page. The ''main'' content
is defined as the most recently referenced content item which (a) is not
generated content (perl code, sitemaps, breadcrumb trails etc.), and (b)
has its
map attribute set to ``true''.
Note that this API should only be called from a deferred content reference;
otherwise the ''main'' content item may not have been referenced by the
time this API is called.
undef is returned if no main content item has been referenced.
-
$main = get_webmake_main_object();
-
Get the current WebMake interpreter's instance of HTML::WebMake::Main
object. Virtually all of WebMake's functionality and internals can be
accessed through this.
Contents for the 'Manual Pages' section
webmake(1)
webmake - a simple web site management system, allowing an entire site to
be created from a set of text and markup files and one WebMake file.
webmake [option ...]
webmake [option ...] [-f webmakefile]
webmake [option ...] [-R dir_or_file]
WebMake is a simple web site management system, allowing an entire site to
be created from a set of text and markup files and one WebMake file.
It requires no dynamic scripting capabilities on the server; WebMake sites
can be deployed to a plain old FTP site without any problems.
It allows the separation of responsibilities between the content editors,
the HTML page designers, and the site architect; only the site architect
needs to edit the WebMake file itself, or know perl or WebMake code.
A multi-level website can be generated entirely from 1 or more WebMake
files containing content, links to content files, perl code (if needed),
and output instructions. Since the file-to-page mapping no longer applies,
and since elements of pages can be loaded from different files, this means
that standard file access permissions can be used to restrict editing by
role.
Text can be edited as standard HTML, converted from plain text (using the
included Text::EtText module), or converted from any other format by adding
a conversion method to the WebMake::FormatConvert module.
Since URLs can be referred to symbolically, pages can be moved around and
URLs changed by changing just one line. All references to that URL will
then change automatically.
Content items and output URLs can be generated, altered, or read in
dynamically using perl code. Perl code can even be used to generate other
perl code to generate content/output URLs/etc., recursively.
-
-f
-
The WebMake file to read and generate output from. If this option is not
supplied, the default behaviour is to search the current directory and its
parents for a file ending in .wmk .
-
-F
-
Force output. Normally if a file is already up to date, it is not modified.
This will force the file to be re-made.
-
-r
-
Run more quickly, but take more risks. Normally, dynamic content, such as
Perl sections, sitemaps, or navigation links, are always considered to be
in need of rebuilding, as mapping their dependencies is often very
difficult or impossible. This switch forces them to be ignored for
dependency-tracking purposes, and so an output file that depends on them
will not be rebuilt unless a normal content item on that page changes.
-
-b basehref
-
Rewrite links to be absolute URLs based at this URL. By default, links are
specified as relative wherever possible.
-
-d basedir
-
Generate output, and look for support files (images etc.), relative to this
directory.
-
-p
-
Paranoid mode; do not allow perl code evaluation or accesses to directories
above the WebMake file.
-
-D
-
Debug mode; more output.
-
-L
-
Debug level; how much debug output to produce. 0 means no debug output, 3
means lots.
-
-C dir
-
Change to this directory before reading files or generating output.
-
-R dir_or_file
-
If dir_or_file is a directory, change to that directory, or if it is a file, change to
that file's parent directory, before starting.
The webmake command is part of the HTML::WebMake Perl module. Install this as a normal Perl module, using perl -MCPAN -e shell , or by hand.
No environment variables, aside from those used by perl, are required to be
set.
webmake ettext2html ethtml2text HTML::WebMake Text::EtText
Justin Mason <jm /at/ jmason.org>
HTML::Entities File::Spec File::Path File::Basename Carp Cwd
Image::Size is required to support the IMGSIZE tag. If this tag is not used, or if the
module is not available, webmake can still operate acceptably.
Contents for the 'Plugins and Libraries' section
csvtable_tag.wmk
< use plugin="csvtable_tag" />
< csvtable [delimiter="char"] [HTML table attributes] >
[...cells...]
< /csvtable >
This WebMake Perl library provides a tag to allow HTML tables to be
constructed, quickly, using a tab-, comma-, or pipe-separated value table.
Firstly, pick a delimiter character, such as | . Set the delimiter
attribute to this character.
Each line of the CSV table will become a < TR >; each
delimiter-separated cell will be enclosed in a < TD > tag pair.
Attributes for the HTML table tag itself, can be provided as attributes to
this tag; they will be passed through into the resulting < TABLE >
tag.
By default, items inside the tables are represented as < TD > cells,
with no attributes. Certain special line prefixes allow control over
formatting of table items, as follows. These are all case-insensitive, and
whitespace after them will be stripped; but they must start on the first
character of the line (no leading spaces), and, despite how they're
rendered here, should not contain any spaces between the angle brackets.
Blank lines are skipped.
-
< !-- .... -- >
-
Comments, a la HTML.
-
< csvfmt >
-
The rest of the line is used to specify the format to be used for each line
afterwards, until the end of the < csvtable >, or until the next <
csvfmt > line.
The line should end in a < /csvfmt > closing tag.
Specify a < tr >...< /tr > block, with $1, $2, $3, etc. for the
numbered cells (counting from 1). For example:
< csvfmt >< tr >< td >$1< /td >< td >$2< /td >< td >$3< /td >< /tr >< /csvfmt >
< csvtable delimiter="|" >
< !-- heading -- >
< csvfmt >< tr >< th >$1< /th >< th >$2< /th >< th >$3< /th >< /tr ></ csvfmt >
First Name|Surname|Title
< !-- contents -- >
< csvfmt >< tr >< td >$1< /td >< td >$2< /td >< td >$3< /td >< /tr ></ csvfmt >
Justin|Mason|JAPH
Foo|Bar|Baz
< /csvtable >
Thanks to Chris Barrett; he suggested this tag.
download_tag.wmk
< use plugin="download_tag" />
< download file="filename.dat" [text="template"] />
This WebMake Perl library provides a quick shortcut to make links to files
for download.
The attributes supported are as follows:
-
file="filename.dat"
-
The filename to link to. If a file by this filename does not exist, a
warning will be printed.
Filenames should be specified relative to one of the following:
-
the top level of the site
-
-
the output file which contains the tag (not recommended, as it precludes
the tag being used in another output file in a different directory)
-
-
a directory named in the FileSearchPath WebMake option
-
-
text="template"
-
The link text to be used. The following content items are defined for use
inside the link text:
-
download.path
-
The real path to the file.
-
download.href
-
The path to the file, relative to the current output file.
-
download.name
-
The file's name, without directories.
-
download.mdate
-
The file's modification date, in ctime() format, e.g. Thu Mar 01 20:54:34
2001.
-
download.mtime
-
The file's modification date, in UNIX time_t format.
-
download.size_in_k
-
The file's size, in kilobytes (rounded up).
-
download.size
-
The file's size, in bytes.
-
download.owner
-
The file's owner.
-
download.group
-
The file's group.
-
download.tag_attrs
-
The remaining attributes of the download tag.
template can be a $ {content_reference}. The default template is:
< a href="$ {download.href}" $ {download.tag_attrs}>$ {download.name}
($ {download.size_in_k}k)< /a>
Note that this means that any unrecognised attributes of the download tag
itself will become attributes of the A tag.
FileSearchPath - WebMake option
dump_vars.wmk
dump_vars.wmk - dump all WebMake variables and content items
< use plugin="dump_vars" />
$ {DumpVars_names}
$ {DumpVars_full}
Some debugging help. If you include this file in your WebMake file, it will
define these content items:
-
-
This content contains a list of the names of all content items defined.
-
-
This content contains a dump of all content items defined, including their
names and their values. It excludes and .
navtree.wmk
< use plugin="navtree" />
< navtree name=... sitemap=...
opennode=... closednode=...
thisnode=... thisleaf=...
leaf=... depth=... />
This WebMake plugin provides the navtree tag.
navtree operates similarly to the sitetree tag, but displays only a subset of all the site's nodes; it will map all of
the top-level nodes of the site, the parent nodes of the current page,
their direct children, and the current page plus it's children up to depth depth. The effect is similar to a tree-view-based file browser, like Windows
Explorer.
This differs from the sitetree tag in that sitetree does not support displaying the current page's children.
So, for a site like this:
-
+ Section 1
-
-
+ Section 1 Subsection 1
-
-
+ Section 1 Subsection 2
-
-
+ Section 2
-
-
+ Section 2 Subsection 1
-
-
+ Section 2 Subsection 2
-
A reference to the site tree on page Section 1 would result in a site tree like this:
-
- Main Page
-
-
- Section 1
-
-
+ Section 2
-
Display of each page's entry in the tree is performed by expanding one of
the 5 template content items named in the tag's attributes: closednode,
opennode, thisnode, thisleaf or leaf. See the sitemap tag documentation for more details on how to use these (note however that
the
is_node variable is not available for sitetrees).
-
name
-
The name of the sitetree object. To include a sitetree in a page, refer to
it using this name, as a deferred reference.
-
sitemap
-
The name of the sitemap. The sitetree requires a sitemap, as the sitemap is
responsible for mapping out the site and defining which pages and content
items are included.
-
closednode
-
A content item which is evaluated to display a ''closed'' node, ie. a node
which is not on the path to the current page.
-
opennode
-
A content item which is evaluated to display an ''open'' node, one which is
on the path to the current page. As for the sitemap tag's node attribute, this content item must include a reference to the list variable, which will contain all the entries for the pages beneath it in
the hierarchy.
-
rootnode
-
A content item which is evaluated to display an ''open'' root node. It
defaults to opennode if not specified. It may be used to generate ''multirooted'' tree (a
forest). In that case you should create a dummy root content (it upsets
sitemap code if you dont have one single root) and create rootnode template to output only the list with apropriate decorations.
-
thisnode
-
A content item which is evaluated to display the current page if it is an
inner node, that is it has children. Iff depth > 0, thisnode must include a reference to the list variable.
-
thisleaf
-
A content item which is evaluated to display the current page if it is a
leaf.
-
leaf
-
A content item which is evaluated to display a leaf-node page, one which
has no pages beneath it in the hierarchy.
-
depth
-
How many levels beneath the current page should be listed. 0 means none
(behavior of sitetree tag). The default is 1 which means to list direct children of the current
node.
Following variables (content items) are defined for use in templates:
-
title
-
The title metadatum of the node.
-
score
-
The score metadatum of the node.
-
name
-
The name of the node.
-
url
-
The url of the node. Should be referenced using url reference ($ (url) ).
-
level
-
The level of the node, that is how deep it is in the tree. Root node has
level 0, it's children 1, their children 2 and so on.
-
sublvl
-
The level under current page. This is similar to level, except that current page is considered root. -1 for nodes not descendant
from current page.
-
left
-
This is depth above the current node and depth - sublvl for the descendants of the current node.
-
is_leaf
-
This is 1 for leaf nodes and 0 for inner nodes (both closed and open).
-
list
-
This is the list of children, which should be output by open nodes.
Thanks to Jan Hudec <bulb /at/ ucw.cz>, who provided this
tag.
safe_tag.wmk
< use plugin="safe_tag" />
< safe>
...some data with HTML tags or WebMake references
< /safe>
<{perl
$safe_text = make_safe ($unsafe_text);
}>
This WebMake Perl library provides a way to ``make safe'' WebMake, EtText
or HTML data, escaping all metacharacters appropriately so that content
references, EtText links or HTML tags are not interpreted.
sitetree.wmk
< use plugin="sitetree" />
< sitetree name=... sitemap=...
opennode=... closednode=...
thispage=... leaf=... />
This WebMake Perl library provides the sitetree tag.
Sitetree operates similarly to the built-in sitemap tag, but, displays only a subset of all the site's nodes; it will map all
of the top-level nodes of the site, and then only the parent nodes of the
current page. The effect is similar to a tree-view-based file browser, like
Windows Explorer.
In terms of differences in usage, where sitemap creates a single map which includes every page in the site, sitetree maps only the pages up to and including the current page, and generates a
map for each individual output page.
So, for a site like this:
-
+ Section 1
-
-
+ Section 1 Subsection 1
-
-
+ Section 1 Subsection 2
-
-
+ Section 2
-
-
+ Section 2 Subsection 1
-
-
+ Section 2 Subsection 2
-
A reference to the site tree on page Section 1 Subsection 1 would result in a site tree like this:
-
- Section 1
-
-
- Section 1 Subsection 1
-
-
+ Section 2
-
Display of each page's entry in the tree is performed by expanding one of
the 4 template content items named in the tag's attributes: closednode,
opennode, thispage, or leaf. See the sitemap tag documentation for more details on how to use these (note however that
the is_node variable is not available for sitetrees).
-
name
-
The name of the sitetree object. To include a sitetree in a page, refer to
it using this name, as a deferred reference.
-
sitemap
-
The name of the sitemap. The sitetree requires a sitemap, as the sitemap is
responsible for mapping out the site and defining which pages and content
items are included.
-
closednode
-
A content item which is evaluated to display a ''closed'' node, ie. a node
which is not on the path to the current page.
-
opennode
-
A content item which is evaluated to display an ''open'' node, one which is
on the path to the current page. As for the sitemap tag's node attribute, this content item must include a reference to the list variable, which will contain all the entries for the pages beneath it in
the hierarchy.
-
thispage
-
A content item which is evaluated to display the current page.
-
leaf
-
A content item which is evaluated to display a leaf-node page, one which
has no pages beneath it in the hierarchy.
Thanks go to Alex Canady, who came up with the idea for this one.
|