WebMake Documentation (version 1.1)

Data Sources for the <contents> and <media> Tags

Contents or URLs can be searched for using the <contents> or <media> tags, which allow you to search a data source (directory, delimiter-separated-values file, database etc.) for a pattern.

Currently two data source protocols are defined, file: and svfile: . More will probably follow, especially if other people contribute them, hint hint ;)

file: is the default protocol, if none is specified.

Attributes Supported By Datasource Tags

src

All datasources require this attribute, which specifies a protocol and path, in a URL-style syntax:

protocol:path

name

This attribute is used to specify the pattern of data, under this path, which will be converted into content items. The part of the data's location which matches this name pattern will become the name of the item. Typically, glob patterns, such as "*.txt" or ".../*.html" are used.

prefix

The items' names can be further modified by specifying a prefix and/or suffix; these strings are prepended or appended to the raw name to make the name the content is given.

suffix

See above.

namesubst

a Perl-formatted s// substitution, which is used to convert source filenames to content names.

nametr

a Perl tr// translation, which is used to convert source filenames to content names.

listname

a name of a content item. This content item will be created, and will contain the names of all content items picked up by the <contents> or <media> search.

In addition, the attributes supported by the content tag can be specified as attributes to <contents>, including format, up, map, etc.

The content blocks picked up from a <contents> search can also contain meta-data, such as headlines, visibilty dates, workflow approval statuses, etc. by including metadata.

The file: Protocol

The file: protocol loads content from a directory; each file is made into one content chunk. The src attribute indicates the source directory, the name attribute indicates the glob pattern that will pick up the content items in question. The filename of the file will be used as the content chunk's name.

<contents src="stories" name="*.txt" />

Note that the files in question are not actually opened until their content chunks are referenced using ${name} or get_content("name").

Normally only the top level of files inside the src directory are added to the content set. However, if the name pattern starts with .../, the directory will be searched recursively:

<contents src="stories" name=".../*.txt" />

The resulting content items will contain the full path from that directory down, i.e. the file stories/dir1/foo/bar.txt exists, the example above would define a content item called ${dir1/foo/bar.txt}.

The svfile: Protocol

The svfile: protocol loads content from a delimiter-separated-file; the src attribute is the name of the file, the name is the glob pattern used to catch the relevant content items. The namefield attribute specifies the field number (counting from 1) which the name pattern is matched against, and the valuefield specifies the number of the field from which the content chunk is read. The delimiter attribute specifies the delimiter used to separate values in the file.

<contents src="svfile:stories.csv" name="*" namefield=1 valuefield=2 delimiter="," />

Adding New Protocols

New data sources for <contents> and <media> tags are added by writing an implementation of the DataSourceBase.pm module, in the HTML::WebMake::DataSources package space (the lib/HTML/WebMake/DataSources directory of the distribution).

Every data source needs a protocol, an alphanumeric lowercase identifier to use at the start of the src attribute to indicate that a data source is of that type.

Each implementation of this module should implement these methods:

new ($parent)

instantiate the object, as usual.

add ()

add all the items in that data source as content chunks. (See below!)

get_location_url ($location)

get the location (in URL format) of a content chunk loaded by add().

get_location_contents ($location)

get the contents of the location. The location, again, is the string provided by add().

get_location_mod_time ($location)

get the current modification date of a location for dependency checking. The location, again, is in the format of the string provided by add().

Notes:

  • If you want add() to read the content immediately, call $self->{parent}->add_text ($name, $text, $self->{src}, $modtime).

  • add() can defer opening and reading content chunks straight away. If it calls $self->{parent}->add_location ($name, $location, $lastmod), providing a location string which starts with the data source's protocol identifier, the content will not be loaded until it is needed, at which point get_location_contents() is called.

  • This location string should contain all the information needed to access that content chunk later, even if add() was not been called. Consider it as similar to a URL. This is required so that get_location_mod_time() (see below) can work.

  • All implementations of add() should call $fixed = $self->{parent}->fixname ($name); to modify the name of each content chunk appropriately, followed by $self->{parent}->add_file_to_list ($fixed); to add the content chunk's name to the filelist content item.

  • Data sources that support the <media> tag need to implement get_location_url, otherwise an error message will be output.

  • Data sources that support the <contents> tag, and defer reading the content until it's required, need to implement get_location_contents, which is used to provide content from a location set using $self->{parent}->add_location().

  • Data sources that support the <contents> tag need to implement get_location_mod_time. This is used to support dependency checking, and should return the modification time (in UNIX time() format) of that location. Note that since this is used to compare the modification time of a content chunk from the previous time webmake was run, and the current modification time, this is called before the real data source is opened.

WebMake Documentation (version 1.1)