v1.0, 2001-11-26
Revision 1.3
Robert (\rho) Barta
Bond University
This document has no formal status. It is a technical report of the local university.
After an introduction and motivation, we present the core concepts in a short tutorial, before turning to formal language specification.
You can find the running example used throughout the tutorial at the Bond Topicmap Server. You might want to peruse an online converter.
Since the stabilisation of XTM, an XML-based notation for Topic Maps, the interest in authoring Topic Maps has increased.
While the automatic generation of topic maps from backend databases into XTM can easily be achieved, manual authoring is tedious and error-prone. One option is to use XML aware development tools, such as XML-editors. While feasible, generic XML editors offer little help above syntactical conformance. Another option is to use integrated development environments for Topic Maps (server or client-side) as they appear in the market.
AsTMa is a linear, textual notation for Topic Map information. The motivation was to create an authoring shorthand notation in contrast to XTM which is suitable for interchange purposes. AsTMa has the following design objectives:
At this stage AsTMa does not fulfill all of the above objectives. As outlined in more detail in the section about conformance, AsTMa is (yet) not as expressive as XTM, that not having been a prime concern. Still, AsTMa is sufficiently rich to prototype medium sized topic maps.
The following setting assumes that the AsTMa text will be either directly understood by a particular Topic Map processing software or that a specialized processor will first convert the AsTMa text stream into an XTM stream.
filesystem (software)
already defines a topic (as explained below). If there is more to a topic (or an association) this information will be on follow-up lines:
filesystem (software) bn: File System
An empty line, thus, separates items like topic and associations. On any line white-spaces can be used before and after keywords and special characters. They are silently ignored.
Any line also can contain comment like
filesystem (software) # more information will follow
Such comments will be discarded by any processor and are only for internal documentation purposes. If you would like to have a comment in the processor output, then this comment MUST begin at the start of a line:
AsTMa | XTM |
---|---|
# I will survive and (hopefully) # the line structure will not # be broken |
<!-- I will survive and (hopefully) the line structure will not be broken --> |
Comments on consecutive lines will be treated as one comment. Any non-comment line signals the end of such a group. Also, any '-->' occurrence within a comment will be converted into '--_ >', interspersing one underscore character
filesystem (software)
indicates the definition of topic with id filesystem which is an instance of another topic, software:
AsTMa | XTM |
---|---|
filesystem (software) |
<topic id="filesystem"> <instanceOf> <topicRef xlink:href="#software"/> </instanceOf> <baseName> <baseNameString>filesystem</baseNameString> </baseName> </topic> |
As we did not provide a base name, the topic id filesystem is also assumed to be the topic's basename. While this heuristic approach works fine for some words, it does not well with others, say,
linux-distribution (software)
Any AsTMa processor is free to apply any other heuristics, such as:
AsTMa | XTM |
---|---|
linux-distribution (software) |
<topic id="linux-distribution"> <instanceOf> <topicRef xlink:href="#software"/> </instanceOf> <baseName> <baseNameString>linux distribution</baseNameString> </baseName> </topic> |
substituting dashes by blanks, looking up 3rd-party databases or leaving it as it is. Of course, the author can enforce a particular base name:
AsTMa | XTM |
---|---|
RedHat-Linux-sparc (linux-distribution-port) bn: RedHat Linux for SPARC |
<topic id="RedHat-Linux-sparc"> <instanceOf> <topicRef xlink:href="#linux-distribution-port"/> </instanceOf> <baseName> <baseNameString>RedHat Linux for SPARC</baseNameString> </baseName> </topic> |
On a similar take, you can also add occurrences for topics:
AsTMa | XTM |
---|---|
linux (os) bn: Linux kernel oc: http://www.kernel.org/ |
<topic id="linux"> <instanceOf> <topicRef xlink:href="#os"/> </instanceOf> <baseName> <baseNameString>Linux kernel</baseNameString> </baseName> <occurrence> <resourceRef xlink:href="http://www.kernel.org/"/> </occurrence> </topic> |
in the case for resource references or also for inline data (XTM resourceData):
AsTMa | XTM |
---|---|
linux-port-on-sparc (linux-port) bn: SPARC Linux port oc: http://www.sparc.org/linux.shtml in: The kernel and kernel modules \ are 64-bit on sparc64, \ userland is still 32-bit, \ and in fact the same as on sparc32. |
<topic id="linux-port-on-sparc"> <instanceOf> <topicRef xlink:href="#linux-port"/> </instanceOf> <baseName> <baseNameString>SPARC Linux port</baseNameString> </baseName> <occurrence> <resourceRef xlink:href="http://www.sparc.org/linux.shtml"/> </occurrence> <occurrence> <resourceData>The kernel and kernel mod....</resourceData> </occurrence> </topic> |
If appropriate, you can also type topic characteristics:
AsTMa | XTM |
---|---|
reiserfs (filesystem) bn: Reiser File System, ReiserFS oc (download): http://www.namesys.com/download.html |
<topic id="reiserfs"> <instanceOf> <topicRef xlink:href="#filesystem"/> </instanceOf> <baseName> <baseNameString>Reiser File System, ReiserFS</baseNameString> </baseName> <occurrence> <instanceOf> <topicRef xlink:href="#download"/> </instanceOf> <resourceRef xlink:href="http://www.namesys.com/download.html"/> </occurrence> </topic> |
To scope a characteristic you use '@' to introduce a particular context:
AsTMa | XTM |
---|---|
RedHat-Linux-sparc (linux-distribution-port) bn: RedHat Linux for SPARC bn @ deutsch : RedHat Linux für SPARC |
<topic id="RedHat-Linux-sparc"> <instanceOf> <topicRef xlink:href="#linux-distribution-port"/> </instanceOf> <baseName> <baseNameString>RedHat Linux for SPARC</baseNameString> </baseName> <baseName><scope><topicRef xlink:href="#deutsch"/></scope> <baseNameString>RedHat Linux für SPARC</baseNameString> </baseName> </topic> |
Associations may or may not have a particular association type. This topic type is provided inside a () pair.
(kernel-patch-provides-feature) ...If the association has no explicit type, it can be omitted, by writing ().
Associations also have a number of members playing roles:
AsTMa | XTM |
---|---|
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.x |
<association> <instanceOf> <topicRef xlink:href="#kernel-patch-provides-feature"/> </instanceOf> <member> <roleSpec> <topicRef xlink:href="#feature"/> </roleSpec> <topicRef xlink:href="#reiserfs"/> </member> <member> <roleSpec> <topicRef xlink:href="#platform"/> </roleSpec> <topicRef xlink:href="#i386"/> </member> <member> <roleSpec> <topicRef xlink:href="#patch"/> </roleSpec> <topicRef xlink:href="#generic-reiserfs-patch-2.4.x"/> </member> </association> |
For better readability you may want to indent the roles
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.x
The very first non-empty line within the document MUST provide a name (identifier) of the topic map itself:
AsTMa | XTM |
---|---|
sparclinux : iso-8859-1 |
<?xml version="1.0" encoding="iso-8859-1"?> <topicMap id="sparclinux" xmlns = 'http://www.topicmaps.org/xtm/1.0/' xmlns:xlink = 'http://www.w3.org/1999/xlink'> |
Optionally, you can specify an particular encoding, like in the example above. If not provided, the encoding defaults to iso-8859-1.
For authors who have no access to a general macro expansion environment, the language supports a rudimentary macro facility which comes handy when to abbreviate long strings. The idea is to first declare a macro via
de=http://www.topicmaps.org/xtm/1.0/language.xtm#de
somewhere towards the beginning of the document and then use this definition for, say, scoping:
AsTMa | XTM |
---|---|
oc @ &de; (press-release) : http://www.... |
<occurrence> <scope><topicRef xlink:href="http://www.topicmaps.org/xtm/1.0/language.xtm#de"/></scope> <instanceOf> <topicRef xlink:href="#press-release"/> </instanceOf> <resourceRef xlink:href="http://www.s...."/> </occurrence> |
Every instance of &de; throughout the document will be expanded by the above URL.
While these macros cannot have parameters, they will be evaluated recursively, i.e. macros can contain other macros.
It goes without saying that the notation &...; may collide with other XML entities. It lies in the responsibility of the author to take care of that.
In the following we use grammars (extended BNFs) and regexps (regular expressions) to define the syntax of an AsTMa instance.
As AsTMa is line-oriented, we introduce eol as special terminal to symbolize the end of the line (or the end of the text stream, for that matter). AsTMa processors use the start and the end of a line as a natural boundary of AsTMa purported information. If for some reason, say readability, a line n has to be broken into several lines, the backslash character \ at the very end of a line n indicates that the followup line n+1 logically belongs to n.
Example:
bn: this line is too long, so\ we continue it on the next line
If we allow any number of white spaces to occur on a specific point in the grammar, the we symbolize this as ws*. If we require at least one white space to appear somewhere, then we use ws+ to denote this. Before and after terminals (keywords and special characters), white-spaces can be used freely while this is not made explicit in the grammar productions. Moreover, any text is stripped off white-spaces on its beginning and its end:
string --> [ ws* ] { character } [ ws* ]
In many cases topic identifiers will be used to refer to other topics. To underline what kind of topic is expected in specific cases (without enforcing this with an AsTMa processor) the syntax uses typed non-terminals, like role-topic-id or type-topic-id.
A real AsTMa processor is free to skip the transformation step and to directly load AsTMa instances as long as this process is equivalent.
instance --> { comment-section } name-encode-section { section }An AsTMa processor has to honor the following processing constraints:
section --> comment-section | topic-definition-section | association-definition-section | macro-definition-section
Example:
linux: iso-8859-1The instance will be called linux and the used encoding will be iso-8859-1.
name-encode-section --> name [ ':' encoding ] eolA particular AsTMa instance is a character string in a particular character encoding. The AsTMa processor transparently passes all characters to the equivalent virtual XTM instance. To make this happen, an AsTMa instance has to specify at the begining of the stream the appropriate character encoding. This encoding will also be the XML encoding used in the virtual XTM instance. The name will become the value of the ID attribute topic map declaration.
A global comment section is introduced by a # at the very beginning of a line. Any following text (i.e. starting with the 2nd column) is regarded to be comment text. Any directly subsequent line which also begins with a # is accumulated into the comment. A global comment section is terminated by the begin of some other top level section or an empty line.
It is up to the AsTMa processor to process comments. When converting into XTM, the processor may create XML comment sections, when directly loading it for other purposes the comment may be discarded.
Example:
# warning from the information secretary # Linux is a communist conspiration against the free Microsoft worldThe comment contains two lines, the #s themselves are discarded.
comment-section --> { '#' string eol }A local comment is introduced by a # NOT at the beginning of the line. Local comments reach until the end of the current line and are discarded by AsTMa processors.
topic-definition --> topic-id [ '(' list-of-type-topic-ids ')' ] eol { ws* topic-characteristic eol }
The topic-id stands for the topic identifier which HAS to be provided. This identifier SHOULD follow the XML identifier rules although an AsTMa processor might not directly enforce this.
The topic identifier itself has to be placed at the beginning of the line; if the optional list of identifiers inside '()' is specified, the topic identifiers mentioned therein are interpreted as types of that topic. The list entries are separated by one (or more) white spaces:
list-of-topic-ids --> topic-id { ws+ topic-id }
Example:
linux (operating-system-technology open-source-software) ...Defines a topic with id linux and with two types.
Every topic definition will result in a topic element in the virtual XTM instance. The topic-id will be used as value for the ID attribute. For every topic type an instanceOf element will be generated inside this topic containing a topicRef element with the xlink:href attribute having the value of the type id.
Any topic definition section can contain any number of topic characteristics. They have to directly follow the start of a topic definition section, the order is irrelevant, though. Every such characteristic is indicated by a (case sensitive) keyword, followed by scoping and/or typing information (if appropriate), a colon (:) and the value for this particular characteristic. To improve readibility, before the keyword white spaces are allowed.
Example:
linux (operating-system-technology open-source-software) bn: Linux Operating System oc (download): http://www.linux.org/Defines a topic with id linux with an explicit base name and a typed occurrence.
Following defaults are used for a topic in the case there is no explicit characteristic defined:
topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic
basename-characteristic --> 'bn' [ '@' scope-topic-id ] ':' stringThe string will be used 'as-is' for the text content of the baseNameString element inside baseName element of the virtual XTM instance. If a scope topic id exists, then this id will be used as value for the attribute xlink:href for a topicRef element inside a scope element inside above baseName element.
resourceRef-characteristic --> 'oc' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' stringThe string SHOULD be a valid URI. An AsTMa processor might perform checks on this URL.
For each resource occurrence, in the virtual XTM instance a resourceRef element will be created with the string being used as xlink:href attribute. That element will be nested inside a occurrence element inside the current topic element.
If a scope topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.
If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.
resourceData-characteristic --> 'in' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' stringThe string will be used 'as-is' as text inside a resourceData element inside an occurrence element inside the current topic element.
If a scope topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.
If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.
association-definition --> '(' [ association-type-topic-id ] ')' eol { ws* association-member eol }
Example:
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.xThis defines an association of type kernel-patch-provides-feature with three roles feature, platform and patch.
At most one topic identifier can occur as association type. If not provided the XTM default will be assumed
Association members are defined line-by-line according to
association-member --> role-topic-id ':' { member-topic-id }In the virtual XTM instance for every association definition in AsTMa, an association element will be created inside the current topic map. Every member will result in the creation of a member element inside this association element.
The role-topic-id will be used as the value of the xlink:href attribute of an topicRef element inside a roleSpec element inside the member.
Every single member-topic-id is used as the value of the xlink:href attribute of a topicRef element inside the member.
Macros are an ad-hoc facility to abbreviate frequent strings and resemble as such SGML entities. AsTMa processors MAY directly implement this or also MAY use external mechanisms not covered inside AsTMa for macro processing.
A macro is defined via
macro-definition --> macro-name '=' string eoland basically binds a string value to a name. This name, in turn, can be used in subsequent AsTMa definitions like an entity reference:
macro-reference --> '&' macro-name ';'Macros can contain other macros, it is up to the AsTMa processor to warn about the existence of circular definitions.
Having a separate (not standardized) language opens then venue, though, for developing AsTMa into a constraint or even into a query and manipulation language.
special character in id, encoding according to www.isi.edu/in-notes/iana/assignments/character-sets,
causal stream, no implicit merging (not even with ids)
conformant converter is free to produce any additional topic/assocs appropriate??
in terms of an abstract pattern processor?
causal streams (really?) , performance at bigger maps
specifying additional behavior? auto complete, defaults for assocs (types?)
instance --> { comment-section } name-encode-section { section }
comment-section --> { '#' string eol }
name-encode-section --> name [ ':' encoding ] eol
section --> comment-section | topic-definition-section | association-definition-section | macro-definition-section
topic-definition --> topic-id [ '(' list-of-type-topic-ids ')' ] eol { ws* topic-characteristic eol }
topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic
basename-characteristic --> 'bn' [ '@' scope-topic-id ] ':' string
resourceRef-characteristic --> 'oc' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' string
resourceData-characteristic --> 'in' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' string
association-definition --> '(' [ association-type-topic-id ] ')' eol { ws* association-member eol }
association-member --> role-topic-id ':' { member-topic-id }
macro-definition --> macro-name '=' string eol
list-of-topic-ids --> topic-id { ws+ topic-id }
string --> [ ws* ] { character } [ ws* ]
name --> ... any identifier not containing blanks