Robert Barta Bond University
rho@bond.edu.au Copyright © 2002 Robert Barta
AsTMa= is part of the AsTMa language family which was designed to facilitate authoring, but also constraining and querying topic maps. After an introduction and motivation, we present the core concepts in a short tutorial, before turning to formal language specification.
This document has no formal status. It is a technical report of Bond University.
Since the stabilisation of XTM, an XML-based notation for Topic Maps, the interest in authoring Topic Maps has increased.
While the automatic generation of topic maps from backend databases into XTM can easily be achieved, manual authoring is tedious and error-prone. One option is to use XML aware development tools, such as XML-editors. While feasible, generic XML editors offer little help above syntactical conformance. Another option is to use integrated development environments for Topic Maps (server or client-side) as they appear in the market.
AsTMa= is a linear, textual notation for Topic Map information. The motivation was to create an authoring shorthand notation in contrast to XTM which is mainly suitable for interchange purposes. AsTMa= has the following design objectives:
At this stage AsTMa= does not fulfill all of the above objectives. As outlined in more detail in the section about conformance, AsTMa= is (yet) not as expressive as XTM, that not having been a prime concern. Still, AsTMa= is sufficiently rich to prototype medium sized topic maps.
The following setting assumes that the AsTMa= text will be either directly understood by a particular Topic Map processing software or that a specialized processor will first convert the AsTMa= text stream into an XTM stream.
AsTMa= is line oriented. This means that pertinent information is terminated with the end of the line. A single line containing
filesystem (software)already defines a topic (as explained below). If there is more to a topic (or an association) this information will be on follow-up lines:
filesystem (software) bn: File System
An empty line, thus, separates items like topics and associations. On any line white-spaces can be used before and after keywords and special characters. They are silently ignored.
Any line also can contain comment introduced by the character '#' (following a white-space character):
filesystem (software) # more information will followSuch comments will be discarded by any processor and are only for internal documentation purposes.
Please note that (starting from Rev1.5) such a comment must have at least one blank before the '#'. This allows the hasselfree notation of URIs containing a '#' avoiding that the XPointer part will be interpreted as comment. The blanks between the text and the '#' are ignored.
If you would like to have a comment in the processor output, then this comment MUST begin at the start of a line:
AsTMa= | XTM |
# I will survive and (hopefully) # the line structure will not # be broken |
<!-- I will survive and (hopefully) the line structure will not be broken --> |
Comments on consecutive lines will be treated as one comment. Any non-comment line signals the end of such a group. Also, any '-->' occurrence within a comment will be converted into '--_ >' (one underscore character) to avoid problems in the resulting XML code.
The line
filesystem (software)indicates the definition of topic with id filesystem which is an instance of another topic, software:
AsTMa= | XTM |
filesystem (software) |
<topic id="filesystem"> <instanceOf> <topicRef xlink:href="#software"/> </instanceOf> <baseName> <baseNameString>filesystem</baseNameString> </baseName> </topic> |
As we did not provide a base name, the topic id filesystem is also assumed to be the topic's basename. While this heuristic approach works fine for some words, it does not well with others, say,
linux-distribution (software)
Any AsTMa= processor is free to apply any other heuristics, such as:
AsTMa= | XTM |
linux-distribution (software) |
<topic id="linux-distribution"> <instanceOf> <topicRef xlink:href="#software"/> </instanceOf> <baseName> <baseNameString>linux distribution</baseNameString> </baseName> </topic> |
substituting dashes by blanks, looking up 3rd-party databases or leaving it as it is. Of course, the author can enforce a particular base name:
AsTMa= | XTM |
RedHat-Linux-sparc (linux-distribution-port) bn: RedHat Linux for SPARC |
<topic id="RedHat-Linux-sparc"> <instanceOf> <topicRef xlink:href="#linux-distribution-port"/> </instanceOf> <baseName> <baseNameString>RedHat Linux for SPARC</baseNameString> </baseName> </topic> |
On a similar take, you can also add occurrences for topics:
AsTMa= | XTM |
linux (os) bn: Linux kernel oc: http://www.kernel.org/ |
<topic id="linux"> <instanceOf> <topicRef xlink:href="#os"/> </instanceOf> <baseName> <baseNameString>Linux kernel</baseNameString> </baseName> <occurrence> <resourceRef xlink:href="http://www.kernel.org/"/> </occurrence> </topic> |
in the case for resource references or also for inline data (XTM resourceData):
AsTMa= | XTM |
linux-port-on-sparc (linux-port) bn: SPARC Linux port oc: http://www.sparc.org/linux.shtml in: The kernel and kernel modules \ are 64-bit on sparc64, \ userland is still 32-bit, \ and in fact the same as on sparc32. |
<topic id="linux-port-on-sparc"> <instanceOf> <topicRef xlink:href="#linux-port"/> </instanceOf> <baseName> <baseNameString>SPARC Linux port</baseNameString> </baseName> <occurrence> <resourceRef xlink:href="http:....linux.shtml"/> </occurrence> <occurrence> <resourceData>The kernel ...</resourceData> </occurrence> </topic> |
If appropriate, you can also type topic characteristics:
AsTMa= | XTM |
reiserfs (filesystem) bn: Reiser File System, ReiserFS oc (download): \ http://www.namesys.com/download.html |
<topic id="reiserfs"> <instanceOf> <topicRef xlink:href="#filesystem"/> </instanceOf> <baseName> <baseNameString>Reiser ....</baseNameString> </baseName> <occurrence> <instanceOf> <topicRef xlink:href="#download"/> </instanceOf> <resourceRef xlink:href="http:...download.html"/> </occurrence> </topic> |
To scope a characteristic you use '@' to introduce a particular context:
AsTMa= | XTM |
RedHat-Linux-sparc (linux-distribution-port) bn: RedHat Linux for SPARC bn @ deutsch : RedHat Linux für SPARC |
<topic id="RedHat-Linux-sparc"> <instanceOf> <topicRef xlink:href="#linux-distribution-port"/> </instanceOf> <baseName> <baseNameString> RedHat Linux for SPARC </baseNameString> </baseName> <baseName> <scope> <topicRef xlink:href="#deutsch"/> </scope> <baseNameString> RedHat Linux für SPARC </baseNameString> </baseName> </topic> |
Associations may or may not have a particular association type. This topic type is provided inside a () pair.
(kernel-patch-provides-feature) ...If the association has no explicit type, it can be omitted, by writing ().
Associations also have a number of members playing roles:
AsTMa= | XTM |
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.x |
<association> <instanceOf> <topicRef xlink:href="#kernel-patch-provides-feature"/> </instanceOf> <member> <roleSpec> <topicRef xlink:href="#feature"/> </roleSpec> <topicRef xlink:href="#reiserfs"/> </member> <member> <roleSpec> <topicRef xlink:href="#platform"/> </roleSpec> <topicRef xlink:href="#i386"/> </member> <member> <roleSpec> <topicRef xlink:href="#patch"/> </roleSpec> <topicRef xlink:href="#generic-reiserfs-patch-2.4.x"/> </member> </association> |
For better readability you may want to indent the roles
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.x
Topics are said to reify subjects. Either a topic in a topic map is a representant of the subject if the subject itself is not directly addressable. Or, if it is then the topic can directly and unambiguously the subject via a URI.
In case a subject indicator (a not necessarily unique identification for a particular subject) is known, it can be provided via sin:
AsTMa= | XTM |
linux (os) bn: Linux kernel oc: http://www.kernel.org/ sin: http://dmoz.org/.../Linux/ |
<topic id="linux"> <instanceOf> <topicRef xlink:href="#os"/> </instanceOf> <baseName> <baseNameString>Linux kernel</baseNameString> </baseName> <occurrence> <resourceRef xlink:href="http://www.kernel.org/"/> </occurrence> <subjectIdentity> <subjectIndicatorRef xlink:href="http://dmoz.org/.../Linux/"/> </subjectIdentity> </topic> |
AsTMa= | XTM |
linux (os) bn: Linux kernel ... sin: http://dmoz.org/.../Linux/ sin: linux-os |
<topic id="linux"> <instanceOf> <topicRef xlink:href="#os"/> </instanceOf> <baseName> <baseNameString>Linux kernel</baseNameString> </baseName> ... <subjectIdentity> <subjectIndicatorRef xlink:href="http://dmoz.org/.../Linux/"/> <topicRef xlink:href="#linux-os"/> </subjectIdentity> </topic> |
In the case where the topic can be unambiguously be linked to the subject in question, we can use AsTMa's reify clause:
AsTMa= | XTM |
linux-kernel-site (web-site) reifies http://www.linux.org/ bn: Linux kernel Site ... |
<topic id="linux"> <instanceOf> <topicRef xlink:href="#web-site"/> </instanceOf> <baseName> <baseNameString>Linux kernel Site</baseNameString> </baseName> ... <subjectIdentity> <resourceRef xlink:href="http://www.linux.org/"/> </subjectIdentity> </topic> |
There is no special format or syntax for a AsTMa Topic Map instance. All text blocks within the document are regarded to be part of the map.
Optionally you can control the name (id) of the Topic Map. This, though, might have only relevance to your local topic map processor, so there is not counterpart of this in XTM. If so, then the very first non-empty line within the document MUST provide this name (identifier) of the topic map itself:
AsTMa= | XTM |
sparclinux : iso-8859-1 |
<?xml version="1.0" encoding="iso-8859-1"?> <topicMap id="sparclinux" xmlns = 'http://www.topicmaps.org/xtm/1.0/' xmlns:xlink = 'http://www.w3.org/1999/xlink'> |
Additionally, you may specify a particular encoding, like in the example above. If not provided, the encoding defaults to iso-8859-1.
Any local AsTMa implementation may also provide special commands or syntactical forms to control these aspects of your map. Please consult the appropriate documentation.
In the following we use grammars (extended BNFs) and regexps (regular expressions) to define the syntax of an AsTMa= instance.
As AsTMa= is line-oriented, we introduce eol as special terminal to symbolize the end of the line (or the end of the text stream, for that matter). AsTMa= processors use the start and the end of a line as a natural boundary of AsTMa= purported information. If for some reason, say readability, a line n has to be broken into several lines, the backslash character \ at the very end of a line n indicates that the followup line n+1 logically belongs to n.
Example:
bn: this line is too long, so\ we continue it on the next line
If we allow any number of white spaces (\s) to occur on a specific point in the grammar, the we symbolize this as ws*. If we require at least one white space to appear somewhere, then we use ws+ to denote this. Before and after terminals (keywords and special characters), white-spaces can be used freely while this is not made explicit in the grammar productions. Moreover, any text is stripped off white-spaces on its beginning and its end:
string --> [ ws* ] { character } [ ws* ]
In many cases topic identifiers will be used to refer to other topics. To underline what kind of topic is expected in specific cases (without enforcing this with an AsTMa= processor) the syntax uses typed non-terminals, like role-topic-id or type-topic-id.
For the sake of the following definition, it is assumed that the processor will convert an AsTMa= instance into an equivalent XTM instance, furtheron referred to as the virtual XTM instance. The following semantic definition of AsTMa= is based on this transformation and not on a Topic Map processing model which is developed at the time of writing.
A real AsTMa= processor is free to skip the transformation step and to directly load AsTMa= instances as long as this is equivalent regarding a Topic Map processing model.
An AsTMa= instance has the following structure:
instance --> { comment-section } name-encode-section { section }
section --> comment-section | topic-definition-section | association-definition-section
An AsTMa= processor has to honor the following processing constraints:
Any AsTMa= instance has to be named by a string matching [^\S]+ (anything not containing white spaces). It is up to the AsTMa= processor how this name will be utilized.
name-encode-section --> name [ ':' encoding ] eol
Example:
linux: iso-8859-1The instance will be called linux and the used encoding will be iso-8859-1.
A particular AsTMa= instance is a character string in a particular character encoding. The AsTMa= processor transparently passes all characters to the equivalent virtual XTM instance. To make this happen, an AsTMa= instance has to specify at the begining of the stream the appropriate character encoding. This encoding will also be the XML encoding used in the virtual XTM instance. The name will become the value of the ID attribute topic map declaration.
A global comment section is introduced by a # at the very beginning of a line. Any following text (i.e. starting with the 2nd column) is regarded to be comment text. Any directly subsequent line which also begins with a # is accumulated into the comment. A global comment section is terminated by the begin of some other top level section or an empty line.
It is up to the AsTMa= processor to process comments. When converting into XTM, the processor may create XML comment sections, when directly loading it for other purposes the comment may be discarded.
Example:
# warning from the information secretary # Linux is a communist conspiration against the free Microsoft worldThe comment contains two lines, the #s themselves are discarded.
comment-section --> { '#' string eol }A local comment is introduced by a # NOT at the beginning of the line AND following at least one white-space character (changed in Rev1.5). Local comments include ALL white-space characters before the '#' (changed in Rev1.5) and reach until the end of the current line. They are discarded by AsTMa= processors.
A topic definition section follows the syntax
topic-definition --> [ 'tid' ':' ] topic-id [ '(' list-of-type-topic-ids ')' ] [ 'reifies' string ] eol { ws* topic-characteristic eol }
The topic-id stands for the topic identifier and HAS to be provided. This identifier SHOULD follow the XML identifier rules although an AsTMa= processor might not directly police this.
The topic identifier itself has to be placed at the beginning of the line; if the optional list of identifiers inside '()' is specified, the topic identifiers mentioned therein are interpreted as types of that topic. The list entries are separated by one (or more) white spaces:
list-of-*-topic-ids --> topic-id { ws+ topic-id }
Example:
linux (operating-system-technology open-source-software) ...Defines a topic with id linux and with two types.
Every topic definition will result in a topic element in the virtual XTM instance. The topic-id will be used as value for the ID attribute. For every topic type an instanceOf element will be generated inside this topic containing a topicRef element with the xlink:href attribute having the value of the type id.
Optionally, it can be specified that a particular topic reifies a particular addressable resource. To unambiguously name this resource after the keyword 'reifies' a URI must be provided. The URI may point to an external resource, but also to another topic in the map. The AsTMa processor will generate a resourceRef element within the subjectIdentity element of that topic.
Any topic definition section can contain any number of topic characteristics. They have to directly follow the start of a topic definition section, the order is irrelevant, though. Every such characteristic is indicated by a (case sensitive) keyword, followed by scoping and/or typing information (if appropriate), a colon (:) and the value for this particular characteristic. To improve readibility, before the keyword white spaces are allowed.
Example:
linux (operating-system-technology open-source-software) bn: Linux Operating System oc (download): http://www.linux.org/Defines a topic with id linux with an explicit base name and a typed occurrence.
Following defaults are used for a topic in the case there is no explicit characteristic defined:
All topic characteristics result in the generation of an appropriate XTM element inside the current topic element of the virtual XTM instance:
topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic | subject-identity
A (scoped) base name can be defined via
basename-characteristic --> 'bn' [ '@' list-of-scope-topic-ids ] ':' stringThe string will be used 'as-is' for the text content of the baseNameString element inside baseName element of the virtual XTM instance. If a list of scope topic ids exists, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside above baseName element.
A (scoped and/or typed) resourceRef occurrence is defined with
resourceRef-characteristic --> 'oc' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' stringThe string SHOULD be a valid URI. An AsTMa= processor might perform checks on this URL.
For each resource occurrence, in the virtual XTM instance a resourceRef element will be created with the string being used as xlink:href attribute. That element will be nested inside a occurrence element inside the current topic element.
If a list of scope topic ids is provided, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.
If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.
A (scoped and/or typed) resourceData occurrence is defined with
resourceData-characteristic --> 'in' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' stringThe string will be used 'as-is' as text inside a resourceData element inside an occurrence element inside the current topic element.
If a list of scope topic ids is provided, then thes ids will be used as value each for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.
If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.
A subject identity is defined by
subject-identity --> 'sin' ':' stringIf the string follows the syntax of an URI and contains a URI scheme, this string will be used as subject indicator reference as the value for the attribute xlink:href for a subjectIndicatorRef element inside a subjectIdentity element inside the current topic.
Otherwise, AsTMa assumes this string to be a topic reference and will add a topicRef element instead creating a relative link with that string.
An association definition section follows the syntax
association-definition --> [ '@' list-of-scope-topic-ids ] '(' [ association-type-topic-id ] ')' eol { ws* association-member eol }
Example:
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.xThis defines an association of type kernel-patch-provides-feature with three roles feature, platform and patch.
If a list of scope topic ids exists, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside the association element.
At most one topic identifier can occur as association type. If not provided the XTM default will be assumed.
Association members are defined line-by-line according to
association-member --> role-topic-id ':' { member-topic-id }
In the virtual XTM instance for every association definition in AsTMa=, an association element will be created inside the current topic map. Every member will result in the creation of a member element inside this association element.
The role-topic-id will be used as the value of the xlink:href attribute of an topicRef element inside a roleSpec element inside the member.
Every single member-topic-id is used as the value of the xlink:href attribute of a topicRef element inside the member.
AsTMa= allows processors to define any number of directives. The only (syntactic) constraint the language imposes is, that such an directive may only occupy exactly one line that has to begin with the character '%'.
The precise semantics of each such directive is completely under the discretion of the AsTMa processor. Suggestions for directives are:
Following unresolved issues exist:
Experiences with inexperienced authors have shown that topic maps written in AsTMa= are usually much richer and less errorneous than maps in XTM. Obviously, these authors are less distracted by the XML code and can concentrate more on productive issues.
instance | --> | { comment-section } name-encode-section { section } | |
comment-section | --> | { '#' string eol } | |
name-encode-section | --> | name [ ':' encoding ] eol | |
section | --> | comment-section | topic-definition-section | association-definition-section | |
topic-definition | --> | [ 'tid' ':' ] topic-id [ '(' list-of-type-topic-ids ')' ] [ 'reifies' string ] eol { ws* topic-characteristic eol } | |
topic-characteristic | --> | basename-characteristic | resourceRef-characteristic | resourceData-characteristic | subject-identity | |
basename-characteristic | --> | 'bn' [ '@' list-of-scope-topic-ids ] ':' string | |
resourceRef-characteristic | --> | 'oc' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string | |
resourceData-characteristic | --> | 'in' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string | |
subject-identity | --> | 'sin' ':' string | |
association-definition | --> | '(' [ association-type-topic-id ] ')' eol { ws* association-member eol } | |
association-member | --> | role-topic-id ':' { member-topic-id } | |
list-of-*-topic-ids | --> | topic-id { ws+ topic-id } | |
string | --> | [ ws* ] { character } [ ws* ] | |
name | --> | ... any identifier not containing blanks |