Robert Barta Bond University
rho@bond.edu.au Copyright © 2002 Robert Barta
AsTMa= is part of the AsTMa language family which was designed to facilitate authoring, but also constraining and querying topic maps. This document provides a formal language definition based on a syntax and mapping rules how particular AsTMa= language constructs are mapped into an XTM equivalent.
This document has no formal status. It is a technical report of Bond University.
In the following we use grammars (extended BNFs) and regexps (regular expressions) to define the syntax of an AsTMa= instance.
An AsTMa= instance is any (linear) text which follows the rules given in this specification. Such a text can reside in a file on a file system or can also be created on the fly from an external data source. As AsTMa= is line-oriented, we introduce eol as special terminal to symbolize the end of the line (or the end of the text stream, for that matter). AsTMa= processors use the start and the end of a line as a natural boundary of pertained Topic Map information. If for some reason, say readability, a line n has to be broken into several lines, the backslash character \ at the very end of a line n indicates that the followup line n+1 extends line n.
Example: Lines will be collapsed into one
bn: this line is too long, so\ we continue it on the next line
If we allow any number of white spaces to occur on a specific point in the grammar, the we symbolize this as ws*. If we require at least one white space to appear somewhere, then we use ws+ to denote this. Before and after terminals (keywords and special characters), white-spaces can be used freely while this is not made explicit in the grammar productions to keep the grammar readable. Moreover, any text is stripped off white-spaces on its beginning and its end:
string --> [ ws* ] { any character } [ ws* ] eol
In many cases topic identifiers will be used to refer to other topics. To underline what kind of topic is expected in specific cases (without necessarily enforcing this with an AsTMa= processor) the syntax uses typed non-terminals, like role-topic-id or type-topic-id.
An AsTMa= processor is any program which is capable to parse and process an AsTMa= instance. For the sake of the following definition it is assumed that the processor will convert an AsTMa= instance into an equivalent XTM instance, henceforth referred to as the virtual XTM instance. The following semantic definition of AsTMa= is based on this transformation and not on Topic Map processing models ( SAM, TMPM) which are developed at the time of writing.
A AsTMa= processor is free to skip the transformation step and to directly load AsTMa= instances as long as this results in the same semantics as the virtual XTM instance.
An AsTMa= instance has the following structure:
instance --> { comment } name-encode { section }
section --> comment | topic-definition | association-definition
An AsTMa= processor has to honor the following processing constraints:
A particular AsTMa= instance is a character string in a particular character encoding. The AsTMa= processor transparently passes all characters to the equivalent virtual XTM instance. To make this happen, an AsTMa= instance CAN specify at the begining of the text stream the appropriate character encoding. This encoding will also be the XML encoding used in the virtual XTM instance. Any AsTMa= instance CAN be named by a string matching [^\S]+ (anything not containing white spaces). It is up to the AsTMa= processor how this name will be utilized. The name MAY become the value of the ID attribute of the top-level topicMap element.
name-encode --> name [ ':' encoding ] eol
Example: The following AsTMa= instance will be named linux and will assume any text to be encoded in iso-8859-1.
linux: iso-8859-1
It is up to an AsTMa= processor how to process comments. When converting into XTM, the processor MAY create XML comment sections. When directly loading it any comment may also be discarded.
comment --> { '#' string eol }
AsTMa= supports global and local comments.
A global comment section is introduced by a # at the very beginning of a line (column 1). Any following text (i.e. starting with the 2nd column) is regarded to be comment text. Any directly subsequent line(s) which also begin(s) with a # is(are) accumulated into the comment. A global comment section is terminated by the begin of some other top level section or an empty line.
Example: The following comment contains 3 lines, the '#'s themselves are discarded.
# warning from the information secretary # Linux is a communist conspiration # against the free Microsoft world
A local comment is introduced by a # NOT at the beginning of the line AND following at least one white-space character (changed starting with Rev1.5). Local comments include ALL white-space characters before the '#' (changed starting with Rev1.5) and reach until the end of the current line.
A topic definition section follows the syntax
topic-definition --> [ 'tid' ':' ] topic-id [ '(' list-of-type-topic-ids ')' ] [ 'reifies' string ] { 'is-reified-by' uri } eol { ws* topic-characteristic eol }
The topic-id stands for the topic identifier and HAS to be provided. This identifier SHOULD follow the XML identifier rules although an AsTMa= processor may not directly police this.
The topic identifier itself has to be placed at the beginning of the line; if the optional list of type identifiers is specified, these topic identifiers mentioned therein are interpreted as types of that topic. The list entries are separated by one (or more) white spaces:
list-of-*-topic-ids --> topic-id { ws+ topic-id }
Example: A topic with id linux and with two types.
linux (operating-system-technology open-source-software) ...
Every topic definition will result in a topic element in the virtual XTM instance. The topic-id will be used as value for the ID attribute. For every topic type an instanceOf element will be generated inside this topic containing a topicRef element with the xlink:href attribute having the value of the type id.
Optionally, it can be specified that a particular topic reifies a particular addressable resource. To unambiguously name this resource after the keyword 'reifies' a URI must be provided. The URI may point to an external resource, but also to another topic in the map. The AsTMa= processor will generate a resourceRef element within the subjectIdentity element of that topic.
Example: A topic reifies a subject which is an addressable resource.
linux-org (web-site) reifies http://www.linux.org/ ...
Additionally, any number of other topics can be named which reify the current topic. The effect is similar to the above except that all these topics will be generated with the subjectIdentity defined within them (and not in the current topic).
Example: A topic is reified by another topic.
linux-org (web-site) is-reified-by kernel-org ...
Any topic definition section can include any number of topic characteristics, the order itself is irrelevant. Every such characteristic is indicated by a (case sensitive) keyword, followed by scoping and/or typing information (if appropriate), a colon (:) and the value for this particular characteristic. To improve readibility, before the keyword white spaces are allowed.
Example: A topic with id linux with an explicit base name and a typed occurrence.
linux (operating-system-technology open-source-software) bn: Linux Operating System oc (download): http://www.linux.org/
Following defaults are used for a topic in the case there is no explicit characteristic defined:
All topic characteristics result in the generation of an appropriate XTM element inside the current topic element of the virtual XTM instance:
topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic | subject-identity
A (scoped) base name can be defined via
basename-characteristic --> 'bn' [ '@' list-of-scope-topic-ids ] ':' stringThe string will---after being stripped off leading and trailing blanks---be used for the text content of the baseNameString element inside baseName element of the virtual XTM instance. If a list of scope topic ids exists, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside above baseName element.
A (scoped and/or typed) resourceRef occurrence is defined with
resourceRef-characteristic --> 'oc' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' uriThe uri SHOULD be a valid URI. An AsTMa= processor MAY perform checks on this URL.
For each resource occurrence, in the virtual XTM instance a resourceRef element will be created with the uri being used as xlink:href attribute. That element will be nested inside a occurrence element inside the current topic element.
If a list of scope topic ids is provided, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.
If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.
A (scoped and/or typed) resourceData occurrence is defined with
resourceData-characteristic --> 'in' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' stringThe string will---after being stripped off leading and trailing blanks---be used as text inside a resourceData element inside an occurrence element inside the current topic element.
If a list of scope topic ids is provided, then these ids will be used as value each for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.
If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.
A subject identity is defined by
subject-identity --> 'sin' ':' uriThis uri will be used as subject indicator reference as the value for the attribute xlink:href for a subjectIndicatorRef element inside a subjectIdentity element inside the current topic.
An association definition section follows the syntax
association-definition --> [ '@' scope-topic-id ] '(' [ association-type-topic-id ] ')' { 'is-reified-by' uri } eol { ws* association-member eol }
Example: An association of type kernel-patch-provides-feature with three roles feature, platform and patch and players, respectively.
(kernel-patch-provides-feature) feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.x
If a list of scope topic ids exists, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside the association element.
At most one topic identifier can occur as association type. If no type is specified the XTM default will be assumed.
Association members are defined line-by-line according to
association-member --> role-topic-id ':' { member-topic-id }
In the virtual XTM instance for every association definition in the AsTMa= instance an association element will be created inside the current topic map. Every member will result in the creation of a member element inside this association element.
The role-topic-id will be used as the value of the xlink:href attribute of an topicRef element inside a roleSpec element inside the member.
Every single member-topic-id is used as the value of the xlink:href attribute of a topicRef element inside the member.
Associations can be reified in the same way as topics.
Example: Above association is reified as statement-1
(kernel-patch-provides-feature) is-reified-by statement-1 feature: reiserfs platform: i386 patch: generic-reiserfs-patch-2.4.x
AsTMa= allows processors to define any number of directives. The only (syntactic) constraint the language imposes is that such an directive may only occupy exactly one line which has to begin with the character '%'.
The precise semantics of each such directive is completely under the discretion of the AsTMa= processor. Suggestions for directives are:
Following unresolved issues exist:
instance | --> | { comment } name-encode { section } | |
comment | --> | { '#' string eol } | |
name-encode | --> | name [ ':' encoding ] eol | |
section | --> | comment | topic-definition | association-definition | |
topic-definition | --> | [ 'tid' ':' ] topic-id [ '(' list-of-type-topic-ids ')' ] [ 'reifies' string ] eol { ws* topic-characteristic eol } | |
topic-characteristic | --> | basename-characteristic | resourceRef-characteristic | resourceData-characteristic | subject-identity | |
basename-characteristic | --> | 'bn' [ '@' list-of-scope-topic-ids ] ':' string | |
resourceRef-characteristic | --> | 'oc' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string | |
resourceData-characteristic | --> | 'in' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string | |
subject-identity | --> | 'sin' ':' string | must be a valid URI |
association-definition | --> | [ '@' scope-topic-id ] '(' [ association-type-topic-id ] ')' eol { ws* association-member eol } | at least one member |
association-member | --> | role-topic-id ':' { member-topic-id } | at least one player |
list-of-any-topic-ids | --> | topic-id { ws+ topic-id } | |
any-topic-id | --> | topic-id | |
string | --> | [ ws* ] { character } [ ws* ] | stripped of blanks |
name | --> | /\w+/ in regexp |