AsTMa= (Asymptotic Topic Map Notation, Authoring)

Robert Barta Bond University

rho@bond.edu.au

Copyright © 2002 Robert Barta

AsTMa= is part of the AsTMa language family which was designed to facilitate authoring, but also constraining and querying topic maps. After an introduction and motivation, we present the core concepts in a short tutorial, before turning to formal language specification.

This document has no formal status. It is a technical report of Bond University.

v1.0, 2002-06-02, Revision 1.7, Draft

Introduction

Since the stabilisation of XTM, an XML-based notation for Topic Maps, the interest in authoring Topic Maps has increased.

While the automatic generation of topic maps from backend databases into XTM can easily be achieved, manual authoring is tedious and error-prone. One option is to use XML aware development tools, such as XML-editors. While feasible, generic XML editors offer little help above syntactical conformance. Another option is to use integrated development environments for Topic Maps (server or client-side) as they appear in the market.

AsTMa= is a linear, textual notation for Topic Map information. The motivation was to create an authoring shorthand notation in contrast to XTM which is mainly suitable for interchange purposes. AsTMa= has the following design objectives:

AsTMa design objectives

Minimum of effort:
A converter should be able to interpret the intention of the author in a specific context reducing the verbosity of the language.
Minimal use of special characters and keywords:
Banning of [(&^%$}] delimiters should increase the usability of the language. This also reduces the need to escape these special characters once they belong to the information.
Asymptotic regarding to XTM:
The language should not have a built-in syntax-soound-barrier making it impossible to reach the same expressiveness as XTM.
Keep things together:
The author should not be forced to split up (topic) information into several fragments, which have to be merged via TNC by a followup Topic Map processing stage.

At this stage AsTMa= does not fulfill all of the above objectives. As outlined in more detail in the section about conformance, AsTMa= is (yet) not as expressive as XTM, that not having been a prime concern. Still, AsTMa= is sufficiently rich to prototype medium sized topic maps.


Tutorial

The following setting assumes that the AsTMa= text will be either directly understood by a particular Topic Map processing software or that a specialized processor will first convert the AsTMa= text stream into an XTM stream.

Basics

AsTMa= is line oriented. This means that pertinent information is terminated with the end of the line. A single line containing

   filesystem (software)
already defines a topic (as explained below). If there is more to a topic (or an association) this information will be on follow-up lines:
   filesystem (software)
   bn: File System

An empty line, thus, separates items like topics and associations. On any line white-spaces can be used before and after keywords and special characters. They are silently ignored.

Any line also can contain comment introduced by the character '#' (following a white-space character):

   filesystem (software) # more information will follow
Such comments will be discarded by any processor and are only for internal documentation purposes.

Please note that (starting from Rev1.5) such a comment must have at least one blank before the '#'. This allows the hasselfree notation of URIs containing a '#' avoiding that the XPointer part will be interpreted as comment. The blanks between the text and the '#' are ignored.

If you would like to have a comment in the processor output, then this comment MUST begin at the start of a line:

AsTMa=XTM
# I will survive and (hopefully) 
#     the line structure will not
#        be broken
<!--  I will survive and (hopefully)
          the line structure will not
             be broken -->

Comments on consecutive lines will be treated as one comment. Any non-comment line signals the end of such a group. Also, any '-->' occurrence within a comment will be converted into '--_ >' (one underscore character) to avoid problems in the resulting XML code.

Topics

The line

   filesystem (software)
indicates the definition of topic with id filesystem which is an instance of another topic, software:

AsTMa=XTM
  filesystem (software)
<topic id="filesystem">
   <instanceOf>
     <topicRef xlink:href="#software"/>
   </instanceOf>
   <baseName>
     <baseNameString>filesystem</baseNameString>
   </baseName>
</topic>

As we did not provide a base name, the topic id filesystem is also assumed to be the topic's basename. While this heuristic approach works fine for some words, it does not well with others, say,

   linux-distribution (software)

Any AsTMa= processor is free to apply any other heuristics, such as:

AsTMa=XTM
   linux-distribution (software)
<topic id="linux-distribution">
  <instanceOf>
    <topicRef xlink:href="#software"/>
  </instanceOf>
  <baseName>
     <baseNameString>linux distribution</baseNameString>
  </baseName>
</topic>

substituting dashes by blanks, looking up 3rd-party databases or leaving it as it is. Of course, the author can enforce a particular base name:

AsTMa=XTM
   RedHat-Linux-sparc (linux-distribution-port)
   bn: RedHat Linux for SPARC
<topic id="RedHat-Linux-sparc">
  <instanceOf>
    <topicRef 
      xlink:href="#linux-distribution-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>RedHat Linux 
              for SPARC</baseNameString>
  </baseName>
</topic>

On a similar take, you can also add occurrences for topics:

AsTMa=XTM
  linux (os)
  bn: Linux kernel
  oc: http://www.kernel.org/
<topic id="linux">
  <instanceOf>
    <topicRef xlink:href="#os"/>
  </instanceOf>
  <baseName>
     <baseNameString>Linux kernel</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http://www.kernel.org/"/>
  </occurrence>
</topic>

in the case for resource references or also for inline data (XTM resourceData):

AsTMa=XTM
  linux-port-on-sparc (linux-port)
  bn: SPARC Linux port
  oc: http://www.sparc.org/linux.shtml
  in: The kernel and kernel modules \
      are 64-bit on sparc64, \
      userland is still 32-bit, \
      and in fact the same as on sparc32.
<topic id="linux-port-on-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>SPARC Linux 
                               port</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http:....linux.shtml"/>
  </occurrence>
  <occurrence>
    <resourceData>The kernel ...</resourceData>
</occurrence>
</topic>

Types and Scopes

If appropriate, you can also type topic characteristics:

AsTMa=XTM
reiserfs (filesystem)
bn: Reiser File System, ReiserFS
oc (download): \
     http://www.namesys.com/download.html
<topic id="reiserfs">
  <instanceOf>
    <topicRef xlink:href="#filesystem"/>
  </instanceOf>
  <baseName>
     <baseNameString>Reiser ....</baseNameString>
  </baseName>
  <occurrence>
    <instanceOf>
       <topicRef xlink:href="#download"/>
    </instanceOf>
    <resourceRef xlink:href="http:...download.html"/>
  </occurrence>
</topic>

To scope a characteristic you use '@' to introduce a particular context:

AsTMa=XTM
RedHat-Linux-sparc (linux-distribution-port)
bn: RedHat Linux for SPARC
bn @ deutsch : RedHat Linux f&uuml;r SPARC
<topic id="RedHat-Linux-sparc">
  <instanceOf>
    <topicRef
        xlink:href="#linux-distribution-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>
        RedHat Linux for SPARC
     </baseNameString>
  </baseName>
  <baseName>
     <scope>
        <topicRef xlink:href="#deutsch"/>
     </scope>
     <baseNameString>
        RedHat Linux f&uuml;r SPARC
     </baseNameString>
  </baseName>
</topic>

Associations

Associations may or may not have a particular association type. This topic type is provided inside a () pair.

(kernel-patch-provides-feature)
...
If the association has no explicit type, it can be omitted, by writing ().

Associations also have a number of members playing roles:
AsTMa=XTM
(kernel-patch-provides-feature)
feature: reiserfs
platform: i386
patch:   generic-reiserfs-patch-2.4.x
<association>
  <instanceOf>
    <topicRef xlink:href="#kernel-patch-provides-feature"/>
  </instanceOf>
  <member>
     <roleSpec>
       <topicRef xlink:href="#feature"/>
     </roleSpec>
     <topicRef xlink:href="#reiserfs"/>
  </member>
  <member>
     <roleSpec>
       <topicRef xlink:href="#platform"/>
     </roleSpec>
     <topicRef xlink:href="#i386"/>
  </member>
  <member>
     <roleSpec>
       <topicRef xlink:href="#patch"/>
     </roleSpec>
     <topicRef xlink:href="#generic-reiserfs-patch-2.4.x"/>
  </member>
</association>

For better readability you may want to indent the roles

  (kernel-patch-provides-feature)
      feature: reiserfs
      platform: i386
      patch:   generic-reiserfs-patch-2.4.x

Identification and Reification

Topics are said to reify subjects. Either a topic in a topic map is a representant of the subject if the subject itself is not directly addressable. Or, if it is then the topic can directly and unambiguously the subject via a URI.

In case a subject indicator (a not necessarily unique identification for a particular subject) is known, it can be provided via sin:
AsTMa=XTM
linux (os)
bn: Linux kernel
oc: http://www.kernel.org/
sin: http://dmoz.org/.../Linux/
<topic id="linux">
  <instanceOf>
    <topicRef xlink:href="#os"/>
  </instanceOf>
  <baseName>
     <baseNameString>Linux kernel</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http://www.kernel.org/"/>
  </occurrence>
  <subjectIdentity>
     <subjectIndicatorRef 
         xlink:href="http://dmoz.org/.../Linux/"/>
  </subjectIdentity>
</topic>
Several such subject indicators can be provided for a single topic. If the indicator string provided contains a URI scheme, then AsTMa assumes a reference to an (external) subject indicator. Otherwise, AsTMa will assume this to be a reference to a local topic (topicRef):
AsTMa=XTM
linux (os)
bn: Linux kernel
...
sin: http://dmoz.org/.../Linux/
sin: linux-os
<topic id="linux">
  <instanceOf>
    <topicRef xlink:href="#os"/>
  </instanceOf>
  <baseName>
     <baseNameString>Linux kernel</baseNameString>
  </baseName>
  ...
  <subjectIdentity>
     <subjectIndicatorRef 
         xlink:href="http://dmoz.org/.../Linux/"/>
     <topicRef 
         xlink:href="#linux-os"/>
  </subjectIdentity>
</topic>
It is clear from the syntax that only a single such resource reference can be specified.

In the case where the topic can be unambiguously be linked to the subject in question, we can use AsTMa's reify clause:

AsTMa=XTM
linux-kernel-site (web-site) reifies http://www.linux.org/
bn: Linux kernel Site
...
<topic id="linux">
  <instanceOf>
    <topicRef xlink:href="#web-site"/>
  </instanceOf>
  <baseName>
     <baseNameString>Linux kernel Site</baseNameString>
  </baseName>
  ...
  <subjectIdentity>
     <resourceRef 
         xlink:href="http://www.linux.org/"/>
  </subjectIdentity>
</topic>
The subject provided can be external (by providing a full URI) or can also be a local resource, such as another topic or an association.

Topic Maps

There is no special format or syntax for a AsTMa Topic Map instance. All text blocks within the document are regarded to be part of the map.

Optionally you can control the name (id) of the Topic Map. This, though, might have only relevance to your local topic map processor, so there is not counterpart of this in XTM. If so, then the very first non-empty line within the document MUST provide this name (identifier) of the topic map itself:
AsTMa=XTM
sparclinux : iso-8859-1
<?xml version="1.0" encoding="iso-8859-1"?>
<topicMap id="sparclinux"
          xmlns       = 'http://www.topicmaps.org/xtm/1.0/'
          xmlns:xlink = 'http://www.w3.org/1999/xlink'>

Additionally, you may specify a particular encoding, like in the example above. If not provided, the encoding defaults to iso-8859-1.

Any local AsTMa implementation may also provide special commands or syntactical forms to control these aspects of your map. Please consult the appropriate documentation.


Language Specification

General Syntax

AsTMa= instance
An AsTMa= instance is any (linear) text which follows the rules given in this specification. Such a text can reside in a file on a file system or can also be created on the fly from an external data source.

In the following we use grammars (extended BNFs) and regexps (regular expressions) to define the syntax of an AsTMa= instance.

As AsTMa= is line-oriented, we introduce eol as special terminal to symbolize the end of the line (or the end of the text stream, for that matter). AsTMa= processors use the start and the end of a line as a natural boundary of AsTMa= purported information. If for some reason, say readability, a line n has to be broken into several lines, the backslash character \ at the very end of a line n indicates that the followup line n+1 logically belongs to n.

Example:

bn: this line is too long, so\
    we continue it on the next line

If we allow any number of white spaces (\s) to occur on a specific point in the grammar, the we symbolize this as ws*. If we require at least one white space to appear somewhere, then we use ws+ to denote this. Before and after terminals (keywords and special characters), white-spaces can be used freely while this is not made explicit in the grammar productions. Moreover, any text is stripped off white-spaces on its beginning and its end:

string --> [ ws* ] { character } [ ws* ]

In many cases topic identifiers will be used to refer to other topics. To underline what kind of topic is expected in specific cases (without enforcing this with an AsTMa= processor) the syntax uses typed non-terminals, like role-topic-id or type-topic-id.

General Semantics

AsTMa= processor
An AsTMa= processor is any program which is capable to parse and process an AsTMa= instance.

For the sake of the following definition, it is assumed that the processor will convert an AsTMa= instance into an equivalent XTM instance, furtheron referred to as the virtual XTM instance. The following semantic definition of AsTMa= is based on this transformation and not on a Topic Map processing model which is developed at the time of writing.

A real AsTMa= processor is free to skip the transformation step and to directly load AsTMa= instances as long as this is equivalent regarding a Topic Map processing model.

Overall Structure

An AsTMa= instance has the following structure:

instance --> { comment-section } name-encode-section { section }
section --> comment-section | topic-definition-section | association-definition-section

An AsTMa= processor has to honor the following processing constraints:

Naming and Encoding

Any AsTMa= instance has to be named by a string matching [^\S]+ (anything not containing white spaces). It is up to the AsTMa= processor how this name will be utilized.

name-encode-section --> name [ ':' encoding ] eol

Example:

linux: iso-8859-1
The instance will be called linux and the used encoding will be iso-8859-1.

A particular AsTMa= instance is a character string in a particular character encoding. The AsTMa= processor transparently passes all characters to the equivalent virtual XTM instance. To make this happen, an AsTMa= instance has to specify at the begining of the stream the appropriate character encoding. This encoding will also be the XML encoding used in the virtual XTM instance. The name will become the value of the ID attribute topic map declaration.

Comments

A global comment section is introduced by a # at the very beginning of a line. Any following text (i.e. starting with the 2nd column) is regarded to be comment text. Any directly subsequent line which also begins with a # is accumulated into the comment. A global comment section is terminated by the begin of some other top level section or an empty line.

It is up to the AsTMa= processor to process comments. When converting into XTM, the processor may create XML comment sections, when directly loading it for other purposes the comment may be discarded.

Example:

# warning from the information secretary
# Linux is a communist conspiration against the free Microsoft world
The comment contains two lines, the #s themselves are discarded.

comment-section --> { '#' string eol }
A local comment is introduced by a # NOT at the beginning of the line AND following at least one white-space character (changed in Rev1.5). Local comments include ALL white-space characters before the '#' (changed in Rev1.5) and reach until the end of the current line. They are discarded by AsTMa= processors.

Topics

A topic definition section follows the syntax

topic-definition --> [ 'tid' ':' ] topic-id [ '(' list-of-type-topic-ids ')' ] [ 'reifies' string ] eol { ws* topic-characteristic eol }

The topic-id stands for the topic identifier and HAS to be provided. This identifier SHOULD follow the XML identifier rules although an AsTMa= processor might not directly police this.

The topic identifier itself has to be placed at the beginning of the line; if the optional list of identifiers inside '()' is specified, the topic identifiers mentioned therein are interpreted as types of that topic. The list entries are separated by one (or more) white spaces:

list-of-*-topic-ids --> topic-id { ws+ topic-id }

Example:

linux (operating-system-technology open-source-software)
...
Defines a topic with id linux and with two types.

Every topic definition will result in a topic element in the virtual XTM instance. The topic-id will be used as value for the ID attribute. For every topic type an instanceOf element will be generated inside this topic containing a topicRef element with the xlink:href attribute having the value of the type id.

Optionally, it can be specified that a particular topic reifies a particular addressable resource. To unambiguously name this resource after the keyword 'reifies' a URI must be provided. The URI may point to an external resource, but also to another topic in the map. The AsTMa processor will generate a resourceRef element within the subjectIdentity element of that topic.

Topic characteristics

Any topic definition section can contain any number of topic characteristics. They have to directly follow the start of a topic definition section, the order is irrelevant, though. Every such characteristic is indicated by a (case sensitive) keyword, followed by scoping and/or typing information (if appropriate), a colon (:) and the value for this particular characteristic. To improve readibility, before the keyword white spaces are allowed.

Example:

linux (operating-system-technology open-source-software)
  bn: Linux Operating System
  oc (download): http://www.linux.org/
Defines a topic with id linux with an explicit base name and a typed occurrence.

Following defaults are used for a topic in the case there is no explicit characteristic defined:

All topic characteristics result in the generation of an appropriate XTM element inside the current topic element of the virtual XTM instance:

topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic | subject-identity
baseName

A (scoped) base name can be defined via

basename-characteristic --> 'bn' [ '@' list-of-scope-topic-ids ] ':' string
The string will be used 'as-is' for the text content of the baseNameString element inside baseName element of the virtual XTM instance. If a list of scope topic ids exists, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside above baseName element.

resourceRef occurrence

A (scoped and/or typed) resourceRef occurrence is defined with

resourceRef-characteristic --> 'oc' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string
The string SHOULD be a valid URI. An AsTMa= processor might perform checks on this URL.

For each resource occurrence, in the virtual XTM instance a resourceRef element will be created with the string being used as xlink:href attribute. That element will be nested inside a occurrence element inside the current topic element.

If a list of scope topic ids is provided, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.

If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.

resourceData occurrence

A (scoped and/or typed) resourceData occurrence is defined with

resourceData-characteristic --> 'in' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string
The string will be used 'as-is' as text inside a resourceData element inside an occurrence element inside the current topic element.

If a list of scope topic ids is provided, then thes ids will be used as value each for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.

If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.

subject identity

A subject identity is defined by

subject-identity --> 'sin' ':' string
If the string follows the syntax of an URI and contains a URI scheme, this string will be used as subject indicator reference as the value for the attribute xlink:href for a subjectIndicatorRef element inside a subjectIdentity element inside the current topic.

Otherwise, AsTMa assumes this string to be a topic reference and will add a topicRef element instead creating a relative link with that string.

Associations

An association definition section follows the syntax

association-definition --> [ '@' list-of-scope-topic-ids ] '(' [ association-type-topic-id ] ')' eol { ws* association-member eol }

Example:

(kernel-patch-provides-feature)
   feature:  reiserfs
   platform: i386
   patch:    generic-reiserfs-patch-2.4.x
This defines an association of type kernel-patch-provides-feature with three roles feature, platform and patch.

If a list of scope topic ids exists, then these ids will be used as values each for the attribute xlink:href for a topicRef element inside a scope element inside the association element.

At most one topic identifier can occur as association type. If not provided the XTM default will be assumed.

Association members are defined line-by-line according to

association-member --> role-topic-id ':' { member-topic-id }

In the virtual XTM instance for every association definition in AsTMa=, an association element will be created inside the current topic map. Every member will result in the creation of a member element inside this association element.

The role-topic-id will be used as the value of the xlink:href attribute of an topicRef element inside a roleSpec element inside the member.

Every single member-topic-id is used as the value of the xlink:href attribute of a topicRef element inside the member.

Directives

AsTMa= allows processors to define any number of directives. The only (syntactic) constraint the language imposes is, that such an directive may only occupy exactly one line that has to begin with the character '%'.

The precise semantics of each such directive is completely under the discretion of the AsTMa processor. Suggestions for directives are:

Consult the processor's documentation.


Language Issues

Completeness
The language is far from being complete relative to the expressiveness of XTM. Current limitations are:
Border cases
There are - due to the declarativeness of the language - some cases in which AsTMa= processors may produce non-intuitive results, depending on the implementation approach taken.
Relationship to LTM
The existence of LTM itself has shown that there is definitely a need for a textual notation for authors who are used to the expressiveness of textual user interfaces. Following differences have been identified:

Open Issues

Following unresolved issues exist:


Conclusions

Experiences with inexperienced authors have shown that topic maps written in AsTMa= are usually much richer and less errorneous than maps in XTM. Obviously, these authors are less distracted by the XML code and can concentrate more on productive issues.


Credits


Appendix: Changes


Appendix: Syntax

AsTMa= Syntax
instance-->{ comment-section } name-encode-section { section }
comment-section-->{ '#' string eol }
name-encode-section-->name [ ':' encoding ] eol
section-->comment-section |
topic-definition-section |
association-definition-section
topic-definition-->[ 'tid' ':' ] topic-id [ '(' list-of-type-topic-ids ')' ] [ 'reifies' string ] eol { ws* topic-characteristic eol }
topic-characteristic-->basename-characteristic |
resourceRef-characteristic |
resourceData-characteristic |
subject-identity
basename-characteristic-->'bn' [ '@' list-of-scope-topic-ids ] ':' string
resourceRef-characteristic-->'oc' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string
resourceData-characteristic-->'in' [ '@' list-of-scope-topic-ids ] [ '(' type-topic-id ')' ]':' string
subject-identity-->'sin' ':' string
association-definition-->'(' [ association-type-topic-id ] ')' eol { ws* association-member eol }
association-member-->role-topic-id ':' { member-topic-id }
list-of-*-topic-ids-->topic-id { ws+ topic-id }
string-->[ ws* ] { character } [ ws* ]
name-->... any identifier not containing blanks