AsTMa, Asymptotic Topic Map Notation

v1.0, 2001-11-26
Revision 1.3

Robert (\rho) Barta
Bond University

Changes

Contents

0. Overview

This document has no formal status. It is a technical report of the local university.

After an introduction and motivation, we present the core concepts in a short tutorial, before turning to formal language specification.

You can find the running example used throughout the tutorial at the Bond Topicmap Server. You might want to peruse an online converter.

1. Introduction

Since the stabilisation of XTM, an XML-based notation for Topic Maps, the interest in authoring Topic Maps has increased.

While the automatic generation of topic maps from backend databases into XTM can easily be achieved, manual authoring is tedious and error-prone. One option is to use XML aware development tools, such as XML-editors. While feasible, generic XML editors offer little help above syntactical conformance. Another option is to use integrated development environments for Topic Maps (server or client-side) as they appear in the market.

AsTMa is a linear, textual notation for Topic Map information. The motivation was to create an authoring shorthand notation in contrast to XTM which is suitable for interchange purposes. AsTMa has the following design objectives:

Minimum of effort:
A converter should be able to interpret the intention of the author in a specific context reducing the verbosity of the language.
Minimal use of special characters and keywords:
Banning of [(&^%$}] delimiters should increase the usability of the language. This also reduces the need to escape these special characters once they belong to the information.
Asymptotic regarding to XTM:
The language should not have a built-in syntax-soound-barrier making it impossible to reach the same expressiveness as XTM.
Keep things together:
The author should not be forced to split up (topic) information into several fragments, which have to be merged via TNC by a followup Topic Map processing stage.

At this stage AsTMa does not fulfill all of the above objectives. As outlined in more detail in the section about conformance, AsTMa is (yet) not as expressive as XTM, that not having been a prime concern. Still, AsTMa is sufficiently rich to prototype medium sized topic maps.

2. Tutorial

The following setting assumes that the AsTMa text will be either directly understood by a particular Topic Map processing software or that a specialized processor will first convert the AsTMa text stream into an XTM stream.

2.1 Basics

AsTMa is line oriented. This means that pertinent information is terminated with the end of the line. A single line containing
   filesystem (software)

already defines a topic (as explained below). If there is more to a topic (or an association) this information will be on follow-up lines:

   filesystem (software)
   bn: File System

An empty line, thus, separates items like topic and associations. On any line white-spaces can be used before and after keywords and special characters. They are silently ignored.

Any line also can contain comment like

   filesystem (software) # more information will follow

Such comments will be discarded by any processor and are only for internal documentation purposes. If you would like to have a comment in the processor output, then this comment MUST begin at the start of a line:

AsTMaXTM
# I will survive and (hopefully) 
#     the line structure will not
#        be broken
<!--  I will survive and (hopefully)
          the line structure will not
             be broken -->

Comments on consecutive lines will be treated as one comment. Any non-comment line signals the end of such a group. Also, any '-->' occurrence within a comment will be converted into '--_ >', interspersing one underscore character

2.2 Topics

The line
   filesystem (software)

indicates the definition of topic with id filesystem which is an instance of another topic, software:

AsTMaXTM
filesystem (software)
<topic id="filesystem">
   <instanceOf>
     <topicRef xlink:href="#software"/>
   </instanceOf>
   <baseName>
     <baseNameString>filesystem</baseNameString>
   </baseName>
</topic>

As we did not provide a base name, the topic id filesystem is also assumed to be the topic's basename. While this heuristic approach works fine for some words, it does not well with others, say,

   linux-distribution (software)

Any AsTMa processor is free to apply any other heuristics, such as:

AsTMaXTM
linux-distribution (software)
<topic id="linux-distribution">
  <instanceOf>
    <topicRef xlink:href="#software"/>
  </instanceOf>
  <baseName>
     <baseNameString>linux distribution</baseNameString>
  </baseName>
</topic>

substituting dashes by blanks, looking up 3rd-party databases or leaving it as it is. Of course, the author can enforce a particular base name:

AsTMaXTM
RedHat-Linux-sparc (linux-distribution-port)
bn: RedHat Linux for SPARC
<topic id="RedHat-Linux-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-distribution-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>RedHat Linux for SPARC</baseNameString>
  </baseName>
</topic>

On a similar take, you can also add occurrences for topics:

AsTMaXTM
linux (os)
bn: Linux kernel
oc: http://www.kernel.org/
<topic id="linux">
  <instanceOf>
    <topicRef xlink:href="#os"/>
  </instanceOf>
  <baseName>
     <baseNameString>Linux kernel</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http://www.kernel.org/"/>
  </occurrence>
</topic>

in the case for resource references or also for inline data (XTM resourceData):

AsTMaXTM
linux-port-on-sparc (linux-port)
bn: SPARC Linux port
oc: http://www.sparc.org/linux.shtml
in: The kernel and kernel modules \
    are 64-bit on sparc64, \
    userland is still 32-bit, \
    and in fact the same as on sparc32.
<topic id="linux-port-on-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>SPARC Linux port</baseNameString>
  </baseName>
  <occurrence>
    <resourceRef xlink:href="http://www.sparc.org/linux.shtml"/>
  </occurrence>
  <occurrence>
    <resourceData>The kernel and kernel mod....</resourceData>
</occurrence>
</topic>

If appropriate, you can also type topic characteristics:

AsTMaXTM
reiserfs (filesystem)
bn: Reiser File System, ReiserFS
oc (download): http://www.namesys.com/download.html
<topic id="reiserfs">
  <instanceOf>
    <topicRef xlink:href="#filesystem"/>
  </instanceOf>
  <baseName>
     <baseNameString>Reiser File System, ReiserFS</baseNameString>
  </baseName>
  <occurrence>
    <instanceOf>
       <topicRef xlink:href="#download"/>
    </instanceOf>
    <resourceRef xlink:href="http://www.namesys.com/download.html"/>
  </occurrence>
</topic>

To scope a characteristic you use '@' to introduce a particular context:

AsTMaXTM
RedHat-Linux-sparc (linux-distribution-port)
bn: RedHat Linux for SPARC
bn @ deutsch : RedHat Linux für SPARC
<topic id="RedHat-Linux-sparc">
  <instanceOf>
    <topicRef xlink:href="#linux-distribution-port"/>
  </instanceOf>
  <baseName>
     <baseNameString>RedHat Linux for SPARC</baseNameString>
  </baseName>
  <baseName><scope><topicRef xlink:href="#deutsch"/></scope>
     <baseNameString>RedHat Linux für SPARC</baseNameString>
  </baseName>
</topic>

2.3 Associations

Associations may or may not have a particular association type. This topic type is provided inside a () pair.

(kernel-patch-provides-feature)
...
If the association has no explicit type, it can be omitted, by writing ().

Associations also have a number of members playing roles:

AsTMaXTM
(kernel-patch-provides-feature)
feature: reiserfs
platform: i386
patch:   generic-reiserfs-patch-2.4.x
<association>
  <instanceOf>
    <topicRef xlink:href="#kernel-patch-provides-feature"/>
  </instanceOf>
  <member>
     <roleSpec>
       <topicRef xlink:href="#feature"/>
     </roleSpec>
     <topicRef xlink:href="#reiserfs"/>
  </member>
  <member>
     <roleSpec>
       <topicRef xlink:href="#platform"/>
     </roleSpec>
     <topicRef xlink:href="#i386"/>
  </member>
  <member>
     <roleSpec>
       <topicRef xlink:href="#patch"/>
     </roleSpec>
     <topicRef xlink:href="#generic-reiserfs-patch-2.4.x"/>
  </member>
</association>

For better readability you may want to indent the roles

  (kernel-patch-provides-feature)
      feature: reiserfs
      platform: i386
      patch:   generic-reiserfs-patch-2.4.x

2.4 Topic Maps

The very first non-empty line within the document MUST provide a name (identifier) of the topic map itself:

AsTMaXTM
sparclinux : iso-8859-1
<?xml version="1.0" encoding="iso-8859-1"?>
<topicMap id="sparclinux"
          xmlns       = 'http://www.topicmaps.org/xtm/1.0/'
          xmlns:xlink = 'http://www.w3.org/1999/xlink'>

Optionally, you can specify an particular encoding, like in the example above. If not provided, the encoding defaults to iso-8859-1.

2.5 Macros

For authors who have no access to a general macro expansion environment, the language supports a rudimentary macro facility which comes handy when to abbreviate long strings. The idea is to first declare a macro via

  de=http://www.topicmaps.org/xtm/1.0/language.xtm#de

somewhere towards the beginning of the document and then use this definition for, say, scoping:

AsTMaXTM
oc @ &de; (press-release) : http://www....
<occurrence>
  <scope><topicRef xlink:href="http://www.topicmaps.org/xtm/1.0/language.xtm#de"/></scope>
  <instanceOf>
     <topicRef xlink:href="#press-release"/>
  </instanceOf>
  <resourceRef xlink:href="http://www.s...."/>
</occurrence>

Every instance of &de; throughout the document will be expanded by the above URL.

While these macros cannot have parameters, they will be evaluated recursively, i.e. macros can contain other macros.

It goes without saying that the notation &...; may collide with other XML entities. It lies in the responsibility of the author to take care of that.

3. Language Specification

In the following we use grammars (extended BNFs) and regexps (regular expressions) to define the syntax of an AsTMa instance.

As AsTMa is line-oriented, we introduce eol as special terminal to symbolize the end of the line (or the end of the text stream, for that matter). AsTMa processors use the start and the end of a line as a natural boundary of AsTMa purported information. If for some reason, say readability, a line n has to be broken into several lines, the backslash character \ at the very end of a line n indicates that the followup line n+1 logically belongs to n.

Example:

bn: this line is too long, so\
    we continue it on the next line

If we allow any number of white spaces to occur on a specific point in the grammar, the we symbolize this as ws*. If we require at least one white space to appear somewhere, then we use ws+ to denote this. Before and after terminals (keywords and special characters), white-spaces can be used freely while this is not made explicit in the grammar productions. Moreover, any text is stripped off white-spaces on its beginning and its end:

string --> [ ws* ] { character } [ ws* ]

In many cases topic identifiers will be used to refer to other topics. To underline what kind of topic is expected in specific cases (without enforcing this with an AsTMa processor) the syntax uses typed non-terminals, like role-topic-id or type-topic-id.

3.1 Definitions

AsTMa instance
An AsTMa instance is any (linear) text which follows the rules given in this specification. Such a text can reside in a file on a file system or can also be created on the fly from an external data source.
AsTMa processor
An AsTMa processor is any program which is capable to parse and process an AsTMa instance. For the sake of the following definition, it is assumed that the processor will convert an AsTMa instance into an equivalent XTM instance, furtheron referred to as the virtual XTM instance. The following semantic definition of AsTMa is based on this transformation and not on a Topic Map processing model which is developed at the time of writing.

A real AsTMa processor is free to skip the transformation step and to directly load AsTMa instances as long as this process is equivalent.

3.2 Overall Structure

An AsTMa instance has the following structure:
instance --> { comment-section } name-encode-section { section }
section --> comment-section | topic-definition-section | association-definition-section | macro-definition-section
An AsTMa processor has to honor the following processing constraints:

3.3 Naming and Encoding

Any AsTMa instance has to be named by a string matching [^\S]+ (anything not containing white spaces). It is up to the AsTMa processor how this name will be utilized.

Example:

linux: iso-8859-1
The instance will be called linux and the used encoding will be iso-8859-1.
name-encode-section --> name [ ':' encoding ] eol
A particular AsTMa instance is a character string in a particular character encoding. The AsTMa processor transparently passes all characters to the equivalent virtual XTM instance. To make this happen, an AsTMa instance has to specify at the begining of the stream the appropriate character encoding. This encoding will also be the XML encoding used in the virtual XTM instance. The name will become the value of the ID attribute topic map declaration.

3.4 Comments

A global comment section is introduced by a # at the very beginning of a line. Any following text (i.e. starting with the 2nd column) is regarded to be comment text. Any directly subsequent line which also begins with a # is accumulated into the comment. A global comment section is terminated by the begin of some other top level section or an empty line.

It is up to the AsTMa processor to process comments. When converting into XTM, the processor may create XML comment sections, when directly loading it for other purposes the comment may be discarded.

Example:

# warning from the information secretary
# Linux is a communist conspiration against the free Microsoft world
The comment contains two lines, the #s themselves are discarded.
comment-section --> { '#' string eol }
A local comment is introduced by a # NOT at the beginning of the line. Local comments reach until the end of the current line and are discarded by AsTMa processors.

3.5 Topics

A topic definition section follows the syntax
topic-definition --> topic-id [ '(' list-of-type-topic-ids ')' ] eol { ws* topic-characteristic eol }

The topic-id stands for the topic identifier which HAS to be provided. This identifier SHOULD follow the XML identifier rules although an AsTMa processor might not directly enforce this.

The topic identifier itself has to be placed at the beginning of the line; if the optional list of identifiers inside '()' is specified, the topic identifiers mentioned therein are interpreted as types of that topic. The list entries are separated by one (or more) white spaces:

list-of-topic-ids --> topic-id { ws+ topic-id }

Example:

linux (operating-system-technology open-source-software)
...
Defines a topic with id linux and with two types.

Every topic definition will result in a topic element in the virtual XTM instance. The topic-id will be used as value for the ID attribute. For every topic type an instanceOf element will be generated inside this topic containing a topicRef element with the xlink:href attribute having the value of the type id.

3.6 Topic characteristics

Any topic definition section can contain any number of topic characteristics. They have to directly follow the start of a topic definition section, the order is irrelevant, though. Every such characteristic is indicated by a (case sensitive) keyword, followed by scoping and/or typing information (if appropriate), a colon (:) and the value for this particular characteristic. To improve readibility, before the keyword white spaces are allowed.

Example:

linux (operating-system-technology open-source-software)
  bn: Linux Operating System
  oc (download): http://www.linux.org/
Defines a topic with id linux with an explicit base name and a typed occurrence.

Following defaults are used for a topic in the case there is no explicit characteristic defined:

All topic characteristics result in the generation of an appropriate XTM element inside the current topic element of the virtual XTM instance:
topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic
baseName
A (scoped) base name can be defined via
basename-characteristic --> 'bn' [ '@' scope-topic-id ] ':' string
The string will be used 'as-is' for the text content of the baseNameString element inside baseName element of the virtual XTM instance. If a scope topic id exists, then this id will be used as value for the attribute xlink:href for a topicRef element inside a scope element inside above baseName element.
resourceRef occurrence
A (scoped and/or typed) resourceRef occurrence is defined with
resourceRef-characteristic --> 'oc' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' string
The string SHOULD be a valid URI. An AsTMa processor might perform checks on this URL.

For each resource occurrence, in the virtual XTM instance a resourceRef element will be created with the string being used as xlink:href attribute. That element will be nested inside a occurrence element inside the current topic element.

If a scope topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.

If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.

resourceData occurrence
A (scoped and/or typed) resourceData occurrence is defined with
resourceData-characteristic --> 'in' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' string
The string will be used 'as-is' as text inside a resourceData element inside an occurrence element inside the current topic element.

If a scope topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a scope element inside the above occurrence element.

If a type topic id is provided, then this id will be used as value for the attribute xlink:href for a topicRef element inside a instanceOf element inside the above occurrence element.

3.7 Associations

An association definition section follows the syntax
association-definition --> '(' [ association-type-topic-id ] ')' eol { ws* association-member eol }

Example:

(kernel-patch-provides-feature)
   feature:  reiserfs
   platform: i386
   patch:    generic-reiserfs-patch-2.4.x
This defines an association of type kernel-patch-provides-feature with three roles feature, platform and patch.

At most one topic identifier can occur as association type. If not provided the XTM default will be assumed

Association members are defined line-by-line according to

association-member --> role-topic-id ':' { member-topic-id }
In the virtual XTM instance for every association definition in AsTMa, an association element will be created inside the current topic map. Every member will result in the creation of a member element inside this association element.

The role-topic-id will be used as the value of the xlink:href attribute of an topicRef element inside a roleSpec element inside the member.

Every single member-topic-id is used as the value of the xlink:href attribute of a topicRef element inside the member.

3.8 Macros

Macros are an ad-hoc facility to abbreviate frequent strings and resemble as such SGML entities. AsTMa processors MAY directly implement this or also MAY use external mechanisms not covered inside AsTMa for macro processing.

A macro is defined via

macro-definition --> macro-name '=' string eol
and basically binds a string value to a name. This name, in turn, can be used in subsequent AsTMa definitions like an entity reference:
macro-reference --> '&' macro-name ';'
Macros can contain other macros, it is up to the AsTMa processor to warn about the existence of circular definitions.

4. Language Issues

Completeness
The language is far from being complete relative to the expressiveness of XTM. At a second thought, though, the expressiveness of XTM is not necessarily the ultimate goal. In the same way as XTM lacks all kinds of application specific constraints, AsTMa currently only covers the authoring aspect of TM processing.

Having a separate (not standardized) language opens then venue, though, for developing AsTMa into a constraint or even into a query and manipulation language.

Border cases
There are - due to the declarativeness of the language - some cases in which AsTMa processors may produce non-intuitive results, depending on the implementation approach taken.
Relationship to LTM
The existence of LTM itself has shown that there is definitely a need for a textual notation for authors who are used to the expressiveness of textual user interfaces. Although LTM may have some shortcomings relative to AsTMa, it might prove valuable in other contexts. So, for instance, is it not possible in AsTMa to define topics directly within an association, whereas AsTMa's line orientedness is more convenient in text editors or when fighting along the sed-awk-perl-grep monsters.

5. Conclusions

First experiences with the language have shown that most parsing situations can be disambiguated by a AsTMa processor. Following unresolved issues exist:

Credits

Appendix: Syntax

instance --> { comment-section } name-encode-section { section }
comment-section --> { '#' string eol }
name-encode-section --> name [ ':' encoding ] eol
section --> comment-section | topic-definition-section | association-definition-section | macro-definition-section
topic-definition --> topic-id [ '(' list-of-type-topic-ids ')' ] eol { ws* topic-characteristic eol }
topic-characteristic --> basename-characteristic | resourceRef-characteristic | resourceData-characteristic
basename-characteristic --> 'bn' [ '@' scope-topic-id ] ':' string
resourceRef-characteristic --> 'oc' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' string
resourceData-characteristic --> 'in' [ '@' scope-topic-id ] [ '(' type-topic-id ')' ]':' string
association-definition --> '(' [ association-type-topic-id ] ')' eol { ws* association-member eol }
association-member --> role-topic-id ':' { member-topic-id }
macro-definition --> macro-name '=' string eol
list-of-topic-ids --> topic-id { ws+ topic-id }
string --> [ ws* ] { character } [ ws* ]
name --> ... any identifier not containing blanks

© 2001 Robert Barta, Bond University