<message id="<kimber.80.0013E65F@passage.com>" date="2987369631">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 00:53:51 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.80.0013E65F@passage.com>
References: <33v8g7$ksc@urmel.informatik.rwth-aachen.de>
Subject: Re: HyTime problems

[Jens Meggers]

|   Working on a HyTime conforming DTD to represent Hypermedia-Mail, I
|   stumbled across some problems:
|
|   1. Using FCSLOCK with a !percent view!:
|   I want to use FCSLOCK to pic up a part of some graphic objects
|   (like JPEG). To adress that parts in percentage terms I have to define a
|   FCS with the dimension 100 x 100. 
|   In some HyTime examples the FCS is only defined in the DTD and there is
|   no FCS-element within the document instance. To force no error during the
|   SGML-parsing process, the attribut  IMPFCS IDREF #REQUIRED is changed to
|   IMPFCS NAME #REQUIRED. This seems to be a very incorrect solution.

In fact, the declared value prescription of "IDREF" for IMPFCS in the
standard is a typo corrected to "NAME" in the Catalog of Architectural
Forms.  This is because you are specifying the *element type name* of the
FCS-form element used to define the FCS, not the ID of an instance of the
FCS-form element.  (Just as the axisdef attribute of the FCS element form
is refering to the element types of AXIS-form elements).

|   The problem is, that if you would put the FCS in the document instance,
|   you have to generate a minimum of one event (\<!element fcs - O
|   (evsched|wand|baton)+ >, but there is no use to define an instance of
|   the percent-fcs, if it is only constructed for fcslock.

Ah, but the evsched element form has a content model of (evgrp|event)*, so
you can have an empty event schedule.

|   2. Using FCSLOCK to pic up a range of video-frames:
|   is it correct, that there is no way to pic up a range of video-frames
|   from different videos with the same FCS(lock) ?
|   I think, that axis-dimension of the fcs, used in fcslock, have to be
|   changed for each video with different lenght.

I'm not sure what you mean, but it would depend on how the location source
for the FCSLoc element was defined.  Normally, a single video object would
be the location source, e.g.:

\<!DOCTYPE Sample PUBLIC "..." [
 \<!ENTITY Video1 SYSTEM "video1.jpeg" NDATA JPEGVideo>
]>
\<sample>
 \<nameloc id=video1>\<nmlist nametype=entity>Video1\</nmlist>\</nameloc>
 \<fcsloc id="frame-range-1" locsrc=video1 impfcs="frameFCS">
  \<extent>\<dimlist>1 5\</dimlist>\</extent>\<!-- First 5 frames -->
 \</fcsloc>
 ...
 \<p>
 \<vidlink linkend=frame-range-1>Click here to see some video\</vidlink>
 ...
\</sample>

However, if you are expecting support for the multloc option, the location
source for the fcsloc could be a multiple location consisting of several
video objects, which would be treated as a single object from which you
could select multiple ranges by specifying multiple dimension
specifications within each dimension list.

You could also locate multiple ranges of frames by using multiple FCSLoc 
elements and then locating them via a nameloc, e.g.:

\<!DOCTYPE Sample PUBLIC "..." [
 \<!ENTITY Video1 SYSTEM "video1.jpeg" NDATA JPEGVideo>
 \<!ENTITY Video2 SYSTEM "video2.jpeg" NDATA JPEGVideo>
]>
\<sample>
 \<nameloc id=video1>\<nmlist nametype=entity>Video1\</nmlist>\</nameloc>
 \<nameloc id=video2>\<nmlist nametype=entity>Video2\</nmlist>\</nameloc>
 \<fcsloc id="frame-range-1" locsrc=video1 impfcs="frameFCS">
  \<extent>\<dimlist>1 5\</dimlist>\</extent>\<!-- First 5 frames -->
 \</fcsloc>
 \<fcsloc id="frame-range-2" locsrc=video2 impfcs="frameFCS">
  \<extent>\<dimlist>10 4\</dimlist>\</extent>\<!-- 10th through 13th frames -->
 \</fcsloc>
 \<nameloc id=multi-frame-range>
  \<nmlist nametype=element>frame-range-1 frame-range-2\</nmlist>
 \</nameloc>
 ...
 \<p>
 \<vidlink linkend=multi-frame-range>Click here to see some video\</vidlink>
 ...
\</sample>


|   3. The dimspec order of the extlist in the FCSLOCK content:
|   does the order of the fcs axis-definition determine the order of the
|   dimspecs in the fcslock-content ?

Yes.  Note that in a real application, you could define different element
types for the dimlist or dimspec elements to make their association to
particular axes clear and consistent:

\<!ELEMENT Extent - - (MyFCSExtentList+) >
\<!ELEMENT MyFCSExtentList - O (AxisADimlist, AxisBDimlist) >
\<!ELEMENT (AxisADimlist | AxisBDimlist) - - (Dimspec+) >
\<!ATTLIST (AxisADimlist | AxisBDimlist)
   HyTime   NAME #FIXED "Dimlist"
>

|   4. How can I describe this Situation:
|   \<if the picture is selected, then start the video presentation.>
|   Can I use a hyperlink to start an event ?

You can use a hyperlink to make an object the content of an event using the
accessed anchor link feature.  Accessed anchor link lets you say "when an
object is traversed to as the anchor of a particular hyperlink, consider it
to be contained in this event."  This allows you to model dynamic behavior
by using hyperlinks to describe "state transitions" (e.g., traversing a
hyperlink is the act of moving from one state to the next).  For the
accessed anchor feature, you can specify either a link type or a link
instance as the thing that does the controlling.  In the case of starting a
particular event, you would probably key off of a specific hyperlink
instance, for example:

\<!DOCTYPE Sample PUBLIC "..." [
 \<!ENTITY Video1 SYSTEM "video1.jpeg" NDATA JPEGVideo>
 \<!ENTITY Graphic1 SYSTEM "Graphic.gif" NDATA GIF>
 \<!ELEMENT ShowObjectLink - - (#PCDATA) >
 \<!ATTLIST ShowObjectLink 
      HyTime  NAME #FIXED "ilink"
      anchrole NAMES #FIXED "button object-to-show"
      linkends IDREFS #REQUIRED -- Can omit first linkend if contextual --
      intra    NAMES #FIXED "A E" -- One-way link from button to object --
 >
 \<!ENTITY % user.hyperlinks "ShowObjectLink" -- Hook to public part of DTD -->
]>
\<Sample>
 \<nameloc id=video1>\<nmlist nametype=entity>Video1\</nmlist>\</nameloc>
 \<VideoDisplay HyTime="fcs">
  \<evsched>
   \<event exspec=video-window-loc>
    \<accanch>\<!-- Accessed anchor list specification -->
     \<acclink linkid=video-1-link
              accrole=object-to-show
              accend=anchor>
     \<!-- Use "object-to-show" anchor of link with ID "video-1-link" and show 
          the anchor object, not its endterm -->
    \</Accanch>
    \<extlist id=video-window-loc>\<!-- 200x200 window at origion -->
     \<extent>
      \<dimlist>1 200\</dimlist>
      \<dimlist>1 200\</dimlist>
     \</extent>
    \</extlist>
   \</event>
 \</evsched>
\</VideoDisplay>
 ...
\<p>\<ShowObjectLink id=show-video-1 linkends="video1">
  \<graphic object=Graphic1>This is a graphic\</graphic>
\</showobjectlink>

In a real application, the event schedule that defines the video display
might be defined centrally and re-used.  You might also use the accessed
link element type feature instead of the accessed link feature so that any
ShowObjectLink that was traversed would result in the object traversed to
being shown.

Also, this example looks more verbose than it needs to be because I've
shown all the elements and not taken advantage of minimization.

Note that you don't necessarily need an event schedule for this sort of
function, as your application could define the meaning of traversing to a
particular anchor of a particular link type to mean "present the anchor in
a window".  For example, DynaText provides a way to specify this behavior
in its own style language.  However, using the FCS is a more generic way to
do it, providing a more interchangeable specification of the author's
intent.

-- 
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<novlepubCvFq3t.HFA@netcom.com>" date="2987384681">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 05:04:41 UT
From: Novell Epub \<novlepub@netcom.com>
Organization: Novell Electronic Publishing
Message-ID: \<novlepubCvFq3t.HFA@netcom.com>
Subject: Job opening at Novell


[This notice was posted last week with replies directed to Wayne Taylor at
Novell.  We have reason to suspect that Wayne's mail connection may be
acting up, so I am reposting this.  Please direct replies to
\<novlepub@netcom.com> this time.  If you replied to the earlier post,
please resend your reply to the new address.  -- Jon Bosak, Novell
Corporate Publishing Services]

Novell, a leading worldwide computer networking company, has the following
opportunity:

                Sr. Software Engineer/Hypertext Specialist

Duties include:

  - Research, specify, develop and maintain our online document preparation
    tools.

  - Assist business units in the specification of hardware to support
    authoring, SGML conversion and preparation tools.

  - Support authoring groups in troubleshooting problems that arise in the
    preparation of documents for online delivery.

  - Communicate with the development and support staffs of our tools
    vendors.

  - Assist business units in negotiating and managing conversion
    outsourcing and other contract work related to online document
    preparation.

  - Monitor and participate in the development of industry standards for
    electronic document delivery.

  - Assist the department head in researching and developing our corporate
    information delivery strategy.

Skills Needed:

  - Significant C or C++ programming abilities in UNIX, Windows, DOS, or
    OS/2

  - Significant experience working with SGML

To be considered for this position, please send your resume in ASCII or
uuencoded PostScript to novlepub@netcom.com.

Jon Bosak
Novell Corporate Publishing Services
</message>
<message id="<CvFzG1.7y1@exodus.iti.gov.sg>" date="2987396785">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 08:26:25 UT
From: Ernie Quah Cheng Hai \<chenghai@ncb.gov.sg>
Organization: ncb
Message-ID: \<CvFzG1.7y1@exodus.iti.gov.sg>
Subject: SGML Conference

SGML Asia Pacific

For interested readers, 

The above conference would be held in Singapore from 10-12 Oct 94.

Dr Charles Goldfard would be one of the keyword speaker.

It's not free !!!

1st participant US$800, 2nd US$600, 3rd US$400

Call +1 703 519 8160 for registration.
</message>
<message id="<85905A7945@pc>" date="2987398357">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 08:52:37 UT
From: Michael Schwantner \<MSCH@fiz-karlsruhe.de>
Organization: Fachinformationszentrum Karlsruhe
Message-ID: <85905A7945@pc>
References: <1994Aug28.062554.1@east.pima.edu>
Subject: Re: TEI info, please

[Gloria McMillan]

|   I was just wondering if there is a second usenet group on TEI.  I am in
|   writing and literature and have been reading about TEI.  I have lots of
|   questions, but don't want to post them to the wrong group.
|   
|   Does a separate TEI news group exist?

There is a TEI mailing list:
Send a mail to listserv@uicvm.uic.edu.
Leave the subject blank, in the message body write:

subscribe tei-l \<your first name> \<your last name>

Here is an excerpt from the welcome message:

"I would like to welcome you as a subscriber to the electronic discussion
group, TEI-L, at the University of Illinois at Chicago.  The Text Encoding
Initiative (TEI) is an international project to establish guidelines for
the encoding of machine-readable textual material for research; this
discussion group has been set up in order to disseminate information about
the TEI and to enable discussion of the TEI guidelines while they are under
development.  We hope that the TEI-L list will prove useful in providing
information and encouraging discussion as the TEI progresses toward
completion and formal publication of the TEI Guidelines For Text Encoding
and Interchange.  ..."
</message>
<message id="<STEINARB.94Aug31113631@flame.falch.no>" date="2987400991">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 09:36:31 UT
From: Steinar Bang \<steinarb@falch.no>
Organization: Falch Hurtigtrykk, Oslo, Norway
Message-ID: \<STEINARB.94Aug31113631@flame.falch.no>
References: <33v8g7$ksc@urmel.informatik.rwth-aachen.de>
Subject: "Was: HyTime problems" \<Hypermedia-Mail>

[Jens Meggers]

|   Working on a HyTime conforming DTD to represent Hypermedia-Mail, ...

What is "hypermedia-mail"?

Just curious.

- Steinar
-- 
\<html>
\<hl>Support your local \<a href="http://www.falch.no/~steinarb/dod/">
\<strong>DoD\</strong>\</a> chapter.
\</html>
</message>
<message id="<344h1o$55h@hopper.acm.org>" date="2987410936">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 12:22:16 UT
From: Dave Peterson \<davep@acm.org>
Organization: ACM Network Services
Message-ID: <344h1o$55h@hopper.acm.org>
References: <342me6INNn9i@moon.cis.ohio-state.edu>
Subject: Re: What is the relationship between entities and attributes?

[James Webster Saunders]

|   I came across the following reference in "Practical SGML":
|   
|   \<Artwork name="FILE1 FILE2"> 
|   
|   where FILE1 and FILE2 are entities.  In this case, the entities are
|   external files containing the tag data.  If a tagged document were
|   being parsed without a DTD and it contained such a reference, how would
|   one distinguish that this value is indeed a list of entities?

"parsing without a DTD" sounds akin to compiling without a programming
language specified -- how would you compile a FORTRAN program without
specifying that it's FORTRAN?

I think you need to explain in more detail what you want to accomplish.

Dave Peterson
SGMLWorks!

davep@acm.org
</message>
<message id="<52070.loeffen@ruulet.let.ruu.nl>" date="2987411266">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 12:27:46 UT
From: Arjan Loeffen \<Arjan.Loeffen@let.ruu.nl>
Message-ID: <52070.loeffen@ruulet.let.ruu.nl>
Subject: Search

Dear reader,

I'd be interested to hear how available SGML editing environments perform a
search/replace.  Do you know of an editor that allows you to search
cross-element boundary, as in the following example:

    \<p>Therefore we \<hp>must</> find such a person.</>

    (search for "we must find" is successful).

and

    \<p>Therefore we\<footnote>I mean our department.</> must
    find such a person.</>

    (search for "we must find" is successful).

I'd like to hear how these programs deal with this, especially if contexts
or types of elements are checked to decide if the elements should be
ignored/included there.

Thanks in advance.

Arjan.
-- 
Arjan Loeffen           Achter de Dom 22-24  ++31+30536417  voice work
Faculty of Arts         3512JP Utrecht       ++31+206623817 voice home
University of Utrecht   The Netherlands      ++31+309221    fax work
</message>
<message id="<CvGBIr.MzM@heeg.de>" date="2987412435">
Newsgroups: comp.lang.smalltalk,comp.text.sgml
Date: 01 Sep 1994 12:47:15 UT
From: Hasko Heinecke \<hasko@heeg.de>
Organization: Georg Heeg Objektorientierte Systeme, Dortmund, FRG
Message-ID: \<CvGBIr.MzM@heeg.de>
References: \<CvCpKs.18C@world.std.com>
Subject: Re: Smalltalk tools for SGML

[Joe Berkovitz]

|   Does anyone know of any Smalltalk tools out there for parsing or
|   otherwise processing SGML input?  Any pointers would be greatly
|   appreciated!

Georg Heeg - Objektorientierte Systeme, the company I work for, is
currently developing an SGML parser for ParcPlace's Smalltalk.  For more
information, please contact me vie phone, fax, or email.

Hasko Heinecke
Georg Heeg - Objektorientierte Systeme
Baroper Str. 337
D-44227 Dortmund
Germany

Tel: +49-231-97599-0
Fax: +49-231-97599-20

Email: hasko@heeg.de, info@heeg.de
-- 
+-------------------------------------------------------+
| Hasko Heinecke speaking for myself only               |
| I _never_ mean what I say - and nobody else does...   |
+-------------------------------------------------------+
</message>
<message id="<jsaunder-010994104514@slip2-16.acs.ohio-state.edu>" date="2987423114">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 15:45:14 UT
From: James Saunders \<jsaunder@magnus.acs.ohio-state.edu>
Organization: the ohio state university
Message-ID: \<jsaunder-010994104514@slip2-16.acs.ohio-state.edu>
References: <342me6INNn9i@moon.cis.ohio-state.edu> <344h1o$55h@hopper.acm.org>
Subject: Re: What is the relationship between entities and attributes?

[Dave Peterson]

|   "parsing without a DTD" sounds akin to compiling without a programming
|   language specified--how would you compile a FORTRAN program without
|   specifying that it's FORTRAN?
|   
|   I think you need to explain in more detail what you want to accomplish.

We are involved in a project that allows construction of a DTD for a
collection of tagged documents that do not have a DTD.  As such, this is
not an easily solved problem.  My question was, what information that is
contained in a tags attribute, or in other tags in the same document,
allows you, without knowing the DTD, to determine that the tag attributes
declared value is CDATA, ENTITY, ENTITIES etc.

ie  \<artwork file = "pic12">      pic12      is an entity
    \<artwork size = 100mm 200mm> "100m 200m" is not an entity

-- 
James W. Saunders  Research Assistant | Graduate Student
                   Office of Research | Dept. of Geography
                   OCLC               | OSU
                   saunders@oclc.org  | jsaunder@magnus.acs.ohio-state.edu
</message>
<message id="<DMEGGINS.94Sep1122108@aix1.uottawa.ca>" date="2987425268">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 16:21:08 UT
From: David Megginson \<dmeggins@aix1.uottawa.ca>
Organization: Department of English, University of Ottawa
Message-ID: \<DMEGGINS.94Sep1122108@aix1.uottawa.ca>
References: <342me6INNn9i@moon.cis.ohio-state.edu>
Subject: Re: What is the relationship between entities and attributes?

[James Webster Saunders]

|   I came across the following reference in "Practical SGML": 
|
|   \<Artwork name="FILE1 FILE2">
|
|   where FILE1 and FILE2 are entities.  In this case, the entities are
|   external files containing the tag data.  If a tagged document were
|   being parsed without a DTD and it contained such a reference, how would
|   one distinguish that this value is indeed a list of entities?

Exactly.  That is why SGML files are incomplete without a DTD.  By the same
token, how could you tell that \&FILE2; was an external data entity without
a declaration in the DTD to guide you?

|   There seems to be a tremendous amount of flexibility in SGML to specify
|   things as attributes.  In this example, however, this seems to lead to
|   ambiguity in the tagging as to what is contained in the attributes
|   declared value.  Any enlightenment on this issue would be appreciated.

That's why we declare notations, entities, and attribute lists in the DTD,
so that the processing software will know _exactly_ what to do in a case
like this.

Just out of curiosity, why _do_ you need to parse the document without a
DTD?

David
-- 
David Megginson                Department of English, University of Ottawa,
dmeggins@aix1.uottawa.ca       Ottawa, Ontario, CANADA  K1N 6N5
dmeggins@acadvm1.uottawa.ca    Phone: +1 613 564 6850 (Office)
ak117@freenet.carleton.ca             +1 613 564 9175 (FAX)
</message>
<message id="<1994Sep1.180913.5161@ast.saic.com>" date="2987431753">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 18:09:13 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep1.180913.5161@ast.saic.com>
References: \<DMEGGINS.94Aug26094925@aix1.uottawa.ca>
Subject: Re: Is #CURRENT a good thing? / TEI gripe

[David Megginson]

|   Exactly -- it would be slightly more useful that way, but not much.  I
|   imagine that it was included in the (SGML) standard, along with a few
|   other poorly-thought-out features (RANK, DATATAG, etc), because of
|   pressure from an older generation of computer hacks, trained in the
|   days when saving ten bytes from a file (at the cost of transparency and
|   easy maintenence) could actually matter.  I was disappointed to find
|   TEI using #CURRENT, but I would like to emphasise that this is a very
|   small disappointment with an otherwise outstanding piece of
|   collaborative work.

Your use of the word "imagine" seems to be accurate here, but that's about
all.  The first sentence of paragraph 4.6.5 (RANK Feature) in the handbook
begins:

"RANK is a concession to application design practices in the early days of
generic coding......"  Not much imagination required there.  Your use of
the phrase "poorly-thought-out" is, I feel, unkind and inappropriate.  The
purpose of #CURRENT is clearly stated in the standard to provide a means
for inheriting a default from a parent element.  With respect to the misuse
of #CURRENT determining the current level of an element, I should like to
point out that the level of an elemnet is defined as the RANK of the
element in the standard and should be readily available to an application
from any well designed parser.

I have found from experience that the handling of "RANK" or level in the
outspec dtd by means of FOSI's is woefully inadequate.  There, the context
must be explicitly stated for nested elements.  e.g., context="item seqlist
item seqlist" The "RANK" or level must be maintained by the FOSI programmer
and the only operations that may be done on the counter is to reset it or
increment it.  No counter arithmetic or logic is provided.  Counters may be
used to contruct strings for labeling or TOCs and non numeric schemes such
as UC Roman or LC alpha are supported.

My point is that the parser can easily determined the level or rank of each
element and should make it easily available to the application.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep1.181753.19063@tin.monsanto.com>" date="2987432273">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 18:17:53 UT
From: Joel Finkle \<jjfink@skcla.monsanto.com>
Organization: Searle R\&D
Message-ID: <1994Sep1.181753.19063@tin.monsanto.com>
References: <342gak$bnt@finnegan.iol.ie>
Subject: Re: SGML to postscript/PDF

[Steve Pepper]

|   The answer depends a lot on your particular requirements.  Any of the
|   "quasi-WYSIWYG" SGML-editors with support for PostScript printers will
|   allow you to produce PostScript output directly from SGML - once you
|   have set up your style sheets, of course.  But they only offer fairly
|   basic formatting capabilities - e.g., no hyphenation.  Examples:
|   Author/Editor, Adept Editor.

One added value of a "direct" SGML to PDF output is retention of the
structure within the PDF file.  This can be done by implementing PDF
bookmarks, links, and annotations.

Through the use of the Acrobat Distiller, "pdfmark" postscript commands may
be embedded in a postscript document already to implement these features.
All it takes is a Postscript output that will embed these codes in the
file.

When I spoke with someone at Adobe some months ago, he said that they would
like to be able to retain SGML tagging within a PDF document.  I expect
that the pdfmark is the first step toward that, but it needs a tremendous
enhancement.

Joel
-- 
Joel Finkle
Searle R\&D
jjfink@skcla.monsanto.com

"And when I die don't bury me / in a box in a cemetery.
Out in the garden would be much better,
I could be pushin' up home grown tomatoes" -- Guy Clark, "Home Grown Tomatoes"
</message>
<message id="<345987$5g9@finnegan.iol.ie>" date="2987435719">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 19:15:19 UT
From: Sean Mc Grath \<digitome@iol.ie>
Organization: Ireland On-Line
Message-ID: <345987$5g9@finnegan.iol.ie>
Subject: Re: SGML to Postscript/PDF

[Jeffrey McArthur]

|   How direct is *direct*.  With TeX, it is possible to feed your SGML
|   file directly into TeX (with a proper set of macro loaded) and output a
|   DVI file.

[Steve Pepper]

|   Hope this helps.  If not, perhaps you need to bore us with the details
|   of your reasons for needing to go direct from SGML to PostScript.

Thanks for both replies...

My situation is that a large collection of SGML documents exist on an
unknown platform on which my transformation from SGML to .PS or .PDF format
must take place. Given that I do not know *anything* about the platform as
of yet I was hoping to avoid concerns about whether or not third party
programs (such as TeX) will run on the target platform by seeing if I can
go direct.

My needs in terms of formatting it must be said are pretty basic. The odd
moveto/lineto sprinkled amongst text of at most three fonts in a handful of
point sizes - that is it. If I could coerce the client into living with
mono spaced fonts my problems would be solved but I doubt if he will go for
that :-)

I cannot help feeling that someone, somewhere has written a nice C function
library with function calls like :-

    newpage()			- Start new postscript page
    setfont(x)			- Select font
    boldon()			- Turn on bold face
    centretext("Hello World")   - Display text centered on the current line
    Para ("I am a paragraph")   - Output text as a paragraph, wrapping text
				  as required

I have that awful "I'm missing something here" feeling.  Can anyone put me
out of my mysery?

Regards,
-- 
Sean Mc Grath	digitome@iol.ie
Digitome Ltd., Ballina, Co. Mayo, Ireland Tel: +353 96 72092
</message>
<message id="<345a7t$480@Starbase.NeoSoft.COM>" date="2987436733">
Newsgroups: alt.culture.usenet,alt.culture.internet,news.misc,comp.infosystems.www.misc,comp.text,comp.text.sgml
Date: 01 Sep 1994 19:32:13 UT
From: Cameron Laird \<claird@Starbase.NeoSoft.COM>
Organization: NeoSoft Internet Services +1 713 684 5969
Message-ID: <345a7t$480@Starbase.NeoSoft.COM>
References: <344tf2$ks7@Starbase.NeoSoft.COM> \<q3cPkq530EyZ057yn@oslonett.no>
Subject: Re: The construction of FAQs

[Cameron Laird]

|   Here's the question: is there any good reason not to construct all FAQs
|   from now on in HTML, rather than plaintext?  More generally,

[Peter N. M. Hansteen]

|   Try this: a significant number of people (I myself am one) use rather
|   traditional text-based tools to browse newsgroups, and either do not
|   have easy access to html or (like myself) read news and associated
|   documents off-line.  This in turn means that the usefulness of live
|   links to other documents etc which html is famous for is greatly
|   reduced.

		[other apt observations]
			.
			.
			.

One of my realizations for the week is that HTML is not just about live
links; its formatting or stylistic standards define an advance over the
character-based plaintext with which we're all familiar.  HTML is not the
best language for quasi-static distribution of documents (PostScript is one
example of a better one), but I'm thinking of these advantages, in the
context of FAQs, which are written by few, and read by many:

1.  HTML browsers are easily available;

2.  knowledge of HTML will diffuse explosively on the back of the WWW; and

3.  the references in an HTMLized document are more likely to be accurate.
    This is just a common-place of software engineering; if a URL in a
    plaintext document is mistyped, people using it will correct the
    spelling on the fly while tapping out their own ftp requests.  On the
    other hand, a live URL that's wrong will quickly reveal itself, and
    invite correction.  Thus, an HTMLized FAQ will have fewer faults of
    this sort, which is a benefit even to the people who don't have a live
    connection.

-- 

Cameron Laird		ftp://ftp.neosoft.com/pub/users/claird/home.html
claird@Neosoft.com (claird%Neosoft.com@uunet.uu.net)	+1 713 267 7966
claird@litwin.com (claird%litwin.com@uunet.uu.net)  	+1 713 996 8546
</message>
<message id="<1994Sep1.194818.21381@ast.saic.com>" date="2987437698">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 19:48:18 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep1.194818.21381@ast.saic.com>
References: \<jsaunder-010994104514@slip2-16.acs.ohio-state.edu>
Subject: Re: What is the relationship between entiti

[James Saunders]

|   We are involved in a project that allows construction of a DTD for a
|   collection of tagged documents that do not have a DTD.  As such, this
|   is not an easily solved problem.  My question was, what information
|   that is contained in a tags attribute, or in other tags in the same
|   document, allows you, without knowing the DTD, to determine that the
|   tag attributes declared value is CDATA, ENTITY, ENTITIES etc.
|   
|   ie  \<artwork file = "pic12">      pic12      is an entity
|       \<artwork size = 100mm 200mm> "100m 200m" is not an entity

In general, nothing.  All that information is in the DTD.  Whoever tagged
the document in the first place did it according to some DTD because that
is what told hime what tags he had available, what attributes the could
have, and in what context they could occur.

In this case, you are somewhat fortunate in that your document designer
used names which are mildly indicative of their nature.  e.g., I might
guess, with some confidence, that "pic2" is an artwork file since it
contains the word FILE and files are file entities.  This is why the
standard requires the DTD to be part of the document.  So if you have a
document without a DTD, it's not really an SGML document.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<9409012114.AA19892@source.asset.com>" date="2987442854">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 21:14:14 UT
From: "Claude L. Bullard" \<bullardc@source.asset.com>
Message-ID: <9409012114.AA19892@source.asset.com>
Subject: SGML New User Requesting General Information

After my posting to "newbies" on the basics of SGML, I received a number of
kind and complementary postings suggesting that I'd written something of
benefit to the community.  Thank y'all very much.  Nothing works better on
me than applause.

Ed Brachman at Interleaf made some very good suggestions about
improving the post.

|   (1) Where you compare a DTD to an 'ADT', I'm not sure you're doing the
|   newbie any favors.  I assume that 'ADT' stands for 'abstract data type'
|   -- but I'm not sure that calling a DTD an ADT helps even for those of
|   us who can make that assumption, and I'll bet that it goes over the
|   heads of at least some of the newbies you otherwise do a good job of
|   talking to.

Quite right, Ed.  But every newbie is not new to every thing.  Some are
well-educated comp-sci types who just want to know what SGML IS-OR-Ain't.
A few well-placed "buzzWords" gives them the clues they are looking for.
Unless they are navigating the nets with Mosaic, they are hopefully not
true innocents.  Even in that case, it behooves an inquiring mind to get
dictionaries for subjects they want to study.  In this and future posts on
the subject, I am consciously trying to propagate a meme:

MyMEME: SGML can be used for more than books.  To do it, one must get
Beyond The Book Metaphor... or finally realize that a lot of things we
think of as software are really automated books without binders.

SERMON FOLLOWS (caveat emptor):

It is our imaginations that place the harshest limits on our creativity.
Since only divine beings create ex nihilo, the rest of us get our sparks
from what we ingest through the nearest convenient port.  In this case, the
newbie would do well to look up the subjects of Abstract Data Type and
automata theory and compare them to what SGML has to offer.  From however
much of this they can absorb, they may begin, by contrast and comparison,
to form their own unique and even heretical ideas for SGML applications.
Great Jumping Horny Toads, they may start to think for themselves rather
than blandly accepting the latest brain-dead application.

The strength of SGML is its capacity for rigorously defining applications.
Sure, the language has gaps and can't *do everything*.  But it can be used
by imaginative people to do far more than has already been done, and if
enough people believe that and try, maybe we can break the miasma of
"control" through registering DTDs.

... we may even find a cure for boredom.  Chapters are BORRING!

To achieve this, we should learn to treat SGML like the dog that it is: a
standard for marking up data.  Nothing more.  How data should be marked up
so that it becomes information, what generic identifiers should be called,
how they should be grouped, this is where we can be endlessly creative
rather than spending our days endlessly debating if this or that feature of
the language or a certain DTD are *holy* (unless that's how you have
fun. In that case, party on.)

Freedom to seek their own path, to direct their own evolution, to think
"funny" thoughts and discover by experience if these thoughts are *holy*,
that is what I advocate for newbies.  To be free one must choose to think.
To think well, one must become knowledgeable. To stay free, one should
freely choose what one should know.

... But first, one must eat *good food*.  Enough sermon..

|   (2) In noting points about the badbook DTD, you claim that the use of
|   the PUBLIC keyword implies that the DTD has been formally registered
|   with some body empowered to do formal registrations...

Hmm, I was trying to say the opposite.  That while the identifier appears
to make that claim, there is no way to check it.  In other words, no
*magical hand* flies around the universe to ensure that the type is
registered.  It points to the concept of *agreement among trusted parties*
as the basis for almost everything SGML can do for parties of more than
one.

|   The only problem I've ever seen is where a system references such
|   identifiers, but offers you *no* way to get at the relevant "publicly"
|   identified material.

Yep.  Such systems are impolite and shouldn't be invited to *parties where
the band is too loud and one has to resort to sign language*.  That is one
use of SGML: sign language for systems that party together than go to their
own *domain* afterwards.

|   (3) In discussing the ISBN attribute, it might be worthwhile to use it
|   as an illustration of the lack of semantics in SGML.  Naming the
|   attribute 'ISBN' does *not* mean that there's anything connected with
|   SGML that checks its value to see if it's a valid ISBN -- as the value
|   in you example in fact is not.

Quite true and part of the subject of the next post where I will complete
the example.  However, SGML is extended by HyTime and HyTime does give one
a way to set a check for that via lextypes.  Whether your system can DO
that checking is another subject: features matching.

|   (4) Parochially, it bugs me that there's no mention of Interleaf in
|   your list of SGML products.  I know that it's just your personal list.

It is.  Actually, I woke up that night worried about the numbers of vendors
I didn't mention (no lie). Interleaf and its products are fine stuff from
what I gather and I have nothing but respect for the company and its never
ending struggle to get WYSIWYG and DTD-centered applications to
interoperate.  But in the fifteen odd years I've done this for a living,
I've never had the pleasure of using an Interleaf product, so it's like
talking about the attributes of a lady I've never courted....  presumptious
and a good way to eliminate a potentially thrilling life experience.

Cheers to one and all!

Len Bullard
</message>
<message id="<346397$ar9@news.delphi.com>" date="2987442952">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 21:15:52 UT
From: Jeffrey McArthur \<j_mcarthur@BIX.com>
Organization: ATLIS Publishing
Message-ID: <346397$ar9@news.delphi.com>
Subject: Re: SGML to Postscript/PDF

TeX is one of the most widely ported programs in existance.  It is actually
quite hard to find a platform that it does not exist on.  The output of TeX
is a .dvi file.  The .dvi file was created with portability in mind.  You
can copy a .dvi file from an IBM Mainframe to a Cray, to a CDC Cyber, to a
PC, to a Mac, to an Amiga, to an Atari, and so on.  David Fuchs (sp?) and
Donald Knuth went to a lot of trouble to make sure the output of TeX was
transparent from one machine to another.

So unless you are running on something REALLY odd, there will be a version
of TeX available.  Converting to Postscript is another matter.  The source
code to DVIPS is available and written in moderately portable C, it has not
been ported to as many platforms as TeX.  But it is available for most
platforms.

On the other hand, installing TeX and DVIPS is not very easy...

-- 
    Jeffrey M\\kern-.05em\\raise.5ex\\hbox{\\b c}\\kern-.05emArthur
    a.k.a. Jeffrey McArthur          email: j_mcarthur@bix.com
    phone: +1 301 210 6655
    fax:   +1 301 210 4999
    home:  +1 410 290 6935

The opinions express are mine.  They do not reflect the opinions of my
employer.  My access to the Internet is not paid for by my employer.
</message>
<message id="<345oko$aa5@zip.eecs.umich.edu>" date="2987451480">
Newsgroups: alt.culture.usenet,alt.culture.internet,news.misc,comp.infosystems.www.misc,comp.text.sgml,comp.text
Followup-To: alt.culture.internet
Date: 01 Sep 1994 23:38:00 UT
From: Jim Jewett \<jimj@quip.eecs.umich.edu>
Organization: University of Michigan EECS Dept.
Message-ID: <345oko$aa5@zip.eecs.umich.edu>
References: <344tf2$ks7@Starbase.NeoSoft.COM> \<q3cPkq530EyZ057yn@oslonett.no> <345a7t$480@starbase.neosoft.com>
Subject: Re: The construction of FAQs

[Followups slashed]

In article <345a7t$480@starbase.neosoft.com>,
Cameron Laird \<claird@Starbase.NeoSoft.COM> wrote:
>In article \<q3cPkq530EyZ057yn@oslonett.no>,
>Peter N. M. Hansteen \<peternm@oslonett.no> wrote:
>>In article <344tf2$ks7@Starbase.NeoSoft.COM>, Cameron Laird wrote:

>>> Is there any good reason not to construct all FAQs from now on in HTML, 
>>> rather than plaintext?

>> [Many people use traditional text tools, often offline, and don't have 
>> access  to the live links.]

>One of my realizations for the week is that HTML is not just
>about live links; its formatting or stylistic standards define
>an advance over the character-based plaintext with which we're
>all familiar.

There is, however, a lot to be said for familiarity.  LaTeX is far
better than ASCII, but there are people who don't have LaTeX around.
If it is done lightly (eg, no figures, no really absurd escapes, etc)
it isn't even very intrusive -- but I remember when I would see 
LaTeX and wonder if I knew how to read it.  I imagine many people would
simply have decided that they didn't.  If FAQs start using entities
and lots of anchors and more tags than line breaks, this will be a 
step backwards, because it will lose the portability.


>HTML is not the best language for quasi-static distribution of 
>documents (PostScript is one example of a better one),

This is what made me respond... I will often grab and read something
in ASCII.  I will often decide not to in postscript.  This would be
true even if they were the same size.  Why?

At home, I can't read postscript at all (except as raw text, which
is sometimes doable, and often painful).

At work, I sit in front of a workstation designed to display postscript
well -- and a large number of the documents I FTP are basically unreadable.

The nearest I can figure, the person writing them made assumptions about
fonts that I don't have, or expected them to be printed rather than viewed.

For truly interesting documents, I literally scroll through the blasted
raw postscript in another window to see what was supposed to be in place
of black blotches, or areas too big to view, or areas overwritten, or 
areas in an unreadable font...

I would much rather read the ascii without the formatting info to obscure
it.  ASCII almost always wins on the portability issue.

_________    Have a favorite group or mailing list?  Describe it to
    |                   grouprev+@pitt.edu 
 jJ |    Take only memories.            jimj@eecs.umich.edu 
\\__/     Leave not even footprints.     jewett+@pitt.edu 

</message>
<message id="<dshema-0109941701410001@kaleetan.rt.cs.boeing.com>" date="2987452784">
Newsgroups: comp.text.sgml
Date: 01 Sep 1994 23:59:44 UT
From: Dave Shema \<dshema@grace.rt.cs.boeing.com>
Organization: Boeing Computer Services
Message-ID: \<dshema-0109941701410001@kaleetan.rt.cs.boeing.com>
Subject: Underscores and the Sgmls validating parser?

My SGML data contains entities and attributes with underscores ("_").  I am
trying to use Sgmls, derived from ARCSGML, as the validating parser.  This
parser does not seem to like underscores.

In the declarations file, I have tried modifying the list of name
characters that can occur at positions other than the first position:

   NAMING LCNMCHAR "-._"
          UCNMCHAR "-._"

The message:

    sgmls: Unsupported feature at sgml_decl, line 32 in declaration parameter 92:
           Character number 95 is not supported as an additional name character

 is displayed by the parser.


--------------------- partial sgml declaration file ---------------------------
.
.
.
SYNTAX    
SHUNCHAR  CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
         18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
BASESET   "ISO 646-1983//CHARSET International Reference Version
(IRV)//ESC 2/5 4/0"
DESCSET   0  128  0
FUNCTION  RE          13
          RS          10
          SPACE       32
          TAB  SEPCHAR 9 
NAMING    LCNMSTRT ""
          UCNMSTRT ""
          LCNMCHAR "-._"
          UCNMCHAR "-._"
          NAMECASE GENERAL YES
                   ENTITY   NO
DELIM     GENERAL  SGMLREF
          SHORTREF SGMLREF
NAMES     SGMLREF
QUANTITY  SGMLREF
          LITLEN 650
          NAMELEN 32
.
.
.

Short of going into the C code and changing values in the tables (what we
ended up doing with ARCSGML), can this parser be encouraged to accept
underscores?

Thanks !!
   Dave

-- 
dave shema  (dshema@grace.rt.cs.boeing.com)
</message>
<message id="<CvHBxA.Aqq@world.std.com>" date="2987459614">
Newsgroups: alt.culture.usenet,alt.culture.internet,news.misc,comp.infosystems.www.misc,comp.text.sgml,comp.text
Followup-To: news.misc
Date: 02 Sep 1994 01:53:34 UT
From: Tom O Breton \<tob@world.std.com>
Organization: BREnterprises
Message-ID: \<CvHBxA.Aqq@world.std.com>
References: <345oko$aa5@zip.eecs.umich.edu>
Subject: Re: The construction of FAQs

[ Follow-ups to news.misc (Since I can't think of a place that really fits) ]

jimj@quip.eecs.umich.edu (Jim Jewett) writes:
> This is what made me respond... I will often grab and read something
> in ASCII.  I will often decide not to in postscript.  This would be
> true even if they were the same size.  Why?

Absolutely. I've often had the experience of ftping to look for a file,
only to find that it's in Postscript and not getting it.

Felt bad about it too, 'cause I presume the author did more work than
straight ASCII would entail, but in my eyes rendered it unusable (unless
I filter it into straight ASCII)

        Tom

-- 
finger me for how Tehomega is coming along (at tob@world.std.com)
Author of The Burning Tower (from TomBreton@delphi.com) (weekly in
rec.games.frp.archives)
</message>
<message id="<34678j$l9o@umd5.umd.edu>" date="2987466451">
Newsgroups: alt.culture.usenet,alt.culture.internet,news.misc,comp.infosystems.www.misc,comp.text.sgml
Date: 02 Sep 1994 03:47:31 UT
From: Ram Samudrala \<ram@elan1.carb.nist.gov>
Organization: The Centre for Advanced Research in Biotechnology
Message-ID: <34678j$l9o@umd5.umd.edu>
References: <344tf2$ks7@Starbase.NeoSoft.COM> \<q3cPkq530EyZ057yn@oslonett.no> <345a7t$480@Starbase.NeoSoft.COM>
Subject: Re: The construction of FAQs

[Cameron Laird]

|   One of my realizations for the week is that HTML is not just about live
|   links; its formatting or stylistic standards define an advance over the
|   character-based plaintext with which we're all familiar.

Right.  It offers as logical framework for your writing, much the way TeX
does, though I am still a lot more comfortable with LaTeX than I am with
HTML.  HTML does offer a certain amount of abstraction, but it's not
flexible enough for me.  In general, I'd like more control over sizes (?)
and the fonts within the document.  It is however a lot easier for the
uninitiated to learn, than TeX is.

I think your writing ability improves if you write using HTML or TeX,
personally.

--Ram

    Ram Samudrala                              ram@elan1.carb.nist.gov
    Reagan became senile.  Bush already was.  Clinton acts like he is.
</message>
<message id="<1994Sep2.060104.2194@sserve.cc.adfa.oz.au>" date="2987474464">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 06:01:04 UT
From: Tom Worthington \<tomw@ccadfa.cc.adfa.oz.au>
Organization: Australian Defence Force Academy, Canberra, Australia
Message-ID: <1994Sep2.060104.2194@sserve.cc.adfa.oz.au>
Subject: Electronic Document Management Pocket Guide available

A "Pocket Guide to the Management of Electronic Documents in the Australian
Public Service" is now available by FTP:

	URL: ftp://archie.au/ACS/edocgd.html

Those without a HTML document browser can send a message to
"listproc@www0.cern.ch" with "www ftp://archie.au/ACS/edocgd.html" in the
body of the message.  A text version of the document will be sent back.
Please note that this is an experimental service provided by CERN.

This leaflet has been produced as a ready reference for APS managers
responsible for, or concerned about, the control of electronically stored
corporate records.  It provides a condensed guide to the principles for
management of electronic documents, as set out in the report "Management of
Electronic Documents in the Australian Public Service", published in 1993
by the Commonwealth Government's Information Exchange Steering Committee
(IESC).

The IESC is an advisory body, responsible for providing guidance to
Commonwealth agencies on policies and strategic directions relating to
Information Technology and related issues, including telecommunications.
For further details of the IESC contact Max McGregor (e-mail:
max.mcgregor@finance.ausgovfinance.telememo.au, ph: +61 6 263 3553, fax:
+61 6 263 2276).

PS: Don't miss (because I am talking at it):

	PLAYING FOR KEEPS: An electronic Records Management Conference
	Hosted by Australian Archives
	Canberra Australia 8-10 November 1994
	For details e-mail: acts@ozemail.edu.au
	Phone: +61 6 2573299 or Fax: +61 6 2573256


-- 
Posted by Tom Worthington \<tomw@adfa.oz.au>
Chair of the IESC Electronic Document Management Subcommittee
& Senior Policy Advisor, Data Administration Standards
Department of Defence
Room B-3-25, Russell Offices, Canberra ACT 2600, Australia
Ph: +61 6 2651258, Fax: +61 6 2653601, Pager: +61 6 2856209
X.400: 
G=Tom;S=Worthington;OU=CM-DIMP;O=HQADF;P=ausgovdefencenet;A=telememo;C=au
2 September, 1994  File no: HQ 93-33989
</message>
<message id="<1994Sep2.091013.413@ittpub>" date="2987478613">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 07:10:13 UT
From: "William D. Lindsey" \<william@ittpub.nl>
Message-ID: <1994Sep2.091013.413@ittpub>
References: <9409012114.AA19892@source.asset.com>
Subject: Re: SGML New User Requesting General Information

[Len Bullard]

|   After my posting to "newbies" on the basics of SGML, I received a
|   number of kind and complementary postings suggesting that I'd written
|   something of benefit to the community.  Thank y'all very much.  Nothing
|   works better on me than applause.

Add my name to the membership list of the appreciative audience of that
article.  I found it to be a very readable introduction to SGML and have
saved a copy to share with computer-aware people who have expressed
curiosity about the subject.

I'd like to take this opportunity to nominate Mr. Bullard for the
un-official post of "Maintainer of the comp.text.sgml FAQ".  He's made a
fine start.

cheers,

Bill Lindsey                     bill@ittpub.nl
</message>
<message id="<1994Sep2.081209.24886@falch.no>" date="2987482329">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 08:12:09 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep2.081209.24886@falch.no>
References: <342me6INNn9i@moon.cis.ohio-state.edu> <344h1o$55h@hopper.acm.org> \<jsaunder-010994104514@slip2-16.acs.ohio-state.edu>
Subject: Re: What is the relationship between entities and attributes?

[James Saunders]

|   We are involved in a project that allows construction of a DTD for a
|   collection of tagged documents that do not have a DTD.  As such, this
|   is not an easily solved problem.  My question was, what information
|   that is contained in a tags attribute, or in other tags in the same
|   document, allows you, without knowing the DTD, to determine that the
|   tag attributes declared value is CDATA, ENTITY, ENTITIES etc.
|
|   ie  \<artwork file = "pic12">      pic12      is an entity
|       \<artwork size = 100mm 200mm> "100m 200m" is not an entity

If you know the SGML declaration (or can safely assume that the reference
concrete syntax is being used), the actual characters used in the attribute
values will give you some clues.

For example, "pic12" *could* be an entity, because it starts with a valid
name start character and only contains valid name characters.
(Unfortunately, it could be anything at all, except a number token or
number token list.)

"size = 100mm 200mm" - because it lacks literal delimiters - cannot in any
case be a single attribute specification.  If it is valid SGML it can only
be equivalent to "size = '100mm' otheratt = '200mm'".

In this case your DTD construction algorithm would be able to infer that
the attribute 'size' could only have the declared value NMTOKEN, NMTOKENS,
NUTOKEN, NUTOKENS or CDATA (because it doesn't start with a valid name
start character).  It could also infer that the declared value of attribute
'otheratt' (whose name would be unknown, of course) is a name token group,
of which '200mm' is a member (because otherwise the omission of the
attribute name would have been illegal).  Neither of these attributes could
in any case be an entity.

If your second example had read '\<artwork size = "100mm 200mm">', size
could still not have a declared value ENTITY, because its specified value
contains a space, which is not a valid name character.  Nor could it be of
type ENTITIES, because both tokens start with invalid name start
characters.  For the same reasons, size could not be ID, IDS, IDREF,
IDREFS, NAME or NAMES, etc.  The only possible interpretations would be
CDATA, NMTOKENS or NUTOKENS.

Note that even if you are unable to make any assumptions about the SGML
declaration, an attribute value that begins with a digit can only ever be
CDATA, a name token (not a name), a number or a number token.  It can
*never* be an entity, because (as far as I understand) it is not possible
to give digits the role of name start character.

Best regards,

Steve
-- 
</(pepper)steve>                                   pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 2216 3040                                fax +47 2216 2350
</message>
<message id="<345s2r$85m@felix.dircon.co.uk>" date="2987483230">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 08:27:10 UT
From: Bruce Hunter \<bruce@sgml.dircon.co.uk>
Organization: The Direct Connection Ltd
Message-ID: <345s2r$85m@felix.dircon.co.uk>
Subject: ActiveSystems

Anyone got an email address for ActiveSystems, or any experience with their
advertised products ActiveSearch and ActiveServer?

regards,

	Bruce Hunter
	SGML Systems Engineering
	bruce@sgml.dircon.co.uk
</message>
<message id="<1994Sep2.110522.414@ittpub>" date="2987485521">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 09:05:21 UT
From: "William D. Lindsey" \<bill@ittpub.nl>
Message-ID: <1994Sep2.110522.414@ittpub>
References: \<dshema-0109941701410001@kaleetan.rt.cs.boeing.com>
Subject: Re: Underscores and the Sgmls validating parser?

[Dave Shema]

|   My SGML data contains entities and attributes with underscores ("_").
|   I am trying to use sgmls, derived from ARCSGML, as the validating
|   parser.  This parser does not seem to like underscores.
:
|   Short of going into the C code and changing values in the tables (what
|   we ended up doing with ARCSGML), can this parser be encouraged to
|   accept underscores?

No.  The only way I could get underscores to be accepted by sgmls was by
hacking (gently) the source.

Around line 957 of sgmldecl.c (sgmls-1.1) change 
from:
               else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
                        && c != '.' && c != '-') {
to:
               else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
                        && c != '.' && c != '-' && c != '_' ) {

This change had the unintended side effect of altering the results for
test019 in the test suite.  I haven't studied the test carefully, but it
may be that the new results are correct.

I hope this helps.

Bill 
-- 
Bill Lindsey                       bill@ittpub.nl
</message>
<message id="<1994Sep2.114316.415@ittpub>" date="2987487795">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 09:43:15 UT
From: "William D. Lindsey" \<bill@ittpub.nl>
Message-ID: <1994Sep2.114316.415@ittpub>
References: <1994Sep2.110522.414@ittpub>
Subject: Re: Underscores and the Sgmls validating parser?

[William D. Lindsey]

|   This change had the unintended side effect of altering the results for
|   test019 in the test suite.  I haven't studied the test carefully, but
|   it may be that the new results are correct.

I apologize for this mis-information.  Upon closer inspection, I found that
I had edited the test019.sgm source file at some point.  The patch to
sgmldecl.c causes NO differences in any of the test results.

Bill 
--  
Bill Lindsey                       bill@ittpub.nl
</message>
<message id="<CvI0t3.14H7@hawnews.watson.ibm.com>" date="2987491862">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 10:51:02 UT
From: Christoph Altenhofen \<caltenhofen@vnet.ibm.com>
Organization: IBM Germany, European Networking Center, Heidelberg
Message-ID: \<CvI0t3.14H7@hawnews.watson.ibm.com>
References: <345s2r$85m@felix.dircon.co.uk>
Subject: Re: ActiveSystems

[Bruce Hunter]

|   Anyone got an email address for ActiveSystems, or any experience with
|   their advertised products ActiveSearch and ActiveServer?

Their address:

Active Systems, Inc.
11 Holland Avenue, Suite 700
Ottawa, Ontario K1Y 4S1
Tel: +1 613 729 2043
Fax: +1 613 729 2874
E-Mail: sales@ctmg.isis.org

I sent a Fax to them to get some informations about ActiveServer and
ActiveSearch and they answered very quickly.

Up to now, I only studied the infos I got in respond of my Fax, but it
sounds quite interesting.

So is there anybody out there having experience with these products or any
other comercial products for databasing SGML documents (as DynaBase, BASIS
SGMLserver etc.)?

Any hints are welcome.

Christoph
-- 
* Christoph Altenhofen      /IBM Deutschland Informationssysteme GmbH
*                          |               European Networking Center
*  Tel.: +6221 / 59 - 4503 |        Dept. Open Document Communication
*  FAX : +6221 / 59 - 3400  \\  Vangerowstr. 18  *  D-69115 Heidelberg
*  IBMMAIL: DEIBMSW6         \\__________                      Germany
*  e-mail : CALTENHOFEN at VNET.IBM.COM  \\__________________________/
*           christo%limmat.heidelbg.ibm.com@ibmpa.awdpa.ibm.com
*  X-400  : C=DE;A=IBMX400;P=IBMMAIL;S=ALTENHOFEN;G=ALTENHC
</message>
<message id="<kimber.81.000ABA5B@passage.com>" date="2987509416">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 15:43:36 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.81.000ABA5B@passage.com>
References: <342me6INNn9i@moon.cis.ohio-state.edu>
Subject: Re: What is the relationship between entities and attributes?

[James Webster Saunders]

|   I came across the following reference in "Practical SGML":
|
|   \<Artwork name="FILE1 FILE2"> 
|
|   where FILE1 and FILE2 are entities.  In this case, the entities are
|   external files containing the tag data.  If a tagged document were
|   being parsed without a DTD and it contained such a reference, how would
|   one distinguish that this value is indeed a list of entities?

It's not SGML if there's not a DTD, so you can't meaningfully process a
document instance without a DTD.  With the DTD there is no ambiguitity
because the declared value prescription indicates what the attribute takes
as a value, in this case, ENTITIES.

-- 
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<DKU.94Sep2170924@zarniwoop.pc-labor.uni-bremen.de>" date="2987510961">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 16:09:21 UT
From: Dirk Kutscher \<dku@zarniwoop.pc-labor.uni-bremen.de>
Organization: PC-Labor der Universitaet Bremen
Message-ID: \<DKU.94Sep2170924@zarniwoop.pc-labor.uni-bremen.de>
Subject: data format of mapping files

Hi,

Does someone know if information about the data format of the replacement
files that allow translating a sgml document to a desired output format can
be obtained somewhere?

Or maybe someone could tell me the meaning of the "+" characters in the
mapping files...  (I have the qwertz dtd as an example.)

Thanks.

-- 
Bye,
	Dirk
</message>
<message id="<347sdh$5is@news1.digex.net>" date="2987520881">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 18:54:41 UT
From: Tommie Usdin \<acg-sgml@access3.digex.net>
Message-ID: <347sdh$5is@news1.digex.net>
References: <33fi64$grh@sernews.raleigh.ibm.com>
Subject: Re: Request for SGML 94 info

Excerpts from the Program for SGML '94

For more information or a full program contact:
Graphic Communications Association
100 Daingerfield Road
Alexandria, VA  22314-2888
Phone: 703/519-8160
Fax: 703/548-2867
E-Mail: blake@access.digex.net


                                   SGML '94
                              November 7-10, 1994
                        Sheraton Premiere, Tysons Corner
                                 Vienna, Virginia



Sunday, November 6
The Just Enough Tutorial Series
Marcy Thompson, Manager of Education and Training, SoftQuad 
Inc., Tutorial Coordinator

9:00 am-12:00 noon
Just Enough Concepts
   Introduction to SGML with no prerequisites. What is SGML? 
   Who uses it? How do they use it? How does it work?

Just Enough Syntax
   Introduction to SGML with no prerequisites. Basic overview 
   of SGML followed by a survey of SGML markup.

1:00 pm-4:00 pm
Just Enough Syntax and Just Enough Concepts (continued)

Just Enough Databases
   How does SGML mesh with document databases? Discussion of 
   full text, relational and object-oriented approaches.

Just Enough Electronic Delivery
   An overview of methods of delivering SGML documents 
   electronically.

Just Enough Paper Publishing
   What must you do to an SGML document to turn it into a 
   printed document? 

9:00 am-5:00 pm
OmniMark User Group

6:30 pm-10:30 pm        
Opening Reception and Dinner Aboard a Potomac River Cruise

Monday, November 7: General Session
9:00 am
Opening Remarks
Yuri Rubinsky, President, SoftQuad Inc., Conference Chairman

9:15 am
The Year in Review
   Yuri Rubinsky and B. Tommie Usdin, Vice President, ATLIS 
   Consulting Group, Conference Co-chair

10:00 am
Conference Keynote: State of the Web - The NCSA Mosaic View 
of the World's Largest SGML Application
   Joseph Hardin, Associate Director, Software Development 
   Group, National Center for Supercomputing Applications, 
   University of Illinois, Champaign-Urbana

11:00 am  Poster Session

Application Track  1:45 pm: Document Engineering at the Canadian 
Department of National Defence CALS Office.
   Ken Holman, Vice President, R\&D, Microstar Software Ltd.

2:30 pm: An SGML-based News Agency
   Tibor Tscheke, Managing Director, STEP GmbH 

3:30 pm : Towards an SGML-based Architecture for 
Operations and Maintenance Documentation in the 
Telecommunications Industry
   Wolfgang Weber, Systems Analyst, Siemens AG

4:15 pm: SGML Environment for Developers of Product 
Data Exchange Standards
   Lisa Phillips, Computer Scientist, and Joshua Lubell, 
   Computer Scientist, National Institute of Standards and 
   Technology

7:30 pm-10:00 pm: Evening Workshop: Table Handling,
Session Chair: Eric Severson, Executive Vice President, 
Avalanche Development

Theory Track
1:45 pm: Project YAO and Other News
   Dr. Charles F. Goldfarb, Principal Consultant, Information 
   Managment Consulting

2:30 pm: Document Conversion - How Does SGML Markup 
Acquire Behavior? 
   Kevin Allen, Senior Systems Engineer, InfoAccess

3:30 pm: SGML, DTD Design and Coding, In-House Publishing, 
Corporate Image, and Synergy
   Robert Erfle and Gunter von Zadow, IBM European Networking 
   Center

4:15 pm: SGML Model for Statistical Tables
   Dianne Kennedy, Vice President Strategic Systems, 
   ActiveSystems

7:30 pm-10:00 pm: Evening Workshop: Graphic Representation 
of Structure
   Session Chair: B. Tommie Usdin, Vice President, ATLIS
    Consulting Group

Tuesday, November 8
Management Track
8:30 am: SGML Is Not a Solution
   Marcy Thompson, Manager of Education & Training, SoftQuad 
   Inc.

9:00 am: Reconciling Internal and Interchange Requirements, or How to 
Survive the Industry Initiative
   Lani Hajagos, FrameBuilder Marketing Manager, Frame 
   Technology Corporation

10:00am: The Human Aspects of Using SGML
   Astrid E. Jenssen and Tone Irene Sandahl, University 
   Center for Information Technology Services, University of 
   Oslo

10:45 am: Practical Approaches to SGML Page Composition
   Francois Chahuneau, Director, AIS Berger Levrault

11:30 am: Implementation Issues and Project Management
   John W. Oster II, Principal Consultant, McAfee & McAdam, 
   Ltd.

1:00 pm: The Impact of SGML on Training in an Organization
   Jeanne El Andaloussi, Manager, Document Engineering Group, 
   Bull S. A.

1:45 pm: Reusing Information through SGML Building Blocks
   John J. Shockro, CEA, Incorporated

2:30 pm: Management of SGML Documents
   Eric Severson, Executive Vice President, Avalanche and 
   Ludo van Vooren, Director Customer Solutions, Interleaf

3:30 pm: SGML: It's Not Just for Documents Anymore
   Kurt Conrad, Internal Consultant, Boeing Computer Services

System Overview Track
Tools and Technologies for SGML-Based Information Systems

8:30 am: Introduction
   Mary Laplante, Executive Director, SGML Open

9:00 am: DTD, Application, and System Utilities
   Debbie Lapeyre, Consultant, ATLIS Consulting Group

10:00 am: Parsers, Transformers, and Conversion Tools
   Pamela Gennusa, Director, Database Publishing Systems, 
   Inc.

11:00 am: Editors and Authoring Systems
   Paul Grosso, Vice President, ArborText

1:00 pm: Databases, Document and Workflow Management
   Michael Sperberg-McQueen, Academic Computing Center, 
   University of Illinois at Chicago

2:00 pm: Electronic Delivery
   Tim Bray, Senior Vice President of Technology, OpenText 
   Corporation

3:15 pm: Layout and Composition
   Mark Walters, Editor, Seybold Publications


General Session 4:15 pm
Poster Session

6:00 pm Author Signing Party


7:30 pm-9:30 pm  Evening Workshop
SGML Open Panel: SGML and the Internet
   Session Chair: Larry Bohn, Interleaf

Wednesday, November 9
General Session
8:30 am: The Golden DTD: Using Data-Centered DTD's to 
Meet Business Goals
   Gregory S. Vaughan, Senior Technical Consultant, Database 
   Publishing Systems, Inc.

9:15 am: RealSGML: Digital Service Bulletins for Commercial 
Aviation
   Harry Summerfield, President, Zandar Corporation and 
   Freelon Hunter, Project Manager, Boeing Commercial 
   Airplane Group

10:00 am: Alchemy for the Masses: Automating the Construction 
of SGML Conversion Applications
   David Sklar, Director of Applications, Electronic Book 
   Technologies

11:00 am Poster Session

1:15 pm-3:00 pm  Product News Flashes

7:00 pm-10:00 pm  Product Demonstration Table Tops 

Thursday, November 10: Theory Track 8:30 am
The Whys, Whats and Hows of Partial Documents in SGML
   Eric Freese, Principal Software Developer, Information 
   Dimensions, Inc.     

9:00 am: SUBDOC, A Useful Construct for Publishing
   Mike Maziarka, Frame/Datalogics

9:30 am: Is SHORTREF Still Meaningful?
   John McFadden, President, Exoterica

10:15 am: Encoding SDIF in the Multipurpose Internet Mail Extensions 
(MIME)
   Edward Levinson, Technical Director, Accurate Information 
   Systems, Inc.

10:45 am: File Format for Documents Containing both Logical Structures 
and Layout Structures
   Makoto Murata, Fuji Xerox

11:15 pm: Simplified Authoring for a Complex DTD
   Keith Fabling, Lead Publications Engineer (CTAS), Boeing 
   Commercial Airplanes and Michael A. Murray, Senior 
   Principal Scientist, Boeing Computer Services

11:45 am: Creating SGML Objects for End-Users
   Jean Paoli, Technical Director, Grif S.A.

Application Track
8:30 am: Converting More Than 1 Million Pages to SGML
   Richard Barth, Director of Operations, Data Conversion 
   Laboratory

9:00 am: Conversion to SGML
   Bill Preacher, Managing Director, Pindar Infotek

9:30 am: Creation of Electronic Technical Manuals Using SGML
   James Frizzel, Software Development Engineer, Docucon

10:15 am: The Conversion of Legacy Technical Documents into IETMs; A 
NAVAIR Phase II SBIR Progress Report
   Timothy E. Billington, Senior Information Engineer, 
   Information Engineering Group, Aquidneck Management 
   Associates, Ltd.

10:45 am: Document Type Definitions: A Case Study for A Common Set of 
High Level Tags for an IETM
   Michael Graser, Martin Marietta Corporation

11:15 am: Building an SGML-based IETM
   Alan Porter, Technology Development Executive, OMI 
   Logistics Ltd.

11:45 am: The Reality of Military Document Analysis
   Lawrence A. Beck, Northrop Grumman Data 
   Systems/InfoConversion, and Lewis M. McCormack, Northrop 
   Grumman Aerospace & Electronics

General Session
1:00 pm
Lunch and Closing Keynote: Pushing the SGML Paradigm
   Jean-Pierre Gaspart, Managing Director, Associated 
   Consultants and Software Engineers (A.C.S.E)

Registration Form
I am attending (check all boxes that apply):

Just Enough Tutorials, Sunday, November 6
 _ Just Enough SGML Syntax
 _ 9:00 am-12:00 noon*
 _ 9:00 am-4:00 pm 
Just Enough SGML Concepts
 _ 9:00 am-12:00 noon*
 _ 9:00 am-4:00 pm 
* You may choose to only attend for the first half of the 
day. This gives you the opportunity to attend one of the 
other courses in the afternoon.
1:00 pm-4:00 pm (only)
 _ Just Enough Databases
 _ Just Enough Electronic Delivery
 _ Just Enough Paper Publishing

Tutorial Fees:
   Full Day Rates (9:00am-4:00pm)
   $145/GCA Member discount $110
   Half Day Rates (9:00am-noon or 1:00-4:00 pm)
   $85/GCA Member discount $50

Tutorial fees are additional. If you plan to attend the 
conference, you must pay a conference registration fee as 
well.

   SGML '94 Conference
      November 7-10, 1994
      Registration fee $845
      GCA member discount $640
      Educational Institution* discount $502
   *Accredited University or College

I would like to participate as a vendor in the product 
demonstration.*
      Nonmember rate $650/two 6-foot tables
      Member rate $425/two 6-foot tables

   Cruise Ship Dandy Reception and Dinner
      Sunday, November 6, 1994
      $10 Conference Registrant
      $30 Guest Fee


Organization Type (check one):
 _ Corporate Graphic Services
 _ Government
 _ Graphic Educational Institute
 _ Industry Media
 _ Manufacturer/Equipment
 _ Manufacturer/Material
 _ Manufacturer/Systems & Services
 _ Publisher
 _ Printer
 _ Reference Publisher
 _ Services Provider
 _ Software Develpeer
 _ Other_______________________________

Registrant Information:
Indicate Name as it will appear on badge:
    Name:
    Title:
    Company/Institution:
    Address:
    City:
    State or Province:
    Postal Code:
    Country:
    Area Code/Phone:
    FAX:
    Date:

Billing Information:
Check enclosed (Make checks payable to Graphic Communications 
Association)

Credit Card: 
     _ Visa
     _ MasterCard
     _ American Express
    Card number:
    Expiration date:
    Signature:
</message>
<message id="<3487pg$a9r@cmcl2.NYU.EDU>" date="2987532528">
Newsgroups: comp.text.sgml
Date: 02 Sep 1994 22:08:48 UT
From: Oliver Lefevre \<lefevre@acfcluster.nyu.edu>
Organization: New York University, New York, NY
Message-ID: <3487pg$a9r@cmcl2.NYU.EDU>
Subject: SCRIPT replacement

IBM used to have a thing called SCRIPT which worked on mainframes and DOS
PCs and, if I understand correctly, implements some sort of SGML.  SCRIPT
source code looks very much like TeX, albeit with different commands.  I
don't have access to the whole suite of SCRIPT software but I would like to
be able to process SCRIPT files nonetheless and print the output.  Does
anybody know how this might be done?  (NB: 1/ I am fairly conversant with
TeX 2/ I am no multi-millionaire, so commercial SGML tools might not be of
much help...)

Thank you very much beforehand

Olivier Lefevre
NYU Medical School, NY
</message>
<message id="<34a0g6$esh@news1.digex.net>" date="2987590598">
Newsgroups: comp.text.sgml
Date: 03 Sep 1994 14:16:38 UT
From: Paul Robinson \<tdarcos@access1.digex.net>
Organization: Tansin A. Darcos & Company, Silver Spring, MD USA
Message-ID: <34a0g6$esh@news1.digex.net>
Subject: On Line Specifications for SGML?

Is there a set of specifications which are available via FTP or otherwise
that describe the specifications for SGML?  Or a quick list?  Or should I
look at one of the source programs that translates SGML, such as a browser,
if one is available for MSDOS or Unix?

-- 
Reports on Security Problems: To Subscribe write PROBLEMS-REQUEST@TDR.COM
Paul Robinson - paul@tdr.com / tdarcos@MCIMail.com / tdarcos@access.digex.net
Voted "Largest Polluter of the (IETF) list" by Randy Bush \<randy@psg.com>
Voted "Largest Polluter of digex.general" by Mike \<voss@orange.digex.net>
</message>
<message id="<LENST.94Sep3231241@lysita.lysator.liu.se>" date="2987615561">
Newsgroups: gnu.emacs.sources,comp.text.sgml
Date: 03 Sep 1994 21:12:41 UT
From: Lennart Staflin \<lenst@lysator.liu.se>
Organization: Lysator Computer Society, Link|ping University, Sweden
Message-ID: \<LENST.94Sep3231241@lysita.lysator.liu.se>
Subject: ANNOUNCE: PSGML 0.4b2 -- SGML-mode for Emacs

PSGML version 0.4b2 is now available at

	ftp.lysator.liu.se: /pub/sgml/psgml-0.4b2.tar.gz

This is a bug-fix release.

PSGML is a major mode for editing SGML documents.  It works with GNU Emacs
19.19 and later or with Lucid Emacs 19.9 and later.  PSGML contains a
simple SGML parser and can work with any DTD.  Functions provided includes
menus and commands for inserting tags with only the contextually valid
tags, identification of structural errors, editing of attribute values in a
separate window with information about types and defaults, and structure
based editing.

This is still a beta version, but it is documented and reasonably stable.

More information is available in the World Wide Web
\<http://www.lysator.liu.se/projects/about_psgml.html>

-- 
Lennart Staflin  \<lenst@lysator.liu.se>
You are in a twisty little maze of URLs, all alluring.
</message>
<message id="<1994Sep4.035723.20957@sq.sq.com>" date="2987639843">
Newsgroups: comp.text.sgml,comp.infosystems.www.users
Date: 04 Sep 1994 03:57:23 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep4.035723.20957@sq.sq.com>
References: \<jdmCv9BDB.KqA@netcom.com> \<h0ZOkyr285y3071yn@netcom.com>
Subject: Re: Is there a HoTMetaL HTML temple for FAQs?

[James D. Murray]

|   I'm putting together a FAQ and I'd like to make it available as a Web
|   HTML file in addition to the traditional ASCII version.  I'm looking
|   into using HoTMetaL as the editing application (MS Windows)

[Frank McNeil]

|   Great?  Once created it will be really easy to update in HoTMetaL, due
|   to the structured way HoTMetaL presents tags and text.

Thanks for the plug :-)

[James D. Murray]

|   and I'd like to know if anyone has bothered to put together a template
|   for FAQs?

Not that we have seen, but if you make one and want to send it to us, we'll
certainly consider it including it in a future release of HoTMetaL.

HoTMetaL templates are simply ASCII HTML files, there's nothing special
about them -- so you can simply put files in your templates directory.
Also, you can have multiple documents open, and use copy/paste between
them.

Lee

(HoTMetaL is available for ftp at the sites listed in my .signature, and
probably lots of other sites too...)

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<CvLxK7.Gys@lut.ac.uk>" date="2987674279">
Newsgroups: comp.text.sgml
Date: 04 Sep 1994 13:31:19 UT
From: Martin Hamilton \<martin@mrrl.lut.ac.uk>
Organization: c/o Loughborough University, UK
Message-ID: \<CvLxK7.Gys@lut.ac.uk>
References: <33v8g7$ksc@urmel.informatik.rwth-aachen.de> \<STEINARB.94Aug31113631@flame.falch.no>
Subject: Re: Hypermedia-Mail (Was: HyTime problems)

[Steinar Bang]

|   What is "hypermedia-mail"?

text/html :-)
</message>
<message id="<34cqo7$lj1@news.ycc.yale.edu>" date="2987683015">
Newsgroups: comp.infosystems.www.misc,comp.text.sgml
Date: 04 Sep 1994 15:56:55 UT
From: Una Smith \<una@doliolum.biology.yale.edu>
Organization: Yale University, Department of Biology
Message-ID: <34cqo7$lj1@news.ycc.yale.edu>
References: <344tf2$ks7@Starbase.NeoSoft.COM> \<q3cPkq530EyZ057yn@oslonett.no> <345a7t$480@starbase.neosoft.com>
Subject: Re: The construction of FAQs

[Cameron Laird]

|   Here's the question: is there any good reason not to construct all FAQs
|   from now on in HTML, rather than plaintext?

I think I've got the attribution correct.  My apologies if not.

The reason I've stuck to plain text is that none of the document
description packages out there are sufficiently easy to set up that it
would be worth my time right now to convert my FAQ into one of them.  Given
my readership, it is imperative that I be able to produce attractive plain
text versions of the FAQ.  HTML does not do this nicely enough, especially
when URLs are used; I've seen really dreadful versions of my FAQ that were
HTMLized by others and then re-extracted to plain text.

I would prefer to use something like LaTeX, that would keep the internal
references correct without manual editing.  And I know there are
LaTeX-to-HTML converters out there, so I am tempted to switch over.  The
most commonly requested form is PostScript, for printed handouts during
workshops.  Ideally, I could keep a single source document, and it could be
filtered through any document format driver on demand: PostScript, Acrobat,
RTF, HTML, you name it.  We're moving in that direction, but aren't nearly
there yet.  For now, I'll stick to the low road: plain text.

-- 
	Una Smith			smith-una@yale.edu

Department of Biology, Yale University, New Haven, CT  06520-8104  USA
</message>
<message id="<tedgCvM86M.6yp@netcom.com>" date="2987688045">
Newsgroups: comp.text.sgml
Date: 04 Sep 1994 17:20:45 UT
From: Edgar Gilchrist \<tedg@netcom.com>
Message-ID: \<tedgCvM86M.6yp@netcom.com>
Subject: Storing SGML in a database

I am confronted with a classic documentation problem: producing multiple,
port specific versions of a generic document.  I would like to consider an
SGML/database approach.  I envision that the document could be stored in a
database as distinct SGML-tagged modules. In the case of a port-specific
variation in a module, I imagine that there would be a distinct record,
encoded in such a way that it would be correctly included when the
port-specific document was assembled.

Now, I know that this is not a new approach.  Can anyone point me to
technical articles, etc., that may exist on this subject?

Thanks,

  Ted Gilchrist
</message>
<message id="<1994Sep5.055215.27280@sserve.cc.adfa.oz.au>" date="2987733135">
Newsgroups: comp.text.sgml
Date: 05 Sep 1994 05:52:15 UT
From: Tom Worthington \<tomw@ccadfa.cc.adfa.oz.au>
Organization: Australian Defence Force Academy, Canberra, Australia
Message-ID: <1994Sep5.055215.27280@sserve.cc.adfa.oz.au>
References: <1994Sep2.060104.2194@sserve.cc.adfa.oz.au>
Subject: Re: Electronic Document Management Pocket Guide available

[Tom Worthington]

|	PLAYING FOR KEEPS: An electronic Records Management Conference
|	Hosted by Australian Archives
|	Canberra Australia 8-10 November 1994
|	For details e-mail: acts@ozemail.edu.au
|	Phone: +61 6 2573299 or Fax: +61 6 2573256

The e-mail address for ACTS was incorrect. The correct address is:

	acts@ozemail.com.au

-- 
Posted by Tom Worthington \<tomw@adfa.oz.au>
Chair of the IESC Electronic Document Management Subcommittee
& Senior Policy Advisor, Data Administration Standards
Department of Defence
Room B-3-25, Russell Offices, Canberra ACT 2600, Australia
Ph: +61 6 2651258, Fax: +61 6 2653601, Pager: +61 6 2856209
X.400: 
G=Tom;S=Worthington;OU=CM-DIMP;O=HQADF;P=ausgovdefencenet;A=telememo;C=au
5 September, 1994  File no: HQ 93-33989
</message>
<message id="<34f23l$17u@felix.dircon.co.uk>" date="2987756085">
Newsgroups: comp.text.sgml
Date: 05 Sep 1994 12:14:45 UT
From: Bruce Hunter \<bruce@sgml.dircon.co.uk>
Organization: SGML Systems Engineering
Message-ID: <34f23l$17u@felix.dircon.co.uk>
References: \<dshema-0109941701410001@kaleetan.rt.cs.boeing.com> <1994Sep2.110522.414@ittpub>
Subject: Re: Underscores and the Sgmls validating parser?

[William D. Lindsey]

|   No.  The only way I could get underscores to be accepted by sgmls was
|   by hacking (gently) the source.
|   
|   Around line 957 of sgmldecl.c (sgmls-1.1) change 
|   from:
|                  else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
|                           && c != '.' && c != '-') {
|   to:
|                 else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
|                         && c != '.' && c != '-' && c != '_' ) {

A much simpler (and recommended) method is just to provide SGMLS with an
SGML Declaration in which the underscore character is made a valid NAME
character.

in the NAMING section just add the underscore character to the UCNMCHAR and
LCNMCHAR declarations, as in

NAMING
	UCNMCHAR	"-._"
	LCNMCHAR	"-._"

regards,
-- 
	Bruce Hunter
	SGML Systems Engineering
	bruce@sgml.dircon.co.uk
</message>
<message id="<1994Sep5.152354.5468@exoterica.com>" date="2987767434">
Newsgroups: comp.text.sgml
Date: 05 Sep 1994 15:23:54 UT
From: "Eric R. Skinner" \<ers@exoterica.com>
Organization: Exoterica Corporation
Message-ID: <1994Sep5.152354.5468@exoterica.com>
References: \<north.194.00119726@knoware.nl>
Subject: Re: SGML Declarations, why ?

In article \<north.194.00119726@knoware.nl> north@knoware.nl (Simon North)
asks a few questions about the SGML Declaration and mentions the lack of
detailed explanations of this area of SGML.

To risk boring the lot of you who have seen this mentioned before,
Exoterica has a 40-page explanation entitled "Understanding the SGML
Declaration" which goes through the whole thing in detail.

It's free - just write to info@exoterica.com and request a copy.  Provide
your air mail address as we will mail you the document.

Regards,
-- 
Eric R. Skinner                          ers@exoterica.com
Exoterica Corporation                  Tel +1 613 722 1700
Ottawa, Canada                         Fax +1 613 722 5706
Product information:                    info@exoterica.com
</message>
<message id="<19940905.4851@naggum.no>" date="2987782176">
Newsgroups: comp.text.sgml
Date: 05 Sep 1994 19:29:36 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940905.4851@naggum.no>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com>
Subject: HyTime critique (was CONCUR usefulness existence proof)

this was becoming too long, so I cut the places where I agree with Eliot.
that's not to imply that if I don't comment, I agree, but that should be
obvious if you read further.

[W. Eliot Kimber]

|   I also do not consider marked sections to be generally useful or
|   robust, both because there is no dynamism of marked sections controls
|   and because marked sections are not bound to element boundaries, which
|   can make them difficult to manage and work with.

I don't understand what "dynamism of marked section controls" refers to, so
I can't address that.

however, if I replace "marked sections" with "entities" in the rest of this
sentence, I end up with a strong argument against entities!  maybe this is
unfair, since we all agree that entities are very useful, but it shows that
arguments that are used negatively about one thing and positively about
another aren't very good arguments and probably hide the _real_ arguments.
that marked sections are constrained to entity boundaries in my opinion
should make them far easier to deal with than entities that aren't bound to
element boundaries, and thus not so difficult to manage or work with.  they
are also not declared in a distant prolog, which should make them easier to
deal with.

I'd like to know that "dynamism of marked section controls" means, because
the other argument is now dismissed.

|   I was referring to the robustness of the functionality HyTime represents.  

you said: "My take on CONCUR is that what CONCUR can do weakly HyTime
allows you to do robustly."  since there isn't much description of "the
functionality [that] HyTime represents" in the standard, I interpreted this
to refer to the syntax of HyTime, just as CONCUR refers to the syntax of
_its_ mechanism.

CONCUR is relatively solid in the way it combines two documents into one.
the syntax is clear, and all the information required to split them apart
again is localized to the seems between the two documents.  with some
fairly straight-forward constraints, it is possible to change either
document without ripping the seems.  this is not dependent on any concept
of location or ID's or anything.  it just happens to be a fairly easily
identified aspect of character strings.

now, can HyTime do _this_ more robustly than CONCUR does?  from my limited
understanding of HyTime, I can see several contradictions to this claim.

depending on what you consider CONCUR to do, I'm sure it's possible to
reduce CONCUR to a basically useless mechanism and then show HyTime to be
superior.  however, we should compare by means of the highest possible
level of expressibility for both solutions.

now, mind you, I don't think CONCUR should be in the standard any more than
DATATAG and RANK should be (all can be outboarded without much trouble), so
I'm a priori interested in anything that can replace it, but that doesn't
mean I buy any old argument about which is weaker and stronger.  I want
truth in advertising, but I find hypermedia to be more hype than media, and
dangerously low on testable facts and hypotheses; HyTime fits this bill.

|   These functions do not necessarily lend themselves to the sort of
|   comfortably computable processing that computer scientists seem to
|   prefer.

I'll gladly take your word for it, but I'm not sure you understand just how
damaging to your own cause this statement is.

|   HyTime reflects both the reality of data and the need for generality.

the "reality of the data" is that if you can't engage them in "comfortably
computable processing", they are and will remain just so many bits.

|   This is a silly argument and I'm surprised you are making it, as it
|   completely misses the point of HyTime and the problems it tries to
|   solve.

sorry to say so, Eliot, but despite valiant efforts both on your part and
by several other people, the "point" of HyTime is still a widely dispersed
mist consisting mainly of hype and vaporware.  I do know which problems
HyTime tries to solve.  you should have discovered by now that I don't
think HyTime succeeds in solving those problems in a way which will make
computer scientists (or, heck, Microsoft-influenced coders) adopt it.  that
is, the solution is inelegant, clumsy and invites serious criticism that
you and others brush off with statements such as that quoted above.

|   HyTime doesn't do validation of document type declarations *because
|   it's incomputable in some cases*.

although I don't think you know the force of this statement, either, I'll
accept it at face value and quote you on it.  I think, however, that you
should be a little more careful with what you say.  there _are_ computer
scientists reading this, and they _may_ be wondering whether to adopt
HyTime or leave it in paper.

|   Also, it is essential that HyTime allow document types that allow
|   invalid constructs -- that's the only way HyTime can hope to peacefully
|   integrate with existing applications that predate HyTime or that have
|   other requirements that do not always allow HyTime conformance.

that's the "only way"?  geez.  as far as I could understand, the whole idea
with HyTime was that it should not require modification of the documents
into which you point.  Integrated Open Hypermedia, right?  this I always
interpreted to mean that you would have to build in some HyTime support in
your applications such that they would not _need_ to mess with extant
document types or instances, but could reference them from outside.  now
all of a sudden we _have_ to mess with extant document types?  what next?

I do understand that you need to emphasize that document types cannot
always be validated, but that's not my question.  my question is: if and
when I want to validate it, can I?  you don't answer this question.

|   But of course, there's no reason you can't define document types that
|   are *guaranteed* to produce structurally valid HyTime documents.  If
|   you want that level of comfort, define your document types that way.  I
|   certainly recommend it if at all possible, but there are many cases for
|   which it would be impossible or inappropriate (the TEI, Docbook, and
|   HTML come to mind immediately).

I was asking whether it was possible to know, a priori, whether a document
type conforms to HyTime, that is, whether a document type would always
produce conforming HyTime documents without having to parse all the
possible instances of that document type.  I have received no answer to
this question, just a bunch of slippery sales-talk that says nothing one
way or the other.  "there's no reason you can't" does not tell me how I
can, or even _whether_ I can.  and you turn the whole issue on its head
with your "if you want that level of comfort, define your document types
that way".  don't you _understand_ what I'm asking for?  the question is:
HOW DO I KNOW THAT WHAT I _THINK_ IS "GUARANTEED" TO PRODUCE STRUCTURALLY
VALID HYTIME DOCUMENTS ACTUALLY WILL?  "define your document types that
way" is just evading the question.  if I were only a wee bit nervous about
HyTime before this, I would close my eyes and hope it would pass away
afterwards.

|   Obviously this is a specious argument given the requirement stated
|   above.  Also, I'm not sure what you mean by "loose".  If by loose you
|   mean that you can leave the choice of what form an element conforms to
|   to authors, again, HyTime has to allow that level of flexibility, but
|   you don't need to put in your document types.

I have understood the concept and the execution of the architectural forms,
Eliot.  this is not HyTime 101, anymore.  I do not see any value of this
approach, however, because it dislocates the information that it tries to
convey, and so is inherently unstable.  to be precise, not any value above
competing proposals.  one of the "winning arguments" of HyTime was that
people should be able to look to this solution and adopt it rather than
reinvent the wheel.  now, HyTime only tells those who come to look that a
"wheel" is "sort of round-like" (only expressed in seriously convoluted
language), which could have been valuable if you didn't already _know_ that
wheels are circular.  HyTime is a catalog of hypermedia concepts and ideas,
with a hell of a lot of syntax that you end up not using.  some of those
ideas are brilliant, but the syntax only provides a major haystack in which
to look for them.

SGML is rigid, which is one of its strength (but also a liability when some
argue that loosening up the rigidity would break the whole thing), and it
provides very specific, known freedoms and disallows others.  what does
HyTime do?  there are so many options I get lost in them; I can't validate
a DTD against what I think could be expressed as element classes; there are
redundant ways to do some things, and only one way to do other things that
really need several ways because they differ in use or purpose.  what is
HyTime's model of object rendition and modification?  what is the semantics
associated with all these concepts that would form the foundation for its
application?  I can look at all this, and I can understand about 25% of it.
sifting through the uselessly intricate indirections of the syntax to see
where things might apply, I only wind up with more than a brainful of loose
threads.

I don't have a pressing need to waste months of my life to grok something
that I am steadily losing confidence in to boot.  the more I hear about how
great HyTime is from others, the more I get a sinking feeling that they're
trying to have _me_ confirm to _them_ how great it is.  sorry, I don't work
that way.  if it weren't for this, I maybe wouldn't think, or say I think,
that it's all a carefully plotted hoax.

|   The association could not be tighter than fixed attribute values naming
|   the element type forms.

oh, it could.

you have yourself stated the major argument for getting HyTime out as a
standard, and I can confirm from firsthand information that this is it: it
needed to be out to beat the other contenders.  you argue strongly that
it's here, that it uses SGML as it exists today, and that we must make do
with what we have.  now hear this: there is much evidence that the time
that was spent on HyTime was consciously and deliberately _not_ spent on
"reviewing" SGML.  ISO committees are free to propose amendments to their
standards at _any_ time, and getting some of the ideas in HyTime into SGML
would be a piece of cake.  there was, in other words, a conscious decision
not to revise SGML, but to go with the existing standard, despite mounting
evidence that SGML needs to be amended and extended.  now HyTime is used as
an excuse not to extend SGML with simpler, cleaner, and better ways to do
the things that HyTime does because HyTime already "does the job".

at this point, we're looking at a fact of history that says that HyTime is
here, and a better SGML is not.  swell!  it didn't have to be this way, and
there is no critical mass of HyTime documents out there, and none that uses
the standard, anyway, since the HyTime Catalog "version 2.0" fixes several
seriously broken things in it.  so we can fix it.  we can go back and say
that we made a mistake in pushing it so hard while it still couldn't stand
on its own, and that we're going back to the drawing board.  only the worst
of cowards are afraid to acknowledge failure when it stares them in the
eye, and no amount of political defeat can outdo a broken standard that
remains unfixed in terms of loss of credibility.

but to counter your actual argument: an element does not become an instance
of an element type form just because it says so.  a number of factors are
involved, such as congruency of attribute lists and content models.  I want
to know, and I want to know that I have made a valid HyTime-conforming DTD,
or I won't _use_ HyTime: the risk of running into a serious problem down
the road because of an easily fixable but undetectable flaw in the design
makes me all queasy and nervous.  and, as you have yourself so eloquently
demonstrated above, I _can't_ know.

to be clear: the only way I can know that an element type conforms to an
element type form is to do an exhaustive search on all the possible actual
contents according to the content model of the element type form as well as
that of the the element type.  if there is a "content set" in the latter
set that are not in the former, I have disproved the assertion.  that is,
in the dreaded computer scientist terminology, we're establishing a less-
than-or-equal relation for a set of regular expressions.  this is known to
be a hard problem.  not in the programming department, but in the "tell me
the result before the sun goes nova" department.

HyTime could have given us computer scientists significantly improved sleep
if a description of what it means to conform to an element type form had
been provided that would restrict this problem to bring it down into the
"comfortably computable" (real) world.  the fact that such restrictions are
_not_ included I attribute to ignorance of computability theory, indeed of
computer science, and this makes those arguments to the effect that it's
the computer scientists' fault that things are more complicated than they
can "comfortably" handle, more than just specious.  it makes those
arguments obtain a tinge of arrogance towards the people who are set to do
the "dirty work" of implementing their "clean design".  may I suggest that
they just refuse?

but it's not just the use of attributes that makes HyTime loose.  it's the
use of the one space of unique ID's as the _universal_ naming and reference
mechanism.  this comes from the fact that to set up all the thingies that
need ID's on them, you need a whole "prolog" just for that, sometimes in
one place, sometimes scattered all over the place.  some like to label them
"meta-objects" and parade them far and wide as the solution to everything.
suffice to say there are more schools of thought than this one.

[Erik Naggum]

|   further, there is one obvious drawback with using more than one
|   document to store a given piece of information: synchronization.  we
|   have discussed this in the context of static documents, where this
|   problem is not so pronounced as in the general case.  using a variety
|   of addressing mechanisms, HyTime is able to use indirection and
|   relative addressing in many very useful ways.  however, the association
|   is inherently loose.  (my proposal to use relative addresses is also
|   loose, but I never claimed it was robust.)

[W. Eliot Kimber]

|   I would like to better understand why you think HyTime location
|   addressing is not robust.  If robustness refers only to validation,
|   then these are two separate issues, and your earlier definition of
|   robustness does not apply to addressing in general.  HyTime location
|   addressing is as robust as the data you're addressing allows.  If
|   everything has an ID, addressing couldn't be more robust.  If, on the
|   other hand, you're trying to locate characters in the content of
|   unidentified elements, then things are a bit shakier.  But at least
|   HyTime gives you sufficient indirection so that you can choose the best
|   binding of locations to make your addresses as persitent as possible.
|   This is what I think of as robustness.

I think we're talking a bit past each other.  let me take another spin on
this one.  by asking for robustness I want to address the testability of
something under stress.  in this case, I want to know whether a particular
change will affect the links that depend on (reference) this document.  put
another way, I want to be able to compute a dependency graph all the way
down to the character if I have to, in order to show whether changing
_this_ will require an update _there_ or not, and usefully, _where_ it
would require an update.  now, since HyTime provides mechanisms to point
into documents that don't know that they are the target of links, this is a
hard nut to crack.  still, for a bounded object set (one of HyTime's better
ideas), it is computable, provided you can wait, or tolerate an answer like
"try, and see what happens".

the HTML linking mechanism is not conceptually much weaker than HyTime's
(although in practice it is much weaker), yet reference (link) maintenance
with only a few changing documents is becoming a significant problem.  the
structure built by HTML is extremely unstable.  the indirection in HyTime
does help, but the fundamental problem is that while HyTime includes the
right concept, activity tracking, there isn't even a hint at how this would
or should be implemented in practice.  it is a distinctly non-trivial
exercise.

my assessment of HyTime is not unlike my assessment of SGML: both rate A+
for concepts, but HyTime rates a D- for specification, whereas SGML rates a
C.  (while I'm rating the SGML-related standards, let me add that so far (I
got DSSSL at the WG8 meeting that ended last week), DSSSL looks like a
straight A for both concepts and specification.)

I could immediately see that SGML was a Good Thing, and assumed that it
shouldn't be all that much of a problem to implement it cleanly.  with a
lot of experience in disproving this assumption, I was a bit more cautious
about HyTime.  I have come to conclude that it would have been an excellent
idea to spend a few more years on HyTime before it was released to the
world in the guise of an International Standard.  in retrospect, it was a
very bad thing for me to (help) vote this thing through ISO without
understanding it.  I deeply regret that I vouched for it, but life doesn't
have an "undo" function.  the lesson to learn is to study those DIS'es
carefully and not to confuse technical merit with political pressure.  as
far as I have heard over the past couple years, more national bodies have
gotten the message.  this should provide the necessary impetus to stop the
heedless rushing I expect when SGML comes up for "review" in 1996.

hope you enjoyed your labor day.

#\<Erik>
</message>
<message id="<Cvp2p5.BAn@inter.NL.net>" date="2987820904">
Newsgroups: comp.text.sgml
Date: 06 Sep 1994 06:15:04 UT
From: "W.D. Lindsey" \<lindsey@inter.NL.net>
Organization: NLnet
Message-ID: \<Cvp2p5.BAn@inter.NL.net>
References: \<dshema-0109941701410001@kaleetan.rt.cs.boeing.com> <1994Sep2.110522.414@ittpub> <34f23l$17u@felix.dircon.co.uk>
Subject: Re: Underscores and the Sgmls validating parser?

[Bruce Hunter]

|   A much simpler (and recommended) method is just to provide SGMLS with
|   an SGML Declaration in which the underscore character is made a valid
|   NAME character.
|
|   in the NAMING section just add the underscore character to the UCNMCHAR
|   and LCNMCHAR declarations, as in
|
|   NAMING
|   	UCNMCHAR	"-._"
|   	LCNMCHAR	"-._"

Have you tried this with sgmls version 1.1?  I could not get it to work
until I made the patch.  That was the point.  The NAMING declaration won't
work without patching the source, the patch has no effect without the
NAMING declaration you suggest.

Cheers,

-Bill

Bill Lindsey            william@ittpub.nl
-- 
William D. Lindsey                                  Prinses Irenelaan 13
						    2341 TP Oegstgeest
  lindsey@inter.NL.NET                              the Netherlands
</message>
<message id="<CvpoI4.DIz@news.cis.umn.edu>" date="2987848829">
Newsgroups: comp.text.sgml
Date: 06 Sep 1994 14:00:29 UT
From: R A Milowski \<milor001@maroon.tc.umn.edu>
Organization: University of Minnesota
Message-ID: \<CvpoI4.DIz@news.cis.umn.edu>
References: <345987$5g9@finnegan.iol.ie>
Subject: Re: SGML to Postscript/PDF

[Sean Mc Grath]

|   I cannot help feeling that someone, somewhere has written a nice C
|   function library with function calls like :-
|
|   	newpage()			- Start new postscript page

In PostScript:

    showpage

Note: Do not do a "showpage" for the first page!

|   	setfont(x)			- Select font

In PostScript:

/name getfont size scalefont setfont

Note: I may be wrong about this part, I'd have to look it up.

|   	boldon()			- Turn on bold face

To change to bold in PostScript, you must change the font to a boldface
font.  Thus, this is just like the setfont() above.

|   	centretext("Hello World")	- Display text centered on the
|   					  current line

You'd have to write a nasty little PostScript procedure to do this but the
final syntax would be:

   ("Hello World") PCenterText

Where "PCenterText" is the PostScript procedure.

|   	Para ("I am a paragraph")	- Output text as a paragraph, 
|   					  wrapping text as required

Again, you could (or someone) could write a small little procedure to set
and wrap the string into a paragraph.

   ("I am a paragraph") PSetPara

|   I have that awful "I'm missing something here" feeling.  Can anyone put
|   me out of my mysery?

Not really, what you want is a simple solution. 

BTW, I have done this before.  Thus, it is possible.  Unfortunately, I
don't have my hands on either the PostScript or the SGML->PS interface.

-- 
R. Alexander Milowski
SGML Operations Manager        milor001@maroon.tc.umn.edu
Microcom Inc.                  +1 612 825 4132
SGML Consulting -- "The SGML Solutions Experts"
</message>
<message id="<34i0br$eng@Mercury.mcs.com>" date="2987852603">
Newsgroups: comp.text.sgml
Date: 06 Sep 1994 15:03:23 UT
From: "William G. Lederer" \<wgl@MCS.COM>
Organization: Another MCSNet Subscriber, Chicago's First Public-Access Internet!
Message-ID: <34i0br$eng@Mercury.mcs.com>
References: <9408311946.AA22157@helium.biomol.uci.edu>
Subject: Re: Search for SGML parser generator

[Louise Falevsky]

|   I am working on a project for Protein Science Journal to electronically
|   publish the journal on the WWW.  I will be using the ISO 12083 DTD as a
|   basis for the SGML document markup.  I will be using WAIS to index the
|   SGML marked-up documents.  I want to create a parser from the 12083 DTD
|   so that I can parse the SGML documents for WAIS indexing.  I have tried
|   to use SGML-ASP on another DTD and have had no luck in creating a
|   parser.  The generator creates a grammer, but the doc_parser and output
|   *.asp are not created.  I have now tried the HTML.dtd and other dtd's
|   and am still unsuccessful.  I have not been able to find authors Sylvia
|   von Egmond and Jos Warmer E-mail addresses for direct help.

I can locate his e-mail address for you if you really want it, but my
suggestion is to get a copy of sgmls, which is publically available,
including source code.  It is known to work on many platforms.  Check with
Archie for location of sgmls.  I would give you the exact location, but it
is in my other office.
</message>
<message id="<1994Sep6.160507.4201@ast.saic.com>" date="2987856307">
Newsgroups: comp.text.sgml
Date: 06 Sep 1994 16:05:07 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep6.160507.4201@ast.saic.com>
References: <9409012114.AA19892@source.asset.com>
Subject: Re: SGML New User Requesting General Informatio

[Claude L. Bullard]

|   After my posting to "newbies" on the basics of SGML, I received a
|   number of kind and complementary postings suggesting that I'd written
|   something of benefit to the community.  Thank y'all very much.  Nothing
|   works better on me than applause.
|   
|   Ed Brachman at Interleaf made some very good suggestions about
|   improving the post.
|   
|   |   (1) Where you compare a DTD to an "ADT", I'm not sure you're doing
|   |   the newbie any favors.  I assume that "ADT" stands for "abstract
|   |   data type" -- but I'm not sure that calling a DTD an ADT helps even
|   |   for those of us who can make that assumption, and I'll bet that it
|   |   goes over the heads of at least some of the newbies you otherwise
|   |   do a good job of talking to.
|   
|   Quite right, Ed.  But every newbie is not new to every thing.  Some are
|   well-educated comp-sci types who just want to know what SGML IS-OR-Ain't.
|   A few well-placed "buzzWords" gives them the clues they are looking
|   for.  Unless they are navigating the nets with Mosaic, they are
|   hopefully not true innocents.  Even in that case, it behooves an
|   inquiring mind to get dictionaries for subjects they want to study.  In
|   this and future posts on the subject, I am consciously trying to
|   propagate a meme:
|   
|   MyMEME: SGML can be used for more than books.  To do it, one must get
|   Beyond The Book Metaphor... or finally realize that a lot of things we
|   think of as software are really automated books without binders.
|   
|   SERMON FOLLOWS (caveat emptor):

I have to agree with Len here.  I have found that for folks with a computer
science background, all I have to do is say something like " A DTD is sort
of a Meta-Grammer or EBNF specification which determines what all the
allowable sentences are in the language of a particular document type" and
they get it instantly.  However to get the same concept over to those who
never took a formal language theory course, I have to say something like "A
tagged SGML document consists mostly of tagged items.  The DTD specifies
what tags can appear in a particular document type and in what context they
can appear, e.g., You can have a subparagraph tag inside a paragraph tag,
but not the other way around."  and most of them usually get it.  The rest
need to be walked through numerous examples and then most get it.  Some
will never get it ;-).

In summary, I think it's best to include all three approaches in order of
increasing complexity with perhaps a caveat as to whom the paragraph is
intended for.  Too bad we don't have Hyperlinks here; then we could use a
"SkillTrack" attribute :-).

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep6.161652.5304@ast.saic.com>" date="2987857012">
Newsgroups: comp.text.sgml
Date: 06 Sep 1994 16:16:52 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep6.161652.5304@ast.saic.com>
References: <34f23l$17u@felix.dircon.co.uk>
Subject: Re: Underscores and the Sgmls validating pa

[Bruce Hunter]

|   A much simpler (and recommended) method is just to provide SGMLS with
|   an SGML Declaration in which the underscore character is made a valid
|   NAME character.
|   
|   in the NAMING section just add the underscore character to the UCNMCHAR
|   and LCNMCHAR declarations, as in
|   
|   NAMING
|   	UCNMCHAR	"-._"
|   	LCNMCHAR	"-._"

Well that's just the problem.  When you try this SGMLS complains about
"Unsupported Feature" and about character 95.  I assume the original poster
did just what you suggested.  I know I did and it failed.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<Cvpyzw.Mw3@news.cis.umn.edu>" date="2987862429">
Newsgroups: comp.text.sgml
Date: 06 Sep 1994 17:47:09 UT
From: R A Milowski \<milor001@maroon.tc.umn.edu>
Organization: University of Minnesota
Message-ID: \<Cvpyzw.Mw3@news.cis.umn.edu>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com> <19940905.4851@naggum.no>
Subject: Re: HyTime critique (was CONCUR usefulness existence proof)

Not to stray too much, but, could either Erik or Eliot (both preferrably)
clarify an issue for me on HyTime:

Might one call HyTime an "introspective" standard in the sense that it
operates assuming that the SGML document is a resource to be queried?
Whereas, DSSSL operates in a outward fashion using the SGML document as a
starting point.

If so, doesn't an introspective standard have a weak link to its resources
since it needs to "query" or "link" into the document to perform its tasks?

Thus, SGML provides a rigid framework and HyTime "softens" that rigidity.

I don't claim to be an expert on HyTime, so please *no flaming*!

-- 
R. Alexander Milowski
SGML Operations Manager        milor001@maroon.tc.umn.edu
Microcom Inc.                  +1 612 825 4132
SGML Consulting -- "The SGML Solutions Experts"
</message>
<message id="<34j3n6$it4$1@heifetz.msen.com>" date="2987888806">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 01:06:46 UT
From: Edward Vielmetti \<emv@garnet.msen.com>
Organization: Msen, Inc. -- Ann Arbor, MI (account info: +1 313 998-4562)
Message-ID: <34j3n6$it4$1@heifetz.msen.com>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com> <19940905.4851@naggum.no>
Subject: Re: HyTime critique (was CONCUR usefulness existence proof)

[W. Eliot Kimber]

|   These functions do not necessarily lend themselves to the sort of
|   comfortably computable processing that computer scientists seem to
|   prefer.

[Erik Naggum]

|   I'll gladly take your word for it, but I'm not sure you understand just
|   how damaging to your own cause this statement is.

If you can't compute things, then code can't be written, then work can't be
done, and people move on to something else they can implement.  From this I
conclude that HyTime can be safely ignored since it's unlikely that anyone
will be able to specify that an implementation meets any testing standards.

|   the HTML linking mechanism is not conceptually much weaker than
|   HyTime's (although in practice it is much weaker), yet reference (link)
|   maintenance with only a few changing documents is becoming a
|   significant problem.  the structure built by HTML is extremely
|   unstable.  the indirection in HyTime does help, but the fundamental
|   problem is that while HyTime includes the right concept, activity
|   tracking, there isn't even a hint at how this would or should be
|   implemented in practice.  it is a distinctly non-trivial exercise.

Right.  For that matter, the Internet is "extremely unstable", in the sense
that failure happens all over the place and sometimes gets in the way of
things working.  It is possible to make reference (link) maintenance work
better, if you have sufficiently motivated people doing the publishing; it
is also possible to have reference (link) creation work better if you have
sufficiently clued in people writing authoring tools.

I am intrigued by the DSSSL comments, Erik, if you can give a good overview
for those of us who can only stomach standards discussions once every few
months it would be welcome.

-- 
  Edward Vielmetti, vice president for research, Msen Inc. emv@Msen.com
Msen Inc., 320 Miller, Ann Arbor MI  48103 +1 313 998 4562 (fax: 998 4563)
</message>
<message id="<34j9k3$a55@deep.rsoft.bc.ca>" date="2987894851">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 02:47:31 UT
From: Tim Bray \<a07893@giant.rsoft.bc.ca>
Organization: MIND LINK! Communications Corp., Langley, BC, Canada
Message-ID: <34j9k3$a55@deep.rsoft.bc.ca>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com> <19940905.4851@naggum.no>
Subject: Re: HyTime critique (was CONCUR usefulness existence proof)

This is an important discussion, since there are a lot of people out there
worrying about whether they have to worry about Hytime.

Summary: HyTime has problems, but we should use it anyhow.

I think the specification of HyTime is ungodly bad and it worries me
profoundly that people like Eliot, who've been sort of living this stuff
for a long time now, admit to not understanding significant parts of it.
And I basically just don't buy the sweeping evangelism coming from Eliot,
Dr. G, and others, about how HyTime is the answer to everything.

BUT!  Everyone I know who's doing anything serious with SGML is also doing
one or both of hypertext and multimedia.  Right now, they're building their
own hypertext/multimedia machinery at the application level.  HyTime
apparently provides a flexible, portable way to encode all of the linkage &
traversal facilities you need to do these things.  And, like SGML, while
the standard is impenetrable, HyTime markup in text is actually pretty easy
to read and figure out what it's doing.

Given the above, from now on, when I have a customer who's going to be
doing any of this stuff, I'm going to advise them to encode it in HyTime.
There may well be solid commercial products on the market that portably and
robustly do the right thing, and if not, you were going to have to build it
yourself anyhow.  So the decision seems like a no-brainer to me.

Cheers, Tim Bray, Open Text Corporation
</message>
<message id="<34ja24$ac1@deep.rsoft.bc.ca>" date="2987895300">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 02:55:00 UT
From: Tim Bray \<a07893@giant.rsoft.bc.ca>
Organization: MIND LINK! Communications Corp., Langley, BC, Canada
Message-ID: <34ja24$ac1@deep.rsoft.bc.ca>
References: <345987$5g9@finnegan.iol.ie> \<CvpoI4.DIz@news.cis.umn.edu>
Subject: Re: SGML to Postscript/PDF

I once typeset a book directly from SGML into PS, even did the paragraph
filling in PS code right in the interpreter, the book came out looking
great.  (1-page modules, so no page breaking logic required).

Conclusions: 1. It can be done.  2. It's a lousy idea; for the same reason
that writing an SGML parser in assembly language would be wrong.

Cheers, Tim Bray, Open Text Corporation
</message>
<message id="<34jf0n$a1q@Russell.Stanford.EDU>" date="2987900375">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 04:19:35 UT
From: Syun Tutiya \<tutiya@Russell.Stanford.EDU>
Organization: Stanford University
Message-ID: <34jf0n$a1q@Russell.Stanford.EDU>
References: \<vanyel.778192354@camelot>
Subject: Re: SGML in Asian languages?

[Alan Williams]

|   In a recent discussion, some of the people with whom I work have
|   brought up the question of how to do SGML tagging (specifically
|   sentential tagging) in documents written in ideographic languages, like
|   Chinese.  Does anyone out there have any insight/experience on this
|   particular matter?

Although I do not understand what Alan means by sentential tagging, we do
not have any particular problem with the partially ideographic languages
like Japanese.  Characters are no problem as long as your parser
understands SGML Declaration and SGMLS parses documents quite nicely.
ISO 8879 is now translated into Japanese as JIS X 4151.

Syun Tutiya
Chiba University
</message>
<message id="<34kmac$9e2@ruby.ora.com>" date="2987940620">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 15:30:20 UT
From: Terry Allen \<terry@ora.com>
Organization: O'Reilly & Associates, Inc.
Message-ID: <34kmac$9e2@ruby.ora.com>
Subject: US standards publishers?

I'm looking for US providers of ISO specs (I'm interested in the
forthcoming DSSSL spec).  The only source I've used was OMNICOM, and I'm
looking for alternatives.  If you know a good one, please send me email.
Thanks.

-- 
Terry Allen
terry@ora.com
</message>
<message id="<34kmfj$1217@rs18.hrz.th-darmstadt.de>" date="2987940787">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 15:33:07 UT
From: Joachim Schrod \<schrod@iti.informatik.th-darmstadt.de>
Organization: TH Darmstadt, FG Systemprogrammierung
Message-ID: <34kmfj$1217@rs18.hrz.th-darmstadt.de>
References: <9409012114.AA19892@source.asset.com> <1994Sep6.160507.4201@ast.saic.com>
Subject: Re: SGML New User Requesting General Informatio

[Claude L. Bullard]

|   After my posting to "newbies" on the basics of SGML, I received a
|   number of kind and complementary postings suggesting that I'd written
|   something of benefit to the community.  Thank y'all very much.  Nothing
|   works better on me than applause.

I'd like to add more applause -- and some critical comments, below... :-)

A few well-placed "buzzWords" gives them the clues they are looking for.

[Bob Agnew]

|   I have to agree with Len here.  I have found that for folks with a
|   computer science background, all I have to do is say something like " A
|   DTD is sort of a Meta-Grammer or EBNF specification which determines
|   what all the allowable sentences are in the language of a particular
|   document type."

Actually, I think this "buzzWord" is better than the ADT.  A DTD is not an
ADT, it does not define any semantics.  And defining abstract semantics is
all an ADT is about... (cf. Barbara H. Liskov and S. N. Zilles:
"Programming with Abstract Data Types", SIGPLAN Symposium on VHLL, 1974.
That's the seminal article that introduced the term ADT, btw.)

	Joachim

-- 
Joachim Schrod			Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany
</message>
<message id="<34knpj$1c7@deep.rsoft.bc.ca>" date="2987942131">
Newsgroups: comp.text.sgml,comp.text
Date: 07 Sep 1994 15:55:31 UT
From: Tim Bray \<a07893@giant.rsoft.bc.ca>
Organization: Open Text Corporation
Message-ID: <34knpj$1c7@deep.rsoft.bc.ca>
Subject: Electronic Document Viewers - Your Chance to Plug Yours

At SGML '94, I'm going to be doing a survey talk on document viewing
technology.  (There's a whole series of these survey talks on various
aspects of the techonology).  The idea is to introduce the design issues
and provide a framework for classifying the products that are out there.

The purpose of this note is to solicit input.  I know of the existence of
the following viewing systems, and have at least some information on hand
about them.  If your product isn't on the list, or it is but you think I
should get an up-to-date demo or technical literature, please send it to
me, either on email or Tim Bray, Open Text, 101-1965 W. 4th Ave.,
Vancouver, B.C. Canada V6J 1M8.

Since this is a technology survey, I don't mind talking about
projects-in-progress.

I AM PERFECTLY AWARE that the following list include some products with
little or no connection to SGML, and products that vary wildly in their
features and capabilities.

Vendor        Product        Comment
------        -------        -------
Adobe         Acrobat        Have heard SGML story, is there a pos'n paper?
Bellcore      Superbook
EBT           Dynatext
Folio         Views          I could use new literature
HaL           Olias
IBM           Bookmaster     Is that the actual product name?
Interleaf     WorldView      Any recent SGML news?
NCSA          Mosaic         There's a PC version called Chaos?
                             other WWW viewers?
NoHands       Common Ground
Northern Tel  Helmsman       Still being marketed?  No recent sightings
Open Text     Lector
SoftQuad      Explorer       I know, SQ didn't write it
WAIS          WAIS           ...and there are loads of Z39.50 clients
Westinghouse  Pathways       
WordPerfect   (lost name)    Need more info

Thanks in advance for any info.  Come to SGML '94!

Cheers, Tim Bray, Open Text Corporation
</message>
<message id="<1994Sep7.163644.9188@ast.saic.com>" date="2987944604">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 16:36:44 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep7.163644.9188@ast.saic.com>
References: <34kmfj$1217@rs18.hrz.th-darmstadt.de>
Subject: Re: SGML New User Requesting General Inform

[Joachim Schrod]

|   Actually, I think this "buzzWord" is better than the ADT.  A DTD is not
|   an ADT, it does not define any semantics.  And defining abstract
|   semantics is all an ADT is about... (cf. Barbara H. Liskov and S. N.
|   Zilles: "Programming with Abstract Data Types", SIGPLAN Symposium on
|   VHLL, 1974. That's the seminal article that introduced the term ADT,
|   btw.)

OK -- Back to defending Len.  The paradigm that I used here, that of a
meta-grammar for a regular expression syntax, is only one of many possible.
It is probably the most fundamental in that it borrows directly from the
standard and its formulation and is most useful for explaining the
mechanics of DTDs and how they relate to document instances.  However, an
SGML document instance can at once be all of the following things:

1) A stream of characters.
2) A string.
3) An entity.
4) An instance of a Concrete Data Type.
5) A specification for an Abstract Data Type.
6) A tree structure.
7) Input data to an application. 
8) A content labeled document.
9) An ostensibly innocuous document with secret information embedded in unknown ways.
10) The secret to life, the universe, and all that.
11) .... your entries here .........

Consider the following discussion:

I use the word CDT or concrete data type in that purely abstract classes
are not supported directly by SGML itself; however most people use the term
ADT when they really mean a CDT.  The confusion derives from how the term
"abstract" is used since a "type abstraction" can be "concrete".  A DTD
might be regarded as the specification part of a concrete class; it does
not include the methods.  They belong to the semantics of the application
and usage.  One certainly could write a DTD which describes a meta grammar
for defining pure virtual C++ classes; however, a document of this type is
still a string of characters.  What the DTD does tell us however, is what
tags can occur in a document and in what context they can occur.  In short
it specifies how the semantic elements of an application may be embedded in
a stream of characters and what characters may be used, etc.

Clearly, it can be useful to have more than one paradigm for a document and
a document type.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<cjh-0709941150520001@perfmac.cray.com>" date="2987944896">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 16:41:36 UT
From: Chris Hector \<cjh@cray.com>
Organization: Cray Research Inc.
Message-ID: \<cjh-0709941150520001@perfmac.cray.com>
Subject: What DTD to use for Man-Pages?

I want to put some man pages into SGML and I am looking for a DTD.  I would
like something tailored toward traditional UNIX man pages rather than an
all-encompassing DTD.

Does anyone have a favorite - or a place that I can look for DTD's?

Thanks in advance for your help.

Chris

-- 
Chris Hector                    | cjh@cray.com
Cray Research, Inc              | Opinions expressed are my own 
655-F Lone Oak Dr.              | and not necessarily those of my employer.
Eagan, MN 55121
</message>
<message id="<kimber.82.000DE9CD@passage.com>" date="2987952882">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 18:54:42 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.82.000DE9CD@passage.com>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com> <19940905.4851@naggum.no>
Subject: Re: HyTime critique

The last part of the subject post is to the effect "I find HyTime to be
inelegant and insufficiently precise" or, in the words of the horse in Ren
and Stimpy cartoons: "No sir, I don't like it."

To be honest, I'm quite surprised by this response and somewhat puzzled by
it.  I'm not going to try to argue about it because Erik has clearly made
up his mind and I doubt there's anything I could say that would change it.
However, I think Erik's complaint largely misses the key point of HyTime,
which is that it attempts to standardize what people were either doing or
were going to do as they moved their SGML applications to include more
hypermedia aspects.  Even if HyTime is only a stopgap waiting for a more
elegant solution to these problems, I think it is still necessary.  Of the
various solutions to these problems defined within the context of existing
SGML functionality, HyTime appears to be the most complete.  That, coupled
with its status as an international standard was enough for me to want to
invest time in understanding it.  Having gained a fairly deep understanding
of HyTime, I am now satisfied that it solves the problems I and my
customers are faced with in a way that is practical to implement and of
lasting benefit.  Of course, I come from the point of view of a data owner
whose primary concern is protecting his investment in his data and
protecting it from those who would attempt to usurp control to make their
own lives easier.  That means some of the concerns Erik puts uppermost are
of less importance to me.

[W. Eliot Kimber]

|   I also do not consider marked sections to be generally useful or
|   robust, both because there is no dynamism of marked sections controls
|   and because marked sections are not bound to element boundaries, which
|   can make them difficult to manage and work with.

[Erik Naggum]

|   I don't understand what "dynamism of marked section controls" refers
|   to, so I can't address that.

Marked sections as means of doing configuration management lack the sort
dynamic control needed to do sophisticated version control.  Specifically,
there is no way to control the inclusion or exclusion of marked sections
based on the local properties of the data (e.g., attribute values of the
elements themselves).  It is also difficult to express conditions based on
multiple variables and boolean combinations.  In designing the InfoMaster
architecture, we decided that marked sections were insufficient for
providing the level of configuration management needed for IBM documents
(and documents of similar complexity).  We chose instead to define an
application-specific mechanism that uses the properties of elements to
determine what data to retrieve.  This solution required only that our
document types provide sufficient flexibility in grouping data and elements
so that authors had a reasonable level of control.  By codifying this
mechanism in the InfoMaster Architecture we hoped to promote it as a
general solution shared by a number of applications (and as it happens, the
retrieval semantic can be expressed as a set of HyQ queries used to build
an "inclusion list" of objects to be processed, so that our
application-specific mechanism can be expressed in HyTime terms, making the
specification of the mechanism interchangeable and directly implementable
or processable by HyTime-based processors with the necessary
functionality).

|   however, if I replace "marked sections" with "entities" in the rest of
|   this sentence, I end up with a strong argument against entities!

I suppose.  The fact that marked sections are not necessarily bound to
element boundaries is not the real problem with marked sections, although
it can lead to complications, which are essentially the same complications
you can have with entities that don't align to element boundaries, so
perhaps it is not really a valid objection to marked sections in general.

|   maybe this is unfair, since we all agree that entities are very useful,
|   but it shows that arguments that are used negatively about one thing
|   and positively about another aren't very good arguments and probably
|   hide the _real_ arguments.  that marked sections are constrained to
|   entity boundaries in my opinion should make them far easier to deal
|   with than entities that aren't bound to element boundaries, and thus
|   not so difficult to manage or work with.  they are also not declared in
|   a distant prolog, which should make them easier to deal with.

I think there must be a typo here ("are constrained" should be "are not
constrained").  The main difficulty with marked sections presented by their
not being bound to element boundaries is ensuring that when you move data
relative to marked section boundaries you have moved it appropriately.  You
would have the same difficulty with entities in an editor that presented a
view of an entire document, including any general text entities, as a
single flow.

[W. Eliot Kimber]

|   I was referring to the robustness of the functionality HyTime
|   represents.

[Erik Naggum]

|   you said: "My take on CONCUR is that what CONCUR can do weakly HyTime
|   allows you to do robustly."  since there isn't much description of "the
|   functionality [that] HyTime represents" in the standard, I interpreted
|   this to refer to the syntax of HyTime, just as CONCUR refers to the
|   syntax of _its_ mechanism.

I was probably not clear about the use of CONCUR to which I was referring.
The TEI has suggested that CONCUR can be used to represent the combination
of a base text plus changes to it, commentary, and other layered
information that scholars often work with when analyzing texts and other
artifacts.  While it is probably the case that CONCUR can in fact do this,
it has been my contention that the relationships at work can be better
represented through the use of HyTime hyperlinks.  Here's why I think so:

1. Hyperlinks in HyTime must represent defined relationships, allowing you
   to be arbitrarily precise about what a given relationship means.  CONCUR
   does not provide this.

2. As the number of layers increase for a given document, the markup weight
   will become quite large, forcing the processing system to deal with all
   layers (even if that only means reading the character stream
   representing the document). However, with a hyperlink-based approach,
   each layer can be independent of the other layers as well as the base
   document, letting authors or applications choose only those layers they
   want or need to process. This provides a great deal of flexibility in how
   data is organized and managed.

3. Because the layers can be independent of the base document, they can be
   processed and interchanged independent of the document itself.

4. The location methods of HyTime allow much greater choice of association
   than CONCUR can because you are not constrained to what you can identify
   with elements in a given document type. Aggregate anchors, span
   locations, and locations in multiple documents all become possible.

In short, by using a hyperlink-based approach, your freedom, flexibility,
and potential for interchange and re-use is vastly improved.

It may also be that I have an instinctive distrust of any data
representation method that appears to bind too much information directly in
the base information.

|   sorry to say so, Eliot, but despite valiant efforts both on your part
|   and by several other people, the "point" of HyTime is still a widely
|   dispersed mist consisting mainly of hype and vaporware.  I do know
|   which problems HyTime tries to solve.  you should have discovered by
|   now that I don't think HyTime succeeds in solving those problems in a
|   way which will make computer scientists (or, heck, Microsoft-influenced
|   coders) adopt it.

But the question is not whether computer scientists will adopt HyTime, but
whether or not information owners will adopt it.  Remember that SGML and
HyTime, and by extension computers and software vendors, exist to serve the
needs of information owners.  I think the existence proofs of tools like
SoftQuad Explorer, MarkMinder/HyMinder, and my own Perl hacks should be
sufficient to demonstrate that HyTime is more than hype and vaporware.
HyTime has already solved for me problems that make its use compelling, all
other concerns notwithstanding.

Do computer scientists in general accept SGML?  Does it matter?  I don't
think so.  APL2 is one of the most elegant programming systems in
existence.  Computer scientists love it.  So what?  I guess that having
spent my entire professional life in industry solving immediate problems
with the tools at hand has taught me that all the computer science I need I
learned in data structures 101.  This is a bad attitude and I try to
moderate it.  Nevertheless, focus on the computer science aspects has, for
me, been much more of a barrier to solving problems than an aid.  Thus my
bad attitude and my "what me worry" approach. The question I typically ask
is "can I think of way to solve this problem?"  If the answer is yes,
that's all I need.  If the answer is no, then I ask "Is this problem
provably unsolvable?"  If it is, I don't try to solve it.  If it isn't, I
try harder to think of a solution.

Part of the problem for me may be that many of the problems posed by text
processing and hypermedia are relatively mundane and easy to solve using
basic data processing techniques.  Optimal solutions may need more
sophistication (for example, optimizing the parsing and indexing of SGML
data for later retrieval), but that optimization occurs within the context
of the larger problem space for which the parameters have already been
defined, like the use of SGML and HyTime.  Given the problem "how do I
index SGML data so that I can resolve HyTime locations quickly" I will
certainly look to computer scientists to find a solution, but the right
answer cannot be "don't use HyTime".  It's not a question of can or can't
but a question of optimization.  I guess I know that once I've demonstrated
*a* solution to a problem, I know I can interest people in finding an
*optimized* solution to a problem, assuming that the solution to the
problem provides some compelling benefit (and of course, I only get paid to
solve problems of compelling benefit, so that's all I try to solve).  In
many cases, the benefit is so compelling that any reasonable solution is
worth the effort.  And you can be reasonably assured that once a solution
is in place, you can justify developing or acquiring an optimal solution.

HyTime reflects the reality of the data we have and the things we want to
do to it -- it doesn't impose any processing you wouldn't have one way or
another.  Therefore, I know that solutions to problems exposed by the use
of HyTime have the same degree of solvability and optimization potential
whether HyTime were used or not.  In other words, if the HyTime solution
can't be implemented or made to perform, no other equivalent solution will
either.  I also know that HyTime is flexible enough that you can impose the
constraints necessary to make a system practical even if the general
solution possible with HyTime is not practical today.

[W. Eliot Kimber]

|   HyTime doesn't do validation of document type declarations *because
|   it's incomputable in some cases*.

[Erik Naggum]

|   although I don't think you know the force of this statement, either,
|   I'll accept it at face value and quote you on it.  I think, however,
|   that you should be a little more careful with what you say.  there
|   _are_ computer scientists reading this, and they _may_ be wondering
|   whether to adopt HyTime or leave it in paper.

I think I was wrong on this point.  I think you *can* tell computationally
whether or not a given DTD allows *only* valid HyTime documents.  See the
discussion below.

[W. Eliot Kimber]

|   Also, it is essential that HyTime allow document types that allow
|   invalid constructs -- that's the only way HyTime can hope to peacefully
|   integrate with existing applications that predate HyTime or that have
|   other requirements that do not always allow HyTime conformance.

[Erik Naggum]

|   that's the "only way"?  geez.  as far as I could understand, the whole
|   idea with HyTime was that it should not require modification of the
|   documents into which you point.  Integrated Open Hypermedia, right?
|   this I always interpreted to mean that you would have to build in some
|   HyTime support in your applications such that they would not _need_ to
|   mess with extant document types or instances, but could reference them
|   from outside.  now all of a sudden we _have_ to mess with extant
|   document types?  what next?

I'm not sure I follow your argument at all.  The point is merely that if
you want to add HyTime support to an existing document type without at the
same time making all of your existing documents invalid, you have to allow
by HyTime and non-HyTime constructs in the same document type.  How could
it be otherwise?  It doesn't have anything to do with documents to which
you point, it only has to do with documents you are putting the pointers
in.

|   I do understand that you need to emphasize that document types cannot
|   always be validated, but that's not my question.  my question is: if
|   and when I want to validate it, can I?  you don't answer this question.

Without working out the algorithm, I would say that the answer is yes.  I
can definitely tell whether a not a document type *only* allows the
creation of valid types.  This is a simple transform by which you change
the GIs in the DTD into the architectural form names of those elements and
then compare the resulting content models to the content models of the
architectural forms as defined in the standard.

There may, however, be some complicating factor I haven't thought of.

[W. Eliot Kimber]

|   But of course, there's no reason you can't define document types that
|   are *guaranteed* to produce structurally valid HyTime documents.  If
|   you want that level of comfort, define your document types that way.  I
|   certainly recommend it if at all possible, but there are many cases for
|   which it would be impossible or inappropriate (the TEI, Docbook, and
|   HTML come to mind immediately).

[Erik Naggum]

|   I was asking whether it was possible to know, a priori, whether a
|   document type conforms to HyTime, that is, whether a document type
|   would always produce conforming HyTime documents without having to
|   parse all the possible instances of that document type.

I think the answer is definitely yes, as explained above.  On the other
hand, the HyTime-defined content models are simple enough that you can
validate a DTD by inspection for the most part.  It is only the subelements
of architectural forms other than HyBrid and HyDoc that pose any real
constraints, and those content models are all fairly simple and easy to
check.  Ditto for attribute lists.  The only danger I can see is the use of
inclusions, by which you might inadvertently allow invalid subelements
within HyTime-unique elements (e.g., allowing HyBrid-form elements within
nameloc-form elements or something).

The other aspects of validation that HyTime documents are subject to are
not determined by the DTD, but are a function of how various location
methods are used in combination or the association between anchor roles and
link ends, things that can only be validated in instances.

|   but to counter your actual argument: an element does not become an
|   instance of an element type form just because it says so.  a number of
|   factors are involved, such as congruency of attribute lists and content
|   models.

I think it's clear now that that can be checked easily enough.  In fact,
the more I think about it, the more I think that my original assumption,
that the reason DTDs are not validated is because they *can't* be, is in
fact wrong.  The reason DTDs are not validated is because you don't always
want to have a DTD that only allows HyTime documents, not because you can't
tell if a DTD does only allow valid HyTime documents, because you can check
that.

|   but it's not just the use of attributes that makes HyTime loose.  it's
|   the use of the one space of unique ID's as the _universal_ naming and
|   reference mechanism.  this comes from the fact that to set up all the
|   thingies that need ID's on them, you need a whole "prolog" just for
|   that, sometimes in one place, sometimes scattered all over the place.
|   some like to label them "meta-objects" and parade them far and wide as
|   the solution to everything.  suffice to say there are more schools of
|   thought than this one.

I'm not sure what you're getting at here, either.  What would you use other
than unique IDs?  How are SGML unique identifiers any different from key
fields in relational databases or object IDs in object-oriented databases?
There has to be some sort of defined name space with guaranteed unique
names.  I'm not getting your point.  And I'm not sure what you mean by a
"prolog".

|   I think we're talking a bit past each other.  let me take another spin
|   on this one.  by asking for robustness I want to address the
|   testability of something under stress.  in this case, I want to know
|   whether a particular change will affect the links that depend on
|   (reference) this document.  put another way, I want to be able to
|   compute a dependency graph all the way down to the character if I have
|   to, in order to show whether changing _this_ will require an update
|   _there_ or not, and usefully, _where_ it would require an update.  now,
|   since HyTime provides mechanisms to point into documents that don't
|   know that they are the target of links, this is a hard nut to crack.
|   still, for a bounded object set (one of HyTime's better ideas), it is
|   computable, provided you can wait, or tolerate an answer like "try, and
|   see what happens".

I agree this is a difficult problem, but what's your point?  It's
difficulty is inherent in the problem itself, not in the use of HyTime to
represent documents that expose it.  Dependency tracking of references is
solved by resolving a set of addresses and seeing what you hit.  In the
abstract the problem is not difficult to solve at all.  The problem is
optimizing the system so that it performs acceptably and being able to
detect that for some forms of address, the answer cannot be found in a
reasonable period of time.  But I think these problems are independent of
the use of HyTime and would be present in any system of equivalent
flexibility and utility.  Certainly existing indexing and retrieval systems
suggest that practical solutions can be found and implemented.  At the same
time, users have to understand the implications of asking certain
questions.  This is no different for HyTime-based hypermedia than it is for
SQL databases.  This is the price of function and power.

-- 
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<34l495$8qi@Starbase.NeoSoft.COM>" date="2987954917">
Newsgroups: soc.culture.scientists,comp.text.sgml,comp.infosystems.www.misc,soc.libraries.talk
Followup-To: comp.text.sgml,comp.infosystems.www.misc
Date: 07 Sep 1994 19:28:37 UT
From: Cameron Laird \<claird@Starbase.NeoSoft.COM>
Organization: NeoSoft Internet Services +1 713 684 5969
Message-ID: <34l495$8qi@Starbase.NeoSoft.COM>
Subject: HTML style: bibliographic elements

I'm looking for guidance on good HTML style in bibliographic citation.  For
now, when I'm writing a WWWable text, and I want to refer to a book, I
italicize

	\<I>Book Title\</I>

the title; I simply plaintext-quote article titles:

	"Explanation of things", \<I>Journal
	of everything\</I>, volume ...

Is there yet a sense of "common practice" in this domain?  What is it?

I've narrowed follow-ups.

-- 
Cameron Laird		ftp://ftp.neosoft.com/pub/users/claird/home.html
claird@Neosoft.com (claird%Neosoft.com@uunet.uu.net)	+1 713 267 7966
claird@litwin.com (claird%litwin.com@uunet.uu.net)  	+1 713 996 8546
</message>
<message id="<Cvs23v.Ix7@HQ.Ileaf.COM>" date="2987960106">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 20:55:06 UT
From: Ed Blachman \<edb@hq.ileaf.com>
Organization: Interleaf, Inc.
Message-ID: \<Cvs23v.Ix7@HQ.Ileaf.COM>
References: <9409012114.AA19892@source.asset.com>
Subject: Re: SGML New User Requesting General Information

[Claude L. Bullard]

|   After my posting to "newbies" on the basics of SGML, I received a
|   number of kind and complementary postings suggesting that I'd written
|   something of benefit to the community.  Thank y'all very much.  Nothing
|   works better on me than applause.

As I've told Len in email, and just in case it's not clear, I mean to join
in the applause for your posting.  It's just that as an (amateur) editor, I
applaud by picking nits....

|   Ed B[l]achman at Interleaf made some very good suggestions about
|   improving the post.
|   
||  (1) Where you compare a DTD to an "ADT", I'm not sure you're doing the
||  newbie any favors.  I assume that "ADT" stands for "abstract data type"
||  -- but I'm not sure that calling a DTD an ADT helps even for those of
||  us who can make that assumption, and I'll bet that it goes over the
||  heads of at least some of the newbies you otherwise do a good job of
||  talking to.
|   
|   Quite right, Ed.  But every newbie is not new to every thing.

True; you have to pick an audience, and write for it.  If I were writing
for an audience of folks who could do the mapping of ADT to Abstract Data
Type, I'd use Bob Agnew's EBNF/metagrammar lingo -- it happens to work
better for me.  But you're writing this, not me.

On the other hand, in my limited experience, most of the people to whom
I've tried to explain SGML are (a) literate, (b) intelligent and (c) not
familiar with ADTs, EBNF, automata theory or much else of computer science.
This is so much the case that I have problems connecting the term "newbie"
with folks who are literate in CS; so I assumed you were writing for the
folks I've talked with... hence my comment.  But I tend to make my SGML-
explanation attempts in social rather than professional contexts....

Maybe you could add a footnote-type paragraph that would expand ADT to
Abstract Data Type, briefly explain the concept, and point to a reference
on the topic?  That might satisfy your goal (of broadening readers'
horizons) while giving non-CS types a chance to understand DTDs enough to
follow the rest of the excellent stuff you wrote.

||  (2) In noting points about the badbook DTD, you claim that the use of
||  the PUBLIC keyword implies that the DTD has been formally registered
||  with some body empowered to do formal registrations...
|   
|   Hmm, I was trying to say the opposite.  That while the identifier
|   appears to make that claim, there is no way to check it.

My apologies for a lack of clarity.  I understood what you were saying; I
meant to disagree that "the identifier appears to make that claim", and
that this made the use of an unregistered public id "bad".  I think that
unregistered public ids are a useful tool when understood, and (in
conjunction with stuff like SGML Open's Entity Management Resolution) a
worthwhile piece in the kind of agreement among trusted parties you rightly
note as being necessary for successful interoperability.

-- ed
</message>
<message id="<34l9cm$f4d@Starbase.NeoSoft.COM>" date="2987960150">
Newsgroups: comp.text.frame,comp.unix.advocacy,comp.text.sgml,comp.infosystems.www.misc,comp.society.futures,news.groups.questions,misc.writing,alt.culture.internet
Followup-To: comp.text.frame
Date: 07 Sep 1994 20:55:50 UT
From: Cameron Laird \<claird@Starbase.NeoSoft.COM>
Organization: NeoSoft Internet Services +1 713 684 5969
Message-ID: <34l9cm$f4d@Starbase.NeoSoft.COM>
Subject: Electronic publication: miscellaneous questions

Should I attend Frame Technology Corporation's seminar (sales presentation)
on FrameViewer 4?

Background: I'm strategizing about organizational info-centers.  My current
employer, for example, ought to be able to say to its engineering staff,
"fire up

	Mosaic http://www.ourserver/infocenter.html

when you come in Monday, and find everything -- telephone lists, calendars,
procedure manuals, ..."  What's a good way to implement this?

Here are some things I know:

1.  I like Frame Technology.  I'm comfortable with FM, and I've seen
    winning applications built with help engines relying on FM.  On the
    other hand, this involves cash outlays, and licensing, and I have to
    make a *very* good case to justify those.

2.  Microsoft's Help model is successful.  On the other hand, it doesn't
    know about networks, and I don't expect it to for some time.

3.  Somebody's going to come out on top in the next couple of years.  We
    who read these newsgroups *know* that a distributed hypertext system is
    what we need for lots of situations; the only question is whether to
    implement it with technology from a WWW model, or Adobe, or Microsoft,
    or WordPerfect, or ...  This is a decision that matters, too, because,
    although one can anticipate filters that will unify the different
    models, they'll probably take as long to build as it took to make
    Word-to-WordPerfect transformations as painless as they now are.

My current approach: distribute hypertext in HTML, with appropriate Mosaic
clients on existing hardware.  Generate the HTML files by writing in
hyper-linked FM, and then transforming.  None of the fm2html-like filters
yet satisfy me, but that's the best I can see.  Should I care about
FrameViewer?  Who does?

Related question: anyone have interesting stories about distributed
publishing of software?  There'll come a day when serious applications
include embedded references to remote hypertexts.  Has it happened yet?

I'm open to suggestions on which newsgroups like to mull over such topics.
For now, I've narrowed follow-ups to comp.text.frame alone.

-- 
Cameron Laird		ftp://ftp.neosoft.com/pub/users/claird/home.html
claird@Neosoft.com (claird%Neosoft.com@uunet.uu.net)	+1 713 267 7966
claird@litwin.com (claird%litwin.com@uunet.uu.net)  	+1 713 996 8546
</message>
<message id="<19940907.4888@naggum.no>" date="2987961105">
Newsgroups: comp.text.sgml
Date: 07 Sep 1994 21:11:45 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940907.4888@naggum.no>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com> <19940905.4851@naggum.no> \<kimber.82.000DE9CD@passage.com>
Subject: Re: HyTime critique (was CONCUR usefulness existence proof)

[W. Eliot Kimber]

|   The last part of the subject post is to the effect "I find HyTime to be
|   inelegant and insufficiently precise" or, in the words of the horse in
|   Ren and Stimpy cartoons: "No sir, I don't like it."

I'm sorry I can't find a witty response to that, so I have to admit that
you win this debate.  congratulations, Eliot!  and good luck.

#\<Erik>
-- 
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<16834.778996760@moose.cs.indiana.edu>" date="2987985560">
Newsgroups: comp.text.sgml,comp.infosystems.www.misc
Date: 08 Sep 1994 03:59:20 UT
From: Marc VanHeyningen \<mvanheyn@cs.indiana.edu>
Organization: Computer Science Dept, Indiana University
Message-ID: <16834.778996760@moose.cs.indiana.edu>
References: <34l495$8qi@starbase.neosoft.com>
Subject: Re: HTML style: bibliographic elements

[Cameron Laird]

|   I'm looking for guidance on good HTML style in bibliographic citation.
|   For now, when I'm writing a WWWable text, and I want to refer to a
|   book, I italicize
|
|   	\<I>Book Title\</I>
|
|   the title; I simply plaintext-quote article titles:
|
|   	"Explanation of things", \<I>Journal
|   	of everything\</I>, volume ...
|   
|   Is there yet a sense of "common practice" in this domain?  What is it?

\<CITE>.
--
Marc VanHeyningen  \<http://www.cs.indiana.edu/hyplan/mvanheyn.html>
</message>
<message id="<1994Sep8.105825.421@ittpub>" date="2988003504">
Newsgroups: comp.text.sgml
Date: 08 Sep 1994 08:58:24 UT
From: "William D. Lindsey" \<william@ittpub.nl>
Message-ID: <1994Sep8.105825.421@ittpub>
References: <34kmac$9e2@ruby.ora.com>
Subject: Re: US standards publishers?

[Terry Allen]

|   I'm looking for US providers of ISO specs (I'm interested in the
|   forthcoming DSSSL spec).  The only source I've used was OMNICOM, and
|   I'm looking for alternatives.  If you know a good one, please send me
|   email.  Thanks.

How about:
 American National Standards Institute
 11 West 42nd Street, 13th floor
 New York, NY 10036 USA
 (212) 642-4900,  fax: (212) 302-1286

Now, who can direct me to the source in The Netherlands?

Regards,

Bill
-- 
william@ittpub.nl    
  (BTW if you've sent mail sent to "bill@ittpub.nl", please resend
   to the above address.  I never recieved it.)
</message>
<message id="<1994Sep8.090450.9514@fel.tno.nl>" date="2988003890">
Newsgroups: comp.text.sgml
Date: 08 Sep 1994 09:04:50 UT
From: "S. A. van Merrienboer" \<svma4@fel.tno.nl>
Organization: TNO Physics and Electronics Laboratory
Message-ID: <1994Sep8.090450.9514@fel.tno.nl>
Summary: Can anybody mail me a SGML, HyTime, or HTML FAQ?
Subject: SGML, HyTime, HTML FAQ's?

I'm desperate for a SGML, HyTime or HTML FAQ, because I'm new on these
subjects.  Can anybody mail me these FAQ if they do exist?

Thanks

Siem van Merrienboer
svmerrienboer@fel3.fel.tno.nl
</message>
<message id="<34ms88$eqg@pheidippides.axion.bt.co.uk>" date="2988012232">
Newsgroups: comp.text.sgml
Date: 08 Sep 1994 11:23:52 UT
From: Roger Reading \<readingr@leeds.syntegra.bt.co.uk>
Organization: SYNTEGRA - The systems integration business of BT
Message-ID: <34ms88$eqg@pheidippides.axion.bt.co.uk>
References: <34kmfj$1217@rs18.hrz.th-darmstadt.de> <1994Sep7.163644.9188@ast.saic.com>
Subject: Re: SGML New User Requesting General Inform

[Claude L. Bullard]

|   After my posting to "newbies" on the basics of SGML.

Is it possible for someone to forward a copy of the document, or post the
location, mentioned above.  I have search the conference but cannot find
the location of said document.

Regards
Roger Reading

-- 
Roger V. Reading
Applications Manager			readingr@fleet.syntegra.bt.co.uk
Syntegra				+44 1252 777779
</message>
<message id="<rknox.779038140@cnj>" date="2988027643">
Newsgroups: comp.text.sgml,comp.text
Date: 08 Sep 1994 15:40:43 UT
From: "Rita E. Knox" \<rknox@cnj.digex.net>
Message-ID: \<rknox.779038140@cnj>
Keywords: document modelling
Subject: Looking for speakers for Documation '95

I am chairing a session at Documation '95 -- "the international forum for
document management applications, document system technology and
interoperability solutions" -- which will be held from March 7-9 at the
Long Beach Convention Center, Long Beach, CA.  A description of the session
follows:

------------------------------------------------------------- 

Session for Documation '95 
Session Chair:  Rita E. Knox

Title:  Data-Driven Documentation: Modelling Issues

Summary: There are many advantages to identifying content in document data
bases.  Among other things it supports cross-referencing, hypertext
navigation, automated data verification and update, and auto-generation of
document components.  Such content must be identified in a meaningful way
-- there must be a correspondence between the document content definition
and the natural structure of the information being documented.  However, at
the same time that document automation experts are developing content
models to support documentation uses, there are domain experts who are
developing content models to support many applications other than
documentation.  Where does the line between these potentially redundant
efforts fall?  What work should each "side" of the industry perform and how
might the efforts be coordinated?  This session explores these issues by
providing examples from different industries where such concurrent
modelling efforts are in progress.

Suggested topic areas: 
   -- Law/Legal publishing 
   -- Pharmaceutical/New Drug Applications 
   -- General Information/Newspaper Publishing 
   -- Product Data Exchange (STEP)/Technical Documentation
------------------------------------------------------------- 

I am looking for 2-3 speakers to participate in this session.  Potential
speakers may be working in one of the suggested topic areas or in some
other area where basic domain modelling and documentation modelling may be
occurring simultaneously.  Interested individuals should send me a brief
abstract (500 words) describing their proposed presentation that would
address this topic. (Please send to either \<knox@kanda.com> or
\<rknox@cnj.digex.net>) I will respond to all submissions no later than 1
November when I have reviewed all abstracts and made a selection.  Thanks.

-- Rita Knox
-- 
Rita E. Knox, Ph.D.                v: 908.576.8678
Knox\&Assocs/Martin Hensel Corp.    f: 908.576.8679
167 Winding Way                    knox@kanda.com 
Little Silver, NJ 07739         OR rknox@cnj.digex.net
</message>
<message id="<kimber.83.000CD03A@passage.com>" date="2988035323">
Newsgroups: comp.text.sgml
Date: 08 Sep 1994 17:48:43 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.83.000CD03A@passage.com>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com> <19940905.4851@naggum.no> \<Cvpyzw.Mw3@news.cis.umn.edu>
Subject: Re: HyTime critique (was CONCUR usefulness existence proof)

[R A Milowski]

|   Not to stray too much, but, could either Erik or Eliot (both
|   preferrably) clarify an issue for me on HyTime:
|
|   Might one call HyTime an "introspective" standard in the sense that it
|   operates assuming that the SGML document is a resource to be queried?
|   Whereas, DSSSL operates in a outward fashion using the SGML document as
|   a starting point.

I think this is a reasonable approach, although I'm not sure I precisely
understand the implications of this classification.  HyTime is certainly
"loose" in that it is essentially a framework that provides opportunities
to hook things together.  The framework has to be flexible enough to allow
a wide variety of applications and implementation methods to work together.
This means HyTime can't be as rigid or constrained as any given application
might be.  For example, HyTime allows an application to define what lexical
type specification language it wants use, what query language it wants to
use, and what property sets it wants to define, even though the standard
also defines its own versions of these.  I consider this a strength of
HyTime, since it makes it more likely that existing applications will adopt
it if they don't have to re-implement things they already provide (for
example, if you were writing a Perl application to do HyTime processing, it
would be easiest to define Perl regular expressions as your lexical type
language, since Perl already provides the functions you need).  HyTime is
an application architecture, and while it's goal is to enable interchange
among disparate applications, each different application will have
different requirements as to what degree of interchange is needed or is
practical.

Another aspect of HyTime is that it provides an architecture within which
other standardization efforts could be done.  For example, HyTime property
sets provide a way to create a interchangeable definition of a set of
application-specific properties.  This provides an opportunity for
application vendors or specific industries to define standard property
sets.  For example, full-text retrieval vendors could define property sets
that enable access to the various full-text properties they provide (e.g.,
proximity, lexical variation, relevance ranking, etc.), either on their own
or as an industry.  I could see applications providing property sets with
products along with other APIs and drivers they might provide.  No single
standard can hope to address all these areas directly.  Rather, HyTime
defines a set of simple but powerful abstraction and indirection mechanisms
that can be used to help data and applications interoperate.

-- 
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<94251.155547MICHAEL@MAINE.MAINE.EDU>" date="2988042946">
Newsgroups: comp.text.sgml,comp.infosystems.www.misc
Date: 08 Sep 1994 19:55:46 UT
From: Michael Johnson \<michael@maine.maine.edu>
Organization: University of Maine System
Message-ID: <94251.155547MICHAEL@MAINE.MAINE.EDU>
References: <34l495$8qi@Starbase.NeoSoft.COM>
Subject: Re: HTML style: bibliographic elements

[Cameron Laird]

|   I'm looking for guidance on good HTML style in bibliographic citation.
|   For now, when I'm writing a WWWable text, and I want to refer to a
|   book, I italicize
|
|           \<I>Book Title\</I>
|
|   the title; I simply plaintext-quote article titles:
|
|           "Explanation of things", \<I>Journal
|           of everything\</I>, volume ...
|   
|   Is there yet a sense of "common practice" in this domain?  What is it?

To put in my $0.02 (2\&#162;) you should at least use the \<CITE> and \</CITE>
tags where you are currently using \<I> and \</I>, since they are intended to
be used for a literary citation.

In general, I'd assume that accepted practice in bibliographic format in
general (c.g., _The Elements of Style_) is appropriate for HTML docs also.
Anywhere you would underline in a typed bibliography (i.e., titles) is an
appropriate place to use the \<CITE>\</CITE> tags.

-- 
Michael Johnson, Relay Technology, Inc.
michael@maine.maine.edu, michaelj@relay.relay.com
"I will choose a path that's clear. I will choose Free Will." -- Neil Peart
</message>
<message id="<34olar$cms@news.delphi.com>" date="2988052727">
Newsgroups: comp.text.sgml
Date: 08 Sep 1994 22:38:47 UT
From: Jeffrey McArthur \<j_mcarthur@bix.com>
Organization: ATLIS Publishing
Message-ID: <34olar$cms@news.delphi.com>
Subject: SGML FAQ

                                 The

                      Un-Official, Non-Sanctioned

                      Frequently Asked Questions

                                 List

                            Version 0.0.0

                          Date: September 8, 1994

          Compiled by Jeffrey McArthur (j_mcarthur@bix.com)

Subject: Table of Contents

0.  About This Release
    0.1     Why is it so late?
    0.2     Why is it not in SGML?
    0.3     When is the next release?
    0.4     Why is the file broken into pieces?

1.  General Information
    1.1     Notes about the FAQ
    1.2     What is Markup?
    1.3     What is Tagging?
    1.4     What is SGML?
    1.5     Why go to all the trouble of using SGML?
    1.6     History of SGML
    1.7     What is ISO 8879?
    1.8     What is a DTD?
    1.9     What is a parsing?
    1.10    What is legacy data?


2.  SGML Language Features
    2.1     Elements
    2.2     Attributes
    2.3     Entities
    2.4     Comments
    2.5     Notation
	2.6		Processing Instructions


3.  Parsers
    3.1     ARC SGML
    3.2     SGMLS
    3.3     Exoterica SGML Kernel


4.  Converting Data To SGML
    4.1     By hand
    4.2     Lex/Flex
    4.3     OmniMark
    4.4     Tagwrite
    4.5     Programming Language of your choice


5.  Printing SGML
    5.1     With LaTeX
    5.2     With Plain TeX
    5.3     With Troff
    5.4     Interleaf
    5.5     Quark Express
    5.6     Ventura Publisher
    5.7     PageMaker



14. Utilities
    14.1    Author/Editor


15. Publications
    15.1    TAG


15. Vendor Information

16. Consultants

17. SGML Archive.



Subject:  Chapter 0.

About This Release

Subject: 0.1     Why is it so late and incomplete?

I am no longer working 12-16 hour days.  I am only working 11-13 hours a
day.  I even took two days off over the Labor Day weekend.  So the FAQ is
late because I have just not had enough strength to work on it.
Someday, in the near future, I will have some free time to work on the FAW.
 Until then, this is all you get.

Subject: 0.2     Why is it not in SGML?

I am just to lazy to convert it at this time.



Subject: Chapter 1.

General Information

Subject:  1.1   Notes about the FAQ

There is no officially maintained FAQ for comp.text.sgml.  This is an
attempt to solve the most frequently asked question on this
newsgroup, "where is the FAQ?".  Rather than start an rwar about who
is right or wrong or if there should be a FAQ at all I decided that
it would be in my best interest to provide a skeleton structure to a
non-official FAQ.

This is only the rough outline of what is to follow, hopefully.
Ideally, the FAQ should be organized as an SGML document.  But to
start with, this is just an ASCII text file.  But looking to the
future, what DTD should the FAQ use?

If you to help with this FAQ, please send any updates or comments to
j_mcarthur@bix.com. The only way this FAQ will be developed is with
help from others.

One word of warning, since I am starting this FAQ, it will reflect my
opinions.

Subject:    1.2     What is Markup?

Using a highlighter pen to emphasize passages in a book is "marking
up" the book.  The highlights show passages that are important to the
reading.  Underlining is another form of markup.  It is not possible
to use a highlighter on an electronic document.  To implement
electronic markup a variety of ideas have been developed.  

Subject:    1.3     What is Tagging?

ASCII has become the most commonly used form of information exchange.
Almost every word processor has the ability to import and export an
ASCII text file.  The problem with ASCII text files the 95 printing
characters of the 7-bit ASCII definition do not provide any
information about the structure or the format of the document.

Several methods have been developed to specify additional information
in an ASCII text file.  FORTRAN used a "C" in the first column of a
punched card to specify a comment.  This was one of the simplest
forms of tagging.  All the comment lines were tagged with a "C" in
the first column.  Pascal allowed comments to be placed almost
anywhere.  This was done by introducing a start comment sequence,
"(*", and an end comment sequence, "*)".

The basic idea is to use a recognizable sequence of characters to
define parts of a document.  Each special sequence of characters is
called a "tag".  Below is a list of tags used in some computer
languages:

    Language        Start Tag               End Tag

    FORTRAN         C in first column       column 72
    Pascal          (*                      *)
    C               /*                      */
    Basic           REM in first column     end of line

The process of adding special sequences of characters to an
electronic document is call tagging.  Tagging is a method of "marking
up" an electronic document.

Over time, comments in computer languages have changed.  One of the
more interresting changes is the ability to nest comments.  Most of
the newer Algol family of languages allow comments to nest.  This
included Modula-2, Oberon, and Oberon-2.

The ability to nest is important in tagging.  This allows using
the same notation over and over again.  The meaning of a tag
become context dependant.  In Modula-2 the first "(*" starts the
comment.  Following "(*" not only continue the comment but add
the requirement of an additional "*)" to end the comment.
For each start comment tag, there is an end comment tag.


Subject:    1.4     What is SGML?

One of the problems with tagging is determining what tags to use.
SGML takes the concept of tagging one step further.  It creates a
method for defining a set of tags.  This is why some people refer to
SGML as a meta-language. SGML does not define a set of tags.  It is a
tool to define a set of tags.

But SGML does a more than just define the tags.  There are tools that
will take a tagged electronic document and compare it to the set of
defined tags and see if the document follows the definition.  The
process of doing this validation is called "parsing".

With SGML you can define a "tag grammar", and you check see whether
a text conforms to that grammar.


Subject:    1.5     History of SGML

Erik Naggum \<erik@naggum.no> wrote a short history in the officially
sanctioned FAQ.  If I get his permission I will include it here.  Or
if anyone would care to write one I will post it here.

Subject     1.6     Why go to all the trouble of using SGML?

SGML is not as easy to use as Word Perfect or Microsoft Word.  Most
word processor programs are very easy to use.  You just type.  Little
thought is given to the structure of what is written.  Style sheets
provide some outline capabilities, but they do not force the document
to match the style.

SGML can be tyrannical in its enforcement of structure.  The major
advantage of SGML is the enforced consistency.  Documents must follow
the defined structure, or they will not parse.

One major advantage of enforcement of structure is consistancy.
The data must follow a predefined set of rules reguarding its
structure.

Another possible advantage is separating the "form" of the
document from the "content" of the document.  With a word processor,
you are always aware of the form.  Style sheets do help; but
the layout of the document is bound to the data.  SGML may
help separate the content.

One common misconception is that SGML tags can only define the
structure of a document.  It is possible to create a SGML document
where the tags only describe the form.  An example of this is
in tables.  Many table models only describe the formatting
of the table.  There is no attempt to represent any structure on
the data other than the format.

If SGML is used to separate the form from the content of a
document, then it is much easier to create new "forms" from
the same data.  For example, if a document is written using
a word processor it may be very difficult to change all the
bold italic listings in the document to bold san-serif.  If the
form is completely separate from the content, then the actual
format of the document is specified outside of the document
itself.

This is the answer of why to use SGML.  If the document is
to be used only one time, for example a letter to a friend, there
is no reason to use SGML.  On the other hand, if the letter is
to be placed into a system that is searched and/or printed many
times in many different ways, then SGML is a major advantage.




Subject:    1.7     What is ISO 8879?

SGML is an ISO standard.  ISO 8879 is the definition of SGML.  The
definitive document is: The SGML Handbook; Oxford University Press,
1990; ISBN 0-19-853737-9; by Charles F. Goldfarb.

If you are serious about SGML, this book is a must.  It is a very
hard to read document. Also for a book that wants to show off the
power of SGML the typesetting is awful.  The indexes are almost
useless because there is no distinction between a simple reference and
a full description (to see a much better computer generated index
look at the index to The TeX Book, ISBN 0-201-13447-0 (hard) and ISBN
0-201-13448-9 (soft)).

Subject:    1.8     What is a DTD?

A Document Type Definition (DTD) is an electronic document that
defines a tagging structure.  A DTD specifies where each tag is
allowed.  For example, a novel is made up of a set of chapters.
Each chapter is made up of one or more sections.  Each section
is made up of one or more paragraphs.  A DTD contains statements
that define this relationship.  DTD is the name for a tag grammar.


Subject:    1.9     What is a parsing?

Webster's defines parsing as: to break (a sentence) down into
parts, explaining the grammatical form, function, and interrelation
of each part.  This is not exactly what we mean by parsing in SGML.
Parsing in SGML is done via a parser.  A parser is a computer
program that breaks down an electronic document into its parts and
compares the form of the document based on the SGML tags to the
form described in the DTD.

Parsing is a check of conformance of a text to the grammar described
in the DTD.

Parsing is what separates SGML and other word processing formats.
For example, in the case of a novel, this would mean that paragraphs
only occur inside of sections, and sections only occur inside
chapter.  A word processor does not enforce those requirements.


Subject     1.10    What is legacy data?

Legacy data is a term used by some to refer to data that has not been
converted to SGML.  The choice of terms is rather unfortunate.  It
gives the impression that nothing good could have been done prior to
SGML.

There are two issues options in converting legacy data.  Change the
existing data to match the DTD, or change the DTD to allow the
structures in the existing data.  The question is simple: what should
define the DTD: the idealized model for new data, or the real-world
existing data.  As anyone who has done any work in physics realizes,
working with real-world data can be a very difficult task.

SGML enforces the structure defined by the DTD.  But it is relatively
easy to create a DTD that is totally unsuitable for a set of data.
It is also possible to create a DTD that is so loose that no structure
at all is enforced.  Converting existing data generally requires
a lot of compromise.

If you have more than a couple of meg of unstructured data and want
to convert it to SGML you will end up making massive changes to both
the data and the DTD; unless you are very, very lucky.



Subject     2.

SGML Language Features

The syntax used to define a document tag definition.  This section
is used to provide a quick overview of of SGML and is not a complete
description.  Also the following is not exactly correct.  There are
predefined names for all the parts of each SGML statement.  Although
needed, the names build a wall to understanding for the novice.
One aim of this FAQ is to make SGML easy to understand.  So the
following discussion will not use the proper names.

This section has a few endnotes.  They will be represented by
parens around a roman numeral.

This section was very hard to write.  Is anyone willing to take
this section over?  It is hard to explain in simple terms the
intricacies of SGML declarations.


Subject     2.1    Elements

Element are the basic building blocks of an SGML document.  Each
element defines at least one tag.  One of the most common tags is
one to define a paragraph.  Below is a simple paragraph definition:

    \<!ELEMENT para - - (#PCDATA) >

There are 5 pieces to the tag(i).  The first piece is "\<!ELEMENT" (ii).
This tells the parser that an element is being defined.  The word
ELEMENT can be in any case (iii).  Following the word element is the
name of the tag to be defined.  In this case we are defining "para"
as a paragraph tag.  Actually two tags are defined.  The start and
the end paragraph tags.  The start tag looks like this: \<para>.
The end tags looks like this: \</para>.  The third piece controls when
the start and end tags are required.  There are four values this
piece can have.  Below is a table showing what the values are and
what they mean:

    - -     Both the start and end tag are required
    - O     The start tag is required, the end tag is optional
    O -     The end tag is required the start tag is optional.
    O O     Both the start and end tag are optional.

The "O" can be either upper or lowercase.

The next piece defines the content of the tag.  In the case of
the paragraph tag only PCDATA is allowed.  PCDATA means parsable
character data.  The meaning is somewhat complex.  But in general
this is used to specify that a paragraph can have text (and a few
other things) inside it.  But no other tag can occur inside a
paragraph. (iv)

The content of a tag is actually a regular expression.  below is a
table showing the regular expression operators supported in SGML:

    ?       Zero or one occurrence
    *       Zero or more occurrences
    +       One or more occurrences
    |       or
    &       and. (a & b) means that both \<a> and \<b> must occur but
            in any order
    ,       and. (a, b) means that both \<a> and \<b> must occur but
            the order must be \<a>\<b>

The definition of section can be defined as:

    \<!ELEMENT sect  - -  (p+) >

The final piece of a element declaration is the end or the ">".

Now that all the parts of the element declaration have been defined
the paragraph tag can be used.  Below is a set of paragraphs
showing how the tag is used:

	\<para>Alex felt the melancholy stealing over him again.
	Nostalgia?  For that germ-infested ball of mud?  Not
	possible.  He could barely remember it.  Snapshots from
	childhood; a chaotic montage of memories.  He had fallen
	down the cellar steps once in a childhood home he scarcely
	recalled.  Tumbling, arms flailing, head thumping hard
	against the concrete floor.  He hadn't been hurt; not
	really.  He'd been too small to mass up enough kinetic
	energy.  But he recalled the terror vividly.  Now he was a
	lot bigger, and he would fall a lot farther.\</para>


----------------------------------------------------------------

(i)     This is actually a lie, there are more than 5 pieces. as a question
        to the astute reader, how many pieces are there?

(ii)    Another lie.

(iii)   Although usually true, this is a lie.

(iv)    It is impossible to tell from the paragraph tag what tags are
        allowed inside.  The statement says that nothing at this point
        is defined inside the paragraph.  The content of the paragraph
        can be changed by exceptions on tags that include a paragraph
        as part of their content.

----------------------------------------------------------------

Subject     2.2    Attributes

Each tag can have a set of attributes.  Attributes allow additional
information to be attached to the tag.  The paragraph example above
works fine for simple paragraphs.  But what about lists?  Lists are
little more that a sequence of special paragraphs.  Defining a simple
list is relatively easy:

	\<!ELEMENT list  - -  (item | list)+ >
	\<!ELEMENT item  - -  (#PCDATA) >

This works fine for simple lists.  But there is no way to specify
if the list is to be numbered, or bulletted, or whatever.  Attributes
provide the way to specify the type of list easily.

    \<!ATTLIST list  type  (bullet | number | dash )    "bullet" >

There are six parts to the attribute list (i).  The first part is
the "\<!ATTLIST" (ii).  The second part specifies for what tag attributes
are begin specified.  In this case, the attributes are for the tag
"list".  The next part specifies the name of the attribute.  In
this case the name is "type".  The next part defines the possible
options.  The next part defines the default value (iii).  Finally
the ">" ends the attribute list.

Attributes are specified as part of the tag.  \<list> would define
a bullet list because the type of the list is not specified and
the default is bullet.  \<list type="number"> would specify a numbered
list.  \<list type="dash"> would specify a dashed list.

There are a wide variety in the types of attributes.  The example
above is only to give an idea of some of the uses of attributes.
A full description would be longer than the entire FAQ.

----------------------------------------------------------------

(i)     This is actually a lie, there are more than six pieces. as a question
        to the astute reader, how many pieces are there?

(ii)    Another lie.

(iii)   Implied and required are also possible options.  Implied means
        the parser should be able to determine the value from context.
        Required means the value must be specified.

----------------------------------------------------------------


Subject     2.3    Entities

Entities are one of the most complex topics in SGML.  This is only a
very brief overview of what they are.

There are two general catagories of entities:  external and parameter. (i)
External entities refer to something outside the current document.
Parameter entities are macros used inside of a dtd.

SGML uses parameter entities to define macro replacements in a dtd.
For example in the list example above, the list of types is far
from complete.  The list of types can get quite long.  Also the
list may be different from document to document.  A parameter entity
would make it easier to change the list of types.  Below is
the example of the same list using a parameter entity.

    \<!ENTITY % listtypes "bullet | number | dash" >

    \<!ATTLIST list  type  (%listtypes;)    "bullet" >

Parameter entities are similar to tags in that they have a start
character and an end character (ii).  "%" is used as the start
character.  ";" is used as the end character. (iii) In the attribute
list for the list tag the list of possible types is a macro.

In the entity declaration notice the space between the "%" and
"listtypes".  This space is mandatory for parameter entities.

Entity definitions are somewhat unusual.  It is not an error to
define an entity several different ways.  But only the first
definition is used.  This is counter to most macro processing
computer languages.  It is important to remember that the
first definition is what counts.

External entities are more complex that the simple macro
replacements of parameter entities.  The idea is similar.  What
external entities do is allow a document to refer to an external
file or definition.

----------------------------------------------------------------

(i)     There is a third category, default entities.  These are rare.

(ii)    In many instances the end character is optional.

(iii)   The start and end can be defined to be other characters.

-- 
    Jeffrey M\\kern-.05em\\raise.5ex\\hbox{\\b c}\\kern-.05emArthur
    a.k.a. Jeffrey McArthur          email: j_mcarthur@bix.com
    phone: +1 301 210 6655            ATLIS Publishing
    fax:   +1 301 210 4999            12001 Indian Creek Court
    home:  +1 410 290 6935            Beltsville, MD  20705

The opinions express are mine.  They do not reflect the opinions of my
employer.  My access to the Internet is not paid for by my employer.
</message>
<message id="<34o9g6$ebd@news.xs4all.nl>" date="2988058566">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 00:16:06 UT
From: Jan Grootenhuis \<jang@xs4all.nl>
Organization: XS4ALL, networking for the masses
Message-ID: <34o9g6$ebd@news.xs4all.nl>
References: <34kmac$9e2@ruby.ora.com> <1994Sep8.105825.421@ittpub>
Subject: Re: US standards publishers?

[William D. Lindsey]

|   Now, who can direct me to the source in The Netherlands?

Nederlands Normalisatie-Instituut
Kalfjeslaan 2, Postbus 5090
2600 GB  DELFT
Tel. (+31) 15 690.188
</message>
<message id="<34ofpb$e6b@apollo.it.luc.edu>" date="2988065003">
Newsgroups: comp.text.desktop,soc.culture.china,alt.chinese.text,alt.hypertext,comp.text,comp.text.frame,comp.text.interleaf,comp.text.sgml
Date: 09 Sep 1994 02:03:23 UT
From: Ron Du \<rdu@fermat.math.luc.edu>
Organization: Loyola University of Chicago
Message-ID: <34ofpb$e6b@apollo.it.luc.edu>
Subject: IBM mainframe \<Script> to WordPerfect conversion

Hi, there.  Does anybody know if there is software package that can do the
conversion from Script to WordPerfect or some other modern text format?  A
professor at my school has only weeks to live.  He really want to see his
writing in old styled Script printed out.  To my knowledge, Script is no
longer supported anywhere.

Thanks.  Your help is greatly appreciated.  Please email to me directly at
rdu@math.luc.edu

Runqing Du
</message>
<message id="<34oh9dINNci4@oasys.dt.navy.mil>" date="2988066541">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 02:29:01 UT
From: Betty Harvey \<harvey@navysgml>
Organization: Advanced Information Systems Branch, DTMB, CDNSWC
Message-ID: <34oh9dINNci4@oasys.dt.navy.mil>
References: <9408311946.AA22157@helium.biomol.uci.edu> <34i0br$eng@Mercury.mcs.com>
Subject: Re: Search for SGML parser generator

[William G. Lederer]

|   I can locate his e-mail address for you if you really want it, but my
|   suggestion is to get a copy of sgmls, which is publically available,
|   including source code.  It is known to work on many platforms.  Check
|   with Archie for location of sgmls.  I would give you the exact
|   location, but it is in my other office.

You can ftp the Unix source code for sgmls from the "anonymous ftp" archive
of navysgml.dt.navy.mil in upload/sgml.tar.  We use sgmls as one of our
parsers.

				Betty

-- 
Betty Harvey  \<harvey@oasys.dt.navy.mil>     | David Taylor Model Basin
Advanced Information Systems Branch          | Carderock Division
Code 183                                     | Naval Surface Warfare
Bethesda, Md.  20084-5000                    |   Center
                                             | DTMB,CD,NSWC   
URL:  http://navysgml.dt.navy.mil/betty.html |          
</message>
<message id="<34pakp$mfk@rs18.hrz.th-darmstadt.de>" date="2988092505">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 09:41:45 UT
From: Joachim Schrod \<schrod@iti.informatik.th-darmstadt.de>
Organization: TH Darmstadt, FG Systemprogrammierung
Message-ID: <34pakp$mfk@rs18.hrz.th-darmstadt.de>
References: <34kmfj$1217@rs18.hrz.th-darmstadt.de> <1994Sep7.163644.9188@ast.saic.com>
Subject: Re: SGML New User Requesting General Inform

[Bob Agnew]

|   I use the word CDT or concrete data type in that purely abstract
|   classes are not supported directly by SGML itself; however most people
|   use the term ADT when they really mean a CDT.

Grmmbl.  That's exactly the problem.  Many people tell that SGML is about
formatting. We shouldn't encourage such fog on terms.

|   A DTD might be regarded as the specification part of a concrete class;
|   it does not include the methods.

Yes, if Len would have written "the signature of an ADT", or "the protocol"
for that matter, I wouldn't have made my remark.


But let me hasten to add again, this is nitpicking.  Len's text is great,
and for those of you who missed it: Get it from the SGML Archive at
Darmstadt:

	ftp.th-darmstadt.de [130.83.55.75]
	directory pub/text/sgml/documentation/
	file SGML.overview

Cheers,
	Joachim

-- 
Joachim Schrod			Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany
</message>
<message id="<34pas2$mfk@rs18.hrz.th-darmstadt.de>" date="2988092738">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 09:45:38 UT
From: Joachim Schrod \<schrod@iti.informatik.th-darmstadt.de>
Organization: TH Darmstadt, FG Systemprogrammierung
Message-ID: <34pas2$mfk@rs18.hrz.th-darmstadt.de>
References: <9409012114.AA19892@source.asset.com> \<Cvs23v.Ix7@HQ.Ileaf.COM>
Subject: Re: SGML New User Requesting General Information

[Ed Blachman]

|   On the other hand, in my limited experience, most of the people to whom
|   I've tried to explain SGML are (a) literate, (b) intelligent and (c)
|   not familiar with ADTs, EBNF, automata theory or much else of computer
|   science.

But to those who are familiar with these terms, SGML is very confusing at
the start.  (That's because many terms are used that have a slightly
different meaning in CS, particularily in the formal language and compiler
construction field.)

At least I had large problems first. :-)  As an example: I was reading
about abstract and concrete reference syntax and thought "great, typical
stuff".  Nope, it's simply the configuration of the lexical analysis, i.e.,
of the way lexems are created during parsing. It took me a while to get rid
of this ami faux. :-(

Cheers,
	Joachim

-- 
Joachim Schrod			Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany
</message>
<message id="<34pb1n$mfk@rs18.hrz.th-darmstadt.de>" date="2988092919">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 09:48:39 UT
From: Joachim Schrod \<schrod@iti.informatik.th-darmstadt.de>
Organization: TH Darmstadt, FG Systemprogrammierung
Message-ID: <34pb1n$mfk@rs18.hrz.th-darmstadt.de>
References: <34kmfj$1217@rs18.hrz.th-darmstadt.de> <1994Sep7.163644.9188@ast.saic.com> <34ms88$eqg@pheidippides.axion.bt.co.uk>
Subject: Re: SGML New User Requesting General Inform

[Roger Reading]

|   Is it possible for someone to forward a copy of the document, or post
|   the location, mentioned above.  I have search the conference but cannot
|   find the location of said document.

It's surely somewhere in Erik's SGML archive.  But for those of you who
have a better Net connection to Germany than to Norway, you can go to the
SGML archive at Darmstadt:

	ftp.th-darmstadt.de [130.83.55.75]
	directory pub/text/sgml/

that holds almost anything as well.  (Actually, no c.t.sgml backlogs.)  The
posting in question is stored in .../documentation/SGML.overview.

Enjoy,
	Joachim

-- 
Joachim Schrod			Email: schrod@iti.informatik.th-darmstadt.de
ftp.th-darmstadt.de, Administration
SGML Archive at pub/text/sgml/
</message>
<message id="<34poef$34g@netnews.upenn.edu>" date="2988106639">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 13:37:19 UT
From: Nathan Sivin \<nsivin@mail.sas.upenn.edu>
Organization: University of Pennsylvania
Message-ID: <34poef$34g@netnews.upenn.edu>
Subject: Basics

I just subscribed to this newsgroup, and have read ca. 120 postings.  #4
was a simple request for information about getting a copy of the basic
instructions for using SGML, with an outline of the markup.  I used IBM GML
for several years in publishing a journal, and would like to use the new
standard.

No one bothered to reply to this basic query.  I am glad to see that so
many people are deep in the esoteric aspects of the subject, but it would
be a sign of generosity to answer such a simple question.  It might lead to
my getting interested in the cutting edge too.

-- 
Nathan Sivin
History and Sociology of Science
University of Pennsylvania
Philadelphia PA 19104-3325
</message>
<message id="<19940909.4927@naggum.no>" date="2988135748">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 21:42:28 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940909.4927@naggum.no>
Subject: SGML and its enemies

those of you who have read my past articles know that I am an ardent
defender of SGML, that I view longevity of information as an important
goal, and for this language in particular, and that I have spent an
enormous amount of time (and also lots of my own money) in disseminating
information on and about SGML.  this has not changed.  what _has_ changed,
is the focus that I want to bring to discussions and dissemination of
information on SGML, including HyTime and DSSSL.

one common way to express the value of SGML is to berate inferior products
and paradigms.  I have been guilty of this stupidity myself, so I know of
what I speak.  rather than view these as enemies, we can learn from them --
we _should_ learn from them, as it is obvious that they are controlling a
large segment of the markets that SGML should have been in a position to
control, and we need to understand why, if only to protect the information
we invest in.  this is becoming even more important as others are starting
to implement SGML, and, despite the ridiculous attitude problems from the
SGML and HyTime proponents, begin to take SGML very seriously.

in this process, I have come to conclude that SGML contains a number of
very serious errors in its specification that we need to address because
they are the reason SGML is not the winner the ideas on which it is based
are.  in short, these are the features that tell any computer scientist or
good programmer _not_ to implement SGML.  these are important because we
are at a stage of the adoption of SGML where smarter and more knowledgeable
people than those who wrote the SGML standard will make short-cuts and have
solid reasons for what they do.  ridiculing them for this is not only
silly, it is self-defeating.

some prominent figures in the SGML community are thus actively paranoid and
split the world into two warring factions: friends, and enemies out to get
them.  as long as you are one of the friends, it makes a whole lot of sense
to look over your shoulder and prepare for attack.  only, there comes no
attack.  when years go by and you still no attacks coming, just people
minding their own business, more or less ignoring you, some "friends" will
inevitably realize that it all has just been a bunch of scare tactics to
make all "friends" cautious in their public discussion of some of SGML's
very obvious flaws, perhaps even to the point where they defend the flaws
because other "friends" may mistake them for "enemies" if they don't.
well, my apologies to you all.

having worked closely with Steve Pepper (of SGML-Tools fame) for over a
year, I have been greatly dismayed by the quality and the approach taken in
the tools that he has (against all odds) been able to master and make do
his bidding.  my tip o' the hat for this major effort, but my computer
scientist schooling tells me that there is something gravely at fault with
a language and/or its tools that make you work so hard to accomplish your
tasks.  it should have been a lot simpler; why is it not?  why is the
formatting software available with SGML so crummy?  why do people accept
and standardize really stupid DTD's and get away with it (I am not thinking
of the TEI work)?  why does HyTime have to use such mind-bogglingly complex
indirections to do what it says it needs to do?

could it be because strong sentiments against computer science have a long
history in the SGML community?  could it be because programmers who are
told of people who want to protect their data from them only become
somewhat puzzled at this "novel idea" and shrug it all off as nutty?  is
this the reason that we only find special-purpose tools that focus on the
publishing aspect of SGML and very little general-purpose software that
reads and writes SGML as its favorite input and output data formats?

let me take a specific example: in the Unix community (which is itself
actively scorned by those who already scorn computer scientists), we have
tools like `lex' and `yacc', which make defining input languages for your
programs a relatively clean and nice task.  YACC accepts specifications for
what is known as LALR grammars, a subset of LR(1), and produces code in a
variety of languages to process the language described by the grammar.
together with LEX, which is used to produce lexical analyzers, one ends up
not having to do the same boring parsing chores over and over again,
simplifying life for the programmers, and enabling them to spend saved time
on more productive tasks.  now, SGML is specified so that it cannot be
described or used with LEX or YACC.  consequence: Unix programmers shrug.

YACC is a true metalanguage, a language only used to describe languages;
when the program is finished, nobody sees the YACC specification.  SGML is
sometimes referred to as a metalanguage, but there are serious flaws in
this categorization: users see SGML, not just the programmers; moreover,
the programmers can't use the DTD to produce code.  whole applications are
defined in and using SGML as its frame of reference.  but it still is not
completely misguided to refer to SGML as a metalanguage, because the
application will have to deal with the elements similarly to what the
grammar productions that YACC deals with.  furthermore, it is irrelevant
which _form_ the input data had by the time the application wants to deal
with it.  thus, SGML is an external representation of something.  but what?

the SGML rapporteur group (RG) in WG8 has been discussing some abstract
idea of a "model" for SGML for some time, and at the Oslo meeting spent
three days discussing vocabulary for some hitherto unknown concept of
object-oriented data modeling.  my interpretation of such activity is that
we do not really know what we are trying to talk about.  which means: SGML
is the external representation of nothing in particular.  which means: it
cannot be expressed in a known and generally acceptable way as classes or
data structures in any programming language, with or without functions
operating on them.  so SGML remains a string of characters that delimits
"data" from "markup", without any further meaning.  if you are as serious
about SGML as I am, this should be a wake-up call.

we all impute a model to SGML when we're using it, for whatever purpose,
but it remains clear that the mapping to a reasonable set of classes or
data structures is not well-defined.  what this does to the prospect of
processing SGML with general tools should be obvious.

HyTime presents sets of disconnected "properties" whose types are all --
surprise -- undefined (but probably strings).  this is marketed as a means
to allow access to all sorts of things in the information that the document
is deemed to contain, but the crucial point is missing: connecting the
disconnected properties to the document and the processing task is still
left to the application, and the specification is of little help in
figuring out these connections.

in the same vein, the Element Structure Information Set (which is _not_
standardized, recommended, voted on, or otherwise accepted by WG8 as more
than an ordinary committee document, which only requires asking for a
document number from the convener, and is therefore suitable only for
internal reference, much unlike the way it is being used) is a subset of
the information in the document, which is, characteristically, tied to an
unidentified processing model, similar to the way the Recognition Modes of
clause 9.6 in ISO 8879 ties SGML down to a parsing model about which little
is known except that it must be inefficient.  ESIS was proposed as part of
the conformance testing project, which does some useful work in defining
what "conformance" should mean.  (absent a coherent idea of the semantics
of an SGML document, I fail to see the value of it, but let's not digress.)

one of the things that becomes bothersome as one looks carefully at SGML is
that some "features" have the capacity to alter the syntax of the language,
modify the protocol (interaction) between the parser and the application,
and to perturb the data structures one might wish to exchange and use.
that is, although it might seem that SGML would have one abstract syntax,
the syntax declaration part of the SGML declaration is insufficient to map
the abstract syntax to a particular concrete syntax.  the feature-dependent
syntax makes it very hard to make general tools to process SGML, or to make
tools that parse SGML with other general tools.  moreover, the fact that
</> is parsed as three data characters if SHORTTAG NO is specified in the
SGML declaration, and an end-tag if SHORTTAG YES is specified (as is true
of many other interesting aspects of SGML parsing), makes it hard to write
a parser that is general enough to handle "SGML" and yet talking to its
caller (application) through a well-defined protocol.  (the alternative is
simply to say that </> is an error unless SHORTTAG YES is specified.)  it
is thus virtually necessary to make a parser for one set of features, only.
conversion between a given set of features in a given syntax and another
set of features and/or another syntax ranges from hard to impossible.

in the general case, SGML does not exist -- it is nothing in particular.
"SGML" is a haze of loosely connected languages with a common core that
evades semantic description.  to get a fix on SGML, you need to make a
whole series of decisions about the syntax(es) that you want to use, the
complexity of the interface(s), and the way you wish to use the data you
get out of the parser, which you most probably will have to write yourself
if you want to do something interesting.  this is obviously a very, very
unsatisfactory situation.  (the reason this works is that most users will
take somebody else's decisions as if they were required by SGML, and will
use them without due consideration, thereby blaming SGML for shortcomings
in those decisions.  take the value of NAMELEN, for instance, which some
regard as cast in stone.)  to top this off, it is a nuisance to work with
"variant concrete syntaxes" through SGML declarations in every file.

as a guiding principle for the revision of SGML, it has been stated that
all conforming documents according to the first edition, shall and must
remain conforming according to the second edition.  it has also been stated
at various occasions that WG8 does not standardize API's, which I think is
very good.  however, this means that the document as seen by the parser and
particularly the data model that applies to it, will change, as is most
likely required by the changes that are made to the standard.  one is left
to wonder what one shall do with a character string which, technically
speaking, remains conforming, but which is no longer accessible in the same
way, and why it is so important for this character string to be the ruler
of the application universe.  the requirement that a conforming document
according to the first edition of the SGML standard also be a conforming
document according to the second edition is probably of the same scare
tactics quality used to suppress discussion, and will inhibit SGML from
becoming a standard that computer scientists can _also_ find useful.

the conclusion that you have all been waiting for, and possibly dreading if
you've understood everything I have been saying, is that SGML has only one
enemy: SGML itself.  _this_ is what must be rectified.

get involved in WG8 before it is too late!  it is not only your data, it is
the whole framework within which you design your applications that is at
stake.  there is a limit to how much you are going to save if the tools you
need are several orders of magnitude more expensive than those of inferior
technical solutions.  the rest of the world has discovered that computer
science makes some serious wonders with their data possible, while SGML
lags behind, clutching an obsolete parsing specification like its life
depended on it, while it is actually this parsing specification that will
be its demise.  it is clear that WG8 will not let go without a fight.  SGML
is, however, much bigger and much more important than WG8, and users and
vendors world wide will need to make their views known, and to direct this
effort to a healthy completion.  in 1996, there is another five-year review
vote for SGML.  that isn't very far off, and if you're not there (or at
least represented) when the decisions are made in WG8, another important
principle in the review process will destroy your chances of coming through
with your comments: it is _very_ unlikely that one meeting will be able to
reverse the decisions of a former meeting, regardless of who shows up at
which of these meetings.  positively stated: a vote will not be reversed
just because it is in minority at a later meeting.  negatively stated: a
vote will be ruled by the first minority to band together to preclude all
future votes.  with the low number of people attending the meetings, you
are all in grave danger of having tricks pulled on you before you know it.

if you have invested in SGML to protect your information, now would be a
good time to act accordingly.  the best way to protect information is to
keep it alive, not to mummify it.

#\<Erik>
</message>
<message id="<1994Sep9.230515.11823@ast.saic.com>" date="2988140715">
Newsgroups: comp.text.sgml
Date: 09 Sep 1994 23:05:15 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep9.230515.11823@ast.saic.com>
References: <34olar$cms@news.delphi.com>
Subject: Re: SGML FAQ

[Jeffrey McArthur]

|                                    The
|   
|                         Un-Official, Non-Sanctioned
|   
|                         Frequently Asked Questions
|   
|                                    List
|   
|                               Version 0.0.0
|   
|                             Date: September 8, 1994
|   
|             Compiled by Jeffrey McArthur (j_mcarthur@bix.com)
|   
|   Subject: Table of Contents

KUDOS!!  BRAVO!!!  THANKYOU

Jeffery is trying to help save this group.  Thankyou!  I would like to
contribute.

Jeffery asks:

|   This is only the rough outline of what is to follow, hopefully.
|   Ideally, the FAQ should be organized as an SGML document.  But to start
|   with, this is just an ASCII text file.  But looking to the future, what
|   DTD should the FAQ use?

Tagging the doc will make it a little harder to read, so maybe there need
to be a tagged and untagged version?  If a DTD doesn't exist, I hacked one
together and started to tag Jeffery's FAQ.  I can automate this and post
it, but first I thought we all outght to argue for a few years about the
DTD ;-).  Seriously, if you can improve it, please do or maybe post an
existing one.  This is a quick, 30 minute hack and needs a lot of work.
I'd be happy to act as a focal point for the dtd and would be willing to
parse it and maintain it.  I just tagged a few lines by hand and parsed it
with sgmls.  It seemed to like it.  The SGML declaration needs adjusting
down, but I just grabbed one.  Here's what I have so far: (Oh go ahead and
flame me, I can take it.)

\<!SGML "ISO 8879:1986"

CHARSET
BASESET "ISO 646-1983//CHARSET International Reference Version
         (IRV)//ESC 2/5 4/0"
DESCSET     0    9   UNUSED
     9    2   9
    11    2   UNUSED
    13    1   13
    14   18   UNUSED
    32   95   32
   127    1   UNUSED
BASESET   "ISO Registration Number 100//CHARSET ECMA-94
           Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET   128         32   UNUSED
   160          5   32
   165          1   UNUSED
   166         88   38
   254          1   127
   255          1   UNUSED


CAPACITY     SGMLREF
TOTALCAP     175000
GRPCAP        70000
ATTCAP        60000
SCOPE        DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
   18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
BASESET "ISO  646-1983//CHARSET International Reference  Version
         (IRV)//ESC 2/5 4/0"
DESCSET    0          128          0
FUNCTION   RE          13
           RS          10
           SPACE       32
           TAB        SEPCHAR      9
NAMING     LCNMSTRT   ""
    UCNMSTRT   ""
    LCNMCHAR   "-."
    UCNMCHAR   "-."
    NAMECASE   GENERAL    YES
        ENTITY     NO
DELIM      GENERAL    SGMLREF
    SHORTREF   NONE
NAMES      SGMLREF
QUANTITY   SGMLREF    LITLEN     2048
        NAMELEN      32
        ATTCNT       80
FEATURES
MINIMIZE   DATATAG   NO   OMITTAG    YES   RANK       NO    SHORTTAG   NO
LINK       SIMPLE    NO   IMPLICIT   NO    EXPLICIT   NO
OTHER      CONCUR    NO   SUBDOC     NO    FORMAL     YES
APPINFO    NONE >
\<!DOCTYPE FAQ [
\<!-- First attempt at a FAQ DTD RAA V 0.0.0 Sept. 9 1994 -->
\<!ELEMENT FAQ - - (front, body, rear)>
\<!ATTLIST FAQ version NMTOKEN #REQUIRED>
\<!-- The following definition for date tries to eliminate ambiguities
     due to different systems of date notation in common use          -->
\<!ELEMENT date         - o  EMPTY >
\<!ATTLIST date Month  (Jan | Feb | Mar | Apr | May | Jun |
                       Jul | Aug | Sep | Oct | Nov | Dec | None) None
               Day     NUMBER        "0"
               Year    NUMBER        #REQUIRED >

\<!-- The "Done" element is a dummy element used to inform the application
         that an element's attributes are ready for processing.       -->

\<!ELEMENT done  - o       EMPTY>
\<!ELEMENT front - o (title, date, editor, toc*, foreword*)>
\<!ELEMENT foreword - o (#PCDATA)>
\<!ELEMENT editor - o (#PCDATA)>
\<!ELEMENT toc - o (tocentry+)>
\<!ELEMENT tocentry - o (#PCDATA)>
\<!ELEMENT body - o (section+)>
\<!ELEMENT section - o (title, (info | QA)*)>
\<!ELEMENT item - o (para*)>
\<!ATTLIST item id ID #REQUIRED
          subject CDATA #REQUIRED>

\<!ELEMENT info - o (para*)>
\<!ATTLIST info id ID #REQUIRED
          subject CDATA #REQUIRED>
\<!ELEMENT QA - o (question,answer+)>
\<!ATTLIST QA id ID #REQUIRED>
\<!ELEMENT question - o (#PCDATA)>
\<!ELEMENT answer - o (#PCDATA)>
\<!ELEMENT para - o (#PCDATA)>
\<!ELEMENT rear - o (notes* , bibliography*, appendix*)>

\<!ELEMENT appendix - o (para*)>
\<!ELEMENT bibliography - -  (bibitem*) >
\<!ELEMENT bibitem      - -  (author+, title, publisher, city?,
                                report-num?, date, done) >
\<!ATTLIST bibitem  id ID #IMPLIED
                   label CDATA #IMPLIED
                   pubtype (book | periodical | conf-proc | tech-rep |
                              manual | unpublished ) book
                   pubnum CDATA ""
                   isbn CDATA ""
                   >
\<!ELEMENT (publisher | report-type | report-num | author | city)  - o  (#PCDATA)>
\<!ELEMENT (title) - o  (#PCDATA)>
\<!ELEMENT notes - o (note+)>
\<!ELEMENT note - o (#PCDATA)>
\<!ELEMENT newline - o EMPTY>
 ]>

\<faq version="0.0.0">
\<front>\<title>The Un-Official, Non-Sanctioned Frequently
Asked Questions List \</title>
\<date month="sep" day="8" year="1994">
\<editor> Jeffrey McArthur (j_mcarthur@bix.com)\</editor>
\<toc>\<tocentry>\</tocentry>\</toc>
\<foreword>\</foreword>\</front>
\<body>\<section>\<title>General Information\</title>
\<info id="I1.1" subject="Notes about the FAQ">
\<para>There is no officially maintained FAQ for comp.text.sgml.  This is an
attempt to solve the most frequently asked question on this newsgroup,
"where is the FAQ?".  Rather than start an war about who is right or wrong
or if there should be a FAQ at all I decided that it would be in my best
interest to provide a skeleton structure to a non-official FAQ.\</para>
\<para>This is only the rough outline of what is to follow, hopefully.
Ideally, the FAQ should be organized as an SGML document.  But to start
with, this is just an ASCII text file.  But looking to the future, what DTD
should the FAQ use?\</para>
\<para>If you to help with this FAQ, please send any updates or comments to
j_mcarthur@bix.com. The only way this FAQ will be developed is with help
from others.\</para>
\<para>One word of warning, since I am starting this FAQ, it will reflect my
opinions.\</para>
\</info>
\<qa id="I1.2">
\<question>What is Markup\</question>
\<answer>Using a highlighter pen to emphasize passages in a book is "marking
up"the book.  The highlights show passages that are important to the
reading.  Underlining is another form of markup.  It is not possible to use
a highlighter on an electronic document.  To implement electronic markup a
variety of ideas have been developed.\</answer>
\</qa>
\</section>
\</body>
\<rear>\</rear>
\</faq>

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<34rjsj$5ol@deep.rsoft.bc.ca>" date="2988167507">
Newsgroups: comp.text.sgml
Date: 10 Sep 1994 06:31:47 UT
From: Tim Bray \<a07893@giant.rsoft.bc.ca>
Organization: MIND LINK! Communications Corp., Langley, BC, Canada
Message-ID: <34rjsj$5ol@deep.rsoft.bc.ca>
References: <19940909.4927@naggum.no>
Subject: Re: SGML and its enemies

This is response to Erik's recent cri-de-coeur.

The thesis behind SGML - that markup should be structural/descriptive
rather than presentational, and its important corollary, that presentation
should be wherever possible computed at run-time, are in my view so
self-evidently true that there is no room left for argument.

The inventors of SGML deserve immense credit for noticing this and getting
a standard defined and, mostly by force of will, getting the world to start
noticing and using it.

This doesn't mean that SGML's design is the one for the future.  I've never
understood why, as Erik wonders, it has to belong to the difficult LL(1)
class of grammars, or why the language and metalanguage have to be
different, or why it has to look so much like OS JCL, or why minimization
mechanisms have to exist, and a whole bunch of other things.  And some very
smart/determined people have tried to convince me.

Perhaps SGML is the OS/360 of descriptive markup systems.  Too big, full of
things that in hindsight could have been better.  But here and implemented
and working.  Quite likely the Unix of descriptive markup systems is
running in a back-room somewhere or maybe just a gleam in a hacker's eye.
But I don't think WG8 or anybody else is going to "fix" SGML any more than
IBM "fixed" OS/360.

Descriptive markup is so much better than any alternative that SGML,
whatever its problems, is still the way to go for any information you care
about keeping around for the long term.  At least when the bright new idea
comes along, you'll have a fighting chance of getting from here to there if
you used SGML.

On one point I strongly disagree with Erik; his put-down of SGML because of
the difficulty of mapping to page or screen presentation.  That's just a
hard problem, partly because the competition (competently typeset
conventional paper pages) is so good.  I don't think it's going to be made
much easier any time soon by improvements to the markup scheme.  But it's
still easier to solve that problem than the ones you get from having your
data locked up in proprietary presentational markup.

Cheers, Tim Bray, Open Text Corporation

PS: Reactions to the DSSSL draft from those in the know would be really
    welcome in this group.
</message>
<message id="<ogawa.1129568071H@news.teleport.com>" date="2988169231">
Newsgroups: comp.text.sgml
Date: 10 Sep 1994 07:00:31 UT
From: Arthur Ogawa \<ogawa@teleport.com>
Organization: TeX Consultants
Message-ID: \<ogawa.1129568071H@news.teleport.com>
References: <34olar$cms@news.delphi.com>
Subject: Re: SGML FAQ

[Jeffrey McArthur]

|                                    The
|                         Un-Official, Non-Sanctioned
|                         Frequently Asked Questions
|                                    List
|                               Version 0.0.0

My kudos also to Jeffrey, and to Bob Agnew.  Let's all pitch in and help
flesh this thing out.  Newbie posts are rather common here, so there's a
real need for the FAQ.

-- 
Arthur Ogawa, TeX Consultants, Kaweah CA 93237-0051
Ph: +1 209 561 4585, FAX: -4584
PGP Key: finger -l ogawa@teleport.com
</message>
<message id="<34sapj$gb1@news.delphi.com>" date="2988175636">
Newsgroups: comp.text.sgml
Date: 10 Sep 1994 08:47:16 UT
From: Jeffrey McArthur \<j_mcarthur@bix.com>
Organization: ATLIS Publishing
Message-ID: <34sapj$gb1@news.delphi.com>
Subject: Re: SGML and its enemies

[Tim Bray]

|   On one point I strongly disagree with Erik; his put-down of SGML
|   because of the difficulty of mapping to page or screen presentation.
|   That's just a hard problem, partly because the competition (competently
|   typeset conventional paper pages) is so good.  I don't think it's going
|   to be made much easier any time soon by improvements to the markup
|   scheme.  But it's still easier to solve that problem than the ones you
|   get from having your data locked up in proprietary presentational
|   markup.

I disagree.  Having developed both page and screen presentation systems, I
can tell you, based on my experience, that improvements in the markup
scheme can make the job much, much easier.

-- 
    Jeffrey M\\kern-.05em\\raise.5ex\\hbox{\\b c}\\kern-.05emArthur
    a.k.a. Jeffrey McArthur          email: j_mcarthur@bix.com
    phone: +1 301 210 6655
    fax:   +1 301 210 4999
    home:  +1 410 290 6935

The opinions express are mine.  They do not reflect the opinions of my
employer.  My access to the Internet is not paid for by my employer.
</message>
<message id="<19940910.4934@naggum.no>" date="2988191090">
Newsgroups: comp.text.sgml
Date: 10 Sep 1994 13:04:50 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940910.4934@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca>
Subject: Re: SGML and its enemies

[Tim Bray]

|   This doesn't mean that SGML's design is the one for the future.

the crux of my point -- that if we are planning for information longevity
with SGML, it'd better _be_ there 10, 20, 50 years down the road, and be
deemed worth implemented by people who need it.  the cost of implementing
SGML is so high that this itself is a strong argument _against_ SGML, and
thus by implication, against information longevity (because SGML is the
only good solution to this problem so far), and _this_ is the tragedy that
I'm trying to prevent by making SGML palatable to the computer scientists
and system designers, and to make it possible to use existing tools to
build SGML systems, rather than having to build them from scratch because
SGML is gratuitously "different".  that SGML today has a virtual monopoly
on the good ideas should never be used as an excuse not to be better than
any real or imagined competition.

one of the battle cries of SGML is that you pay a little more up front and
get more back later.  but when you pay up front all the time and "later"
remains just that, and you realize that because of a number of stupid
misfeatures, you will _continue_ to pay more up front for SGML than for any
competing solution, you need a foresight that is hard to find in today's
corporate culture to really stick with SGML for the next 50 years.

the bright spots of light today are those who look ahead and are willing to
take the costs.  I want to make sure that they will never have to fight to
keep using SGML because of mounting costs.  as we know, getting "down" from
SGML to something else is relatively cheap and easy, but getting "up", can
be horribly expensive.  this means that we need more to convince people to
stay with SGML than some juicy carrot on a long stick, or they will at most
surf on SGML until something better (cheaper, etc) comes along.  this I
only see possible if can make SGML a data format _and_ computer science
standard that people can apply without wasting months of their lives
banging their heads against the walls of the specification.

|   But I don't think WG8 or anybody else is going to "fix" SGML any more
|   than IBM "fixed" OS/360.

ISO committees have a mandate from the national bodies that makes this more
than possible.  some features in SGML are badly written first drafts, while
others are polished final copy.  I have no intent to mess with the final
copy, but the first drafts do need attention, and they needed it four years
ago.  the committee has been stalling for so long that we need to make
these things happen in an orderly fashion, instead of a vote to withdraw
the entire SGML standard at one of the five-year reviews.

|   Descriptive markup is so much better than any alternative that SGML,
|   whatever its problems, is still the way to go for any information you
|   care about keeping around for the long term.  At least when the bright
|   new idea comes along, you'll have a fighting chance of getting from
|   here to there if you used SGML.

agreed, but if you can't get there from here because the costs of using
SGML outweigh the costs of going for something else for the foreseeable
future, then what?  the world doesn't stand still, and an inflexible,
almost brittle, standard such as SGML doesn't withstand pressure well.

|   On one point I strongly disagree with Erik; his put-down of SGML
|   because of the difficulty of mapping to page or screen presentation.

no wonder you disagree -- I never said anything remotely _like_ this.  I am
and have been concerned about mapping to _programming_ languages and their
data representation schemes.  please don't confuse cause with effect -- we
_observe_ that SGML formatting systems are a crummy lot, but I fault the
difficulty of mapping SGML constructs into something a programmer can deal
with in some of the world's many suitable programming languages as the
cause for this.  this, of course, has no effect on the rest of the problems
of building a good formatting system, which do, as you say, remain hard.  I
observe that those who are willing to tackle these problems are not willing
to tackle problems of about the same size just to use some standard, at
least not until they've made a billion dollars selling their proprietary
solution.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<DMEGGINS.94Sep10120556@aix1.uottawa.ca>" date="2988201956">
Newsgroups: comp.text.sgml
Date: 10 Sep 1994 16:05:56 UT
From: David Megginson \<dmeggins@aix1.uottawa.ca>
Organization: Department of English, University of Ottawa
Message-ID: \<DMEGGINS.94Sep10120556@aix1.uottawa.ca>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca>
Subject: Re: SGML and its enemies

I believe that SGML may just be a stopping point on the way to structured,
easily-interchangeable information; however, it is also very likely that
SGML will leave its mark on whatever comes along afterwards.  For example,
even non-SGML specialists are starting to get used to seeing \<tag> and
\&entity;, and on the lexical level, these parts of the Reference Concrete
Syntax will probably turn up again in whatever replaces SGML.

The best thing about SGML -- even as it stands now -- is that because
any single SGML document has a clearly-defined lexical and syntactic
structure, it will be a relatively trivial (and entirely automatic)
problem to convert the terabytes of currently-existing SGML documents
to whatever standard comes next.

David
-- 
David Megginson                Department of English, University of Ottawa,
dmeggins@aix1.uottawa.ca       Ottawa, Ontario, CANADA  K1N 6N5
dmeggins@acadvm1.uottawa.ca    Phone: +1 613 564 6850 (Office)
ak117@freenet.carleton.ca             +1 613 564 9175 (FAX)
</message>
<message id="<34sn5m$gel@deep.rsoft.bc.ca>" date="2988203638">
Newsgroups: comp.text.sgml
Date: 10 Sep 1994 16:33:58 UT
From: Tim Bray \<a07893@giant.rsoft.bc.ca>
Organization: MIND LINK! Communications Corp., Langley, BC, Canada
Message-ID: <34sn5m$gel@deep.rsoft.bc.ca>
Subject: Why have DTD's?

At the HyTime workshop in Vancouver, Wayne Wohler presented a report on a
project he'd done to produce a DTD for DTD's.  I can't stop thinking about
this.  Suppose I had one of these, either one that captured the DTD syntax
directly, or one that used reference concrete and then had a perfect 2-way
filter.  I forget which way Wayne was going, but he said he thought he'd
captured the whole standard.

Then I could edit DTD's with an SGML editor.  I'd only have one language
class to deal with.  It would be easier for people to learn SGML.  The more
I think about this, the more it seems like a good idea.

Am I missing something obvious?  Anyone else doing this?

Cheers, Tim Bray, Open Text Corporation
</message>
<message id="<chet.97.002D8176@lds.com>" date="2988287313">
Newsgroups: comp.text.sgml
Date: 11 Sep 1994 15:48:33 UT
From: Chet Ensign \<chet@lds.com>
Organization: Logical Design Solutions, Inc.
Message-ID: \<chet.97.002D8176@lds.com>
Subject: Glushko explains 6 factors for SGML success to SGML Forum NY


SGML Forum of New York, Inc.

Minutes of the General Meeting, May 9, 1994.

This article summarizes the May general meeting of the SGML Forum of New
York.  The meeting was held on May 9, 1994 at McGraw-Hill Inc, New York.


Presentations

Robert Glushko, Passage Systems

After the announcements, Bob Glushko of Passage Systems took the floor to
tell us what organizational factors spell "success" when migrating to SGML.
His ideas developed from those he originally expressed in "7 Ways to Make
Sure Your Hypertext Project Fails," an article he published in the Society
of Technical Communications Journal.  Bob has found that the ideas, which
originally applied only to hypertext projects, are equally applicable when
the topic is SGML.

The key question is, can you predict, going in, whether or not a company
will succeed in adopting SGML.  Bob believes that you can, based on his
experiences with a number of companies over a number of years.  "Just think
of this as a kind of composite case study," he suggested.


First, a Deity's-Eye View of the Migration

Bob asserted that by now, just about everybody who has heard of it accepts
the notion that at some future date, having brought SGML into their
operation will have been a good thing to do.  The benefits of having
documents stored in a structured, non-proprietary markup scheme are clear.

The question is; is SGML a good thing to do now?  Because whereas sticking
with proprietary, non-SGML systems will continue to provide incremental
improvements in productivity, adopting SGML is going to provoke an initial
drop in benefits before you start to realize its advantages.  "You're going
to have to say to yourself; 'For some period of time, I'll be worse off.
It may not be a long time, but do I want to take that hit now?'"  He
graphed the relationship between moving forward with SGML and non-SGML
systems like this.


BENEFITS |                                          *  SGML 
         |                                        .
         |                                     .
         |                                   .
         |                                .
         |                             . 
         |                           =      =  =  = *  Non-SGML
         |                     =   .
         |              =      .
         |      *  =      .
         |       . . .  
         +-----------------------------------------------  
TIME 
               Now                                  Later

The transistion costs are inevitable because you are talking about
migrating from a less demanding medium (the non-SGML products) to a more
demanding medium (SGML).  There's going to be a drop in productivity while
people learn new ways of doing things.  There's going to be delays while
old documents are converted and brought up-to-date.  In any document
production system, there's no escaping the dependencies between:

  * The extent and explicitness of the markup,

  * The processes required to create that markup, and 

  * The functionality of the software that exploits the markup.

In each of these areas, SGML systems are more demanding than proprietary,
non-SGML systems.  "Many SGML projects fail," Bob said; "because the
difficulties in reaching its benefits are not made clear up front.

So, even though moving to SGML is a good idea (because once you get past
that transition point, the benefits keep on increasing), many people say;
"Not on my shift.  Why should I be the one who pays for the long term
benefits to the company?"  Instead, they are inclined to say; "We'll wait
and make the jump when it gets easier."  They hope to avoid that initial
decrease in productivity, etc., and reach the transition point painlessly.


BENEFITS |                                          *  SGML 
         |                                        .
         |                                     .
         |                                   .
         |                                .
         |                             . 
         |                           .=  =  =  =  = *  Non-SGML
         |                     .=   
         |              .=   
         |      *  .=    
         |      
         +-----------------------------------------------  
TIME 
               Now                                  Later


Bob said; "Uh-uh.  Can't be done."  All that does is delay the initial pain
of transition.  You spend more time learning bad habits, creating pretty
documents with non-standard layouts and styles, and you put off the
inevitable reckoning with the demands for consistency and structure that
SGML depends upon.

If you are not going to adopt SGML now, Bob suggested that you should at
least start practicising good habits to make the transition easier and
shorter.  Start training your writers to think about documents in a more
structured way.  Start using a simple set of consistent formatting and
encourage people to stick to those structures.


The Respective Roles of Technology and Process

People typically have one of three views of SGML.  First, there's the tool
view: "How many of you want to use SGML because you want to use
'Hyper-Multi-Super-Dyna-Frame-Text- Card-Explorer-Mosaic-Book'?" Bob asked.

Then there's the media view: You want to adopt SGML because it's the format
that will give you tightly coupled, totally interactive text, graphics,
audio, video, animation, virtual reality etc., etc.

Finally, there's the process view.  This view says that SGML is a way to
deliver a reliable, structured, end-to-end authoring and production system
for use, reuse and more reuse of your documents.  This one's the dull view.

None of these views, taken alone, is sufficient to support the move to
SGML.  The truth, for every organization, is that adopting SGML involves
evolution in both its technology and process dimensions.  What kinds of
change an organization faces, and their chances of success, depend on the
where they stand in the technology/process framework.


High     |    S                                      SGML 
         |                                           VISION
         |                                       
         |                                  
PROCESS  |                                
TECH-    |                            
NOLOGY   |                           
         |                           
         |        
         |                  
Low      |    F                                      J
         +-----------------------------------------------  
              Low                                    High  

Bob said that whenever he is asked to help a company move to SGML, he
starts by analyzing it's "capabilities culture" -- where the company fits
in the technology/process framework:

  * S's are the "Suits."  These are conservative, techno-averse
    organizations that have a book and do everything by it.  They are low
    on the technology-evolution scale but advanced in the art of evolving
    procedures and processes and living by them.

  * J's are the "Jeans."  Their first love in life is to hang-ten on the
    bleeding edge of the technology but they don't much care for rules.
    They are way on down the technology highway, and they don't intend to
    let things like rulebooks spoil the fun.

  * F's are the "Furs."  They are scared of both process and technology and
    inclined to live in caves.

The "capabilities culture" affects the decision about how to migrate to
SGML and a company's chances of success.  Ideally, the migration path for
any of these groups would be a straight line.  "Suits" would just buy some
technology.  "Jeans" would learn to live with some standards and
procedures.  "Furs" would have to come out of their caves and smell the
roses.

But the actual migration path always involves some disruption to your way
of doing things.  "New technology always destabilizes existing processes,"
Bob said; "and the less 'process' you have in place, the bigger the
disruption will be."  The "Suits" will experience a process disruption,
because even though they have advanced procedures, SGML always causes
processes to be redefined and changed.  The "Jeans" will find that life
gets harder, because SGML makes demands on their free-form life style.
They'll be inclined to try it and then drop it.  In fact, Bob said; "I have
never seen SGML brought into a company by the research teams."

Bob asserted that it is the companies further along with process evolution
that tend to succeed at SGML because procedures act as a "safety net."  The
better thought out and implemented the process, the more independent it
tends to be of the technology and the less impact the technology is likely
to have on it.


The Key Factors for Success

Having helped a number of companies tackle SGML and electronic publishing
over the years, Bob has been in the position to observe both successes and
failures.  Out of that experience, he has concluded that 6 factors affect
whether a company will succeed or fail at adopting SGML.  A company need
not have all 6 in order to succeed.  A factor may not be necessary, a
factor may not be sufficient; but if your organization is low on all of
them -- well, don't bet your paycheck on the outcome.

Bob's factors for success are:

  * Customer-centered project justification and requirements.

  * Explicit publications process.

  * Content-centered process.

  * Standard authoring tools and styles.

  * Low reliance on contract/external authors.

  * Mechanisms and policies for systematic author orientation & training,
    technology evaluation.


Customer-Centered Justification

Bob said that if you answer the question "Why do you want to use SGML?"
with responses like; "Our competition is doing it" or "We need to go to
CD-ROM so we can cut our printing costs" then you are probably getting off
on the wrong foot.

On the other hand, if your response starts with a phrase like; "We've
surveyed our customers and they want..." then you are more likely to be
doing it for reasons that relate directly to the heart of your business.
Customer-centered initiatives show the following characteristics:

  * There's less incentive to cut corners than when the project is driven
    solely by cost-reduction.

  * It is easier to sustain internal support and resources over the long
    term.

  * The systems are more likely to be based on the documents the customers
    actually use, rather than the ones that happened to be lying around.

  * You'll be less likely to accept small-scale demos as
    "proof-of-concept."


Explicit Publications Process

Look around.  Is there a "Getting Started" handbook for new authors?  Is
there a published style guide or list of the corporate standards?  Does it
get used?  If so, then you have a documented publishing process that people
are likely to follow.

Explicit publications processes are important because they help avoid
wasted or duplicated efforts.  They also minimize cultural clashes between
different groups in your organization because the corporate goals are
visible.

Explicit publications processes also document the flow of information; its
source, who does what to it, and where it finally goes.  It becomes much
easier to locate problems in the publishing cycle when you know exactly
where to look for them.

You must, Bob said, design your process to synchronize the technological
and the organizational boundaries.  "If your writers just throw their stuff
over the wall to some place else, without knowing what effect their stuff
has on the people who have to take it from there, then they have no
motivation to become more aware of the structure of the documents they are
creating."  An explicit publishing process makes the impact of each
person's job clear.


Content-Centered Process

"A print orientation hides a lot of sins in the authoring process," Bob
declared.  When writers are only concerned with getting a document to
"print right", they aren't being concerned with whether it is structured
right -- or whether the content is making sense. In the extreme case, this
leads to "Macbeth Multimedia" -- titles that are full of sound-bites and
fury but not much else.


Standard Authoring Tools and Styles

An organization that has at least standardized on a common set of tools is
in a better position to migrate to SGML than one where people use a
potpourri of word processors and DTP programs.  The format conversion
problem is simplified.  Also, it indicates that the organization has
already recognized the value of standards.

Common tool sets also get people into the habit of learning from one
another.  It creates an open environment where people support one another
-- and that has the effect of minimizing the impact of technology change.


Low Reliance on Outsiders

Companies that rely heavily on temporary authors are not investing in their
employees.  And the employees are not investing in them.  "People who
aren't part of the culture don't have the time or the reason to learn how
the organization works and figure out how they can make a better
contribution."

When a company is investing in its employees, it is easier to justify
investing in new tools and training -- key elements in adopting SGML.  It
is also easier to provide incentives that encourage the writers to focus on
structure awareness and compliance to the standards.


Mechanisms for Author Training and Technology Evaluation

Where there are explicit standards and policies regarding training,
orientation and evaluation, there is less fear of short-term transition
costs.  The company's culture is more likely to focus on and respond to
benefits and successes than to worry about the impact change will have on
the present way of doing things.

In summary, to successfully migrate to SGML:

  * Identify your customers and find out what they really need.

  * Develop a strategy that's consistent with your "capabilities culture."

  * Make effective use of the installed base of technology and process that
    you already have in place.

  * Take small steps and profit from the benefits that you garner along the
    way.

  * Identify and remedy technology and procedural deficits in your
    organization.

  * Synchronize the processes and technologies across the organization's
    boundaries.

  * Re-engineer the publications process.

  * Embody your standards in the supporting technology. As Bob put it; "The
    technology should be invisible."

The floor was then opened to questions.

In response to a question asking what you do when your company's job is
working with outside authors, Bob said that you can try to find ways to
encourage them to do things in a more structured, organized fashion.  For
example, you might give them a template and tell them that if they use it,
you can cut the publishing time in half.  Or maybe you can alter the
royalty scheme.

Since the Jeans are never successful at implementing SGML, another audience
member asked, what do you say about the WWW folks?  Bob replied that they
are mainly groups of academics doing projects for each other.  "There
wouldn't be such an explosion in HTML and Mosaic consulting services if
this stuff really worked in a production environment."

Bob then stressed that he wasn't saying the R\&D people didn't belong.  But,
he said, they are there to do a different job.  "Jeans are supposed to fool
around and find new technology.  They get rewarded for hacking a path
through the jungle.  SGML is about paving the streets and building the
MacDonalds along the side."

Bob Glushko is Chief Scientist for Passage Systems, Inc., a provider of
SGML training and consulting services, and maker of PassagePRO, document
management software for SGML-based publishing.  Bob can be reached at
"glushko@passage.com" on the internet, or at +1 415 390 0911.



Vendor Presentation

Robert Glushko, PassagePRO

After a brief intermission, Bob continued the meeting with a talk about his
company, Passage SYSTEMS, INC., and their product PassagePRO(TM).
Unfortunately, the Sun workstation on which PassagePRO was going to be
demonstrated malfunctioned and Bob was only able to show us slides of
screens.

Passage Systems provides products, services and system integration for
companies adopting electronic publishing, especially those moving to SGML.
Their main office is in Mountain View, CA, with additional offices in
Pittsburgh, PA and Austin, TX.

Passage Systems offers:

  * PassagePRO, an SGML-based document management and production system.

  * Templates, DTDs and other conversion tools to support transformation to
    SGML-based publishing in organizations using non-SGML tools.

  * Conversion of legacy documents.

  * Consultation services, including customer surveys, capability
    assessments, document analysis, process re- engineering and migration
    strategies.

  * Training.

PassagePRO is an SGML-based document management and publication production
system designed around three functional parts: the workflow management
component, the production management component and the document management
component.  PassagePRO, Bob explained, models the physical work of
producing documents in an organization by taking the complex processes of
conversion, document routing, version- control, etc.  and abstracting them
to an on-screen representation of the entire publishing cycle.  Each person
in the production cycle can focus on the jobs where they add value to the
finished product (such as researching, writing and reviewing) and let
PassagePRO handle the mechanical chores.

Defining Processes

Bob defined work flow as "tasks done by humans to satisfy a development
process."  For example, a typical publishing work flow could consist of the
tasks:

[Research]  ->  [Write]  ->  [Edit]  ->  [Review]  ->  [Release]

Each of these tasks would be assigned to one or more people who would be
responsible for completing them.

Bob defined production flow as "tasks done by computers to create needed
outputs."  For example, an organization might have three production
processes producing three different final documents:

  \<Filter WP to           \<Filter FrameMaker    \<Compose text
   Rainbow DTD>            to DocBook DTD>       to PostScript>
      |                         |                     |
  \<Transform to           \<Validate DocBook     \<Compose graphics
   DocBook DTD>            SGML>                  to PostScript>
      |                         |                     |
  \<Build book             \<Build book           \<Print book>
   for DynaText>           for OLIAS>

PassagePRO makes it possible for an organization to integrate workflow and
production flow.  Authors can preview the final result of their work at any
stage of the project.

         ... [Write]  ->  [Edit]  ->  [Review]  ... 
                |                         |
           \<Filter WP to             \<Filter WP to 
            Rainbow DTD>              Rainbow DTD>  
                |                         | 
           \<Transform to            \<Transform to   
            DocBook DTD>             DocBook DTD>  
                |                         |  
           \<Build book              \<Build book     
            for DynaText>            for DynaText>  

This has an important benefit for organizations; authors no longer simply
"throw their text over to wall" to some distant group charged with doing
the final production.  In the old model, before production flow could be
automated, writers didn't worry about how their work affected production
staff because they never got to know them.  With PassagePRO, writers can
see how their material will look in the final document at any point in the
production cycle.  Writers now have a supportive production environment
that helps them "get it right from the start."

Because of PassagePRO's client/server design and extensive programming
"hooks" for external tools, it can also be used to tie existing tools
together to create automated production flows as well as to integrate new
tools into the environment.  This is particularly helpful where
organizations already have proprietary, non-SGML writing systems in place.
Passage SYSTEMS has designed PassagePRO to enable "native SGML authoring
and conversion from traditional word processors to coexist."


An Object-Oriented Production Environment

PassagePRO is built on an object-oriented framework.  An organization's
various tools are registered with the system so that documents "know" what
tools, work flows and production flows (word processors, conversion
programs, DTDs, etc.) are associated with them at any given point in the
production cycle.  This offers a key benefit to the organization; because
documents know what to do with themselves, writers and production staff
don't need to.  Instead of spending time (and hence money) learning and
executing purely physical tasks, they can do work that adds value to the
final product.

PassagePRO includes a "Document Debugger" function to help writers learn to
think about their documents structurally.  This is important where writers
are working in word processing programs that must be converted into SGML.
PassagePRO is set up so that error messages come up in language that the
writer can understand.

The debug function typically works like this.  An author sends his word
processed document off for production by selecting the production process
off a menu.  A conversion program is then run to convert the document into
SGML and parse it.  If the file parses correctly, the document continues on
to the next step in the production flow.

If the document has structural errors, an error listing is returned to the
writer's display.  The writer then knows that some things have to be fixed
in his document.  When he selects an error off the list, the document is
opened in the proper program to the point where the error occurs.  The
writer is also shown an "advice field."  The advice in the field can be
information written when the system was first set up, or it can be advice
added by other writers who have previously made this same kind of mistake.
Thus, as writers work with documents in the production environment, they
can enter and share a growing body of knowledge that future writers can
profit from.


Document Management

PassagePRO sits on top of a database that tracks information about your
documents.  Writers can always answer questions like "who else is affected
if I change this graphic?"  The writer can also query attributes about any
objects to which he has access.  PassagePRO can show structural views of
the document providing information about who else and what else affects
this particular piece the writers is working on.

The architecture of the system is a client/server database.  Currently,
this transaction layer runs on a Versant OODB.  Passage SYSTEMS is also
porting it to Sybase and Bob said that the system was designed from the
start to ease porting to other platforms.

Clients currently run on a number of UNIX platforms and Bob said that they
are building other clients like Windows.  "The best way to get us on your
platform," Bob said; "is to hire us and then we can do the port."

Passage Systems, Inc. is located in Mountain View, California. Their phone
number is +1 415 390 0912. Bob works out of Pittsburgh. He can be reached
at +1 412 362 3356 or at "glushko@passage.com" on the internet.
</message>
<message id="<chet.97.002D8176@lds.com>" date="2988287313">
Newsgroups: comp.text.sgml
Date: 11 Sep 1994 15:48:33 UT
From: Chet Ensign \<chet@lds.com>
Organization: Logical Design Solutions, Inc.
Message-ID: \<chet.97.002D8176@lds.com>
Subject: Glushko explains 6 factors for SGML success to SGML Forum NY


SGML Forum of New York, Inc.

Minutes of the General Meeting, May 9, 1994.

This article summarizes the May general meeting of the SGML Forum of New
York.  The meeting was held on May 9, 1994 at McGraw-Hill Inc, New York.


Presentations

Robert Glushko, Passage Systems

After the announcements, Bob Glushko of Passage Systems took the floor to
tell us what organizational factors spell "success" when migrating to SGML.
His ideas developed from those he originally expressed in "7 Ways to Make
Sure Your Hypertext Project Fails," an article he published in the Society
of Technical Communications Journal.  Bob has found that the ideas, which
originally applied only to hypertext projects, are equally applicable when
the topic is SGML.

The key question is, can you predict, going in, whether or not a company
will succeed in adopting SGML.  Bob believes that you can, based on his
experiences with a number of companies over a number of years.  "Just think
of this as a kind of composite case study," he suggested.


First, a Deity's-Eye View of the Migration

Bob asserted that by now, just about everybody who has heard of it accepts
the notion that at some future date, having brought SGML into their
operation will have been a good thing to do.  The benefits of having
documents stored in a structured, non-proprietary markup scheme are clear.

The question is; is SGML a good thing to do now?  Because whereas sticking
with proprietary, non-SGML systems will continue to provide incremental
improvements in productivity, adopting SGML is going to provoke an initial
drop in benefits before you start to realize its advantages.  "You're going
to have to say to yourself; 'For some period of time, I'll be worse off.
It may not be a long time, but do I want to take that hit now?'"  He
graphed the relationship between moving forward with SGML and non-SGML
systems like this.


BENEFITS |                                          *  SGML 
         |                                        .
         |                                     .
         |                                   .
         |                                .
         |                             . 
         |                           =      =  =  = *  Non-SGML
         |                     =   .
         |              =      .
         |      *  =      .
         |       . . .  
         +-----------------------------------------------  
TIME 
               Now                                  Later

The transistion costs are inevitable because you are talking about
migrating from a less demanding medium (the non-SGML products) to a more
demanding medium (SGML).  There's going to be a drop in productivity while
people learn new ways of doing things.  There's going to be delays while
old documents are converted and brought up-to-date.  In any document
production system, there's no escaping the dependencies between:

  * The extent and explicitness of the markup,

  * The processes required to create that markup, and 

  * The functionality of the software that exploits the markup.

In each of these areas, SGML systems are more demanding than proprietary,
non-SGML systems.  "Many SGML projects fail," Bob said; "because the
difficulties in reaching its benefits are not made clear up front.

So, even though moving to SGML is a good idea (because once you get past
that transition point, the benefits keep on increasing), many people say;
"Not on my shift.  Why should I be the one who pays for the long term
benefits to the company?"  Instead, they are inclined to say; "We'll wait
and make the jump when it gets easier."  They hope to avoid that initial
decrease in productivity, etc., and reach the transition point painlessly.


BENEFITS |                                          *  SGML 
         |                                        .
         |                                     .
         |                                   .
         |                                .
         |                             . 
         |                           .=  =  =  =  = *  Non-SGML
         |                     .=   
         |              .=   
         |      *  .=    
         |      
         +-----------------------------------------------  
TIME 
               Now                                  Later


Bob said; "Uh-uh.  Can't be done."  All that does is delay the initial pain
of transition.  You spend more time learning bad habits, creating pretty
documents with non-standard layouts and styles, and you put off the
inevitable reckoning with the demands for consistency and structure that
SGML depends upon.

If you are not going to adopt SGML now, Bob suggested that you should at
least start practicising good habits to make the transition easier and
shorter.  Start training your writers to think about documents in a more
structured way.  Start using a simple set of consistent formatting and
encourage people to stick to those structures.


The Respective Roles of Technology and Process

People typically have one of three views of SGML.  First, there's the tool
view: "How many of you want to use SGML because you want to use
'Hyper-Multi-Super-Dyna-Frame-Text- Card-Explorer-Mosaic-Book'?" Bob asked.

Then there's the media view: You want to adopt SGML because it's the format
that will give you tightly coupled, totally interactive text, graphics,
audio, video, animation, virtual reality etc., etc.

Finally, there's the process view.  This view says that SGML is a way to
deliver a reliable, structured, end-to-end authoring and production system
for use, reuse and more reuse of your documents.  This one's the dull view.

None of these views, taken alone, is sufficient to support the move to
SGML.  The truth, for every organization, is that adopting SGML involves
evolution in both its technology and process dimensions.  What kinds of
change an organization faces, and their chances of success, depend on the
where they stand in the technology/process framework.


High     |    S                                      SGML 
         |                                           VISION
         |                                       
         |                                  
PROCESS  |                                
TECH-    |                            
NOLOGY   |                           
         |                           
         |        
         |                  
Low      |    F                                      J
         +-----------------------------------------------  
              Low                                    High  

Bob said that whenever he is asked to help a company move to SGML, he
starts by analyzing it's "capabilities culture" -- where the company fits
in the technology/process framework:

  * S's are the "Suits."  These are conservative, techno-averse
    organizations that have a book and do everything by it.  They are low
    on the technology-evolution scale but advanced in the art of evolving
    procedures and processes and living by them.

  * J's are the "Jeans."  Their first love in life is to hang-ten on the
    bleeding edge of the technology but they don't much care for rules.
    They are way on down the technology highway, and they don't intend to
    let things like rulebooks spoil the fun.

  * F's are the "Furs."  They are scared of both process and technology and
    inclined to live in caves.

The "capabilities culture" affects the decision about how to migrate to
SGML and a company's chances of success.  Ideally, the migration path for
any of these groups would be a straight line.  "Suits" would just buy some
technology.  "Jeans" would learn to live with some standards and
procedures.  "Furs" would have to come out of their caves and smell the
roses.

But the actual migration path always involves some disruption to your way
of doing things.  "New technology always destabilizes existing processes,"
Bob said; "and the less 'process' you have in place, the bigger the
disruption will be."  The "Suits" will experience a process disruption,
because even though they have advanced procedures, SGML always causes
processes to be redefined and changed.  The "Jeans" will find that life
gets harder, because SGML makes demands on their free-form life style.
They'll be inclined to try it and then drop it.  In fact, Bob said; "I have
never seen SGML brought into a company by the research teams."

Bob asserted that it is the companies further along with process evolution
that tend to succeed at SGML because procedures act as a "safety net."  The
better thought out and implemented the process, the more independent it
tends to be of the technology and the less impact the technology is likely
to have on it.


The Key Factors for Success

Having helped a number of companies tackle SGML and electronic publishing
over the years, Bob has been in the position to observe both successes and
failures.  Out of that experience, he has concluded that 6 factors affect
whether a company will succeed or fail at adopting SGML.  A company need
not have all 6 in order to succeed.  A factor may not be necessary, a
factor may not be sufficient; but if your organization is low on all of
them -- well, don't bet your paycheck on the outcome.

Bob's factors for success are:

  * Customer-centered project justification and requirements.

  * Explicit publications process.

  * Content-centered process.

  * Standard authoring tools and styles.

  * Low reliance on contract/external authors.

  * Mechanisms and policies for systematic author orientation & training,
    technology evaluation.


Customer-Centered Justification

Bob said that if you answer the question "Why do you want to use SGML?"
with responses like; "Our competition is doing it" or "We need to go to
CD-ROM so we can cut our printing costs" then you are probably getting off
on the wrong foot.

On the other hand, if your response starts with a phrase like; "We've
surveyed our customers and they want..." then you are more likely to be
doing it for reasons that relate directly to the heart of your business.
Customer-centered initiatives show the following characteristics:

  * There's less incentive to cut corners than when the project is driven
    solely by cost-reduction.

  * It is easier to sustain internal support and resources over the long
    term.

  * The systems are more likely to be based on the documents the customers
    actually use, rather than the ones that happened to be lying around.

  * You'll be less likely to accept small-scale demos as
    "proof-of-concept."


Explicit Publications Process

Look around.  Is there a "Getting Started" handbook for new authors?  Is
there a published style guide or list of the corporate standards?  Does it
get used?  If so, then you have a documented publishing process that people
are likely to follow.

Explicit publications processes are important because they help avoid
wasted or duplicated efforts.  They also minimize cultural clashes between
different groups in your organization because the corporate goals are
visible.

Explicit publications processes also document the flow of information; its
source, who does what to it, and where it finally goes.  It becomes much
easier to locate problems in the publishing cycle when you know exactly
where to look for them.

You must, Bob said, design your process to synchronize the technological
and the organizational boundaries.  "If your writers just throw their stuff
over the wall to some place else, without knowing what effect their stuff
has on the people who have to take it from there, then they have no
motivation to become more aware of the structure of the documents they are
creating."  An explicit publishing process makes the impact of each
person's job clear.


Content-Centered Process

"A print orientation hides a lot of sins in the authoring process," Bob
declared.  When writers are only concerned with getting a document to
"print right", they aren't being concerned with whether it is structured
right -- or whether the content is making sense. In the extreme case, this
leads to "Macbeth Multimedia" -- titles that are full of sound-bites and
fury but not much else.


Standard Authoring Tools and Styles

An organization that has at least standardized on a common set of tools is
in a better position to migrate to SGML than one where people use a
potpourri of word processors and DTP programs.  The format conversion
problem is simplified.  Also, it indicates that the organization has
already recognized the value of standards.

Common tool sets also get people into the habit of learning from one
another.  It creates an open environment where people support one another
-- and that has the effect of minimizing the impact of technology change.


Low Reliance on Outsiders

Companies that rely heavily on temporary authors are not investing in their
employees.  And the employees are not investing in them.  "People who
aren't part of the culture don't have the time or the reason to learn how
the organization works and figure out how they can make a better
contribution."

When a company is investing in its employees, it is easier to justify
investing in new tools and training -- key elements in adopting SGML.  It
is also easier to provide incentives that encourage the writers to focus on
structure awareness and compliance to the standards.


Mechanisms for Author Training and Technology Evaluation

Where there are explicit standards and policies regarding training,
orientation and evaluation, there is less fear of short-term transition
costs.  The company's culture is more likely to focus on and respond to
benefits and successes than to worry about the impact change will have on
the present way of doing things.

In summary, to successfully migrate to SGML:

  * Identify your customers and find out what they really need.

  * Develop a strategy that's consistent with your "capabilities culture."

  * Make effective use of the installed base of technology and process that
    you already have in place.

  * Take small steps and profit from the benefits that you garner along the
    way.

  * Identify and remedy technology and procedural deficits in your
    organization.

  * Synchronize the processes and technologies across the organization's
    boundaries.

  * Re-engineer the publications process.

  * Embody your standards in the supporting technology. As Bob put it; "The
    technology should be invisible."

The floor was then opened to questions.

In response to a question asking what you do when your company's job is
working with outside authors, Bob said that you can try to find ways to
encourage them to do things in a more structured, organized fashion.  For
example, you might give them a template and tell them that if they use it,
you can cut the publishing time in half.  Or maybe you can alter the
royalty scheme.

Since the Jeans are never successful at implementing SGML, another audience
member asked, what do you say about the WWW folks?  Bob replied that they
are mainly groups of academics doing projects for each other.  "There
wouldn't be such an explosion in HTML and Mosaic consulting services if
this stuff really worked in a production environment."

Bob then stressed that he wasn't saying the R\&D people didn't belong.  But,
he said, they are there to do a different job.  "Jeans are supposed to fool
around and find new technology.  They get rewarded for hacking a path
through the jungle.  SGML is about paving the streets and building the
MacDonalds along the side."

Bob Glushko is Chief Scientist for Passage Systems, Inc., a provider of
SGML training and consulting services, and maker of PassagePRO, document
management software for SGML-based publishing.  Bob can be reached at
"glushko@passage.com" on the internet, or at +1 415 390 0911.



Vendor Presentation

Robert Glushko, PassagePRO

After a brief intermission, Bob continued the meeting with a talk about his
company, Passage SYSTEMS, INC., and their product PassagePRO(TM).
Unfortunately, the Sun workstation on which PassagePRO was going to be
demonstrated malfunctioned and Bob was only able to show us slides of
screens.

Passage Systems provides products, services and system integration for
companies adopting electronic publishing, especially those moving to SGML.
Their main office is in Mountain View, CA, with additional offices in
Pittsburgh, PA and Austin, TX.

Passage Systems offers:

  * PassagePRO, an SGML-based document management and production system.

  * Templates, DTDs and other conversion tools to support transformation to
    SGML-based publishing in organizations using non-SGML tools.

  * Conversion of legacy documents.

  * Consultation services, including customer surveys, capability
    assessments, document analysis, process re- engineering and migration
    strategies.

  * Training.

PassagePRO is an SGML-based document management and publication production
system designed around three functional parts: the workflow management
component, the production management component and the document management
component.  PassagePRO, Bob explained, models the physical work of
producing documents in an organization by taking the complex processes of
conversion, document routing, version- control, etc.  and abstracting them
to an on-screen representation of the entire publishing cycle.  Each person
in the production cycle can focus on the jobs where they add value to the
finished product (such as researching, writing and reviewing) and let
PassagePRO handle the mechanical chores.

Defining Processes

Bob defined work flow as "tasks done by humans to satisfy a development
process."  For example, a typical publishing work flow could consist of the
tasks:

[Research]  ->  [Write]  ->  [Edit]  ->  [Review]  ->  [Release]

Each of these tasks would be assigned to one or more people who would be
responsible for completing them.

Bob defined production flow as "tasks done by computers to create needed
outputs."  For example, an organization might have three production
processes producing three different final documents:

  \<Filter WP to           \<Filter FrameMaker    \<Compose text
   Rainbow DTD>            to DocBook DTD>       to PostScript>
      |                         |                     |
  \<Transform to           \<Validate DocBook     \<Compose graphics
   DocBook DTD>            SGML>                  to PostScript>
      |                         |                     |
  \<Build book             \<Build book           \<Print book>
   for DynaText>           for OLIAS>

PassagePRO makes it possible for an organization to integrate workflow and
production flow.  Authors can preview the final result of their work at any
stage of the project.

         ... [Write]  ->  [Edit]  ->  [Review]  ... 
                |                         |
           \<Filter WP to             \<Filter WP to 
            Rainbow DTD>              Rainbow DTD>  
                |                         | 
           \<Transform to            \<Transform to   
            DocBook DTD>             DocBook DTD>  
                |                         |  
           \<Build book              \<Build book     
            for DynaText>            for DynaText>  

This has an important benefit for organizations; authors no longer simply
"throw their text over to wall" to some distant group charged with doing
the final production.  In the old model, before production flow could be
automated, writers didn't worry about how their work affected production
staff because they never got to know them.  With PassagePRO, writers can
see how their material will look in the final document at any point in the
production cycle.  Writers now have a supportive production environment
that helps them "get it right from the start."

Because of PassagePRO's client/server design and extensive programming
"hooks" for external tools, it can also be used to tie existing tools
together to create automated production flows as well as to integrate new
tools into the environment.  This is particularly helpful where
organizations already have proprietary, non-SGML writing systems in place.
Passage SYSTEMS has designed PassagePRO to enable "native SGML authoring
and conversion from traditional word processors to coexist."


An Object-Oriented Production Environment

PassagePRO is built on an object-oriented framework.  An organization's
various tools are registered with the system so that documents "know" what
tools, work flows and production flows (word processors, conversion
programs, DTDs, etc.) are associated with them at any given point in the
production cycle.  This offers a key benefit to the organization; because
documents know what to do with themselves, writers and production staff
don't need to.  Instead of spending time (and hence money) learning and
executing purely physical tasks, they can do work that adds value to the
final product.

PassagePRO includes a "Document Debugger" function to help writers learn to
think about their documents structurally.  This is important where writers
are working in word processing programs that must be converted into SGML.
PassagePRO is set up so that error messages come up in language that the
writer can understand.

The debug function typically works like this.  An author sends his word
processed document off for production by selecting the production process
off a menu.  A conversion program is then run to convert the document into
SGML and parse it.  If the file parses correctly, the document continues on
to the next step in the production flow.

If the document has structural errors, an error listing is returned to the
writer's display.  The writer then knows that some things have to be fixed
in his document.  When he selects an error off the list, the document is
opened in the proper program to the point where the error occurs.  The
writer is also shown an "advice field."  The advice in the field can be
information written when the system was first set up, or it can be advice
added by other writers who have previously made this same kind of mistake.
Thus, as writers work with documents in the production environment, they
can enter and share a growing body of knowledge that future writers can
profit from.


Document Management

PassagePRO sits on top of a database that tracks information about your
documents.  Writers can always answer questions like "who else is affected
if I change this graphic?"  The writer can also query attributes about any
objects to which he has access.  PassagePRO can show structural views of
the document providing information about who else and what else affects
this particular piece the writers is working on.

The architecture of the system is a client/server database.  Currently,
this transaction layer runs on a Versant OODB.  Passage SYSTEMS is also
porting it to Sybase and Bob said that the system was designed from the
start to ease porting to other platforms.

Clients currently run on a number of UNIX platforms and Bob said that they
are building other clients like Windows.  "The best way to get us on your
platform," Bob said; "is to hire us and then we can do the port."

Passage Systems, Inc. is located in Mountain View, California. Their phone
number is +1 415 390 0912. Bob works out of Pittsburgh. He can be reached
at +1 412 362 3356 or at "glushko@passage.com" on the internet.
</message>
<message id="<RAMAN.94Sep11164345@arctic.crl.dec.com>" date="2988305025">
Newsgroups: comp.text.sgml
Date: 11 Sep 1994 20:43:45 UT
From: TV Raman \<raman@arctic.crl.dec.com>
Organization: Digital Cambridge Research Laboratory
Message-ID: \<RAMAN.94Sep11164345@arctic.crl.dec.com>
Subject: SGML/HTML: An obfuscated markup language?

Most of the readers of this group probably read Erik Naggum's excellent
article on comp.text.sgml about the need to make SGML more compliant with
CS terminology and tools.


I agree whole heartedly with Erik, (I owe most of whatever I know about
SGML to his excellent postings on this group over the years) I started
working on the problem of recognizing structure from HTML documents after
having implemented a system that did the same for TeX/LaTeX.  I was hoping
that HTML would be easier to extract structure from.

Far from it, it's been a struggle.  What's worse, most of the SGML tools
seem to be totally incomprehensible.  Every DTD or specification document I
read is littered liberally with ISO standard numbers, (which make no sense
to me) and though I know I should not complain about surface syntax, I find
the syntactic presentation of DTD's extremely difficult to absorb.

Considering that an HTML document represents a fairly simple hierarchical
structure, why not start describing it as such?

This would make the task of writing parsers easier, and also encourage good
HTML.  Currently, the definition of valid HTML is so inaccessible even to
the practicing computer scientist, leave alone the author of a document,
that the only validation being used is "Does Mosaic display this document?"
according to some subjective measure of "correct display".

At present, I have a hard time understanding for example what kinds of
nesting are allowed by a particular DTD, when reading the HTML spec, I just
resorted to the descriptive statements for each of the elements.

I spent considerable time installing/understanding SGMLS a couple of months
ago, and after fighting hard even managed to find a DTD for HTML on the net
as well as the other files necessary to make sgmls parse simple HTML
documents.  But the whole process of running sgmls is so obfuscated, I
can't remember all the things I needed to do now, and after retrieving the
latest DTD just gave up on trying to validate my documents using it and
sgmls as a waste of time.  This state of affairs is frightening!

I may be wrong, but I somehow get the impression that the whole story
regarding SGML/HTML has been made more complicated/obfuscated than it needs
to be.

I know these are radical statements, but something needs to be done to make
SGML/HTML validation and processing more palatable, or we'll have to spend
the rest of our careers retrofitting our documents into kluges like Mosaic.

--Raman 
</message>
<message id="<350969$8n4@sundog.tiac.net>" date="2988320393">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 00:59:53 UT
From: "Keith M. Corbett" \<kmc@specialform.com>
Organization: Special Form Software
Message-ID: <350969$8n4@sundog.tiac.net>
Subject: Q on using sgmls in DOS, managing entities

I'm using sgmls under DOS.  I'm curious to learn how others have setup
their environment, in terms of command files, managing entities,
concatenating input files, etc.

Here's my scheme: I have a top level SGML directory, with a subdirectory
for each DTD.  To parse a document entity, I run a script that requires the
name of the source file (in the current directory) and the doctype.  The
script names the DTD file and file.SGM as input to sgmls.exe, and directs
errors to file.ERR, output to file.SGK.

Actually, my 1st script expected SGML declaration and DTD to be in separate
files; the HTMLPLUS DTD file has concatenated them, so now I have 2
different scripts.

DOS scripting is so primitive! Is there a better way?

I have SGML_PATH set as follows:

  D:\\SGML\\%N\\%N.DTD;D:\\SGML\\ENTITIES\\%N

This is my first stab at resolving public entity references, putting them
in one directory and adding that to SGML_PATH.  Is there a better way?
What might SGML_PATH look like if, say, all DTDs and entities were in the
same directory?

Suggestions?

 -kmc
</message>
<message id="<19940912.4980@naggum.no>" date="2988323702">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 01:55:02 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940912.4980@naggum.no>
Subject: SGML declaration and record boundaries

record boundary processing in SGML has a long history of vigorous debate in
this newsgroup.  this is not about record boundary processing.

rather, it is about the relationship of production 186 "function character
identification" in the SGML declaration with the external world, and some
ramifications of realizing that there this relationship is an illusion.

the Portable Open-ended/Object-oriented/Other-buzzword Entity Manager (or
"POEM"), my pet project for a while, takes the job of converting lines in
the host file system into the RS- and RE-bracketed records that the SGML
parser want to see, very seriously.  in doing so, it has become obvious
that the "function characters" that are listed in the function character
identification clause and the numbers they are assigned have no remaining
relationship with the actual codes in the external entities or in the codes
that are passed between the entity manager and the parser.

specifically, it is not possible to specify that only one character code
serves as line terminator in the host file system, even though this is
frequently the case.  in practice, people who work with SGML files move
them around freely between tools on various machines that use differing
conventions.  Unix systems use LF, Macintosh uses CR, some arcane systems
use the NL code from C1, yet others some other control code, VMS uses
either records or CRLF or LF, and MVS systems use length-encoded records.
yet SGML wants to see uniform records boundaries.  this is a good idea.

the situation is similar to what happens when entities are exhausted.  an
Entity end (Ee) signal was invented to take care of communicating this
situation from entity manager to parser.  some backward systems still use
end of file markers in the data, such as MS-DOS with its ^Z, but the Ee is
a signal, not data character.  how the Ee is supposed to be communicated is
left as an exercise to the implementor, and most did it right without much
trouble.

the trouble starts with the record boundary "characters", and it is that
from reading the standard they actually appear to be characters.  there is
even a special construct to talk about function characters by name, the
"character reference".  there is also a puzzling note on the distinction
between the function characters and a numeric character reference to the
same "character", which is an ordinary data character.  this distinction
has meaning for the record boundary characters, but not for any other
"function characters", because they are not special to the interaction
between the parser and the application program.

clearly, the record boundary "characters" are not characters at all.  they
are signals the same way the Entity end is a signal.

this means that the SGML declaration tries to say something about the
external world, but the standard contradicts this in its description of the
transformation into record boundaries.  the resolution is obvious: the SGML
declaration talks about what the parser will see from the _entity manager_,
and the codes are redundant and immaterial.  this has some interesting and
some disturbing ramifications.

the SGML parser is intentionally layered on top of an entity manager that
takes care of the dirty work of dealing with the file system, or possibly
some other source of entity text.  some simple-minded implementations have
made the entity manager largely a "null layer" that just pushes characters
it reads from "files".  HyTime obsoletes this view in that it requires some
fairly complicated services from the entity manager, such as extracting
substrings from an entity.  the application program is also expected to
talk with the entity manager about objects that the SGML document
references by name.  presumably, notation processors will also want to talk
to the external world through the entity manager.

further analysis of this separation shows that the SGML parser is not even
allowed to make any assumptions about the real world from what it learns
from the entity manager.  the standard doesn't say it outright, but must
have viewed the entity manager as a slave that does the parser's bidding.
if so, it should have said it much, much stronger.  as it stands now, the
entity manager has the freedom to provide the SGML parser with whatever it
wants.  the only responsibility that might tame it is that users won't be
very happy if the entity manager does surprising or useless things.  it
can, and should, however, do some useful things, and it can do more than
just translate lines into records.

exploiting this vague relationship for all it's worth, what follows is a
shameless plug for some of the design ideas in POEM.

POEM allows several "storage systems", such as files as the trivial one,
output from programs and network resources as more advanced ones, and
filter functions such as fragments and concatenations of the output from
other storage systems, line terminator handling, of course, and as many
others as the application desires.  since all of these can be viewed as
functions that operate on an argument list and return a stream from which
characters can be read, it comes with its own language to build such
functions.  this internal language is probably not suited to write system
identifiers in as they appear in external identifiers in entity
declarations, so this led to another (not new) realization: that the system
identifier is a string of characters that only has meaning to the entity
manager, and only if the entity manager chooses does it have a connection
to the external world, possibly through any number of transformations.

POEM provides the mechanisms for the application program to translate any
character string into the internal functional language, such that we're no
longer talking about just a "file name", but a string that when interpreted
turns into a function that takes a string as its argument and returns a
stream that will read the file that the string names, in the trivial case,
but with unlimited potential to do more useful stuff.  we note that all
file systems deal with various handles on their files that have little or
nothing to do with the actual filenames or pathnames that users see, and
that deriving these handles involves much parsing and table lookups.  POEM
brings the same kind of abstraction to SGML, only now with user-defined
parsing functions, and user-defined mapping functions and filters.

this can be used to do many things.  suppose you would like to store all
your SGML documents as the bare document instance, and prefer that the
system find the attendant SGML and document type declarations for you.  you
_could_ make a program that would read the first few characters of an
instance, and produce a complicated specification for the entire document,
but you would rather write a POEM function that would look up the name of
the document type it finds after reading the first few characters of the
entity in a table to find the SGML declaration and the document type
declaration, arrange to read those files, and resume reading the instance
afterwards.

suppose you would like to edit SGML documents (or parts thereof) on a
variety of systems with varying character sets and other conventions.  the
SGML declaration refers to a single character set that applies to all
entities.  this is clearly not useful, but if the entity manager could
arrange for the character sets to be translated into that natural on the
host system where you run the SGML parser, including an SGML declaration
that fits this character set, you could have truly transparent access to
entities across any of the systems you work on.  this is a true application
of the principle of system independence.

another natural function for POEM is to handle the lookups for any public
identifiers and convert them to system identifiers.  this requires a little
more cooperation from the SGML parser, however, so I'll discuss it sometime
later.

more readily applicable is the possibility of shipping entire SGML
documents (remember that an SGML document is a list of entities) as a
single file, and let the entity manager sort out which is which and where
in the file each starts and ends.  the substring facility can be used to do
this, and we would suddenly avoid the hassle of using SDIF or other
unimplemented packaging schemes.

yet, what do you do if you don't have a parser that can talk to POEM?  I've
thought of that.  the SGML parser is now only a user of the services of the
entity manager, and it reads a character stream.  another user could read
the same character stream and produce individual files that can be read by
an SGML parser at a later, separate stage.  this other user could then
change the system and public identifiers to point to the newly created
files instead of the original specifications, thereby subsuming part of the
functionality that was so insufficiently described in SDIF that nobody has
implemented it.

do I think this will revolutionize entity management?  you bet.  do I think
this will largely obsolete the SGML declaration?  yup, that too, since it
does not talk about the real document, only the document as seen through
the filter called the entity manager.  the major, perhaps more disturbing,
ramification of this is, however, that the SGML declaration pretends to say
something about the real world, but cannot possibly do that in practice, so
must be revised to include information that is only relevant to the
document per se, and exclude everything else.

another possibility opening up to us is that we can start experimenting
with a new SGML right away, just by making suitable filter functions
between the files and the parser.  then, when the revised SGML comes out,
we can gradually accept the new SGML directly by the parsers.  this also
goes for HyTime's obscure syntax for element classes.  wrap the parser in a
front-end and a back-end like this, and you can design a whole new
interface that won't know it's SGML unless you ask real hard questions.
call it embedded SGML.

_this_ is the future of SGML (at least as one computer scientist sees it).

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<ogawa.1129730907A@news.teleport.com>" date="2988332067">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 04:14:27 UT
From: Arthur Ogawa \<ogawa@teleport.com>
Organization: TeX Consultants
Message-ID: \<ogawa.1129730907A@news.teleport.com>
Subject: Major SGML bibliography update

The following will be of interest to subscribers to c.t.s:

Date: Sat, 10 Sep 94 14:06:50 MDT
From: "Nelson H. F. Beebe" \<beebe@math.utah.edu>
To: "Andrew Dobrowolski" \<aed@arbortext.com>,
    "Arthur Ogawa" \<ogawa@teleport.com>,
    "Barbara N Beeton" \<bnb@math.ams.org>,
    "Betsy Dale" \<BJD@arbortext.com>,
    "Christina Thiele" \<cthiele@ccs.carleton.ca>,
    "David F Brailsford" \<dfb@Cs.Nott.AC.UK>,
    "David R Evans" \<dre@Cs.Nott.AC.UK>,
    "Edward A. Fox" \<fox@vt.edu>,
    "Eric van Herwijnen" \<ERIC@CERNVM.BITNET>,
    "George D. Greenwade" \<bed_gdg@SHSU.edu>,
    "Jeffrey McArthur" \<j_mcarthur@BIX.com>,
    "Joachim Schrod" \<schrod@iti.informatik.th-darmstadt.de>,
    "Michel Goossens" \<GOOSSENS@crnvma.cern.ch>,
    "Nico Poppelier" \<N.POPPELIER@elsevier.nl>,
    "Terry Allen" \<terry@ora.com>,
    "Robert W. McGaffey" \<mcgaffeyrw@ornl.gov>
Cc: beebe@math.utah.edu
Subject: Major SGML bibliography update

I have just installed on ftp.math.utah.edu in the directory /pub/tex/bib
these files:

-rw-rw-r--   1 beebe    staff     372193 Sep 10 13:34 sgml.bib
-rw-rw-r--   1 beebe    staff       2368 Sep 10 13:34 sgml.ltx
-rw-rw-r--   1 beebe    staff      10746 Sep 10 13:37 sgml.sok
-rw-rw-r--   1 beebe    staff      39909 Sep 10 13:37 sgml.twx

The index file in that directory describes the collection in more detail.
These files are mirrored to other Internet archive sites, including the
Comprehensive TeX Archive Network (CTAN) hosts; the command "finger
ctan@pip.shsu.edu" will provide a list of them.

These files are also accessible via an e-mail server: mail to
\<tuglib@math.utah.edu> with the text

        help
        send index from tex/bib

will get you started.

The file sgml.bib is a BibTeX bibliography of publications about SGML.
There are 204 entries in the current version, of which 197 have been added
or changed since my return from the TeX User Group TUG'94 meeting in Santa
Barbara, CA, last month.

The updates have been collected from numerous sources, including major
Internet bibliography collections described briefly at the end of the index
file, and the ACM Computing Archive and IEEE INSPEC CD ROMs.

Feel free to pass on this information to others in the SGML community, and
if you are aware of other bibliographies on SGML, or sources thereof,
please let me know so that such activities can be coordinated.

-- 
Nelson H. F. Beebe                      Tel: +1 801 581 5254
Center for Scientific Computing         FAX: +1 801 581 4148
Department of Mathematics, 105 JWB      Internet: beebe@math.utah.edu
University of Utah
Salt Lake City, UT 84112, USA

-- 
Arthur Ogawa, TeX Consultants, Kaweah CA 93237-0051
Ph: +1 209 561 4585, FAX: -4584
PGP Key: finger -l ogawa@teleport.com
</message>
<message id="<351a33$t9c@rs18.hrz.th-darmstadt.de>" date="2988354083">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 10:21:23 UT
From: Joachim Schrod \<schrod@iti.informatik.th-darmstadt.de>
Organization: TH Darmstadt, FG Systemprogrammierung
Message-ID: <351a33$t9c@rs18.hrz.th-darmstadt.de>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca>
Subject: Re: SGML and its enemies

[Tim Bray]

|   This doesn't mean that SGML's design is the one for the future.  I've
|   never understood why, as Erik wonders, it has to belong to the
|   difficult LL(1) class of grammars,

????  If it would really be LL(1) it would be great, there are a lot of
tools to parse LL(1).  In fact, it's easier than LALR(1).  Even if it would
be LL(k), for an arbitrary but fixed k, it would be simple: PCCTS is a
widely used compiler-construction toolkit, available for Unix and other
operating systems.

|   or why the language and metalanguage have to be different, 

Interesting.  I would love to have it even more different.  I.e., I would
like to give an abstract syntax description and then a concrete syntax
description, just like it's done in `normal' definitions of programming
languages.  (And I'm using here the computer science terms, not the SGML
ones.)  And then one could attach semantic functions more easily.

|   or why it has to look so much like OS JCL,

:-)  DROP RS!

|   why minimization mechanisms have to exist,

Because SGML systems are too expensive?

|   PS: Reactions to the DSSSL draft from those in the know would be really
|       welcome in this group.

Another voice for this wish.

	Joachim

-- 
Joachim Schrod			Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany
</message>
<message id="<Cw0sFw.ExF@undergrad.math.uwaterloo.ca>" date="2988367484">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 14:04:44 UT
From: Warren Baird \<wjbaird@undergrad.math.uwaterloo.ca>
Organization: University of Waterloo
Message-ID: \<Cw0sFw.ExF@undergrad.math.uwaterloo.ca>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <19940910.4934@naggum.no>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   the crux of my point -- that if we are planning for information
|   longevity with SGML, it'd better _be_ there 10, 20, 50 years down the
|   road, and be deemed worth implemented by people who need it.

I don't think this is true.  Going from format-based markup to
structure-based markup is always going to be expensive.  However if a
document is marked up structurally, converting to another structure-based
markup language should be quite cheap, and probably automatic.

I'm sure that there will be _a_ standard structural markup language in the
future, and even if it isn't SGML, we'll be able to convert SGML documents
to the new standard fairly easily (assuming that you're using SGML in a
structured fashion, of course).

Warren
</message>
<message id="<Cw0t3o.JDD@walter.bellcore.com>" date="2988368339">
Newsgroups: comp.infosystems.www,comp.text.sgml
Date: 12 Sep 1994 14:18:59 UT
From: Wayne Scott \<wws@cc.bellcore.com>
Organization: Bellcore
Message-ID: \<Cw0t3o.JDD@walter.bellcore.com>
References: \<dszuc.51.2E6526EF@trl.oz.au> <1994Sep1.175049.12216@exoterica.com>
Subject: Re: SGML

The September issue of IEEE Spectrum had a good article on SGML.

Wayne
-- 
	wws@cc.bellcore.com
I'm just a soul whose intentions are good,
Oh Lord, please don't let me be misunderstood.
</message>
<message id="<Cw0wnK.M0B@undergrad.math.uwaterloo.ca>" date="2988372944">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 15:35:44 UT
From: Warren Baird \<wjbaird@undergrad.math.uwaterloo.ca>
Organization: University of Waterloo
Message-ID: \<Cw0wnK.M0B@undergrad.math.uwaterloo.ca>
Subject: Specifying NAMELEN in DTD

I'm creating some DTDs, and I'd like to use element names longer than 8
characters (I'm using sgmls as my parser).  Is there someway that I can
specify NAMELEN in the DTD itself, or does it have to be specified in the
SGML declaration for each document instance based on the DTD?

If it can be done in the DTD, could someone send me a DTD fragment showing
how it's done?  Likewise if it has to be done in the SGML declaration,
could someone send me a sample SGML declaration that does the trick?  We
don't have an SGML book yet, and trying to modify the SGML declaration in
the sgmls documentation has gotten me exactly nowhere.

Thanks,

Warren
</message>
<message id="<ogawa.1129771980H@news.teleport.com>" date="2988373140">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 15:39:00 UT
From: Arthur Ogawa \<ogawa@teleport.com>
Organization: TeX Consultants
Message-ID: \<ogawa.1129771980H@news.teleport.com>
References: <19940909.4927@naggum.no>
Subject: Re: SGML and its enemies

I felt intrigued by Erik's post, but wish I had better clarification of some
points.

[Erik Naggum]

|   there is something gravely at fault with a language and/or its tools
|   that make you work so hard to accomplish your tasks.  it should have
|   been a lot simpler.

In what sense does SGML or its tools make one work overly hard to
accomplish what tasks?  I have some idea of what Erik might have in mind,
but I'd like to hear more concretely.

|   why is the formatting software available with SGML so crummy?

I got my start with SGML creating formatting systems for SGML document
suites, so I guess that puts me in the category of "SGML for publishing".
But I think Erik is referring to shrinkwrapped formatting systems here,
like ArborText SGML Publisher and others.  My general impression is that
the commercial formatters available with SGML are no more "crummy" than
commercial formatters *not* directly associated with SGML (like FrameMaker,
Quark XPress and the like).

|   is this the reason that we only find special-purpose tools that focus
|   on the publishing aspect of SGML and very little general-purpose
|   software that reads and writes SGML as its favorite input and output
|   data formats?

In my work, I and my clients create and maintain SGML documents via tools
like SoftQuad Author/Editor.  In what sense does A/E (as an example) fall
short of general-purpose software reading and writing SGML?  We rely
heavily on the ability of A/E to create an SGML document in minimal form
(i.e., no omitted tags, no shortrefs, etc).  This, for us, makes A/E and
like systems useful primarily because of the ability to output "good" SGML.

|   ...YACC accepts specifications for what is known as LALR grammars, a
|   subset of LR(1) ...now, SGML is specified so that it cannot be
|   described or used with LEX or YACC...

And then another contributor mentioned that SGML was an LL(1) language.
Would someone like to explain for (my sake) what their terminology means?
I'm refering to LALR, LR, and LL grammar.  I wish I understood how
essential is the need for SGML to fall in the more-difficult-to-parse LL
class.

|   YACC is a true metalanguage, a language only used to describe
|   languages; when the program is finished, nobody sees the YACC
|   specification.  SGML is sometimes referred to as a metalanguage, but
|   there are serious flaws in this categorization: users see SGML, not
|   just the programmers; moreover, the programmers can't use the DTD to
|   produce code.  whole applications are defined in and using SGML as its
|   frame of reference.  but it still is not completely misguided to refer
|   to SGML as a metalanguage, because the application will have to deal
|   with the elements similarly to what the grammar productions that YACC
|   deals with.  furthermore, it is irrelevant which _form_ the input data
|   had by the time the application wants to deal with it.  thus, SGML is
|   an external representation of something.  but what?

I've been trying to pick apart this paragraph for awhile without success.
"Nobody sees the YACC specification": does this refer to the specification
of the YACC metalanguage, or the YACC-language specification of the target
language?  Under SGML, users see the markup: yes, but not inevitably (A/E
like other front-ends can hide the tags).  I know Erik has a point here,
but I am not even getting to first base....

Several points Erik makes I not only understand but heartily accept.  One
was that SGML parser/validators are rarer and more expensive than they
otherwise would be (due to the basic difficulty of implementing a parser
and validator for SGML), and the sad consequences that flowed from this.
Another was the advisability of participation in WG8, although I don't
understand how to go about beginning to do this.

But other of Erik's criticisms of SGML simply don't arise in my work with
commercial publishers -- constructs that he deprecates, we don't use, for
instance.  So we remain blithely unaware (although this does not imply lack
of sympathy) of the problems that he is struggling with.  I accept that
there are lots of parts of the SGML standard that make it difficult to
create tools for its use.  I tend to see those parts as worthy of
avoidance.  I don't use them, and wouldn't be sad to see them disappear.

Perhaps at some future date, a capable system with less archaic baggage
will come along and displace SGML.  Others in this group have hinted at
this as well.  Frankly, a system that would provide for elements,
attributes, and content model, along with a vehicle (like entities) for
representing objects (like pi characters) that cannot be typed at the
keyboard, would pretty well satisfy the requirements of most projects I
have worked on.  Specialities of the language that save typing are of no
interest.  Am I missing something in my short list?

But I'll state for the record that I see the lack of support for document
structure and meaning in competing systems to exclude them from
consideration in the work I am doing.  I've been using descriptive markup
since before SGML, after all.

-- 
Arthur Ogawa, TeX Consultants, Kaweah CA 93237-0051
Ph: +1 209 561 4585, FAX: -4584
PGP Key: finger -l ogawa@teleport.com
</message>
<message id="<1994Sep12.155128.4112@sqwest.wimsey.bc.ca>" date="2988373888">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 15:51:28 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep12.155128.4112@sqwest.wimsey.bc.ca>
References: \<kimber.71.000C0D7A@passage.com> <19940820.4480@naggum.no> \<kimber.76.0010D808@passage.com>
Subject: Re: CONCUR usefulness existence proof

[W. Eliot Kimber]

|   HyTime doesn't do validation of document type declarations *because
|   it's incomputable in some cases*.

People keep saying this to me.  Can someone show me a small incomputable
example?  Please?

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<1994Sep12.155413.6906@ast.saic.com>" date="2988374053">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 15:54:13 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.155413.6906@ast.saic.com>
References: <34sn5m$gel@deep.rsoft.bc.ca>
Subject: Re: Why have DTD's?

[Tim Bray]

|   At the HyTime workshop in Vancouver, Wayne Wohler presented a report on
|   a project he'd done to produce a DTD for DTD's.  I can't stop thinking
|   about this.  Suppose I had one of these, either one that captured the
|   DTD syntax directly, or one that used reference concrete and then had a
|   perfect 2-way filter.  I forget which way Wayne was going, but he said
|   he thought he'd captured the whole standard.
|   
|   Then I could edit DTD's with an SGML editor.  I'd only have one
|   language class to deal with.  It would be easier for people to learn
|   SGML.  The more I think about this, the more it seems like a good idea.
|   
|   Am I missing something obvious?  Anyone else doing this?

Yes.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep12.155719.7590@ast.saic.com>" date="2988374239">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 15:57:19 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.155719.7590@ast.saic.com>
References: \<Cw0wnK.M0B@undergrad.math.uwaterloo.ca>
Subject: Re: Specifying NAMELEN in DTD

[Warren Baird]

|   I'm creating some DTDs, and I'd like to use element names longer than 8
|   characters (I'm using sgmls as my parser).  Is there someway that I can
|   specify NAMELEN in the DTD itself, or does it have to be specified in
|   the SGML declaration for each document instance based on the DTD?

Has to be in the SGML declaration.

|   If it can be done in the DTD, could someone send me a DTD fragment
|   showing how it's done?  Likewise if it has to be done in the SGML
|   declaration, could someone send me a sample SGML declaration that does
|   the trick?  We don't have an SGML book yet, and trying to modify the
|   SGML declaration in the sgmls documentation has gotten me exactly
|   nowhere.

See my Friday post of the FAQ DTD.  It has a SGML declaration attached.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep12.161338.10280@ast.saic.com>" date="2988375218">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 16:13:38 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.161338.10280@ast.saic.com>
References: <1994Sep12.155413.6906@ast.saic.com>
Subject: Re: Why have DTD's?

[Tim Bray]

|   At the HyTime workshop in Vancouver, Wayne Wohler presented a report on
|   a project he'd done to produce a DTD for DTD's.  I can't stop thinking
|   about this.  Suppose I had one of these, either one that captured the
|   DTD syntax directly, or one that used reference concrete and then had a
|   perfect 2-way filter.  I forget which way Wayne was going, but he said
|   he thought he'd captured the whole standard.
|   
|   Then I could edit DTD's with an SGML editor.  I'd only have one
|   language class to deal with.  It would be easier for people to learn
|   SGML.  The more I think about this, the more it seems like a good idea.
|   
|   Am I missing something obvious?  Anyone else doing this?

[Bob Agnew]

|   Yes.

OK -- sorry for the terse (smarta..) reply.  It is quite easy to write a
parser for DTDs since they are SGML documents and satisfy the syntax of ISO
8879.  When you do this, you are parsing against the standard's syntax and
showing that the DTD complies with that syntax.  When you parse a document
against a DTD (just incidentally written in SGML) you are certifying that
the document uses only the tags and entities defined in the DTD are used in
the document and that they are only used in the context specified in the
DTD.

Consider the following SGML document instance fragment

\<para>The boy stood on the burning dock,\<newline>
\<indent in=5em>his feet were full of blisters;\<dataitem reftype=1496>
\<newline>he tore his pants on a rusty nail,\<newline>\</dataitem>
\<indent in=5em>and now he wears his sister's.\<newline>\</para>

1) What does the \<dataitem> tag do?
2) Is the newline allowed right before the paragraph end tag?
3) Could \<para> stand for paranormal instead of paragraph?

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<9409121728.AA26391@source.asset.com>" date="2988379728">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 17:28:48 UT
From: "Claude L. Bullard" \<bullardc@source.asset.com>
Message-ID: <9409121728.AA26391@source.asset.com>
Subject: Article Mentions HyTime/SGML

While we are debating the technical merits of SGML, HyTime, and the
absolute meaning of esoteric terms such as Abstract Data Type, some of the
readers of this board may be interested in an article published in the
September '94 issue of Communications of the ACM.  Entitled, "The New
Media", and written by hypertext veterans Roy Rada and George S. Carson,
the article is a succinct discussion of the "why" of HyTime/SGML in
hypermedia if not the "how".  As has been pointed out by others, often as
not, the problems of getting *agreement among parties with mutual
interests* is determining when these mutual interests will be settled to
whose advantage.

...standards, as in music, are always bound in time, and the timing of the
thing, has both a foreground and a background timing.  Quoting Miyamoto
Musashi loosely, "determining the background and foreground timing is
essential to victory over an opponent."

I'm not convinced that SGML has "enemies", but it certainly has
competitors.  To be an enemy of a markup language, is to be a fool for no
one of good sense is inimical of a hammer.  One may find that many objects
are not "nails".  To be a competitor, is to be aware that in the market,
one can only sell what is desirable or necessary.  A carpenter always needs
a hammer; a cabinet, however, can always be glued.  The nail is not always
desired by the cabinet, nor necessary to the carpenter.  But when something
needs to be "nailed", there's nothing so desirable as a good hammer.

If the hypermedia industry is to ever become as profitable as the music
industry, its products must be as common and as standard as cabinets.  Only
time will tell if we are to use glue or nails, but we must use something if
we are to compete, and if we intend to have a business in five years, we
must have common choices.

For now and until somebody comes up with a better solution, SGML and HyTime
are the choices available.  Where they can be improved, do it.  But the
time to use them is now.

Len Bullard
</message>
<message id="<1994Sep12.175005.20847@ast.saic.com>" date="2988381005">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 17:50:05 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.175005.20847@ast.saic.com>
References: <1994Sep12.155128.4112@sqwest.wimsey.bc.ca>
Subject: Re: CONCUR usefulness existence proof

[W. Eliot Kimber]

|   HyTime doesn't do validation of document type declarations *because
|   it's incomputable in some cases*.

[Marcy Thompson]

|   People keep saying this to me.  Can someone show me a small
|   incomputable example?  Please?

I wonder what Godel would make of this?  Anyone have Kurt's e-mail address
;-)

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<3524vs$re6@news.manassas.ibm.com>" date="2988381628">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 18:00:28 UT
From: J. VanHorne \<jvanhorne@vnet.ibm.com>
Organization: LORAL Federal Systems - Owego
Message-ID: <3524vs$re6@news.manassas.ibm.com>
Keywords: tags, math
Subject: tag names and math syntax

Two questions:

1.  From looking at the tag names associated with various flavors of SGML,
    it seems to me that the flexibility of SGML is a disadvantage in this
    area because every DTD can name the paragraph tags in a different way.
    This presents difficulties for those of us trying to hand-code
    documents in ASCII form.  The "intelligent SGML editors" tend to be
    very pricey and only work on graphical terminals.  Things were bad
    enough with troff, LaTeX, and script having different names for
    paragraphs, ordered lists, etc.  Now it looks like the different
    varieties of SGML text markup (CALS, etc) are as diverse as their
    tag-language-formatter predecessors.  When SGML standards are invented,
    WHY can't the standard give a NAME to the paragraph and emphasis tags
    -- it would make working with SGML a _lot_ less confusing -- after all,
    there are types of paragraph tags that we all use, such as body
    paragraphs, numbered lists, ordered lists, etc.  The DTD could just set
    up the ATTRIBUTES of these paragraphs, such as numbering style,
    circumstances of use, etc.

    My background involves publishing with LaTEX, IBM Bookmaster/Script,
    troff, Framemaker, Interleaf, WordPerfect, and Word.  In some ways, tag
    languages such as Bookmaster and LaTEX are superior to the WYSIWYG
    tools in that you can have unlimited levels of imbed files -- in
    BookMaster's case, every row of a table can be a different file.  So
    config management is much simpler and cheaper (it can be implemented
    using RCS or SCCS) for the tag languages than it is for the other
    tools.  This is why my area is interested in SGML- it will let us keep
    our present config management methods, and let people work with their
    present ascii text terminals instead of having to purchase X-stations
    or high-powered PCs with hi-res graphics.  It really seems to me that
    SGML is much more complicated than it has to be for most publishing
    tasks.  Of course, I realize that the formatting spec (FOSI) is another
    matter entirely, and I am not speaking of the FOSI here - just the DTD.

2.  Is there any standard or spec for SGML formatting of mathematical text?
    As in question 1, will the formatting tags be NAMED?  If so, where can
    I obtain a copy of this standard/spec?

Sorry if question 1 seems to have a whining tone, but I truly believe this
situation is holding back acceptance of SGML on a wider scale.  SGML seems
like a GREAT IDEA......until you work with some of the tools and practical
implementations of it.  There _is_ such a thing as _too_ much flexibility!
Also, speaking as a programmer, this flexibility makes it that much harder
to create publishing tools for working with SGML.

J. VanHorne
</message>
<message id="<1994Sep12.182537.26192@ast.saic.com>" date="2988383137">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 18:25:37 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.182537.26192@ast.saic.com>
Keywords: AAP ISO 12083
Subject: AAP DTDs and ISO-12083

There have been quite a few requests for the AAP dtds on this group
recently.  I had replied with a copy of the AAP public entity which came
with my Arbortext distribution.  Apparently, this is not the latest
revision of the AAP DTD which eventually became an ANSI standard, ANSI
Z39.59-1988.  These DTDs have now been revised and extended to become an
ISO standard ISO-12083:1993.  Dr. Paul Grosso of Arbortext Inc. who chaired
the ISO committee which revised the math and table parts of the dtds,
kindly provided us with electronic copies of these dtds.  I have placed
these on actd.saic.com on directory pub/SGML/ISO-12083 where they are
available via anonymous FTP.  There are four DTDs

	book.dtd
	article.dtd
	maths.dtd
        serial.dtd

I have also placed the older version of the AAP dtd on pub/SGML/AAP.  I
will place the latest revision there when I can get it.  Dr. Grosso has
also kindly provided some ordering information for the various standards
which I have reproduced herein:

The AAP standards are held by the Electronic Publishing Special Interest
Group (EPSIG) that had been associated with OCLC, but has since been taken
over jointly by GCARI (Graphics Communication Association) and McAfee and
McAdam.  The contact addresses I have are:

Ms. Robin Canami		Michelle Wulffaerat
McAfee and McAdam		GCARI
PO Nox 328			PO Box 2888
Bel Air, MD  21014-0328		Alexandria, VA  22314-2888
USA				USA
+1 410 893 1340			+1 703 519 8184

Either contact should eventually be able to help you order the AAP
Electonic Manuscript Markup documents (the "old" AAP stuff) and the ISO
12083 standard.  These organizations also give tutorials on the ISO 12083
standard.

Furthermore, you should be able to order a copy of ISO 12083 from ANSI:

American National Standards Institute
11 West 42nd Street, 13th Floor
New York, NY  10036
USA

Also, one can order ANSI NISO standards via:

NISO Press
PO Box 338
Oxon Hill, MD  20750-0338
USA
+1 800 282-NISO (6476)
or +1 301 567-9522

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep12.183416.27650@ast.saic.com>" date="2988383656">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 18:34:16 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.183416.27650@ast.saic.com>
References: <9409121728.AA26391@source.asset.com>
Subject: Re: Article Mentions HyTime/SGML

[Claude L. Bullard]

|   While we are debating the technical merits of SGML, HyTime, and the
|   absolute meaning of esoteric terms such as Abstract Data Type, some of
|   the readers of this board may be interested in an article published in
|   the September '94 issue of Communications of the ACM.  Entitled, "The
|   New Media", and written by hypertext veterans Roy Rada and George
|   S. Carson, the article is a succinct discussion of the "why" of
|   HyTime/SGML in hypermedia if not the "how".  As has been pointed out by
|   others, often as not, the problems of getting *agreement among parties
|   with mutual interests* is determining when these mutual interests will
|   be settled to whose advantage.
|   
|   ....standards, as in music, are always bound in time, and the timing of
|   the thing, has both a foreground and a background timing.  Quoting
|   Miyamoto Musashi loosely, "determining the background and foreground
|   timing is essential to victory over an opponent."

Very aptly put Len.  I am also reminded of something I read in "Management
Secrets of Attila the Hun" which went something like "always choose your
battles wisely.." i.e., "never start a war you don't think you can win."
Somehow this stayed with me.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<46067.loeffen@ruulet.let.ruu.nl>" date="2988384464">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 18:47:44 UT
From: Arjan Loeffen \<Arjan.Loeffen@let.ruu.nl>
Message-ID: <46067.loeffen@ruulet.let.ruu.nl>
Subject: CONCUR / marked sections

Dear reader,

In concurrent markup of a document (concurrent document type definitions)
it is normally assumed that all data characters are represented as data
content of all concurrent elements -- each document however has a different
'interpretation' of the data, expressed by the element that covers the
character sequence.

On the other hand, some data may be local to a single view.  For instance,
if a play is encoded both conform a drama DTD and a linguistic DTD, some
annotation inserted for a scene should not be reflected in the linguistic
encoding of the play.

If not using external references using a query language such as HyQ or the
extended pointer syntax of the TEI (that do not address concurrency
issues), I understand that such disalignments may still be recorded using
marked sections.  However, the technique is yet unclear to me.  Can anyone
give me a hint here?  A good example may do wonders.

Thanks in advance,
Arjan.
</message>
<message id="<1994Sep12.185857.6904@sqwest.wimsey.bc.ca>" date="2988385137">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 18:58:57 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep12.185857.6904@sqwest.wimsey.bc.ca>
References: <33fi64$grh@sernews.raleigh.ibm.com> <347sdh$5is@news1.digex.net>
Subject: Re: Request for SGML 94 info

Excerpts from the Program for SGML '94

|   Sunday, November 6
|
|   The Just Enough Tutorial Series
|   Marcy Thompson, Manager of Education and Training, SoftQuad 
|   Inc., Tutorial Coordinator
|   
|   9:00 am-12:00 noon
|   Just Enough Concepts
|      Introduction to SGML with no prerequisites. What is SGML? 
|      Who uses it? How do they use it? How does it work?
|   
|   Just Enough Syntax
|      Introduction to SGML with no prerequisites. Basic overview 
|      of SGML followed by a survey of SGML markup.
|   
|   1:00 pm-4:00 pm
|   Just Enough Syntax and Just Enough Concepts (continued)

Just a clarification: These tutorials (Just Enough Syntax and Just Enough
Concepts) are not *continued* after lunch, they are repeated after lunch.
Thus, a beginner could take one in the morning and the other in the
afternoon and end up with an overview of both concepts and syntax.

The other three afternoon tutorials are intended for people with at least a
passing acquaintance with SGML.  There is no expertise or experience
required, though.  The idea is that if you took either of the morning
tutorials, you should be prepared for any of the special topic afternoon
sessions.

|   Just Enough Databases
|      How does SGML mesh with document databases? Discussion of 
|      full text, relational and object-oriented approaches.
|   
|   Just Enough Electronic Delivery
|      An overview of methods of delivering SGML documents 
|      electronically.
|   
|   Just Enough Paper Publishing
|      What must you do to an SGML document to turn it into a 
|      printed document? 

Marcy

-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<masonjd.94.2E74AAB1@ornl.gov>" date="2988386609">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 19:23:29 UT
From: James David Mason \<masonjd@ornl.gov>
Organization: Convenor, ISO/IEC JTC1/SC18/WG8
Message-ID: \<masonjd.94.2E74AAB1@ornl.gov>
References: <19940909.4927@naggum.no>
Subject: Re: SGML and its enemies

In article <19940909.4927@naggum.no> Erik Naggum \<erik@naggum.no> made a
plea for more participation in the work of ISO/IEC JTC1/SC18/WG8, the group
responsible for the creation, maintenance, and revision of SGML.  Some of
the follow-on articles have asked how this can be done.

Without getting involved in any of the philosophical/religious issues
related to the revision of SGML (my position requires that I stay somewhat
neutral to them), I heartily second Erik's plea for participation.  We need
more than just the few hardcore faithful who have stuck with the project
for years on end.  It would be good to have more participants from both the
computer-science community and the user community working on the project.

(In case some of the computer-science types wonder why SGML has such a
convoluted grammar, a good bit of it is because almost all of the original
creators were end users, mostly from the technical-publishing community,
who were thinking first of all how to solve a batch of practical problems
-- hence all the "neat tricks" and simplifications like markup minimization
-- and not language designers.)

Active participation in WG8 comes through participation in national
standards bodies (e.g., ANSI, BSI, DIN, AFNOR, TSCJ, NNI) that certify
participants to the working group.  Liaison bodies like the SGML Users'
Group can also send participants, but their opinions do not have the weight
of those offered by the people sent by national standards organizations.
So the place to start is at the grass roots, with your national standards
group.  If anyone needs help on finding the appropriate standards body,
please contact me, and I'll try to help.

(In the U.S., there will be a meeting of the national committee for SC18,
ANSI X3V1, next week in Boulder, Colorado.  For information, contact the
chairman, Mr. Rudolf M. Riess, Digital Equipment Corp., 30 Porter Road /
LJO2-C12, Littleton, MA 01460, Telephone: +1 508 486 2019, Facsimile: +1
508 486 2013, Internet: riess@jokur.enet.dec.com.)

-- 
Dr. James D. Mason
(WG8 Convenor)
Oak Ridge National Laboratory
Information Management Services
Bldg. 2506, M.S. 6302, P.O. Box 2008
Oak Ridge, TN  37831-6302   U.S.A.
Telephone: +1 615 574-6973
Facsimile: +1 615 574-6983
Network: masonjd @ ornl.gov
</message>
<message id="<940912192434_71140.2011_HHE92-1@CompuServe.COM>" date="2988386674">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 19:24:34 UT
From: Paul Hermans <71140.2011@compuserve.com>
Message-ID: <940912192434_71140.2011_HHE92-1@CompuServe.COM>
Subject: ANNOUNCEMENT SGML BeLux '94

                           "EXPERIENCE THE POWER OF SGML" 

                First annual Conference on the practical use of SGML


Arenbergkasteel, Heverlee (Leuven), Belgium, October 12th

A one day conference with 11 SGML-professionals willing to share their
know-how and experience with you!

With Microsoft's and WordPerfect's opinions on the future of documents and
the role of SGML.

-----------------------------------------------------------------------------
the Seminar Programme

08h30-09h00 Registration and coffee

09h00-09h10 Welcome
Jan Engelen (KULeuven-Caps) and Paul Hermans (SGML BeLux)

09h10-09h55 Case : The adoption and implementation of SGML within Shell
A business case of adopting SGML as a key component in a document
management strategy.

09h55-10h35 Object-Oriented Information and DTD design
Steffen R. Frederikson (Information Mapping Europe) and Paul Hermans (Pro
Text) A practical introduction to the method of Information Mapping and the
translation of this methodology into a SGML DTD.

10h35-11h05 Coffee and visit to exhibition

11h05-11h35 A typesetters tale on SGML
Lawrie Stevens and Steven Van de Bergh (Fotek)
A case history of their day-to-day experience with SGML: examination of
tools and software, the problems encountered, the benefits, the client
requirements, the costs,...

11h35-12h20 Document processing based on architectural forms, with ICADD as
an example.
Klaus Harbo (Exoterica) and Bart Bauwens (Caps)
The use of architectural forms for DTD to DTD conversion.

12h30-14h00 Lunch and visit to exhibition

14h00-14h45 SGML conversion issues: in search for a methodology
Francois Chahuneau (AIS)
An attempt to classify available methodological approaches as well as
several case studies in both down-translation and up-translation.

14h45-15h15 Coffee and visit to exhibition

15h15-16h00 The future of documents according to WordPerfect
WordPerfect's opinions on issues like: OpenDoc versus OLE, SGML and
compound elements and SGML versus ODA.  A demonstration of IntelliTag, WP's
tool for converting WP documents to SGML.

16h00-16h45 The future of documents according to Microsoft
Microsoft's opinions on issues like: OpenDoc versus OLE, SGML and compound
elements and SGML versus ODA.  A demonstration of SGML Author for Microsoft
Word.

16h45-17h00 Conclusion

17h00-19h00 Reception and visit to exhibition

-----------------------------------------------------------------------------
Seminar Booking Details


Places are limited

There are only 120 places available.  They will be assigned in order of
inscription.  So please make your reservation immediately.


The Vendor Exhibition

Parallel to the Conference Programme there is a vendor exhibition.  To be
seen on the floor are products from: Frame, Exoterica, AIS, SoftQuad, EBT,
Microstar, Information Mapping Europe, Zandar, Oracle, Folio, Microsoft,
Xyvision, Time Lux, WordPerfect, ...


When?

Wednesday 12 October 1994
from 09h00 till 19h00
reception and registration as from 8h30


Where

Arenbergkasteel, Kardinaal Mercierlaan, 3001 Heverlee (Leuven), Belgium


Conference fees

Members: 4.000,- BEF
Non-Members: 9.000,- Bef
This fee includes your participation to the Conference, access to the
vendor exhibition, the lunch, the coffee breaks, the reception and your
personal Binder of the Conference Proceedings (hardcopy and electronic).


How to make your reservation

Make your reservation by completing the registration form and returning it
before October 7th to:

KUL, Departement Elektrotechniek, Afd. TEO
SGML Belux vzw/asbl, Bart Bauwens
Kardinaal Mercierlaan 94
3001  Heverlee (Leuven), Belgium

Or fax to: +32 16 221855
e-mail : bauwens@esat.kuleuven.ac.be


For more information:

SGML Belux, Bart Bauwens, +32 16 220931 extension 1119



-----------------------------------------------------------------------------
Registration Form "Experience the power of SGML"

First Annual Conference on the practical use of SGML

October 12th 1994, Heverlee (Leuven), Belgium

To make your reservation, complete the registration form and return it
before October 7th to:

KUL, Departement Elektrotechniek, Afd. TEO
SGML Belux vzw/asbl, Bart Bauwens
Kardinaal Mercierlaan 94
3001 Heverlee (Leuven), Belgium

Or fax to: +32 16 22185
e-mail : bauwens@esat.kuleuven.ac.be


Yes, I want to attend the SGML Conference

Personal details

Name
Company/Organization
Address
Daytime Phone
Fax


Conference Fees

The conference fees includes your participation to the Conference, access
to the vendor exhibition, the lunch, the coffee breaks, the reception and
your personal Binder of the Conference Proceedings.

O Members     4.000,- BEF
O Non-members 9.000,- BEF


Conference Proceedings

Extra copies of the proceedings can be obtained at 300,- BEF per copy.
Please indicate the format and the number of copies required.

O Hardcopy   ..... x 300,- BEF = 
O Electronic ..... x 300,- BEF = 


Method of payment

Advance payment is required in order to attend the conference. 

The total of my fees is
Conference fee
Extra proceedings
TOTAL

O I will pay by bank transfer to account 068-2144301-46 and provide proof
  of payment at the conference registration desk.

O I will pay at the conference registration desk.


Invoicing

Invoices will be sent upon request.  Check the appropriate boxes below if
you require an invoice.  Note that advanced payment is still requested.

O Please send me an invoice
O Please reference purchase order number .....

- Advance payment is required in order to attend the conference.
- An applicant may cancel a registration whereupon SGML BeLux will charge
  an administration fee of 1.500,- plus V.A.T.
- A colleague can always take your place.
- SGML BeLux reserves the right at any time to vary the time, location and
  presenters of seminars.

Date
Signature
</message>
<message id="<DMEGGINS.94Sep12164545@aix1.uottawa.ca>" date="2988391545">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 20:45:45 UT
From: David Megginson \<dmeggins@aix1.uottawa.ca>
Organization: Department of English, University of Ottawa
Message-ID: \<DMEGGINS.94Sep12164545@aix1.uottawa.ca>
References: <46067.loeffen@ruulet.let.ruu.nl>
Subject: Re: CONCUR / marked sections

[Arjan Loeffen]

|   On the other hand, some data may be local to a single view.  For
|   instance, if a play is encoded both conform a drama DTD and a
|   linguistic DTD, some annotation inserted for a scene should not be
|   reflected in the linguistic encoding of the play.

Here's a rough example, though I might have misunderstood the standard a
bit (I do not find the notes on CONCUR entirely clear):

  \<!DOCTYPE foo [
    \<!ELEMENT foo    - - (div+)>
    \<!ELEMENT div    - - (#PCDATA)>
    \<!ENTITY % onlyfoo "INCLUDE">
    \<!ENTITY % onlybar "IGNORE">
  ]>
  \<!DOCTYPE bar [
    \<!ELEMENT bar    - - (div+)>
    \<!ELEMENT div    - - (#PCDATA)>
    \<!ENTITY % onlyfoo "IGNORE">
    \<!ENTITY % onlybar "INCLUDE">
  ]>
  <(foo)foo>
  <(bar>bar>
  \<div>This is a common division\</div>
  <(bar)div><(foo)div>This division will end here</(bar)div)><(bar)div)>
  for one DTD, but here</(foo)div><(foo)div> for the other\<![%onlyfoo;[,
  and this data will appear only in the foo
  instance\]]>.</(foo)div></(bar)div>
  </(foo)foo>
  </(bar)bar>

[It gives me a headache just looking at it!]

Essentially, the trick is that one DTD will declare a parameter entity
"INCLUDE", while another will declare "IGNORE", so that the data is
included in only one document instance.  Hope this helps.

David
-- 
David Megginson                Department of English, University of Ottawa,
dmeggins@aix1.uottawa.ca       Ottawa, Ontario, CANADA  K1N 6N5
dmeggins@acadvm1.uottawa.ca    Phone: +1 613 564 6850 (Office)
ak117@freenet.carleton.ca             +1 613 564 9175 (FAX)
</message>
<message id="<9409082335.AA04091@netprod1.gateway.bsis.com>" date="2988398144">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 22:35:44 UT
From: James Biddell \<jbiddel@gateway.bsis.com>
Message-ID: <9409082335.AA04091@netprod1.gateway.bsis.com>
Subject: NC SGML Users Group Meeting

The North Carolina SGML Users Group will be holding our next meeting on
September 15, from 7:00 pm to 9:00 pm.  The meeting place will be in Cary,
North Carolina at the SAS Institute training center auditorium.  SAS is
located off Highway 40 on Harrison Ave, west of Raleigh.  We encourage
anyone interested in SGML-related issues to join us.  Past meetings have
featured presentations/discussions on Hytime and Mathpack.  For more
information, contact Mary Kroeger at +1 919 319 4648 or
\<mkroeger@gateway.bsis.com>


                                                               ^
                                                        Durham |

                                       _traffic light      |
                                      \\                   /|
                                       \\                 / |
------------Harrison Ave.---------------X------------------|-------
                                        |                  | /
                                        |                  |/  exit 287
                           -------     [|] gatehouse       |
                           |bldg.|      |                  |
                           |  L  |      |                  |
                           |     |      |                  |
                           -------      |                  | I-40
                                        |                  |
                                        |                  |
               .......walk..........    |                  |
               .     ------------- .    |    --------      |
        ..........   | aud. .    | .    |    |      |      |
        .        .   |.......    |\<enter|    | bldg.|      |
        . park   .   |           |      |    |      |      |
        .        .   |   bldg.   |      |    |   E  |      |
        .        .   |           |      |    |      |      |
        .        .   |    F      |      |    |      |
        ..........   |           |      |    |      |
            |        |           |      |    --------
            |        ------------       |
            |                           |               Raleigh |
            |                           |                       V
            ----------------------------------
                                        |
                                        |

-- 
Jim Biddell,
Treasurer/Secretary
NC SGML Users Group
</message>
<message id="<1994Sep12.224544.15386@ast.saic.com>" date="2988398744">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 22:45:44 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep12.224544.15386@ast.saic.com>
Subject: Footnotes in FOSIs

Anyone have any experience with footnote reference elements in FOSIs.
Adept handles footnote elements and xref elements.  I had to modify the
38784C FOSI to handles xrefs to different elements e.g., tables, fuigures,
equations, bibliography, etc.  I was considering adding a footnote
reference type to xref and getting rid of ftnref altogether.  I'm still
trying to figure out how the FOSI engine knows to fill in the xrefstr with
the right reference title like [3] for a bib reference.  Anyone know how
this is done?

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<novlepubCw1GKr.4AK@netcom.com>" date="2988398762">
Newsgroups: comp.text.sgml
Date: 12 Sep 1994 22:46:02 UT
From: Jon Bosak \<novlepub@netcom.com>
Organization: Novell Electronic Publishing
Message-ID: \<novlepubCw1GKr.4AK@netcom.com>
Subject: FOSI consultant sought

Novell is looking for a FOSI consultant who can configure Adept Publisher
to produce output conforming to our existing corporate style from software
documentation marked up in DocBook.  In addition to having a lot of
experience with FOSIs and Adept Publisher, this person will know enough
about FrameMaker to be able to understand the formatting embodied in our
current FrameMaker templates and will preferably be located within driving
distance of our offices in San Jose, California, or Summit, New Jersey.
Some work has already been done on a prototype, so the whole job should
probably take no more than two to four weeks.

Persons interested in bidding on this contract should send mail to
novlepub@netcom.com.

-- 
+------------------------------------------------------------------------+
|  Jon Bosak  Novell Corporate Publishing Services  novlepub@netcom.com  |
+------------------------------------------------------------------------+
</message>
<message id="<352q8f$hjr@inxs.ncren.net>" date="2988403406">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 00:03:26 UT
From: Mark Shelman \<shelman@src.org>
Organization: Semiconductor Research Corporation
Message-ID: <352q8f$hjr@inxs.ncren.net>
References: <19940909.4927@naggum.no> \<ogawa.1129771980H@news.teleport.com>
Subject: Re: SGML and its enemies

[Arthur Ogawa]

|   Perhaps at some future date, a capable system with less archaic baggage
|   will come along and displace SGML.  Others in this group have hinted at
|   this as well.  Frankly, a system that would provide for elements,
|   attributes, and content model, along with a vehicle (like entities) for
|   representing objects (like pi characters) that cannot be typed at the
|   keyboard, would pretty well satisfy the requirements of most projects I
|   have worked on.  Specialities of the language that save typing are of
|   no interest.  Am I missing something in my short list?

Well, Art, don't forget external file entities.  I can't tell you how
useful they are in processing (autotagging, validating, moving around)
hundreds of small documents that eventually get merged into something like
a single DynaText book without ever concatenating SGML files.  Some of
Erik's ideas sound like they would make this easier, at least for the
developer.

BTW, not all SGML editors make it as easy to edit the component parts of a
master document made up of lots of file entity refs (though I don't think
this is a fault of the standard, since Arbortext does it pretty well with
the current standard.)

-- 
Mark Shelman
Semiconductor Research Corp
</message>
<message id="<352v7p$2pk@emoryu1.cc.emory.edu>" date="2988408505">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 01:28:25 UT
From: "Benjamin K. Belton" \<ilabkb@unix.cc.emory.edu>
Organization: Emory University
Message-ID: <352v7p$2pk@emoryu1.cc.emory.edu>
Subject: Avalanche SGML add-in for Word6.0?

I have read somewhere about an add-in for Word 6.0 from Avalanche software,
but locate any info.  Does anyone have any details about this?

Thanks in advance.

-- 
Keith Belton, AICP, Landscape Architect extraordinaire
ilabkb@unix.cc.emory.edu
\<A HREF="http://www.gatech.edu/andy/keith.belton.html">My home page.\</A>
</message>
<message id="<north.208.00147AFD@knoware.nl>" date="2988408521">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 01:28:41 UT
From: Simon North \<north@knoware.nl>
Organization: Directions Documentation Engineers
Message-ID: \<north.208.00147AFD@knoware.nl>
Subject: Errors in IETM DTD, repeat request

We have been trying to import the MIL standard IETM DTDs into Microstar's
Near & Far but get repeated errors.  Then I suddenly remembered someone
posting a notification of some errors and their fixes.  Would it please be
possible for someone to e-mail or post them once again??  (either at home
address: \<north@knoware.nl> or work address: \<db512@hgl.signaal.nl>).

Thanks a lot. 
</message>
<message id="<1994Sep12.225455@opal.tufts.edu>" date="2988417295">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 03:54:55 UT
From: Fekade \<faytaged@opal.tufts.edu>
Organization: Tufts University - Medford, MA
Message-ID: <1994Sep12.225455@opal.tufts.edu>
Subject: RTF to SGML converter -HELP!!

Hi,

I was wondering if there are any public domain or commercial programs out
there that convert from RTF to SGML.  It is one big pain to convert RTF
files from Word, quark, etc to SGML.  Tables, figures & images are
particularly hard.  The program I have right now produces output that needs
to be cleaned up.  It takes about 1-2 hours of editing to clean up.  I
would appreciate it if someone could point me to the proper place.

-Fekade
</message>
<message id="<1994Sep13.081101.21054@edf.fr>" date="2988432661">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 08:11:01 UT
From: Daniel Glazman \<daniel.glazman@der.edf.fr>
Organization: EDF Direction des Etudes et Recherches
Message-ID: <1994Sep13.081101.21054@edf.fr>
References: <34poef$34g@netnews.upenn.edu>
Subject: Re: Basics

[Nathan Sivin]

|   I just subscribed to this newsgroup, and have read ca. 120 postings.
|   #4 was a simple request for information about getting a copy of the
|   basic instructions for using SGML, with an outline of the markup.  I
|   used IBM GML for several years in publishing a journal, and would like
|   to use the new standard.

Warning: SGML is ****NOT**** GML version n++...

GML is (much) more a formatting language than a structured markup language.
You'll encounter the following problems moving from this proprietary format
to SGML:

1) conversion; if all your documents are written "in the same way", maybe
   you can translate 90 % of them in SGML according to a restrictive DTD
   including only *some* GML equivalent tags.  Otherwise, I suggest you
   keep GML for your existing files without any hope of conversion

2) restriction; as I said in point 1, its is quite impossible to use a DTD
   including all the GML existing tags and user macros.

3) restitution; GML builds a printable image of a document but you need a
   SGML composer to build a such paper from a SGML document.  See the
   discussion in this newsgroup about SGML->Postscript conversion.

\</Daniel>
</message>
<message id="<3543bfINN1cs@oasys.dt.navy.mil>" date="2988445487">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 11:44:47 UT
From: Betty Harvey \<harvey@navysgml.dt.navy.mil>
Organization: Advanced Information Systems Branch, DTMB, CDNSWC
Message-ID: <3543bfINN1cs@oasys.dt.navy.mil>
References: <34rjsj$5ol@deep.rsoft.bc.ca> <19940910.4934@naggum.no> \<Cw0sFw.ExF@undergrad.math.uwaterloo.ca>
Subject: Re: SGML and its enemies

[Tim Bray]

|   This doesn't mean that SGML's design is the one for the future.

[Erik Naggum]

|   the crux of my point -- that if we are planning for information
|   longevity with SGML, it'd better _be_ there 10, 20, 50 years down the
|   road, and be deemed worth implemented by people who need it.

[Warren Baird]

|   I don't think this is true.  Going from format-based markup to
|   structure-based markup is always going to be expensive.  However if a
|   document is marked up structurally, converting to another
|   structure-based markup language should be quite cheap, and probably
|   automatic.
|   
|   I'm sure that there will be _a_ standard structural markup language in
|   the future, and even if it isn't SGML, we'll be able to convert SGML
|   documents to the new standard fairly easily (assuming that you're using
|   SGML in a structured fashion, of course)

I vowed I wouldn't enter any of these esoteric arguments about the the
validity, rightness/wrongness, of SGML, but I just can't help myself
sometimes.  One area that I see everyone is forgetting about SGML is it can
not only identify format or structure (if this is what you want to do), it
can also identify content th of the data.  If the data is really worth
taking the effort to save and put into SGML, it is worth identifying the
content of the data.  Granted creating DTD's and SGML instances based on
structure is a lot easier than content.

The organization wanting their information in SGML needs to make value
judgements: Is this information critical to our mission?  Does this
information have longevity?  Will this information be reused?  If the
answer to any of those questions are yes then it is worthwhile for the
organization to 'content tag' their information.  If the answer is no then
a good solution would be an easy to use, very simple format/structure based
DTD like HTML.

I personally see this as one of the benefits to SGML -- it can be as
simplistic or all emcompassing depending upon the need and application.

					Betty

-- 
Betty Harvey  \<harvey@oasys.dt.navy.mil>     | David Taylor Model Basin
Advanced Information Systems Branch          | Carderock Division
Code 183                                     | Naval Surface Warfare
Bethesda, Md.  20084-5000                    |   Center
                                             | DTMB,CD,NSWC   
URL:  http://navysgml.dt.navy.mil/betty.html |          
</message>
<message id="<Cw2Muu.Iro@undergrad.math.uwaterloo.ca>" date="2988453558">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 13:59:18 UT
From: Warren Baird \<wjbaird@undergrad.math.uwaterloo.ca>
Organization: University of Waterloo
Message-ID: \<Cw2Muu.Iro@undergrad.math.uwaterloo.ca>
References: <34rjsj$5ol@deep.rsoft.bc.ca> <19940910.4934@naggum.no> \<Cw0sFw.ExF@undergrad.math.uwaterloo.ca> <3543bfINN1cs@oasys.dt.navy.mil>
Subject: Re: SGML and its enemies

[Betty Harvey]

|   I vowed I wouldn't enter any of these esoteric arguments about the the
|   validity, rightness/wrongness, of SGML, but I just can't help myself
|   sometimes.  One area that I see everyone is forgetting about SGML is it
|   can not only identify format or structure (if this is what you want to
|   do), it can also identify content th of the data.  If the data is
|   really worth taking the effort to save and put into SGML, it is worth
|   identifying the content of the data.  Granted creating DTD's and SGML
|   instances based on structure is a lot easier than content.

Could you expand on what you mean by content markup?  How is it different
from structural markup?

Warren Baird
</message>
<message id="<354cpbE6p5@uni-erlangen.de>" date="2988455147">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 14:25:47 UT
From: Markus Kuhn \<mskuhn@cip.informatik.uni-erlangen.de>
Organization: Student Pool, CSD, University of Erlangen, Germany
Message-ID: <354cpbE6p5@uni-erlangen.de>
References: <34kmac$9e2@ruby.ora.com> <1994Sep8.105825.421@ittpub>
Subject: Re: US standards publishers?

[William D. Lindsey]

|   How about:
|
|	American National Standards Institute
|	11 West 42nd Street, 13th floor
|	New York, NY 10036 USA
|	+1 212 642 4900,  fax: +1 212 302 1286
|
|   Now, who can direct me to the source in The Netherlands?

The "Standards-FAQ" in news.answers (or in any FAQ server,
e.g. rtfm.mit.edu pub/usenet/news.answers/) contains a list of all ISO
member bodies on this planet and they'll sell you the stuff.

Markus

-- 
Markus Kuhn, Computer Science student -- University of Erlangen,
Internet Mail: \<mskuhn@cip.informatik.uni-erlangen.de> - Germany
WWW Home: \<http://wwwcip.informatik.uni-erlangen.de/user/mskuhn>
</message>
<message id="<ricko.779470133@ee.uts.EDU.AU>" date="2988458933">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 15:28:53 UT
From: Rick Jelliffe \<ricko@ee.uts.edu.au>
Organization: University of Technology, Sydney
Message-ID: \<ricko.779470133@ee.uts.EDU.AU>
References: <19940909.4927@naggum.no>
Subject: Re: SGML and its enemies

Why not just bite the bullet and have SGML mark 2?

Ordinary SGML can be frozen, and quite happily be used for the the things
it is currently used for.

The things SGML2 should have are:

1) a formal (usable, complete) grammar
2) a formal reference parser model
3) a standard written either in a real language using only words that
   appear in a known dictionary or in some algebra, but not in a mish mash
   of jargon giving the appearance of formality
4) an altered SGML declaration to handle multibyte character sets more 
   obviously.

Anything else (inheritance, transparent elements, etc) are secondary.

This SGML2 should have a simpler minimum feature set (e.g., most basic
consists of just elements + data (no attributes, no entities, no omittag,
no RE rules)).  This would allow very simple minimal parsers to to written.
Surely the big thing that holds SGML back is it is so difficult to
understand, let alone implement a minimal parser.

-ricko
</message>
<message id="<1994Sep13.103315.24191@driftwood.cray.com>" date="2988459195">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 15:33:15 UT
From: "Chris J. Hector" \<cjh@cray.com>
Message-ID: <1994Sep13.103315.24191@driftwood.cray.com>
References: <1994Sep12.225455@opal.tufts.edu>
Subject: Re: RTF to SGML converter -HELP!!

[faytaged@opal.tufts.edu]

|   I was wondering if there are any public domain or commercial programs
|   out there that convert from RTF to SGML.  It is one big pain to convert
|   RTF files from Word, quark, etc to SGML.  Tables, figures & images are
|   particularly hard.  The program I have write now produces output that
|   needs to be cleaned up.  It takes about 1-2 hours of editing to clean
|   up.  I would appreciate it if someone could point me to the proper
|   place.

If you can live with the HTML DTD, you can use rtftohtml.  See
ftp://ftp.cray.com/src/WWWstuff/RTF/rtftohtml_overview.html for details.

Chris

-- 
cjh@cray.com	   ! ftp://ftp.cray.com/src/WWWstuff/Chris_Hector.html
Cray Research Inc. ! My opinions are my own
655-F Lone Oak Dr. ! 
Eagan, MN 55121	   ! "Ya better start swimmin or you'll sink like a stone
+1 612 683 5858	   ! for the times they are a-changin"   - BobD
</message>
<message id="<354gp0INNjmi@oasys.dt.navy.mil>" date="2988459232">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 15:33:52 UT
From: Betty Harvey \<harvey@oasys.dt.navy.com>
Organization: Advanced Information Systems Branch, DTMB, CDNSWC
Message-ID: <354gp0INNjmi@oasys.dt.navy.mil>
References: \<Cw0sFw.ExF@undergrad.math.uwaterloo.ca> <3543bfINN1cs@oasys.dt.navy.mil> \<Cw2Muu.Iro@undergrad.math.uwaterloo.ca>
Subject: Re: SGML and its enemies

[Warren Baird]

|   Could you expand on what you mean by content markup?  How is it
|   different from structural markup?

When you tag the structure of the data you identify pieces of information,
i.e., chapter, title, paragraphs, etc.  Content data can include things
such as task, part, part number, address, etc.  Content tagging information
requires a great deal more effort than strictly structure tagging but it is
easier to identify the data and enable it to be used over and over again.

Betty

-- 
Betty Harvey  \<harvey@oasys.dt.navy.mil>     | David Taylor Model Basin
Advanced Information Systems Branch          | Carderock Division
Code 183                                     | Naval Surface Warfare
Bethesda, Md.  20084-5000                    |   Center
                                             | DTMB,CD,NSWC   
URL:  http://navysgml.dt.navy.mil/index.html |          
</message>
<message id="<1994Sep13.162422.2076@cs.nott.ac.uk>" date="2988462262">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 16:24:22 UT
From: Martijn Koster \<m.koster@nexor.co.uk>
Organization: NEXOR Ltd
Message-ID: <1994Sep13.162422.2076@cs.nott.ac.uk>
References: \<RAMAN.94Sep11164345@arctic.crl.dec.com>
Subject: Re: SGML/HTML: An obfuscated markup language?

[TV Raman]

|   Far from it, it's been a struggle.  What's worse, most of the SGML
|   tools seem to be totally incomprehensible.

That's pretty much the same for all non-SGML'ers, including me.

|   I spent considerable time installing/understanding SGMLS a couple of
|   months ago, and after fighting hard even managed to find a dtd for html
|   on the net as well as the other files necessary to make sgmls parse
|   simple html documents.  But the whole process of running SGMLS is so
|   obfuscated, i can't remember all the things I needed to do now, and
|   after retrieving the latest DTD just gave up on trying to validate my
|   documents using it and SGMLS as a waste of time.  This state of affairs
|   is frightening!

Absolutely, I posted along the same lines a few months ago.

As there seems little active effort by the SGML people to make SGML simple
for Web users (with the exception of the HotMetal people), and after a
number of requests from Web users I finally give in.  I have written a
document on how to setup psgml and sgmls, on

    \<URL:http://web.nexor.co.uk/mak/doc/html/sgml-lib/html-sgml.html>

Included in it is a tar archive of my /usr/local/lib/sgml, which is most
the confusion I encountered.

This should help people to get started -- it would certainly have saved me
precious evenings.  It's mostly from memory, and I only half know what I'm
talking about, but it's a start.

I would really appreciate comments on it from more expert people.

|   I know these are radical statements, but something needs to be done to
|   make sgml/HTML validation and processing more palatable, or we'll have
|   to spend the rest of our careers retrofitting our documents lto kluges
|   like Mosaic.

For some more radical statements see \<URL:http://web.nexor.co.uk/mak/
doc/html/sgml-lib/html-sgml.html#soapbox>.  But I believe that for now
we're stuck with SGML as-is, and should try to make the best of that.

Regards,

-- Martijn
-- 
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
Telephone: +44 115 9 520576
WWW: http://web.nexor.co.uk/mak/mak.html
</message>
<message id="<1994Sep13.170905.10083@ast.saic.com>" date="2988464945">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 17:09:05 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep13.170905.10083@ast.saic.com>
References: <354gp0INNjmi@oasys.dt.navy.mil>
Subject: Re: SGML and its enemies

[Warren Baird]

|   Could you expand on what you mean by content markup?  How is it
|   different from structural markup?

[Betty Harvey]

|   When you tag the structure of the data you identify pieces of
|   information, i.e., chapter, title, paragraphs, etc.  Content data can
|   include things such as task, part, part number, address, etc.  Content
|   tagging information requires a great deal more effort than strictly
|   structure tagging but it is easier to identify the data and enable it
|   to be used over and over again.

Whoops!  I've got to be more careful with my semantics from now on.  I
think that I've been inadvertently equating these two meanings sometimes
because I refer to the structure of the document to mean it's "logical" or
"usage-based" structure rather than its document structure which I infer is
what you mean by structure in your usage.  I guess this is a result of
working with Mil-spec Data Item Descriptions (DIDS) in which the document
structure reflects the logical structure.

For example, when I use a \<requirement> tag, it is clearly a case of
content labeling.  But since the document is a requirements document, it is
logically "structured" around requirements.  I'm not nit-picking here.  I
just perceive a semantic problem and I am looking for some genius on the
group to disambiguate it.  Any ideas?

Also, you mention the difficulty of content labeling a document.  I have
been working on ways to content-label legacy software product data.  There
are some automatic taggers on the market with a modicum of AI or expert
rules to help identify fragments of text as candidates for content labeling
but they are not yet adequate for my purposes.  As an example, if a
paragraph contains a "shall" than it's probably a candidate for a
\<requirement> tag, but should it use a \<requirement type=derived>,
\<requirement type=implied> or a \<requirement type=stated> tag?  This begins
to border on natural language recognition but then SGML always has that
kind of implication for me.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep13.172502.13685@ast.saic.com>" date="2988465902">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 17:25:02 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep13.172502.13685@ast.saic.com>
References: <1994Sep13.162422.2076@cs.nott.ac.uk>
Subject: Re: SGML/HTML: An obfuscated markup languag

[Martijn Koster]

|   As there seems little active effort by the SGML people to make SGML
|   simple for Web users (with the exception of the HotMetal people), and
|   after a number of requests from Web users I finally give in.  I have
|   written a document on how to setup psgml and sgmls, on
|
|       \<URL:http://web.nexor.co.uk/mak/doc/html/sgml-lib/html-sgml.html>
|
|   Included in it is a tar archive of my /usr/local/lib/sgml, which is
|   most the confusion I encountered.

Your efforts to help others is laudable, but please be kind.  Many people
on this group read their news on VT100s (next thing up the evolutionary
ladder after the teletype and the VT52) and have never heard of WEB, much
else be in a position to help you with it.  The application of SGML to the
WEB server is significant to the future of SGML, but it is somewhat off the
mainstream of SGML applications.

(Just an opinion -- Oh no, where'd I put that asbestos...)

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep13.192657.6145@ast.saic.com>" date="2988473217">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 19:26:57 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep13.192657.6145@ast.saic.com>
Keywords: FOSI footnote 38784
Subject: Footnote References in Arbortext 38784C FOSIs

Hallelujah!!!  After many tries, I have succeeded in implementing
autoindexed footnote references in an Arbortext 38784C style FOSI.  As I
mentioned in a previous post, I implemented this by totally ignoring the
"ftnref" element which does not work with Adept and added another attribute
choice to my xref tag, namely "footnote".  To refer to an footnote, I use a
\<xref xidtype=footnote xrefid=whatever> tag where "whatever" is the ID
attribute of the footnote being referenced.  I hacked the FOSI to savetext
and att fillval the same variable "refnum" that the other xrefid variants
use.  I have no idea why this works but the Arbortext FOSI engine must do
some magic when it sees the "refnum" variable used in a fillval.  In a fit
of insanity, I though that I might be able to patch up the ftnref element
in the same fashion but after little reflection that wouldn't work because
it's not a xref element and so the magic wouldn't be triggered (I think).
None of this is documented anywhere of course.  Who says that FOSI
programming is difficult?  The changes to the footnote e-i-c tag were to
add:

\<att>
\<fillval attname="id" fillcat="savetext" fillchar="textid">
\<charsubset>
\<savetext conrule="refnum"\</charsubset>
\</att>

The additions to the xref e-i-c were:

\<att>
\<specval attname="xidtype" attval="footnote">
\<specval attname="pretext" attval="#NONE">
\<specval attname="posttext" attval="#NONE">
\<charsubset>
\<font style="sanserif" size="7pt" offset="4pt">
\<usetext source="xrefstr">\</usetext>
\</charsubset>
\</att>

The DTD element for xref now looks like this (obviously footnote is not the
first thing that I've added):


\<!ELEMENT xref     - o  EMPTY >
\<!ATTLIST xref     xrefid IDREF #REQUIRED
                   xidtype (text | figure | table | bibitem | para0 |
			    subpara1 | subpara2 | subpara3 | section |
			    footnote) #REQUIRED
                   pretext CDATA #IMPLIED
                   posttext CDATA #IMPLIED
                   %secur; >

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<35559q$4ht@lo-fan.jpl.nasa.gov>" date="2988480250">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 21:24:10 UT
From: Grace Tamashiro \<grace@jaguar.jpl.nasa.gov>
Organization: JPL
Message-ID: <35559q$4ht@lo-fan.jpl.nasa.gov>
References: <352v7p$2pk@emoryu1.cc.emory.edu>
Subject: Re: Avalanche SGML add-in for Word6.0?

Keith-

Here's the info to contact John Payne of Avalanche Corp. 

947 Walnut Street
Boulder Co., 80302

Phone: +1 303 449 5032
FAX: +1 303 449 3246
john@support.avalanche.com

-- 
Grace Tamashiro, grace@jaguar.jpl.nasa.gov
</message>
<message id="<truly.467.000F8B4F@lunemere.com>" date="2988484353">
Newsgroups: comp.text.sgml
Date: 13 Sep 1994 22:32:33 UT
From: Truly Donovan \<truly@lunemere.com>
Organization: La Lunemere
Message-ID: \<truly.467.000F8B4F@lunemere.com>
References: <34poef$34g@netnews.upenn.edu> <1994Sep13.081101.21054@edf.fr>
Subject: Re: Basics

[Daniel Glazman]

|   Warning: SGML is ****NOT**** GML version n++...
|
|   GML is (much) more a formatting language than a structured markup
|   language.  You'll encounter the following problems moving from this
|   proprietary format to SGML:
|
|   1) conversion; if all your documents are written "in the same way",
|      maybe you can translate 90 % of them in SGML according to a
|      restrictive DTD including only *some* GML equivalent tags.
|      Otherwise, I suggest you keep GML for your existing files without
|      any hope of conversion
|
|   2) restriction; as I said in point 1, its is quite impossible to use a
|      DTD including all the GML existing tags and user macros.
|
|   3) restitution; GML builds a printable image of a document but you need
|      a SGML composer to build a such paper from a SGML document. See the
|      discussion in this newsgroup about SGML->Postscript conversion.

What are you calling "GML"?  Conceptually, the only difference between GML
and SGML is the absence of the formal discipline of a DTD.

You need a GML composer just as much as you need an SGML composer to build
a printable image.

Truly Donovan
</message>
<message id="<ROBIN.94Sep13221748@utafll.utafll.uta.edu>" date="2988505068">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 04:17:48 UT
From: Robin Cover \<robin@utafll.uta.edu>
Organization: UT Arlington
Message-ID: \<ROBIN.94Sep13221748@utafll.utafll.uta.edu>
Subject: ISO 12083 DTDs (bogus)

[Bob Agnew]

|   There have been quite a few requests for the AAP dtds on this group
|   recently.  I had replied with a copy of the AAP public entity which
|   came with my Arbortext distribution.  Apparently, this is not the
|   latest revision of the AAP DTD which eventually became an ANSI
|   standard, ANSI Z39.59-1988.  These DTDs have now been revised and
|   extended to become an ISO standard ISO-12083:1993.  Dr. Paul Grosso of
|   Arbortext Inc. who chaired the ISO committee which revised the math and
|   table parts of the dtds, kindly provided us with electronic copies of
|   these dtds.  I have placed these on actd.saic.com on directory
|   pub/SGML/ISO-12083 where they are available via anonymous FTP.  There
|   are four DTDs
|   
|   	book.dtd
|   	article.dtd
|   	maths.dtd
|           serial.dtd
|   
|   I have also placed the older version of the AAP dtd on pub/SGML/AAP.  I
|   will place the latest revision there when I can get it.  Dr. Grosso has
|   also kindly provided some ordering information for the various
|   standards which I have reproduced herein:
|   
|   The AAP standards are held by the Electronic Publishing Special
|   Interest Group (EPSIG) that had been associated with OCLC, but has
|   since been taken over jointly by GCARI (Graphics Communication
|   Association) and McAfee and McAdam.  The contact addresses I have are:
|   
|   Ms. Robin Canami		Michelle Wulffaerat
|   McAfee and McAdam		GCARI
|   PO Nox 328			PO Box 2888
|   Bel Air, MD  21014-0328		Alexandria, VA  22314-2888
|   USA				USA
|   +1 410 893 1340			+1 703 519 8184
|   
|   Either contact should eventually be able to help you order the AAP
|   Electonic Manuscript Markup documents (the "old" AAP stuff) and the ISO
|   12083 standard.  These organizations also give tutorials on the ISO
|   12083 standard.
|   
|   Furthermore, you should be able to order a copy of ISO 12083 from ANSI:
|   
|   American National Standards Institute
|   11 West 42nd Street, 13th Floor
|   New York, NY  10036
|   USA

Thanks to Bob for putting up the DTDs, but there is some question about
their authenticity.  I looked at them on the FTP site Bob referenced
(yesterday), and the article DTD has the same infelicity that's printed in
paper copies of "ISO 12983" sold in the US -- which, I hear from other
sources, are buggy and not the "official, published" ISO.

I don't know what's happening here, but I'd be grateful for an explanation
from someone who knows.  Look at the article DTD: it contains a duplicate
entity declaration for ISOpub.  When I spotted this and a couple other
"mistakes," I asked in a semi-public forum "what gives?" and was told that
ISO 12083 is not yet published, and that the paper version being sold (as
well as the electronic copies of the DTDs circulated by EPSIG) is not the
genuine article.  Really?

I'm a little disappointed, since the copy I bought (from EPSIG I think,
perhaps from GCA) was advertised as real.  In fact, the copy I have
contains a note (promoting the standard for NISO review/adoption) as
follows: "Following is the complete and final text of ISO 12083 Electronic
manuscript preparation and markup."

I might ask for my $75.00 back, but I don't know who's to blame for this
state of affairs, including misleading advertising.

I do think -- ISO's need to sell paper copies of standards notwithstanding
-- that it's a good idea to put up the ISO 12083 DTDs.  But let's get the
corrected and "official" copies, if possible.
</message>
<message id="<1994Sep14.100544.27199@sqwest.wimsey.bc.ca>" date="2988525944">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 10:05:44 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep14.100544.27199@sqwest.wimsey.bc.ca>
References: <1994Sep12.155128.4112@sqwest.wimsey.bc.ca> <1994Sep12.175005.20847@ast.saic.com>
Subject: Re: CONCUR usefulness existence proof

[W. Eliot Kimber]

|   HyTime doesn't do validation of document type declarations *because
|   it's incomputable in some cases*.

[Marcy Thompson]

|   People keep saying this to me.  Can someone show me a small
|   incomputable example?  Please?

[Bob Agnew]

|   I wonder what Godel would make of this? Anyone have Kurt's e-mail
|   address ;-)

Okay, I'll accept anything my dissertation advisor would have accepted:

-- an example

-- a rigourous demonstration that one can always construct an example

-- a correct reductio ad absurdum argument (I was taught Real Analysis
   by Bishop, so this can be taken as an actual concession)

Now, can someone demonstrate to me that this claim (that HyTime validation
of DTDs is incomputable in some cases) is true?

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<KJETIL.94Sep14104659@rigel.spd.eee.strathclyde.ac.uk>" date="2988528419">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 10:46:59 UT
From: Kjetil Rossavik \<kjetil@rigel.spd.eee.strathclyde.ac.uk>
Organization: Signal Processing Division, Strathclyde University, Scotland
Message-ID: \<KJETIL.94Sep14104659@rigel.spd.eee.strathclyde.ac.uk>
Subject: SMSL?

Hi,

I have been asked to make a presentation about SGML, HTML and SMSL.  The
first two I think I know enough about (for the presentation purposes anyway
:-)), but SMSL is a complete mystery to me.  I searched the web using
Veronica and the W3 index in Switzerland, but to no avail.  Can anybody
help?

Regards,
Kjetil.
</message>
<message id="<356uqv$kt0@Starbase.NeoSoft.COM>" date="2988539167">
Newsgroups: comp.infosystems.www.misc,alt.comp.shareware,comp.text.sgml
Date: 14 Sep 1994 13:46:07 UT
From: Cameron Laird \<claird@Starbase.NeoSoft.COM>
Organization: NeoSoft Internet Services +1 713 684 5969
Message-ID: <356uqv$kt0@Starbase.NeoSoft.COM>
References: <344tf2$ks7@Starbase.NeoSoft.COM> <34qulm$1vt@usenet.srv.cis.pitt.edu> \<VAIZKI.94Sep10094154@lk-hp-21.hut.fi> <354dfs$n05@Starbase.NeoSoft.COM>
Subject: Re: The maintenance of URL-endowed pages (was: The construction of FAQs) [LONG]

[Cameron Laird]

|   'Know what we need even before that?  A static analysis tool for
|   validating HREFs.  I'm completely serious; having such a

In email, John M. Troyer gently, but comprehensively, answered my questions:

    http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html

-- 
Cameron Laird		ftp://ftp.neosoft.com/pub/users/claird/home.html
claird@Neosoft.com (claird%Neosoft.com@uunet.uu.net)	+1 713 267 7966
claird@litwin.com (claird%litwin.com@uunet.uu.net)  	+1 713 996 8546
</message>
<message id="<3572a2$l3o@news.xs4all.nl>" date="2988542722">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 14:45:22 UT
From: Jan Grootenhuis \<jang@xs4all.nl>
Organization: XS4ALL, networking for the masses
Message-ID: <3572a2$l3o@news.xs4all.nl>
References: <354gp0INNjmi@oasys.dt.navy.mil> <1994Sep13.170905.10083@ast.saic.com>
Subject: Re: SGML and its enemies

[Bob Agnew]

|   For example, when I use a \<requirement> tag, it is clearly a case of
|   content labeling.  But since the document is a requirements document,
|   it is logically "structured" around requirements.  I'm not nitpicking
|   here.  I just perceive a semantic problem and I am looking for some
|   genius on the group to disambiguate it.  Any ideas?

The Seybold Report on Publishing Systems, Vol. 20, Nr. 7 (December 1990)
had an article by George Alexander and Mark Walter, A fresh look at SGML,
that also discussed structural vs content tagging, and referred to an
article by Dale Waldt on the same subject.

Hope this helps,

Jan
</message>
<message id="<1994Sep14.144615.22676@ast.saic.com>" date="2988542775">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 14:46:15 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep14.144615.22676@ast.saic.com>
References: \<ROBIN.94Sep13221748@utafll.utafll.uta.edu>
Subject: Re: ISO 12083 DTDs (bogus)

[Robin Cover]

|   Thanks to Bob for putting up the DTDs, but there is some question about
|   their authenticity.  I looked at them on the FTP site Bob referenced
|   (yesterday), and the article DTD has the same infelicity that's printed
|   in paper copies of "ISO 12983" sold in the US -- which, I hear from
|   other sources, are buggy and not the "official, published" ISO.
|   
|   I don't know what's happening here, but I'd be grateful for an
|   explanation from someone who knows.  Look at the article DTD: it
|   contains a duplicate entity declaration for ISOpub.  When I spotted
|   this and a couple other "mistakes," I asked in a semi-public forum
|   "what gives?" and was told that ISO 12083 is not yet published, and
|   that the paper version being sold (as well as the electronic copies of
|   the DTDs circulated by EPSIG) is not the genuine article.  Really?
|   
|   I'm a little disappointed, since the copy I bought (from EPSIG I think,
|   perhaps from GCA) was advertised as real.  In fact, the copy I have
|   contains a note (promoting the standard for NISO review/adoption) as
|   follows: "Following is the complete and final text of ISO 12083
|   Electronic manuscript preparation and markup."
|   
|   I might ask for my $75.00 back, but I don't know who's to blame for
|   this state of affairs, including misleading advertising.
|   
|   I do think -- ISO's need to sell paper copies of standards
|   notwithstanding -- that it's a good idea to put up the ISO 12083 DTDs.
|   But let's get the corrected and "official" copies, if possible.

I forwarded this to Dr. Grosso, although I know he reads this group anyway. 
Perhaps Dr. Grosso can shed some light on this.  Please accept my apologies.

Are BOGUS dtds better than no dtds at all?? ;--((

-- 
"DTDs are like the Ten Commandments - its's pretty hard to get a hold of the
 original manuscripts and the author has been unreachable for some time"
</message>
<message id="<Cw4K32.GCF@news.cis.umn.edu>" date="2988542967">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 14:49:27 UT
From: R A Milowski \<milor001@maroon.tc.umn.edu>
Organization: University of Minnesota
Message-ID: \<Cw4K32.GCF@news.cis.umn.edu>
References: <19940909.4927@naggum.no> \<ricko.779470133@ee.uts.EDU.AU>
Subject: Re: SGML and its enemies

[Rick Jelliffe]

|   Why not just bite the bullet and have SGML mark 2?
|   
|   Ordinary SGML can be frozen, and quite happily be used for the the
|   things it is currently used for.

I agree.  If people can make predictions that "something" will replace SGML
down the road if the standard doesn't change, why not just create a
successor to SGML that solve problems within the standard and formalizes it
and, in doing so, create SGML's replacement.

|   The things SGML2 should have are:
|   1) a formal (usable, complete) grammar
|   2) a formal reference parser model
|   3) a standard written either in a real language using only words that
|      appear in a known dictionary or in some algebra, but not in a mish
|      mash of | jargon giving the appearance of formality
|   4) an altered SGML declaration to handle multibyte character sets more
|      obviously.

The two things I can see would be:

1. SGML seems almost "object oriented" with it methods exchangeable.  It
   would be interesting to explore this.

2. The founding concepts of SGML (and HyTime) can be based on formal set
   theory.  It would be beneficial that SGML not only have a formal
   specification of the standard but also a formal abstract model
   (preferably based on some assemblance of mathematics).

-- 
R. Alexander Milowski
SGML Operations Manager        milor001@maroon.tc.umn.edu
Microcom Inc.                  (612) 825 - 4132
SGML Consulting -- "The SGML Solutions Experts"
</message>
<message id="<357n5f$fhd$1@mhade.inhouse.compuserve.com>" date="2988564079">
Newsgroups: comp.text.sgml
Date: 14 Sep 1994 20:41:19 UT
From: Alex Baluta <74667.2675@compuserve.com>
Message-ID: <357n5f$fhd$1@mhade.inhouse.compuserve.com>
Subject: Re: Q: commercial SGML products: advice

I am interested in comments and comparisons of available SGML (or HTML)
products.  If you could direct me to published comparisons, that would be
nice.  However, I am also interested in brief user comments.  Specifiaclly,
I am interested in the products usefulness, robustness, buginess, support,
etc.

What is the best SGML editor/document composition software?

Again, specifically, I am interested in comments about the following
products:

InContext
SoftQuad Author/Editor
SoftQuad HoTMetaL
Microstar's Near & Far

Thanks in advance for your time and comments.

please address responses to:

74667,2675 on Compuserve.

-- 
..............................................Alex Baluta
</message>
<message id="<358d2d$229@news.u.washington.edu>" date="2988586509">
Newsgroups: comp.infosystems.www.misc,alt.comp.shareware,comp.text.sgml
Date: 15 Sep 1994 02:55:09 UT
From: Douglas Brick \<dbrick@u.washington.edu>
Organization: University of Washington
Message-ID: <358d2d$229@news.u.washington.edu>
References: <344tf2$ks7@Starbase.NeoSoft.COM> <34qulm$1vt@usenet.srv.cis.pitt.edu> \<VAIZKI.94Sep10094154@lk-hp-21.hut.fi> <354dfs$n05@Starbase.NeoSoft.COM> <356uqv$kt0@Starbase.NeoSoft.COM>
Subject: Re: The maintenance of URL-endowed pages (was: The construction of FAQs) [LONG]

[Cameron Laird]

|   'Know what we need even before that?  A static analysis tool for
|   validating HREFs.

[Cameron Laird]

|   In email, John M. Troyer gently, but comprehensively, answered my questions:
|   
|   	http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html

This is a very nice program, but it tends to say that a *lot* of links that
are perfectly good are bad.  One that I like better, though it is slower,
is:

gopher://gopher.math.psu.edu/11/pub/sibley 
LinkCheck (a perl script that checks urls in an html doc to see if
they are still available)

The program also requires a few auxiliary perl scripts which can be
obtained from the same site.  Here is the complete list from the author's
"about.linkcheck" file:

Requirements:
    perl 4.0 (obviously)

    mconnect (usually /usr/etc/mconnect if you have it)
    -- or --
    mconnect.pl
        Available from the same place you get linkcheck.
        Not a substitute for mconnect generally; just enough
        for what linkcheck needs.
        mconnect.
        mconnect.pl requires perl 4.0 and the perl packages
        sock.pl and telnet.pl available from the perl archive
        at the University of Florida:
        ftp://ftp.cis.ufl.edu/pub/perl/scripts

    ftpcheck
        if you want to check ftp links.
        Available from the same place you get linkcheck
        Requires perl 4.0 and the ftplib.pl perl package
        available from U Florida perl archive as above.

Of course, even the best crafted url checker can't do much about links that
have been redirected by hand: e.g., though the following is a perfectly
good url:

ftp://ftp.u.washington.edu/public/Brick/Hot/best.html

All you'll get if you connect to it is:


                           THE "ESSENTIAL" INTERNET
                                       
   This page has moved to:
   
     http://saul2.u.washington.edu:8080/Hot/best.html
     
   Please change any links that you have in your lists to reflect this
   change
   
   Thanks much--db
   
   
     _________________________________________________________________
   
   Douglas Brick / dbrick@u.washington.edu
   
   Last modified: Tue Sep 6 20:37:12 1994

-- 
We have staked the whole future of American civilization, not upon the
power of government, far from it.  We have staked the future of all of our
political institutions upon the capacity of mankind of self-government;
upon the capacity of each and all of us to govern ourselves, to control
ourselves, to sustain ourselves according to the Ten Commandments of God.

                                James Madison (1751-1836)
</message>
<message id="<1994Sep15.041701.697@sq.sq.com>" date="2988591421">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 04:17:01 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep15.041701.697@sq.sq.com>
References: \<DMEGGINS.94Aug24203405@aix1.uottawa.ca> <94237.143509U35395@uicvm.uic.edu> \<ogawa.1128310225K@news.teleport.com>
Subject: Re: Is #CURRENT a good thing? / TEI gripe

[Arthur Ogawa]

|   I've been working lately with a DTD in which an elment, say SEC,
|   appears within its own content model, somehting like:
|   
|   \<!ELEMENT SEC (P+,SEC)>
:
|   I have always wondered what others would think of this approach,
|   because it is not by any means the same as that used in the bulk of
|   DTDs I have encountered.

I have always done it this way and still find S1, S2, S3... more than a
little strange, especially when the same people allow LIST within LIST,
instead of having LIST1, LIST2 and so forth.

I think this is often a holdover from typesetting.

In Author/Editor you get a big win, because you can easily use cut or copy
and then paste a section in at a different level, perhaps using the
Structure View.

I have encountered typesetting systems that can't handle recursion at all,
and I think that's nearer to an argument.  But a simple Balise script could
turn \<Sec> \<Sec>... into \<Sec1> \<Sec2>... or could add an attribute.

For one DTD I wrote, I had \<smallsection> elements; the theory was that
people wouldn't nest them too deeply because they had such a long name, but
I am not sure how well it worked in practice.  I think not badly, three
years on they are still using the system :-)


I suppose for completeness I have heard an argument that authors are too
stupid to cope, or are too easily confused, but I have met few writers who
can't count to 6 or so, I don't believe this.  Or I have misunderstood it.

It _is_ true that Recursion is a difficult concept for many people.  Listen
to people's difficulty with \<DL> on the HTML list, or the WWW groups: The
experimental model

	\<!Element dl - - (dt*,dd)+>

turns out to be difficult for many people to grasp, who have not otherwise
been introduced to SGML.

Anyone who has been into a kitchen and seen nested pastry cutters, biscuit
(`cookie') patterns or cake tins has a concrete model for the first few
levels of recursion, however, and actual document instances don't use
abstract recursion.

I'd say go with nested elements and have a simpler DTD.

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<358jvt$9sd@hopper.acm.org>" date="2988592657">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 04:37:37 UT
From: Waysys \<waysys@acm.org>
Organization: ACM Network Services	
Message-ID: <358jvt$9sd@hopper.acm.org>
Subject: Conversion of Pagemaker Documents

Can anyone suggest an approach or tools for converting Pagemaker documents
to SGML?
</message>
<message id="<1994Sep15.050330.4223@sq.sq.com>" date="2988594210">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 05:03:30 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep15.050330.4223@sq.sq.com>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de>
Subject: Re: SGML and its enemies

This article contains a long rant, and a proposal for a formal SGML.  Skip
to the Proposal tag if you only want that.

[Tim Bray]

|   This doesn't mean that SGML's design is the one for the future.  I've
|   never understood why, as Erik wonders, it has to belong to the
|   difficult LL(1) class of grammars,

[Joachim Schrod]

|   ???? If it would really be LL(1) it would be great, there are a lot of
|   tools to parse LL(1).

As I understand it, SGML is LL(1) because a DTD that specifies a grammar
that is not LL(1) is illegal.  However, a DTD is capable of specifying an
unrestricted LL* grammar with unbounded lookahead required.

\<rant>

Consider the following illegal content model:

\<!Element BadBoy - -
    ((A*,B?,A*,C?,A*,B,A*)*,A,B,C)
>

      1  2  3  4  5  6 7    8 9

Here, if you get
	\<BadBoy>\<A>\<A>\<A>\<B>
you don't know if the \<B> should match at (2) or (6) above.
This is `ambiguous' in SGML terms, as I understand it.

The implication of this is that an SGML parser not only has to cope with
some fairly complex syntax of its own, but also has to detect ambiguity.
Now, in this case it's pretty easy; our product (RulesBuilder) says

    mieza!lee> mkrls -a a.dtd
    Compiling a.dtd into a.rls.

    Error in Document Type Declaration at offset 55 of the input stream,
    on line 2 of the document: Ambiguous content model.
    An instance of element A could simultaneously match two or more tokens
    in the content model.

In effect, you have to have something like Yacc (only with a slightly more
powerful model, it's possible that byacc or eyacc could do it) built in
to the parser.

The same sort of thing goes for the lexical analysis as well.

And the macro expansion via entities gives a whole new meaning to `arcane',
with the end of each entity introducing an out-of-band Ee token that is or
is not allowed in various places., Did you know that you can't put comments
in a content model??  Try

    \<!Element Standard - -
	(
	    Title,
	    Scope, --* as per IEEE *--
	    Definitions,
	    Requirements, --* as per IEE p. 375 para iii *--
	    %OtherStuff;
	)
    >

And then there's the nonsense about #EMPTY elements being forbidden a close
tag, just to make sure you always have to be able to read a DTD to do
anything with a document.  If it wasn't for that, you could take a fully
expanded (no OMITTAG, no `obfuscatory entities' as the good Prof. calls
'em) instance and make an in-memory data structure without needing the DTD.
You couldn't tell if it was right or not, but often you don't care.

One solution is to avoid #EMPTY.  Another might have been to have had a
different syntax, e.g.

	<#NAME ATT="xxx">

using # (say) to mark an empty element.

Challenge: write an attribute with both ' and " inside it.  Now do it
without using entities.

OK, you can use the MSSCHAR thing (if I have named it right), which works a
bit like a \\ in most of Unix.  But as far as I can tell, the behaviour of
MSSCHAR is undefined if it should occur at the end of an entity.  It is
only defined if there is a character following it within the same entity.
[sorry I am at home & don't have the ref.]

\</rant>

\<ConnectingBit>

So I agree with Erik.

I also agree with others: In a way, it doesn't matter.  It's like Microsoft
Windows: it's technically not very appealing, but it does what a lot of
people want, and those people are not in a position to say that it's not as
good as some other system they have never heard of, or to demand Display
PostScript, TNT or whatever.  It has become a de facto standard.

If you stick close to the reference concrete syntax, SGML is also a de
facto standard, and it works pretty well.  A lot of people are using it,
one way or another -- especially when you include WWW -- and solving real
problems.

Could it be improved?  Yes.

\</ConnectingBit>

\<Proposal>

There's an article in Scientific American this month (Sept. 1994) about the
`software crisis' of unreliable software.  The article points out how the
use of formal specifications and mathematical rigour helps to make programs
that are on time, that work, that are reliable, that do what was expected
of them.

I would like to suggest defining a proper subset of SGML with the following
properties.  I'll call it SGML/F for now.

* a BNF grammar for the abstract syntax
* restrictions on content models to force regularity
* reference concrete syntax with
	NAMECASE NO (i.e. same for elements and entities)
	NAMELEN 0 (unrestricted)
	no CAPACITY
	no SHORTTAG, OMITTAG, DATATAG, SHORTREF, LINK, CONCUR, etc.
* a formally specified concrete syntax

All SGML/F documents would also be SGML documents, but the reverse would
not necessarily be true.

I can envision an SGML/FX (X=extended), with
* EMPTY elements distinguished syntactically
* a distinction between #PCDATA that can be empty and (#PCDATA)+
  (i.e. textual content required)
* a regular input syntax -- e.g. comment equivalent to whitespace
* a formal specification (e.g. in Z?  I don't know Z well enough)

but I am not proposing that, because SGML/FX documents would not also be
SGML documents, even though a `down-translation' might be possible.

Anyone for SGML/F?

\</Proposal>

\</Article>
\<Signature>

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<1994Sep15.071753.14319@falch.no>" date="2988602273">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 07:17:53 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep15.071753.14319@falch.no>
References: <358jvt$9sd@hopper.acm.org>
Subject: Re: Conversion of Pagemaker Documents

[waysys@acm.org]

|   Can anyone suggest an approach or tools for converting Pagemaker
|   documents to SGML?

How about:

1.   Export document to RTF-format, making sure the "Export format codes"
     box is checked in the Export... dialog box.

Then either:

2a.  Convert to Rainbow format using rtf2rb (available from ftp.ifi.uio.no)
     and use an SGML transformation tool (Balise, Omnimark, SGML Hammer,
     CoST) to transform from Rainbow to the DTD of your choice; or

2b.  Convert to Rainbow format as above, and then use one of the new
     interactive tagging tools (e.g. EBT's DynaTag or ArborText's
     PowerPaste) to tag according to your DTD; or

2c.  Use Avalanche's FastTAG to process the RTF file and produce SGML
     output directly using visual recognition techniques.

Which approach is best for you will depend on many factors:

- how much money can you spend on the tools?
- how much time do you want to spend mastering them?
- how consistently are your PageMaker files marked up (in particular: to
  what extent can structure be inferred directly from style names, without
  recourse to visual recognition techniques)?

There are, of course, other ways to get from PageMaker to SGML, but they
are even more convoluted.  The basic problem is that PageMaker, unlike
Ventura, QuarkXPress, FrameMaker and others, is unable to export to an
easily readable ASCII-based format without losing in-line markup.  (You can
export to ASCII with "Export format codes", but the only codes you retain
are the paragraph style names.)

For more information about the tools mentioned here, see my Whirlwind Guide
to SGML Tools, available from ftp.falch.no.

Best regards,

Steve
-- 
</(pepper)steve>                                   pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 2216 3040                                fax +47 2216 2350
                      "Life begins at 0x28"
</message>
<message id="<359a8r$qep@rs18.hrz.th-darmstadt.de>" date="2988616411">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 11:13:31 UT
From: Joachim Schrod \<schrod@iti.informatik.th-darmstadt.de>
Organization: TH Darmstadt, FG Systemprogrammierung
Message-ID: <359a8r$qep@rs18.hrz.th-darmstadt.de>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   This article contains a long rant, and a proposal for a formal SGML.

I like the proposal, btw.

|   As I understand it, SGML is LL(1) because a DTD that specifies a
|   grammar that is not LL(1) is illegal.

Looking through my bibliography archives I realized that I should have
known this myself... ;-)

@techreport{markup:brueggemann:92.1,
 author = {Anne Br{\\"u}ggemann-Klein and Derick Wood},
 title = {Unambiguous Regular Expressions and {SGML} Document Grammars},
 month = nov,
 year = 1992,
 institution = uni-ontario,
 address = csd # {, } # london-ca,
 type = tr,
 number = 337,
 library = {own/markup},
 note = {Submitted for publication},
 annote = {Available from AFA {\\tt 
    ftp.csd.uwo.ca:/pub/csd-technical-reports/337/}.
 \\js{} This report shows the relationship between traditional formal
    languages and SGML doctype specifications. In particular, it's a data
    point to show that the usage of terms in the SGML community does not
    conform to the canonical meaning in CS\\@. The report shows that one can
    decide if a given regular expression is equivalent to a valid SGML
    doctype, and that one can do this transformation. The algorithmus for
    the transformation is exponential, it is not known if this is a
    time-optimal algorithm.
 }
}

Actually, somebody posted in comp.compiler.tools.pccts (PCCTS is a compiler
toolset for languages with LL(k) grammars) that he has written a SGML
grammar.  Sad to say, this article has expired at our site, if he posts
again, I'll ask him if one can distribute this grammar.  Might be of
interest for some here.

Thanks for your correction,

	Joachim

-- 
Joachim Schrod			Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany
</message>
<message id="<JJC.94Sep15122331@jclark.jclark.com>" date="2988617011">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 11:23:31 UT
From: James Clark \<jjc@jclark.com>
Organization: None, London, England
Message-ID: \<JJC.94Sep15122331@jclark.jclark.com>
References: \<dshema-0109941701410001@kaleetan.rt.cs.boeing.com> <1994Sep2.110522.414@ittpub>
Subject: Re: Underscores and the Sgmls validating parser?

[Dave Shema]

|   My sgml data contains entities and attributes with underscores ("_").
|   I am trying to use Sgmls, derived from ARCSGML, as the validating
|   parser.  This parser does not seem to like underscores.
:
|   Short of going into the c code and changing values in the tables (what
|   we ended up doing with ARCSGML), can this parser be encouraged to
|   accept underscores?

[William D. Lindsey]

|   No.  The only way I could get underscores to be accepted by sgmls was
|   by hacking (gently) the source.
|   
|   Around line 957 of sgmldecl.c (sgmls-1.1) change 
|   from:
|   
|                  else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
|                           && c != '.' && c != '-') {
|   
|   to:
|                 else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
|                         && c != '.' && c != '-' && c != '_' ) {
|   

Unfortunately that patch makes sgmls non-conforming: it will incorrectly
process documents that use an underscore both as a name start character and
as a short reference delimiter.

The following modification to your patch allows underscore to be added only
to LCNMCHAR or UCNMCHAR.  So far as I can tell, this shouldn't cause any
problems.  Allowing it to be added to LCNMSTRT and UCNMSTRT is much more
complicated.

*** sgmldecl.c.~38~	Wed Mar 23 11:52:50 1994
--- sgmldecl.c	Thu Sep 15 12:10:40 1994
***************
*** 992,998 ****
  	       if (c < 0)
  		    bad = 1;
  	       else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
! 			&& c != '.' && c != '-') {
  		    int class = lextoke[c];
  		    if (class == SEP || class == SP || class == NMC
  			|| class == NMS || class == NU)
--- 992,999 ----
  	       if (c < 0)
  		    bad = 1;
  	       else if ((char_flags[c] & (CHAR_SIGNIFICANT | CHAR_MAGIC))
! 			&& c != '.' && c != '-'
! 			&& !(c == '_' && i >= 2)) {
  		    int class = lextoke[c];
  		    if (class == SEP || class == SP || class == NMC
  			|| class == NMS || class == NU)

James Clark
jjc@jclark.com
</message>
<message id="<359g54INNus@oasys.dt.navy.mil>" date="2988622436">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 12:53:56 UT
From: Betty Harvey \<harvey@oasys.dt.navy.mil>
Organization: Advanced Information Systems Branch, DTMB, CDNSWC
Message-ID: <359g54INNus@oasys.dt.navy.mil>
References: <354gp0INNjmi@oasys.dt.navy.mil> <1994Sep13.170905.10083@ast.saic.com>
Subject: Re: SGML and its enemies

[Bob Agnew]

|   Whoops!  I've got to be more carefull with my semantics from now on.  I
|   think that I've been inadvertently equating these two meanings
|   sometimes because I refer to the structure of the document to mean it's
|   "logical" or "usage-based" structure rather than its document structure
|   which I infer is what you mean by structure in your usage.  I guess
|   this is a result of working with Mil-spec Data Item Descriptions (DIDS)
|   in which the documnet structure reflects the logical structure.

I guess I have been working in the MIL-Specs area too long.  Historically,
in the DoD community the SGML Mil Standards (MIL-M-28001) has been equated
with structural tagging, i.e., paragraph, chapter, etc.  The IETM spec
(MIL-D-87269) was considered to be the content tagging Mil standard.
MIL-M-28001 was originally written for paper-based publishing.  However, we
are moving away from paper and into the electronic arena.  There are some
who think that SGML cannot be used to tag content.  I disagree.

|   For example, when I use a \<requirement> tag, it is clearly a case of
|   content labeling.  But since the document is a requirements document,
|   it is logically "structured" around requirements.  I'm not nitpicking
|   here.  I just perceive a semantic problem and I am looking for some
|   genius on the group to disambiguate it.  Any ideas?
|   
|   Also, you mention the difficulty of content labeling a document.  I
|   have been working on ways to content-label legacy software product
|   data.  There are some automatic taggers on the market with a modicum of
|   AI or expert rules to help identify fragments of text as candidates for
|   content labeling but they are not yet adequate for my purposes.  As an
|   example, if a paragraph contains a "shall" than it's probably a
|   candidate for a \<requirement> tag, but should it use a \<requirement
|   type=derived>, \<requirement type=implied> or a \<requirement
|   type=stated> tag?  This begins to border on natural language
|   recognition but then SGML always has that kind of implication for me.

I don't think I said it was difficult, just takes more effort.  What I mean
is that if you are tagging data to the content level, you will be using
more tags and more time will go into tagging each content specific piece of
data.  Take for instance the MIL-M-38784C DTD.  There are content tags
included in this DTD.  There are approximately 170 tags.  are approximately
170 different ELEMENTS.  Now look at the HTML DTD which is based strictly
on structure and format, there are approximately 50 tags.  If the HTML DTD
was difficult to tag it wouldn't be as popular as it currently is.  Every
Website (which is really difficult to estimate) has one or more people who
are creating HTML documents and most of them don't even realize that it is
SGML.  Most people that I talk to who are creating HTML documents and you
ask them about SGML get a glazed look in their eye and say 'What's that!".
I guarantee the 'common folk' won't be tagging data to our 38784C DTD.

					Betty

-- 
Betty Harvey  \<harvey@oasys.dt.navy.mil>     | David Taylor Model Basin
Advanced Information Systems Branch          | Carderock Division
Code 183                                     | Naval Surface Warfare
Bethesda, Md.  20084-5000                    |   Center
                                             | DTMB,CD,NSWC   
URL:  http://navysgml.dt.navy.mil/betty.html |          
</message>
<message id="<9409151355.AA03364@lars.texcel.no>" date="2988626102">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 13:55:02 UT
From: Johan Henrikson \<johan@texcel.no>
Message-ID: <9409151355.AA03364@lars.texcel.no>
Subject: FOSI Course

Texcel AB in Sweden will be holding a FOSI course at their office in
Stockholm.

The course will be 3 days between Tuesday 27 and Thursday 29 September.

The course includes topics such as:

Page layout: margins, columns, text width, running headers and footers

General formatting issues: point size, indent, justification, etc

Cross referencing, table of contents and source document attributes
Formatting analysis and much more.


If you are interested in more details you are welcome to contact:

Johanna Kuhn
Texcel AB
Storsaetragraend 12
127 39 Skaerholmen
Sweden

Tel: +46 8 708 80 15
Fax: +46 8 708 80 16
email: johanna@texcel.no
</message>
<message id="<779638731snz@light.demon.co.uk>" date="2988627531">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 14:18:51 UT
From: Richard Light \<Richard@light.demon.co.uk>
Message-ID: <779638731snz@light.demon.co.uk>
Subject: Email address for SGML Users' Group

The SGML Users' Group can now be contacted by email.  The address is:

    dpsl!gew@visionware.co.uk

(This actually gets you through to the indefatigable Gaynor West, who can
deal with membership subscriptions, orders for publications, etc.)

If you want to contact one of the officers of the Group, you have the
choice of:

    dpsl!plg@visionware.co.uk (Pam Gennusa, Chair)
    richard@light.demon.co.uk (Richard Light, Treasurer)

Richard Light
</message>
<message id="<1994Sep15.183206.20463@ast.saic.com>" date="2988642726">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 18:32:06 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep15.183206.20463@ast.saic.com>
References: <1994Sep14.100544.27199@sqwest.wimsey.bc.ca>
Subject: Re: CONCUR usefulness existence proof

[Marcy Thompson]

|   Okay, I'll accept anything my dissertation advisor would have accepted:
|   
|   -- an example
|   
|   -- a rigourous demonstration that one can always construct an example
|   
|   -- a correct reductio ad absurdum argument (I was taught Real Analysis
|      by Bishop, so this can be taken as an actual concession)
|   
|   Now, can someone demonstrate to me that this claim (that HyTime 
|   validation of DTDs is incomputable in some cases) is true?
|   

Does validation include verification of proper run-time interpretation of
attributes and PIs?

(This is a serious question, at least I intended it to be.)

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<m0qlLz0-000C9wC@newman>" date="2988644160">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 18:56:00 UT
From: Matt Timmermans \<mtimmerm@newman.microstar.com>
Message-ID: \<m0qlLz0-000C9wC@newman>
References: <1994Sep12.155128.4112@sqwest.wimsey.bc.ca> <1994Sep12.175005.20847@ast.saic.com> <1994Sep14.100544.27199@sqwest.wimsey.bc.ca>
Subject: Re: CONCUR usefulness existence proof

[Marcy Thompson]

|   Now, can someone demonstrate to me that this claim (that HyTime
|   validation of DTDs is incomputable in some cases) is true?

There are two types of HyTime conformance.

1)  Strictly speaking, in order for a DTD to be HyTime conforming, it must
    only be _possible_ to create an instance of that DTD which is HyTime
    conforming.

2)  According to the definition assumed in this thread (I think), a DTD is
    HyTime conforming if it is _impossible_ to create an instance of that
    DTD which is _not_ HyTime conforming.

Now, there are two types of DTDs which must be considered:

A)  Your DTD has #FIXED values for the HyTime attribute on all of the 
    relevant elements.

B)  It doesn't

If A), then computing 1) thunks down to a series of regular expression
intersection operations, and computing 2) thunks down to a series of
regular expression differences.  Both of these are computable in
exponential time.

If B), then your DTD does _not_ conform according to 2), because someone
could always enter "Cabbage" in the HyTime attribute.

The case remaining is computing 1) if B).

I don't know of anyone who has proven this to be incomputable, but it
_feels_ incomputable.  That doesn't mean that this kind of validation can't
be done, however -- only that it can't be done perfectly.  There are
reasonably robust ways to _prove_ that a DTD is conforming, but they
haven't been proven to work for _all_ conforming DTDs.

If you are concerned about this type of safety, then simply consider a DTD
to be non-conforming if you can't prove that it conforms.  Normally, if you
can't prove that a DTD conforms, then it is unlikely that a human author
would be able to create a conforming instance anyway.

\</Matt>

-- 
Matt Timmermans               | Phone:  +1 613 727-5696
Microstar Software Ltd.       | Fax:    +1 613 727-9491
34 Colonnade Rd. North        | BBS:    +1 613 727-5272
Nepean Ontario CANADA K2E-7J6 | E-mail: mtimmerm@microstar.com
</message>
<message id="<vanyel.779657984@camelot>" date="2988646863">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 19:41:03 UT
From: Alan Williams \<vanyel@camelot.bradley.edu>
Organization: Bradley University
Message-ID: \<vanyel.779657984@camelot>
Subject: SGML editor for Japanese, Chinese, Arabic?

Does anyone know of the existence of an SGML editor that is capable of
displaying Chinese, Japanese, and/or Arabic characters?  And while I'm on
the line of thought, does anyone know if ISO standard character sets exist
for those languages?

Thanks for your help.

-- 
____  Alan Williams               \\ "If you prick us, do we not bleed?  If you
\\  /  vanyel@camelot.bradley.edu   \\ tickle us, do we not laugh?  If you poison
 \\/   awilliam@heartland.bradley.edu\\ us, do we not die?"
                                     \\ --Shakespeare, _The Merchant of Venice_
</message>
<message id="<1994Sep15.203350.18210@sqwest.wimsey.bc.ca>" date="2988650030">
Newsgroups: comp.text.sgml
Date: 15 Sep 1994 20:33:50 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep15.203350.18210@sqwest.wimsey.bc.ca>
References: <1994Sep14.100544.27199@sqwest.wimsey.bc.ca> <1994Sep15.183206.20463@ast.saic.com>
Subject: Re: CONCUR usefulness existence proof

[Marcy Thompson]

|   Okay, I'll accept anything my dissertation advisor would have accepted:
|   
|   -- an example
|   
|   -- a rigourous demonstration that one can always construct an example
|   
|   -- a correct reductio ad absurdum argument (I was taught Real Analysis
|      by Bishop, so this can be taken as an actual concession)
|   
|   Now, can someone demonstrate to me that this claim (that HyTime 
|   validation of DTDs is incomputable in some cases) is true?
|   

[Bob Agnew]

|   Does validation include verification of proper run-time interpretation
|   of attributes and PIs?
|   
|   (This is a serious question, at least I intended it to be.)

I dunno.  Part of the reason I asked the question is because I suspect that
the people who say it don't actually have a clear idea of what they mean
when they say it.  I hope that in answering this question, they will not
only clear up my personal confusion but contribute to the general
understanding of what's what with HyTime.

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<35bcqq$alo@news.delphi.com>" date="2988668520">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 01:42:00 UT
From: Jeffrey McArthur \<j_mcarthur@BIX.com>
Organization: ATLIS Publishing
Message-ID: <35bcqq$alo@news.delphi.com>
References: \<DMEGGINS.94Aug24203405@aix1.uottawa.ca> <94237.143509U35395@uicvm.uic.edu> \<ogawa.1128310225K@news.teleport.com> <1994Sep15.041701.697@sq.sq.com>
Subject: Re: Is #CURRENT a good thing? / TEI gripe

[Liam R. E. Quin]

|   I have encountered typesetting systems that can't handle recursion at
|   all, and I think that's nearer to an argument.  But a simple Balise
|   script could turn \<Sec> \<Sec>... into \<Sec1> \<Sec2>... or could add an
|   attribute.

Our experience has been that some of our keyboarding staff gets a bit lost
in the depths of recursive structures.  We have found that adding a few
comments the SGML helps a lot.  Instead of just \<Sec>\<Sec> we would do
something like:

\<Sec>\<!-- Sec Level 1 -->
\<Sec>\<!-- Sec Level 2 -->

This helps a lot, particularly if you are dealing with data that can go
several levels deep (we have one that routinely goes 5 levels deep) and can
the data can stay at any one level for several hundred lines.

Comments help a lot.
-- 
    Jeffrey M\\kern-.05em\\raise.5ex\\hbox{\\b c}\\kern-.05emArthur
    a.k.a. Jeffrey McArthur          email: j_mcarthur@bix.com
    phone: +1 301 210 6655
    fax:   +1 301 210 4999
    home:  +1 410 290 6935

The opinions express are mine.  They do not reflect the opinions of my
employer.  My access to the Internet is not paid for by my employer.
</message>
<message id="<35atv0$ed9@hamblin.math.byu.edu>" date="2988669344">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 01:55:44 UT
From: Bradley D Stoddard \<brad@bert.cs.byu.edu>
Organization: Brigham Young University, Utah
Message-ID: <35atv0$ed9@hamblin.math.byu.edu>
Subject: SGML to RTF conversion tools?

I am researching the creation of conversion tools to take SGML to RTF.  Has
this type of work already been done?  If it has, would someone tell me who
has done it.

At this time I am working on a thesis proposal about creating a compiler
that would convert from SGML to RTF (and later other formats.)  My idea is
that I can use the output of the SGMLS parser as input into my compiler.
My hope is that using the output from SGMLS will reduce the complexity of
my compiler.

I have found a lot of information about SGML, but I haven't found much
about creating compilers for SGML.  I have the source code for the SGMLS
parser (although I have not taken a hard look at it), but the only other
information I have found has been in this news group.

This isn't simply an educational execise.  I have need at my work to
convert SGML to RTF so I can create MS Windows help files.

I have written a C compiler (very limited) before, so compiler creation is
not new to me.  My main interest at this time is understanding what has
already been done so I can learn from it and then further the subject.

Thanks for your help.

Brad Stoddard
CS graduate student

brad@bert.cs.bye.edu
</message>
<message id="<1994Sep16.092322.17139@edf.fr>" date="2988696202">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 09:23:22 UT
From: Daniel Glazman \<daniel.glazman@der.edf.fr>
Organization: EDF Direction des Etudes et Recherches
Message-ID: <1994Sep16.092322.17139@edf.fr>
References: <34poef$34g@netnews.upenn.edu> <1994Sep13.081101.21054@edf.fr> \<truly.467.000F8B4F@lunemere.com>
Subject: Re: Basics

[Truly Donovan]

|   What are you calling "GML"?  Conceptually, the only difference between
|   GML and SGML is the absence of the formal discipline of a DTD.

GML offers formatting tags like line breaks, page breaks, indentation
control, ...  In GML, the content and its composed image are intimately
linked.  Not in SGML.

\</Daniel>
</message>
<message id="<Cw7xLq.CAE@actrix.gen.nz>" date="2988700765">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 10:39:25 UT
From: Gary Houston \<ghouston@actrix.gen.nz>
Organization: Actrix Information Exchange
Message-ID: \<Cw7xLq.CAE@actrix.gen.nz>
References: \<RAMAN.94Sep11164345@arctic.crl.dec.com> \<D> <1994Sep13.162422.2076@cs.nott.ac.uk>
Subject: Re: SGML/HTML: An obfuscated markup language?

[Martijn Koster]

|   As there seems little active effort by the SGML people to make SGML
|   simple for Web users (with the exception of the HotMetal people), and
|   after a number of requests from Web users I finally give in.  I have
|   written a document on how to setup psgml and sgmls, on
|
|       \<URL:http://web.nexor.co.uk/mak/doc/html/sgml-lib/html-sgml.html>
|
|   Included in it is a tar archive of my /usr/local/lib/sgml, which is
|   most the confusion I encountered.

Have a look also at gf:

   URL:ftp://ftp.th-darmstadt.de/pub/text/sgml/misc/gf-0.44.tar.gz

This now uses the draft HTML-2.0 DTD and may make validation easier.

Gary 
</message>
<message id="<122535@cup.portal.com>" date="2988701700">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 10:55:00 UT
From: David Dave Nuttall \<dnuttall@cup.portal.com>
Organization: The Portal System (TM)
Message-ID: <122535@cup.portal.com>
Subject: SGML Document Analyst(s)

At least one and possibly two positions for persons skilled in the analysis
of existing documents and the creation of DTDs, FOSIs and style sheets
related to CD-ROM equivalents are needed for work in Oklahoma City, OK.
(Actually Norman, OK.)  Length of the assignment is 6-8 months full time.
Start 11/1/94 or possibly sooner.  (no pun intended!)

I would consider it a plus if applicants have UNIX GUI design/ programming
experience, and/or ArborText, Near & Far, or EBT experience.  This is a
chance to participate in a very high end SGML "conversion" project.

For immediate consideration, E-mail detailed resume to:
\<dnuttall@centech.com> or \<dave@csoftec.csf.com>

FAX or regular mail to me at:
+1 210 736 9663
Dave Nuttall
Century Technologies, Inc. (CENTECH)
4335 Piedras Drive West, Ste 130
San Antonio, TX 78228

Please:  no phone calls until I make contact with you!
(No admin assistant at the moment so your patience is appreciated!)
Independent contractor types are OK with me.  Ability to
perform is what counts.
</message>
<message id="<35c2mc$rab@usenet.srv.cis.pitt.edu>" date="2988706956">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 12:22:36 UT
From: "David J. Birnbaum" \<djbpitt+@pitt.edu>
Organization: University of Pittsburgh
Message-ID: <35c2mc$rab@usenet.srv.cis.pitt.edu>
References: \<vanyel.779657984@camelot>
Subject: restricting the data content of elements

Is there any way to restrict the data content of an SGML element?  For
example, if I want to create an element \<foo> that can only contain the
character "a", is there a way to write this restriction into a DTD?
Suppose I want to create an element \<foo> that can only contain the entity
"%bar;".  Is an entity legal as part of a content description?

I have heard conflicting reports about what is legal in the content of a
element declaration, and I don't know my way around the standard well
enough to be able to find an answer.  Can someone point me in the right
direction?

Thanks,

David
-- 
David J. Birnbaum                 djbpitt+@pitt.edu
The Royal York Apartments, #802   Voice: +1 412 687 4653
3955 Bigelow Boulevard            Fax:   +1 412 624 9714
Pittsburgh, PA 15213 USA
</message>
<message id="<1994Sep16.122341.8482@calspan.com>" date="2988707021">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 12:23:41 UT
From: Matthew Stringer \<stringer@calspan.com>
Organization: Calspan Advanced Technology Center
Message-ID: <1994Sep16.122341.8482@calspan.com>
Subject: Attempting to aquire DTDs

I am presently working on the evaluation of a number of SGML and DTD
editors and will soon be evaluating other SGML utilities.  I need to know
of a location, preferably on the net and preferably by FTP from which I
could aquire a number of DTDs to help provide me with an objective test
case.  Response by email to the business address below or to this group
would be appreciated.  I would be happy to trade notes with anyone
currently engaged in a similar software search.  Thank you.

-- 
*------------------------------------------------------------------------------
*- Matthew S. Stringer   Software Engineer   Calspan Advanced Technology Center
*- Business related email - 	stringer@calspan.com   voice-(716)632-7500x5119
*- Personal email - 		stringer@cs.buffalo.edu       fax-(716)631-6722
*- The opinions stated here are my own and do not reflect on my employer.
</message>
<message id="<1994Sep16.154404.4457@ast.saic.com>" date="2988719044">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 15:44:04 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep16.154404.4457@ast.saic.com>
References: <359g54INNus@oasys.dt.navy.mil>
Subject: Re: SGML and its enemies

[Betty Harvey]

|   I don't think I said it was difficult, just takes more effort.

OK -- But I thought you were talking about tags that were much more
advanced than the simple tags in 38784C.  About the hardest contextual
element to recognize in 38784C is a step which might look like a list item
or a paragraph.  On the other hand, in tagging a SRS, how does the "fast
tagger" distinguish a Safety-Requirement from a Human-Requirement?

|   mean is that if you are tagging data to the content level, you will be
|   using more tags and more time will go into tagging each content
|   specific piece of data.  Take for instance the MIL-M-38784C DTD.  There
|   are content tags included in this DTD.  There are approximately 170
|   tags.

I count 149 with my list elements tool, but that may have an error in it.

|   are approximately 170 different ELEMENTS.  Now look at the HTML dtd
|   which is based strictly on structure and format, there are
|   approximately 50 tags.  If the HTML DTD was difficult to tag it
|   wouldn't be as popular as it currently is.  Every Website (which is
|   really difficult to estimate) has one or more people who are creating
|   HTML documents and most of them don't even realize that it is SGML.
|   Most people that I talk to who are creating HTML documents and you ask
|   them about SGML get a glazed look in their eye and say "What's that!".
|   I guarantee the "common folk" won't be tagging data to our 38784C dtd.

Well, I just gave a 300 page Technical Manual with 100+ drawings to a tech
writer who had never seen a DTD before and had never heard of SGML and it
took him about 40 hours to tag the document with ArborText without any but
the most cursory instructions.  I admit that he was a very seasoned tech
writer and he was familiar with 38784C paper documents.  If the tags are in
the domain expertise of the document author, then I don't think its a
problem.

But in the previous discussion I was talking about software which
automatically tags legacy software product documentation with little or no
human intervention.Furthermore, I had to add 86 tags to the 38784C DTD to
get the software requirements spec DTD, about 25% of which were trivial
document info.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep16.154720.5080@ast.saic.com>" date="2988719240">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 15:47:20 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep16.154720.5080@ast.saic.com>
References: <1994Sep15.050330.4223@sq.sq.com>
Subject: Re: SGML and its enemies

[Tim Bray]

|   This doesn't mean that SGML's design is the one for the future.  I've
|   never understood why, as Erik wonders, it has to belong to the
|   difficult LL(1) class of grammars,

[Joachim Schrod]

|   ????  If it would really be LL(1) it would be great, there are a lot of
|   tools to parse LL(1).

[Liam R. E. Quin]

|   As I understand it, SGML is LL(1) because a DTD that specifies a
|   grammar that is not LL(1) is illegal.  However, a DTD is capable of
|   specifying an unrestricted LL* grammar with unbounded lookahead
|   required.

I thought it said that it had to be LL(0), i.e., no look ahead is ever
required to determine the next state.
</message>
<message id="<jfritchCw8DoL.D27@netcom.com>" date="2988721598">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 16:26:38 UT
From: Jeanne Fritch \<jfritch@netcom.com>
Message-ID: \<jfritchCw8DoL.D27@netcom.com>
Keywords: Oil & Gas, SGML
Summary: Looking for information
Subject: SGML in the Oil & Gas Industry?

I'm looking for any information about SGML in the Oil and Gas Industry.  Is
there an existing DTD?  Are there any case studies available?

Any information is greatly appreciated.  Please reply to my email address.

-- 
Jeanne Fritch
JF Consulting
Boulder, CO
email: jfritch@netcom.com
</message>
<message id="<35ch8u$oqn@elna.ethz.ch>" date="2988721886">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 16:31:26 UT
From: Wiedmer Hans Ulrich \<wiedmer@iwf.mabp.ethz.ch>
Organization: Inst. for Machine Tools and Manufacturing, ETHZ
Message-ID: <35ch8u$oqn@elna.ethz.ch>
Subject: greek letters in ISO 12083

Hi all,

can anybody tell me how to mark up greek letters using the ISO 12083 DTD
(and only the ISO 8859-1 latin 1 alphabet)?  E.g., the character Rho, or
another one?

Any hints are greatly appreciated.

Kind regards
John

-- 
John (Hans Ulrich) Wiedmer, CAD/CAM Group,            Phone : +41 1 632 4819
Swiss Federal Institute of Technology (ETH)           Fax:    +41 1 632 1159
Laboratory of Machine Tools and Manufacturing (IWF), Leonhardstr. 27
CH-8092 Zurich, Switzerland                 e-mail: wiedmer@iwf.mabp.ethz.ch
</message>
<message id="<1994Sep16.173206.19843@midway.uchicago.edu>" date="2988725526">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 17:32:06 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep16.173206.19843@midway.uchicago.edu>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca>
Subject: "was Re: SGML and its enemies" \<Tractability>

[Tim Bray]

|   This doesn't mean that SGML's design is the one for the future.  I've
|   never understood why, as Erik wonders, it has to belong to the
|   difficult LL(1) class of grammars, or why the language and metalanguage
|   have to be different, or why it has to look so much like OS JCL, or why
|   minimization mechanisms have to exist, and a whole bunch of other
|   things.  And some very smart/determined people have tried to convince
|   me.

Erik's point is that the people who designed SGML to start with simply did
not know enough about the design of formal languages to guarantee
computational tractability.  There is also a lot of government money behind
SGML now, which is probably what is perpetuating the language -- since no
one has ever even heard of SGML in the popular market.

SGML was a great idea in some respects, and a lousy one in others.  It will
linger on in various subsets (HTML, TEI, and so on).  Let's hope that the
designers of these subsets will have the foresight to take the findings of
modern automaton theory into account, and come up with something we can all
actually use and implement with a minimum of fuss.

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<1994Sep16.173802.20269@midway.uchicago.edu>" date="2988725882">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 17:38:02 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep16.173802.20269@midway.uchicago.edu>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <19940910.4934@naggum.no>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   ...and _this_ is the tragedy that I'm trying to prevent by making SGML
|   palatable to the computer scientists and system designers, and to make
|   it possible to use existing tools to build SGML systems, rather than
|   having to build them from scratch because SGML is gratuitously
|   "different"

But it isn't just "different."  It is comparatively intractable.  My
question is this: Is this intractability necessary?  Is there something
about the system that required straying from the LR(1) standard -- from the
sort of language that is guaranteed to be parsable in nearly linear time
using standard parser generators?

Or did a bunch of bozos who had never really thought much about the theory
of parsing design the language?  Did the question ever arise in the initial
stages?

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<1994Sep16.174154.20566@midway.uchicago.edu>" date="2988726114">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 17:41:54 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep16.174154.20566@midway.uchicago.edu>
References: <19940910.4934@naggum.no> \<Cw0sFw.ExF@undergrad.math.uwaterloo.ca> <3543bfINN1cs@oasys.dt.navy.mil>
Subject: Re: SGML and its enemies

[Betty Harvey]

|   I vowed I wouldn't enter any of these esoteric arguments about the the
|   validity, rightness/wrongness, of SGML, but I just can't help myself
|   sometimes.  One area that I see everyone is forgetting about SGML is it
|   can not only identify format or structure (if this is what you want to
|   do), it can also identify content of the data.

Meaningless point.  The identification of content is more a philosophy of
markup than a formal specification.  It's not as if macros and style sheets
in many commerical wordprocessors don't do essentially this....

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<19940916T182000Z.erik@naggum.no>" date="2988728400">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 18:20:00 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940916T182000Z.erik@naggum.no>
References: <1994Sep15.050330.4223@sq.sq.com> <1994Sep16.154720.5080@ast.saic.com>
Subject: Re: SGML and its enemies

[Bob Agnew]

|   I thought it said that it had to be LL(0), i.e. no look ahead is ever
|   required to determine the next state.

LL parsing engines (usually "recursive descent parsers") do not have any
memory they can use to determine what to do, so they must look ahead at
least one token.  LL(0) grammars would describe only the one language.

LR parsing engines have a stack on which an unbounded number of items can
be pushed.  this is called "shift".  sooner or later a "reduce" must occur,
at which point any number of stack elements are reduced to a production,
which is then pushed back for more "reduce" operations.  the reduce
operation is the "matching" between tokens and productions.  shift occurs
if no match can be made yet.  if the reduce operation can be performed
without looking at anything besides the stack, and reduces all elements
that it looks at to a production, it is LR(0).  if the reduce operation
cannot determine what to do solely on the basis of the tokens on the stack,
but must consider "future" tokens that are not themselves part of the
reduction, then the grammar becomes LR(k), for a positive, non-zero k.

LL parsers are most often code-driven (that is, state is known by the
function call stack), while LR parsers are most often table-driven.

we see trivially that LL(1) is a subset of LR(0), since the look-ahead in
LL(1) can be pushed on the stack, and immediately reduced.  thus one may
speak of an LR(0) with a bounded stack depth, but this is not usually very
interesting.

the terminology in ISO 8879 that has to do with parsing comes from a world
unrelated to the established practices of computer science.  as almost all
reinventions of wheels, this particular wheel is not particularly round.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<35coomINN8st@oasys.dt.navy.mil>" date="2988729558">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 18:39:18 UT
From: Betty Harvey \<harvey@oasys.dt.navy.mil>
Organization: Advanced Information Systems Branch, DTMB, CDNSWC
Message-ID: <35coomINN8st@oasys.dt.navy.mil>
References: <3524vs$re6@news.manassas.ibm.com>
Keywords: tags, math
Subject: Re: tag names and math syntax

[J. VanHorne]

|   2.  Is there any standard or spec for SGML formatting of mathematical
|       text?  As in question 1, will the formatting tags be NAMED?  If so,
|       where can I obtain a copy of this standard/spec?

There are two math tagging standards that I am know about.  One is the
MathPac tags which are used in CALS.  The other is the AAP Math tags.
Again, the Don Gignac from David Taylor Model Basin has written "A User's
Guide to MIL-M-28001B SGML Declarations for Mathemetical Formulae".  I am
not aware of any SGML publishing system that uses the MathPac tags.

				Betty

Navy DTD/FOSI Repository under WWW:  http://navysgml.dt.navy.mil/sgml.html
                      (Still Under Construction)

-- 
Betty Harvey  \<harvey@oasys.dt.navy.mil>     | David Taylor Model Basin
Advanced Information Systems Branch          | Carderock Division
Code 183                                     | Naval Surface Warfare
Bethesda, Md.  20084-5000                    |   Center
                                             | DTMB,CD,NSWC   
URL:  http://navysgml.dt.navy.mil/betty.html |          

</message>
<message id="<9409161912.AA27426@source.asset.com>" date="2988731555">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 19:12:35 UT
From: "Claude L. Bullard" \<bullardc@source.asset.com>
Message-ID: <9409161912.AA27426@source.asset.com>
References: <3524vs$re6@news.manassas.ibm.com>
Keywords: tags, math
Subject: Re: tag names and math syntax

[J. VanHorne]

|   1.  From looking at the tag names associated with various flavors of
|       SGML, it seems to me that the flexibility of SGML is a disadvantage
|       in this area because every DTD can name the paragraph tags in a
|       different way.  When SGML standards are invented, WHY can't the
|       standard give a NAME to the paragraph and emphasis tags -- it would
|       make working with SGML a _lot_ less confusing -- after all, there
|       are types of paragraph tags that we all use, such as body
|       paragraphs, numbered lists, ordered lists, etc.  The DTD could just
|       set up the ATTRIBUTES of these paragraphs, such as numbering style,
|       circumstances of use, etc.....
:
|   Sorry if question 1 seems to have a whining tone, but I truly believe
|   this situation is holding back acceptance of SGML on a wider scale.
|   SGML seems like a GREAT IDEA......until you work with some of the tools
|   and practical implementations of it.  There _is_ such a thing as _too_
|   much flexibility!  Also, speaking as a programmer, this flexibility
|   makes it that much harder to create publishing tools for working with
|   SGML.

WARNING: Tis a rant that follows.  Busy readers need not consume.

Whining on this subject is OK.  It's safe to say that a lot of us who are
SGMLists have heard this particular complaint only a few million times.
The answer and the problem are the same.  If you want or need flexibility,
you accept responsibility and effort.  So much of how one perceives SGML
depends on which role one plays in its practice.  I have a programmer
manager actor at our company who always points to this construct:

\<!ELEMENT foo - - (#PCDATA | anything)* >

and says that any language that allows that should be banned from the
planet.  When I point out to him that its better practice to

\<!ELEMENT foo - - (leaf | anything)*>
\<!ELEMENT leaf - - (#PCDATA) >

he smiles and says that this kind of practice proves his point.  He thinks
that SGML would be better off without DTDs too because they are just too
hard for the average author to understand.  Where we can find this mythical
author, he never tells me.  They are all using WORD he says and won't
bother with SGML until it is simpler and probably not even then.

We must, he says, leave the specification and implementation of
applications to professionals who have made a career out of protecting us
from our lack of knowledge about computers and have advanced their careers
to the enlightened status of managers by capturing and keeping author's
work in applications that our company owns.  An unfair competitive
advantage, he tells me, is his fondest dream.

Bullfeathers!! (as politely as I can say it.)  I may not be writing
spreadsheet software, but writing a DTD ain't that hard.  Writing a parser
for one is another issue, but since I can get one off of the Internet, the
issue is a non-starter.  So he will have to look elsewhere in his quest for
power, money, glory and a parking space next to the front door.

I confess: I have a degree in English and Music.  I have programmed before
and will again but that is seldom my job.  Put another way, I am a USER of
SGML, not a systems developer.  I commiserate with the problems of the
developers and attempt wherever possible not to use the features of SGML
that make them cry at night.  If the system can't do it, my requirements
are moot issues anyway.  Fortunately, SGML allows me to use the features I
need most of which are supported by any serious SGML system and to ignore
the rest.  I can get the work done in most cases without these features, so
the idea of a practical subset of SGML is and always has been a reality for
me.  Why would I need a formal definition to help me there?  Explain to me
in simple terms why others do.  I know about the holes in SGML just like I
know about the holes in my pockets.  I don't put the keys to my car in
them.

I was a technical writer for many years and learned SGML because DoD paid
for that education.  It did so when in the Eighties it was discovered that
the WYSIWYG products of that time could not reproduce the complex formats
of minimal technical manuals.  We discovered, to our chagrin, that we could
not even move our data from one of these products to another.  It was
discovered that in the name of efficiency and "leveraging the market", the
implementors of these systems had made all of the "important decisions" for
us so we shouldn't burden our earnest but undereducated humanities brains
with these *decisions*.

OOPS!  Their implementations couldn't do the job.  They discovered that a
line printer paradigm just wasn't adequate to handle the myriad
complexities of documents and that the editors with whom they consulted had
only shallow knowledge.  Editors know how to spell and how to layout a
page, but those are the last phase of technical writing and shouldn't drive
the requirements for the other 98 per cent of the task.  Calling data a
\<para> is just one way to name it and not very conducive to searching by
data type.  (Boltzman's Law: S=KlogW: interpreted in loose terms for
humans, the meaningfulness of a term is inversely proportional to the
number of objects which it identifies.  If you want to explore that
concept, get a book on chaos or complexity theory, or look up Claude
Shannon, or The Tao of Objects.)

I may not (don't, no doubt) understand all of the subtleties of compiler
design, parsing, binding to byte sizes, floating point processor
instructions, etc.  Many of the complaints levied against SGML's formal
definition are justified and should be answered.  For that purpose,
Dr. James Mason and Erik Naggum and users like myself, ask, no beg, that if
you do, please participate in the ISO standards work.

But the point at which we are asked or commanded to use one set of tag
names for all applications, we will have abandoned everything that SGML has
achieved to date.  The CALS program tried that.  So far after several years
and thousands of hours of honest if politically constrained effort, they
are just now beginning to actually construct an online registry that allows
one to access the results of that effort.

And any contract administrator at anytime can waive the requirements.

Courtesy of HyTime, we have architectural forms that give us a technique
for declaring that an "element type" is an instance of a "class".  That's
loose, but close enough for discussion.  However, if you don't have a way
to process "instances" of that "class" you aren't much better off than you
were with a tag name that your software can recognize.  I'm told that Steve
DeRose, Lynne Price and others are working on ways to better "objectify"
SGML.  They are brilliant people and will succeed.  Meanwhile, I reserve
the right to keep on writing my own DTDs to do the job I need to do when I
need to do it according to my best decisions about what the job is and how
it should be done.

I enjoy that freedom because of SGML, and though they are certainly
expensive, some very talented and dedicated companies provide software
systems that guarantee I can keep on enjoying it.  If I choose to use any
of the less expensive and less capable systems that hardwire the classes
and methods to the tag names, I'm free to do that too.  Right now, I would
much rather navigate the Internet using a Mosaic browser than try to learn
the complexities of the other methods.  Not that I can't learn them, but
the time involved is expensive, so any application despite my objections,
that satisfies my need as I define it, suits my energy budget.  Of course,
I can't use my DTDs for that application, but I can read the information
others choose to put in that form.

Other approaches and applications will be developed as soon as they are
profitable.  It is SGML that makes it possible to choose among them and not
lose the very freedom I preserve at any cost.  Yes, WWW hardwires the tags
of HTML for some reason.  In so doing, it's applications are cut off,
except by translation, from the other large bodies of SGML data.  How is
that defended?  Usually with unverified numbers like "20 million users" and
by the fact that they are licensing the code.  Money talks.  |-)

Claims that "the Web is the largest application of SGML and the only
application that ever made people notice SGML" are pretty thin.  Check out
the huge libraries of law or automotive design in SGML.  WWW is the
application that induced large corporations to violate the netiquette about
not using the Internet to post advertisements.  That development was
inevitable.  A giant unregulated broadcast network only needs ease of
access to make it attractive to seekers of competitive advantage.  CBers
discovered that years ago, but the government steps in when they use it to
broadcast the latest hit from the country charts.

Who regulates the Web?
Nobody.
Who regulates SGML?
ISO.
Who regulates my SGML applications?
Me and my customer.

That's a good deal.  Don't let it go.

The WebMakers have asked for input to their work so I cheer them on because
I'm not technical enough to help.  I'm not against the Web.  It's a fine
SGML application.  I'm against false advertising and hype.  It does a
disservice to the SGML community and to the developers of the software that
supports the Web.

But we are talking about containers of information.

This is the nature of information: it exists at a place in space and time
and for a point of view that modifies the containers into which one puts
it.  Because of that, SGML is, at this time and place and IMHO, a way we
can define the containers and enable others to test our definitions.  Can
the method by which we communicate our intent be improved?  Certainly.  Can
the computability of that expression of intent be improved?  A lot of very
talented and serious people are working on that and you have been invited
by the convenor of that group, Dr. James Mason, to join them.  Will the
results of that effort come quickly?  Not if past efforts are an indicator.
Standardization is a multi-dimensional effort, moves like ice, takes the
form of whatever terrain it traverses, and melts at the first sign of heat.
That is, it is a human effort.  Does it succeed?  Yes.  The need is
recognized and ambitious talent steps up to the challenge.

To paraphrase an old bad movie, "Humans are at their best when things are
at their worst.  That is what makes them beautiful and unique."  To quote
Eliot Kimber, "SGML is for Humans."  The software and the hardware and the
developers will just have to cope.

Len Bullard
</message>
<message id="<datCw8Lr8.3xu@netcom.com>" date="2988732067">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 19:21:07 UT
From: Daniel Tauber \<dat@netcom.com>
Message-ID: \<datCw8Lr8.3xu@netcom.com>
Keywords: SGML Word for Windows Microsoft
Summary: I saw a Microsoft SGML demo
Subject: Microsoft SGML

I just saw a demo of Microsoft's SGML add-in for Word for Windows 6.0 at
Seybold San Francisco and thought readers of this group would be interested
in what I saw.  Let me preface this by saying that Microsoft was showing a
early beta of the software and are planning on making many user interface
improvements before it is available.

Microsoft SGML consists of two pieces of software.  One allows the SGML
administrator to map a DTD to a set of Word for Windows styles.  In
addition to mapping elements to styles, you can also map elements based on
the document content.  So if you have paragraphs like:

Out of My Later Years: Albert Einstein.

that has a Word style assigned to it, you can map everything up to the
colon to one element and everything after the colon to another element.
For this to work the author has to put in the correct punctuation in
addition to assigning style names.

Word does not check the document while the author is entering it against
the DTD.  The document is checked against the DTD when you convert it to
SGML.

MS will sell it for around $495.  Available by end of year.

You will not need copies of it on every machine.  You can create templates
with the correct style names and distribute those to your authors.  Only
the person doing the Word to SGML translating needs the full package.
</message>
<message id="<19940916T205025Z.erik@naggum.no>" date="2988737425">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 20:50:25 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940916T205025Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   As I understand it, SGML is LL(1) because a DTD that specifies a
|   grammar that is not LL(1) is illegal.  However, a DTD is capable of
|   specifying an unrestricted LL* grammar with unbounded lookahead
|   required.

the grammar in ISO 8879 is not LL(1).  in fact, the productions in the
standard do not satisfy LL(k) for any fixed k, i.e., parsing strictly
according to the grammar requires unbounded token look-ahead.  a simple
rewrite of a few productions fixes k, and I think might even cause it to
become 1.  that is, SGML as a language can be expressed with an LL(1)
grammar, and can thus be said to be LL(1), but this requires some
cleverness in implementing the specification that should not have been
required of every implementor.

the grammar in ISO 8879 attempts to describe a constraint on the "grammar"
that the DTD describes that it be LL(1).  it could do this in fifteen
words, and be done with it, instead of making a whole lot of fuss that does
not quite manage to communicate that constraint.

there are two interesting things with content models and DTDs.  first, a
content model can (in the theoretical sense) describe a grammar that
requires unbounded look-ahead.  the (not explicit) rules for when tags
shall be inferred may be open to interpretations that allow more look-ahead
than the content models do.  I do not think such interpretations are within
the freedoms of interpretation when taking the rest of the standard into
consideration.  (thus, one might flippantly say that the standard requires
unbounded look-ahead for the human reader.)

in consequence, the DTD cannot wreak more havoc than the maximally havoc-
wreaking content model of an individual element.  this is why it works to
check only the content models for ambiguity.  (_despite_ the mealy-mouthed
wording that tag omission is forbidden if it would cause an ambiguity,
leading to all sorts of wrong conclusions.)

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940916T211444Z.erik@naggum.no>" date="2988738884">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 21:14:44 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940916T211444Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   Consider the following illegal content model:
|   
|   \<!Element BadBoy - -
|   	((A*,B?,A*,C?,A*,B,A*)*,A,B,C)
|   >
:
|   This is `ambiguous' in SGML terms, as I understand it.

(A?, A) is also ambiguous and a little more instructional since this
simpler case can be converted to a non-ambiguous content model through a
straight-forward rewrite.  (although you tend to hit the stupid ceilings
set by various quantities and may have to increase those if you do anything
interesting.)

but take your example

    \<!ELEMENT badboy - - ((A*,B?,A*,C?,A*,B,A*)*,A,B,C)>

which cannot be disambiguated within the framework of SGML because it
requires backtracking (a far more precise term than "look-ahead").  that
is, it is only satisfied if A, B, C are found in that order at the _end_ of
the expression.  if it decides to test for this, and the token is followed
by more tokens, a parser must discard the possibility of a match and start
the loop over.  if it does not test for this first, but for the repeated
group, which it should do according to the standard, it will never get to
the end of this content model, because the ending pattern is matched by the
previous pattern.  this expression is ambiguous as a regular expression,
and that is a far more interesting situation than that puny "ambiguity"
that SGML forbids, which is sometimes necessary (for humans, machines
couldn't care less) to be able to deal with complex content models in real
application, and which then requires error-prone and time-consuming
rewrites if you can't ask the computer to do it.

STRONGLY HELD OPINION:
parsers that complain about ambiguous content models but don't offer an
unambiguous rewrite such that the silly rules of ISO 8879 can be followed
to the letter should not be used in production settings.

(from what I know, Exoterica has understood the importance of this.)

|   Here, if you get
|   	\<BadBoy>\<A>\<A>\<A>\<B>
|   you don't know if the \<B> should match at (2) or (6) above.

hmm, apart from the termination issue, this is actually irrelevant, as it
doesn't _matter_ which primitive content token an element matches, since no
actions can be taken on the basis of which token it would match.  (this is
unlike other languages that specify grammars, and another argument for why
SGML is not a "meta-language".)  this observation is the key to be able to
rewrite these things into non-ambiguous content models with impunity.

|       Error in Document Type Declaration at offset 55 of the input
|       stream, on line 2 of the document: Ambiguous content model.  An
|       instance of element A could simultaneously match two or more tokens
|       in the content model.

now take (A*, A), and you get the same result, which is silly.  the model
obviously means "one or more A's", and if that can be accomplished by other
means, do it: (A+).  (however, ((A|B)*,A) is a little harder.)  the
simplistic model employed by SGML on the issue of ambiguity lumps trivial
cases with hard cases, and most DTD designers are afraid of "ambiguous
content models" without understanding that some of them may _trivially_ be
made non-ambiguous.

STRONGLY HELD OPINION: to teach DTD design and even _mention_ the silly
ambiguity rule of SGML is doing students a disservice.  forget the issue,
and show them tools that will simplify their models; if they are
interested, show them ways to do it on their own.  the fact that some of
their clean, neat models come out looking like flood victims should perhaps
be cause for some concern, but not in teaching them how to model things.

IMNSHO, this "requirement" is a joke, and it has caused much bad document
design with extraneous elements thrown around for no good purpose.  my
favorite example of bad design is a note in the standard.  page 415 if you
have [Goldfarb] by your side.

    \<!ELEMENT e ((a, b?), b)>

is ambiguous, it says, and it suggests a rewrite:

    \<!ELEMENT e (f, b)> \<!ELEMENT f (a, b?)>

I don't know _what_ went on when this was written by the obvious rewrite is
of course

    \<!ELEMENT e (a, b, b?)>

|   In effect, you have to have something like Yacc (only with a slightly
|   more powerful model, it's possible that byacc or eyacc could do it)
|   built in to the parser.

nonsense.  content model ambiguity (as currently specified) is statically
checkable with humble means.  the more powerful ambiguity that you have
used as an example _does_ require slightly (but not much) more powerful
mechanism to detect, but if you check for the simple-minded "ambiguity"
that SGML requires, you will catch them all.  moreover, dynamic checking of
ambiguity is simple, and could be employed by any parser with ease.  (look
for a match in the graph, ignore that match and look again.  another hit
means ambiguity.)

|   The same sort of thing goes for the lexical analysis as well.

nonsense.  lexical analysis in SGML is clean compared to the above stuff.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<Cw8srp.C5G@mv.mv.com>" date="2988741156">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 21:52:36 UT
From: Kaikow \<kaikow@standards.com>
Organization: MV Communications, Inc.
Message-ID: \<Cw8srp.C5G@mv.mv.com>
References: \<datCw8Lr8.3xu@netcom.com>
Subject: Re: Microsoft SGML

[Daniel Tauber]

|   I just saw a demo of Microsoft's SGML add-in for Word for Windows 6.0
|   at Seybold San Francisco and thought readers of this group would be
|   interested in what I saw.  Let me prefix this by saying that Microsoft
|   was showing a early beta of the software and are planning on making
|   many user interface improvements before it is available.
:
|   MS will sell it for around $495.  Available by end of year.

Hmmm, it's in early beta test now but will be available by year end.  Less
than 3 months of "testing"????
</message>
<message id="<1994Sep16.215345.21599@sq.sq.com>" date="2988741225">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 21:53:45 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep16.215345.21599@sq.sq.com>
References: <1994Sep15.050330.4223@sq.sq.com> <1994Sep16.154720.5080@ast.saic.com>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   As I understand it, SGML is LL(1) because a DTD that specifies a
|   grammar that is not LL(1) is illegal.  However, a DTD is capable of
|   specifying an unrestricted LL* grammar with unbounded lookahead
|   required.

[Bob Agnew]

|   I thought it said that it had to be LL(0), i.e., no look ahead is ever
|   required to determine the next state.

You may be right, I am not sure.  LL(0) was the stated intent.  However,
since it wasn't achieved, I am not sure it matters :-)

There was an article in this newsgroup some time ago about working out
whether a given content model was ambiguous.  You could probably do the
same for lookahead.

Lookahead is not needed for correct, error-free input marked up according
to a correctly-written DTD.

If you have a document that you know to be correct in this way, you
probably don't need a strict SGML parser for it :-) -- you simply want to
use it.

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<bdonogho.39.00160457@vallona.act.acs.org.au>" date="2988741652">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 22:00:52 UT
From: Bill Donoghoe \<bdonogho@vallona.act.acs.org.au>
Message-ID: \<bdonogho.39.00160457@vallona.act.acs.org.au>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   I would like to suggest defining a proper subset of SGML with the
|   following properties.  I'll call it SGML/F for now.
|
|   * a BNF grammar for the abstract syntax
|   * restrictions on content models to force regularity
|   * reference concrete syntax with
|           NAMECASE NO (i.e. same for elements and entities)
|           NAMELEN 0 (unrestricted)
|           no CAPACITY
|           no SHORTTAG, OMITTAG, DATATAG, SHORTREF, LINK, CONCUR, etc.
|   * a formally specified concrete syntax
|
|   All SGML/F documents would also be SGML documents, but the reverse
|   would not necessarily be true.
|
|   I can envision an SGML/FX (X=extended), with
|   * EMPTY elements distinguished syntactically
|   * a distinction between #PCDATA that can be empty and (#PCDATA)+
|     (i.e. textual content required)
|   * a regular input syntax -- e.g. comment equivalent to whitespace
|   * a formal specification (e.g. in Z?  I don't know Z well enough)
|
|   but I am not proposing that, because SGML/FX documents would not also
|   be SGML documents, even though a `down-translation' might be possible.

Isn't the regular input syntax the reason that SGML/FX documents could not
be SGML documents (N.B. I don't know what Z is!!) ??

|   Anyone for SGML/F?

I think that it is a good idea.

Some other extensions could be:

* allowing model group to specify a range for repeating groups 
  (e.g., (A, B){1,5}  -- ala lex notation)
* incorporating attribute and content syntax specifications
  (ala Erik's proposal or HyTime lexmodel )
* allowing the specification of the maximum recursion depth
  \<!ELEMENT SECTION   - - ( PARA+, SECTION* )  -- I hate short names -- >
  \<!DEPTH   SECTION  "5"  -- NO MORE THAN 5 NESTED SECTIONS -- >

Now for my rant.

My biggest gripe about SGML is that the syntax checking possibilities are
limited mainly to the order and nesting of elements.  Content validation
(e.g., requiring some text in a title) is very important in translating
existing documents into SGML.  I would trade all of the minimization
features of SGML (SHORTTAG, DATATAG, OMITTAG) for the features listed
above.
</message>
<message id="<19940916T223046Z.erik@naggum.no>" date="2988743446">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 22:30:46 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940916T223046Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   And the macro expansion via entities gives a whole new meaning to
|   `arcane', with the end of each entity introducing an out-of-band Ee
|   token that is or is not allowed in various places.,

but it is _not_ a macro expansion!  this must be _the_ source of much
confusion in SGML.  parameter entities are a form of string replacement
that is bounded by the context in which the reference occur, but which obey
many other rules of context that make it highly inappropriate to talk about
them as macro expansion.  (macro expansion works as context-free string
replacement (C, etc) or as language extensions through recursive expression
substitution (LISP).)  parameter entities have to be complete expressions
valid at the point they are inserted, but they are not functions.

the Ee stuff is stupid, and the real meaning is hidden behind the obscure
syntax productions that are burdened with too many tokens that allow and
disallow Ee at seemingly random places.  not so.  transcend the stupid Ee,
and you see that the net effect is to require that a pair of delimiters
occur in the same entity.  sometimes, a delimiter that pairs with another
that occurs in a different entity is an error, and sometimes it is data.
(it is an error for groups, it is data in attribute values.)  this could be
explained _very_ simply, without even resorting to the Ee (which is an
implementation detail, anyway, and it shouldn't be necessary to use it).

|   Did you know that you can't put comments in a content model??

yes...  \<sigh>  this is because, as you say, comments aren't whitespace,
but their own syntactic tokens.

|   And then there's the nonsense about #EMPTY elements being forbidden a
|   close tag, just to make sure you always have to be able to read a DTD
|   to do anything with a document.

I'm not sure this is so bad, really.  and you don't _have_ to read the DTD
to do anything with a document.  you seem to want to forbid minimization,
but then you learn _very_ soon whether an element is empty or not, unless
you're into dealing with invalid SGML documents, but then anything goes.

|   If it wasn't for that, you could take a fully expanded (no OMITTAG, no
|   `obfuscatory entities' as the good Prof. calls 'em) instance and make
|   an in-memory data structure without needing the DTD.  You couldn't tell
|   if it was right or not, but often you don't care.

look, this is really simple.  if you read a start-tag, you assume it has
content.  when you read an end-tag, you close it, but when the element just
outside of an element ends before that element itself ends, you know it was
empty, and you do a tiny tree transformation so that that empty element has
all its children as right siblings, instead.  no big deal.  of course, if
you use languages in which such manipulations are hard, that's your choice.

|   Challenge: write an attribute with both ' and " inside it.  Now do it
|   without using entities.

does numeric character references count as "entities"?  I don't think so.
well, then, what's the problem?

here is a _real_ problem: suppose you want a system identifier with both '
and " inside it.  the stupid syntax makes all the characters in a system
identifier "system characters" _before_ any possible parameter entity
expansion or numeric character references.  bummer!  the numeric character
references can be fixed if you teach your entity manager to undo this
stupidity.  of course, mine does.  I grew tired of the silly restriction.

|   OK, you can use the MSSCHAR thing (if I have named it right), which
|   works a bit like a \\ in most of Unix.  But as far as I can tell, the
|   behaviour of MSSCHAR is undefined if it should occur at the end of an
|   entity.

"undefined" is a special term with very precise connotations in standards.
nothing in SGML is "undefined" in this particular terminology.  unspecified
is a different thing.  however, neither does that apply to this case:

    9.7 Markup Suppression

    An |MSOCHAR| suppresses recognition of markup until an |MSICHAR| or
    entity end occurs.  An |MSSCHAR| does so for the next character in the
    same entity (if any).  [365:10]

that is: if it is followed by the end of the entity, it has no effect.

because some operating system designers got their eyes crossed or had a bad
day or something, using \\ as MSSCHAR has some undesirable side effects on
some operating systems, such as a minor problem in distinguishing between a
\\ that is intended as a path separator, or a \\ that is intended as a quote
for the next character.  doubling them, which is the natural solution in
all languages that use \\ as a quoting character will work, of course,
provided you can teach billions of little bugs (and users) to behave.

fortunately, you can undo this by using a more general syntax for pathnames
in the entity manager, and then you don't have this problem.  giving system
specific pathnames in system identifiers has never been a good idea, anyway.

|   It is only defined if there is a character following it within the same
|   entity.  [sorry I am at home & don't have the ref.]

you _know_ what I think about posting wild claims without references.  boo!

|   There's an article in Scientific American this month (Sept. 1994) about
|   the `software crisis' of unreliable software.  The article points out
|   how the use of formal specifications and mathematical rigor helps to
|   make programs that are on time, that work, that are reliable, that do
|   what was expected of them.

glad to see the research of the past 40 years finally make it to the
popular press!  in time, we will probably see dynamically typed languages
hit the mainstream, instead of the bondage\&discipline of stupid languages
like C++.  problem is that it is easier to prove dynamically typed programs
correct than statically typed programs.  and if the statically typed
programs use "casts" (as any C++ program _must_ do if it does anything
useful, because some things just aren't made for static typing), you lose.

|   I would like to suggest defining a proper subset of SGML with the
|   following properties.  I'll call it SGML/F for now.
|   
|   * a BNF grammar for the abstract syntax

possibly a good idea.

|   * restrictions on content models to force regularity

I'm not sure what this means.

|   * reference concrete syntax with
|   	NAMECASE NO (i.e. same for elements and entities)
|   	NAMELEN 0 (unrestricted)
|   	no CAPACITY
|   	no SHORTTAG, OMITTAG, DATATAG, SHORTREF, LINK, CONCUR, etc.

I would request that _all_ the quantities go take a hike, not just NAMELEN.

why NAMECASE GENERAL NO ENTITIES NO?  that makes them all case-_sensitive_.
do you really want that?  your example with \<!Element BadBoy ...> would no
longer be legal.

and why no SHORTTAG?  it is truly obnoxious that we have to type in values
for FIXED and DEFAULT values, and with stupid quotes as well, when they are
not needed in the DTD.  SHORTTAG is a grab-bag of nonsense, but we could
select a few things from that grab-bag and make SHORTTAG less of a curse.

|   * a formally specified concrete syntax

better yet, a formally specified set of constraints on concrete syntax such
that it would be possible know whether a concrete syntax is contradictory
or not.  this is currently _very_ hard to do.  (why?  beats me.)

|   All SGML/F documents would also be SGML documents, but the reverse would
|   not necessarily be true.
|   
|   I can envision an SGML/FX (X=extended), with

|   * EMPTY elements distinguished syntactically

this is completely unnecessary.

|   * a distinction between #PCDATA that can be empty and (#PCDATA)+
|     (i.e. textual content required)

opens a whole can of worms about data content validation.  I don't think we
want that in SGML.  really.  if you want this, SGML already has NOTATION to
allow such things to be done formally.  it can be used cleverly: what does
a SYSTEM identifier mean for a NOTATION?  the name of the program?  well,
suppose it was code in some reasonably coherent expression language,
instead?  suppose it could access values of attributes, and validate them,
too?  this could make NOTATION _useful_, instead of just sort of hanging
around.  what _is_ HyTime's "lextype" if not another name for notations?
beats me.

|   * a regular input syntax -- e.g. comment equivalent to whitespace

oh, this is my favorite!  a _unified_ syntax, such that no features could
fuck with the syntax (and the term _is_ appropriate for what the do today),
and getting rid of this stupid "separators" that clutter up everything.
oh, yes!

|   * a formal specification (e.g. in Z?  I don't know Z well enough)

not me.  I recently received a draft standard for the simple and small
language called Modula-2, and it has "formal specifications" in it, that
makes it into a veritable phone book of a standard.  they use the Vienna
Definition Method, which, admittedly, is a very powerful formal method.
when a language that began with some 20 pages ends up with 700 pages and is
much larger than Ada 9X ("not your father's Ada", "it doesn't suck"), which
itself is a reasonably formally-described standard, I start to worry.

ODA has formal specifications.

|   but I am not proposing that, because SGML/FX documents would not also be
|   SGML documents, even though a `down-translation' might be possible.

it should be possible to have a common minimum without much trouble, such
that a minimal SGML document (modulo quantity restrictions) was a minimal
SGML/FX document.

|   Anyone for SGML/F?

no thanks.  I have less ambitious goals for the immediate future.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940916T224531Z.erik@naggum.no>" date="2988744331">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 22:45:31 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940916T224531Z.erik@naggum.no>
References: <1994Sep12.155128.4112@sqwest.wimsey.bc.ca> <1994Sep12.175005.20847@ast.saic.com> <1994Sep14.100544.27199@sqwest.wimsey.bc.ca> \<m0qlLz0-000C9wC@newman>
Subject: Re: CONCUR usefulness existence proof

[Matt Timmermans]

|   There are two types of HyTime conformance.

no, there is only one type of HyTime conformance -- that defined in the
standard.

|   1) Strictly speaking, in order for a DTD to be HyTime conforming, it
|      must only be _possible_ to create an instance of that DTD which is
|      HyTime conforming.

HyTime defines conformance between the element type forms and the actual
instance, only.  the intention is that you do not do any conformance
checking in the SGML parser, only with whatever the parser tells you it
found (and that was ESIS at the time the standard was written, despite the
fact that ESIS was only a proposal for the conformance testing standards)
and your HyTime engine has "innate knowledge" of the element type forms you
use with which it compares whatever it gets.  the problem with this is that
it _prohibits_ the kind of useful validation that I want.

|   2)  According to the definition assumed in this thread (I think), a DTD is
|       HyTime conforming if it is _impossible_ to create an instance of that
|       DTD which is _not_ HyTime conforming.

what I and others desire is to be able to say that "if a document conforms
to this DTD, it will also conform to HyTime."  to be able to say this, we
need to get rid of inclusion exceptions, unless they are tied to the
element in which they first were allowed and are treated as truly out of
band with respect to the element contents.  ESIS does not allow for this
progressive view.

|   If A), then computing 1) thunks down to a series of regular expression
|   intersection operations, and computing 2) thunks down to a series of
|   regular expression differences.  Both of these are computable in
|   exponential time.

some of us just _may_ consider that a very bad predicatment.

|   If you are concerned about this type of safety, then simply consider a
|   DTD to be non-conforming if you can't prove that it conforms.
|   Normally, if you can't prove that a DTD conforms, then it is unlikely
|   that a human author would be able to create a conforming instance
|   anyway.

precisely, but nowhere am I allowed to make this kind of heuristic work.
it shall be possible to use a heap of ANY elements and a bunch of
inclusions and still end up with a valid HyTime document "by accident".

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<1994Sep16.224809.28435@sq.sq.com>" date="2988744489">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 22:48:09 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep16.224809.28435@sq.sq.com>
References: <33d253$en6@sundog.tiac.net> \<CONNOLLY.94Aug25104040@austin2.hal.com> <33ijke$oqs@sundog.tiac.net>
Subject: Re: HTML and SGML

[Dan Connolly]

|   The parser in Mosaic considers the '>' in '\<tag>' to be the end of the
|   comment.

[Keith M. Corbett]

|   That this is a horrible mis-feature, we violently agree.  Is there a
|   comprehensive list of such problems so that implementors can know what
|   SGML features to avoid?

Not at the moment.  I'll append a few to this message.

|   Wouldn't it be nice if Mosaic used a decent parser, say, sgmls?  Then
|   one could have some confidence in its implementation as an HTML
|   application of SGML.  (Is that phrased properly this time? ;)

Unfortunately, a lot of HTML documents -- probably most of them -- contain
errors.  One of the ways we're trying to help address this is by making
something like our Author/Editor SGML editor available by anonymous ftp:
HoTMetaL is available at the sites in my .signature, and is a full SGML
editor that is restricted to the HTML DTD.  This helps people to produce
documents that conform to the DTD.  (Although the DTD included is a little
out of date now, and we also included the HTML+ DTD, the next time we ship
we'll be using HTML 2.0 with tables added)

|   Until then I'll use "real" SGML as a front end to filter into HTML.

That's not a bad approach.

[Dan Connolly]

|  The way current implementations parse of comments and attribute values
|  cannot be reconciled with ISO 8879:1986.

[Keith M. Corbett]

|  I'm hopeful that IETF will address these problems.  With only a DTD, and
|  in the absence of an application standard, with implementations going
|  their own way... you know better than I how far HTML *practice* has
|  drifted from theory.

Yes, the IETF working group is doing the best it can to address this and
other problems, in helping the HTML community to move towards SGML.

People on this newsgroup can help a LOT here.

Don't poke fun at HTML as `not proper SGML'.  It can be, and is, since
there is a DTD for it.  Instead, help people to underastand that full SGML
(if you will) is more powerful than HTML, but HTML is a perfectly
legitimate use of SGML.  This remark is not addressed at anyone in
particular, but I've heard a lot of people make silly claims such as `HTML
isn't SGML because \<P> has a #EMPTY content model', which is neither true
(it does not have a #EMPTY content model in the current DTD) nor helpful
(as it would still be perfectly legal, valid HTML even if it \<P> was
empty).

|  I did not mean to condemn HTML the standard -- my complaints are against
|  the ad hoc use of HTML style guides and implementations that do not
|  implement SGML.

We have to help them improve, not simply complain about that.  Dan is one
of the people most active in this area.

Lee

---------------------------------------------------------------------------

Some SGML things to avoid in HTML:

* Marked sections (unimplemented)

* Entities other than Latin1/isopub/isotech that are in ISO 8859-1
  (entities cannot be defined in an instance)

* document subset (the [....] in the DOCTYPE thing :-)) not supported at
  all.

* Comments
  Cannot contain either -- (as per SGML) or >, and may also be terminated
  by a newline (who knows?)

* Processing Instructions
  (sometimes supported, but behaviour actually varies widely, the contents
  may or may not be shown to the user, run as a shell script, formatted
  through TeX, etc.)

* Minimisation:

  NETTAG like \<PRE/the content of pre/ is not supported, although the SGML
  declaration allows it.  (It's allowed so that attribute names can be
  omitted in certain cases)

  SHORTTAG </> is ignored by Mosaic as far as I know.

  OMITTAG is OK for \<P>, but not recommended

* Case sensitivity

  Browsers seem to cope with tags in lower or upper case.  Attribute values
  on forms, however, are required to be in lower case in Mosaic, even
  though they should be case insensitive.


Some laxness in Mosaic (and other browsers probably, I haven't checked):

* Attributes

  Double quotes are not required round attribute values, even where SGML
  requires them for a URL with punctuation in it.

  ID-valued attributes can take any string value, e.g., "46", even though
  they should start with a Name char.

  IDs are not checked for uniqueness

* Unknown Elements

  Unknown elements are generally ignored -- i.e., as if the tag wasn't
  there, but the content was.  The end element might or might not mark the
  end of its container, though.

* Incorrect Nesting

  Try \<I>doing \<B>THIS\</I> with\</B> any SGML parser!  Then try it in
  Mosaic...  it's a fairly common error in existing documents.

* Missing #REQUIRED attributes, or missing required elements

  They might be required in the DTD... not in the browser!

There are others.

On the plus side, Mosaic is a freely available SGML-based browser, and
although it deviates from the standard in a number of ways, the NCSA staff
_are_ working on improving conformance, and are trying to keep their
program very tolerant, so that it will show the document the user wants to
see, even if it's faulty, whenever possible.  They are also very busy...

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<1994Sep16.230125.5004@sqwest.wimsey.bc.ca>" date="2988745285">
Newsgroups: comp.text.sgml
Date: 16 Sep 1994 23:01:25 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, British Columbia, CANADA
Message-ID: <1994Sep16.230125.5004@sqwest.wimsey.bc.ca>
References: <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   STRONGLY HELD OPINION: to teach DTD design and even _mention_ the silly
|   ambiguity rule of SGML is doing students a disservice.  forget the
|   issue, and show them tools that will simplify their models; if they are
|   interested, show them ways to do it on their own.  the fact that some
|   of their clean, neat models come out looking like flood victims should
|   perhaps be cause for some concern, but not in teaching them how to
|   model things.

I used to agree with this.  The problem is that I teach DTD design to
people who do not always have control over what tools will be used.  So I
started teaching students about ambiguity.  And you know what?  A wonderful
and strange thing happened.  When we started including exercises about
ambiguity in our courses, we discovered that these exercises solidified
students' understanding of content model constructs to a much greater
degree than if we had just had the same number of minutes devoted to
additional content model reading and writing exercises.  (We know this,
because to have time for the ambiguity discussion, we had to cut other
basic exercises.)

This effect is real and dramatic.  The ambiguity rule gives me the fidgets.
I wish it weren't there.  However, so long as it is there, I am resigned to
having to teach it.  Too many of our students need it and even those that
don't seem to benefit from it.

Which makes me realize that if this rule is removed from future versions of
the standard (o, yes please!), then I will have to think up some other
exercise to ground students' understanding.  :-)

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<19940917T023842Z.erik@naggum.no>" date="2988758322">
Newsgroups: comp.text.sgml
Date: 17 Sep 1994 02:38:42 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940917T023842Z.erik@naggum.no>
References: <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> <1994Sep16.230125.5004@sqwest.wimsey.bc.ca>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   STRONGLY HELD OPINION: to teach DTD design and even _mention_ the silly
|   ambiguity rule of SGML is doing students a disservice.  forget the
|   issue, and show them tools that will simplify their models; if they are
|   interested, show them ways to do it on their own.  the fact that some
|   of their clean, neat models come out looking like flood victims should
|   perhaps be cause for some concern, but not in teaching them how to
|   model things.

[Marcy Thompson]

|   I used to agree with this.  The problem is that I teach DTD design to
|   people who do not always have control over what tools will be used.  So
|   I started teaching students about ambiguity.  And you know what?  A
|   wonderful and strange thing happened.  When we started including
|   exercises abpout ambiguity in our courses, we discovered that these
|   exercises solidified students' understanding of content model
|   constructs to a much greater degree than if we had just had the same
|   number of minutes devoted to additional content model reading and
|   writing exercises.  (We know this, because to have time for the
|   ambiguity discussion, we had to cut other basic exercises.)

interesting.  my conjecture is that this is because it was an opportunity
to understand how the parser works its way through the content model in an
actual element instance.  did you teach this separately?

it is possible to walk through a content model today without constructing
either a deterministic or indeterministic finite state machine.  with a few
simple tree-walking functions, you basically have it down pat.  that is,
you can draw a cute little tree figure and show how the parser will walk
down a given branch and you can show what happens when there are multiple
branches with the same label on them.

this tree walking is _necessary_ once we move beyond the simplistic
"ambiguity" rules, because we will "resolve" the ambiguity by pushing
elements we have seen on a "stack" if we can't decide, yet.  this is a
slightly more complicated scheme, but much more natural than the present
"do the obvious thing first, then see if you live to regret it" scheme.

for instance, suppose you want a content model (A?, ((A|B)+ | C+)).  to a
human eye, this says that you may want to have an A, but then you want one
or more A's or B's, _or_ one or more C's.  obviously, if you have only one
A, it is the required one.  recall that there is no way a parser or anybody
else can know _which_ "primitive content token" any given element in the
instance will match, so the ambiguity of an initial A is purely in the
reading of this model, not anywhere in the instance.  (in the instance, it
is unambiguously an A, of course.)

we get these initial branches: A, B, C.  if we find B or C, we know what to
do.  if we find A, we do not know what to do, yet, but we know both that we
need to look further to find out, and that whatever we find further down,
it won't make this A invalid, so after A we have these branches: A, B, C.
now we know exactly what we should do with the rest of the model.  an
unambiguous model can be constructed:

    (((B, (A|B)*) | C+) | (A, ((A|B)* | C+)))

now, this is not quite as readable as the first version, is it?  but
through analysis, we know what we must do, and that this does it for us.
should we have to express this in the primitive "unambiguous" form of a
convoluted content model?  I don't think so.  but let me take a natural
second step for a real DTD during development: suppose an optional B may
occur before the C.  in our "ambiguous" model: (A?, ((A|B)+ | (B?, C+)))

the change to this content model is minimal and functional: it says exactly
what we want to say.  now, what happens to the "unambiguous" model?

    ((B, ((A|B)* | C+)) | (A, ((B, ((A|B)* | C+)) | (A, (A|B)*) | C*)))

(whether this does in fact do what is desired, and nothing more, is left as
an exercise to the reader.  note the change from C+ to C*.)

since I can do this, and I can program computers to do this for me, I think
it is criminal to have to write this down in DTDs that other people shall
use.  they will not understand what is going on without _lots_ of effort,
and making a minimal change will cause _lots_ of rewriting.  and SGML was
supposed to be for *humans*?  haha, good joke.  what we have done is to
make the human users suffer because of some misguided idea about parsing
that was, even at the ancient time SGML was defined, a non-issue.

fact is, _humans_ are smart enough to see these things, and to deal with
LR(0) grammars without effort.  it takes a lot of effort to deal with LL(1)
grammars, because they are _unnaturally_ and _artificially_ constrained.
a human _does_ look ahead when reading, does hold off conclusions until
more evidence clicks into place, and is usually willing to suspend a desire
to correct something until more data is available.  but not an SGML parser.
human languages are riddled with examples of how we don't know what a word
means or even its role until several words further down in the sentence.
the dog chased the cat.  the cat the dog chased killed the mouse.  the
mouse the cat the dog chased killed suffered.  perfectly readable.

when you read FEATURES MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES
LINK SIMPLE YES 4 IMPLICIT YES EXPLICIT NO OTHER CONCUR YES 4 SUBDOC YES 10
FORMAL YES APPINFO NONE, it may be a little tricky to see that it _really_
was supposed to be read as ((FEATURES (MINIMIZE (DATATAG NO) (OMITTAG YES)
(RANK YES) (SHORTTAG YES)) (LINK (SIMPLE YES 4) (IMPLICIT YES) (EXPLICIT
YES)) (OTHER (CONCUR YES 4) (SUBDOC YES 10) (FORMAL YES))) (APPINFO NONE)).
even if you suffer from parenthophobia, and have trouble counting all of
them up and down, this task is demonstrably simpler than counting up and
down when there are no parens there to begin with.  the fact that you can
use the _closing_ paren further to your right to determine where you are,
helps enormously.

take attribute lists as another example of unreadable LL(1) grammars:
\<!ATTLIST foobar A CDATA #REQUIRED B NOTATION (X | Y | Z) Z NAME (TEST)
#FIXED TEST ETC NAME NOTATION> which is really \<!ATTLIST foobar (A CDATA
#REQUIRED) (B (NOTATION X | Y | Z) Z) (NAME (TEST) (#FIXED TEST)) (ETC NAME
NOTATION)>.

I use parens to go up and down because that is what a parser will do when
it goes up and down functions called to parse the things it looks at.  in
the document instance, we have found it _very_ useful to explicitly mark
the beginning and end of things, and only when it can be unambiguously
"inferred" what we will begin or end can we "omit" the markers.  not so in
the SGML declaration or the DTD: all "markup declarations" in the DTD are
bracketed, but not their complex parameter lists; none of the constructs in
the SGML declaration are bracketed.  there are even brackets where none are
needed, as in the document type declaration subset -- only because it made
it a lot easier to find where it ended, and that's my point: unless we have
enough information to know where we are, and can find it by reading a bit
backwards and ahead, we have a problem.  if _all_ the information we need
to know where we are comes from what we have already seen (as is the case
with LL(1) languages) the human stack depth is over-run, and we lose.

BTW, the above example is actually legal.  part of the wonderful deal with
LL type languages is that you don't need reserved keywords, because at each
point, you know what you're looking _for_, and "NAME" may be an attribute
name or a declared value or a name token in a name group because the past
context provides the knowledge necessary to decide.  in LR type languages,
you typically have an independent lexical tokenizer that will match a token
anywhere it occurs, and on the stack, reductions will happen anywhere they
occur.  this is called context free grammars.  in contrast, it takes effort
to make LL languages context free, because nothing is barring you from not
doing it.  neither SGML declaration nor DTD in SGML are described by
context free grammars.  it is less obvious whether the instance is context
free, but I don't think it is.  in short, SGML is ideal for hand-crafting
parsers, and hand-crafting parsers is not the ideal.

|   Which makes me realize that if this rule is removed from future versions
|   of the standard (o, yes please!), then I will have to think up some other
|   exercise to ground students' understanding. :-)

did this help? :-)

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<Cw9rzv.47v@cyf-kr.edu.pl>" date="2988786811">
Newsgroups: comp.text.sgml
Date: 17 Sep 1994 10:33:31 UT
From: Julian Zabicki \<atzabick@cyf-kr.edu.pl>
Organization: Academic Computer Centre, CYFRONET
Message-ID: \<Cw9rzv.47v@cyf-kr.edu.pl>
Subject: DAMOS

Hi,

Does anybody know something about project DAMOS, I need source of
information concerning this project.

Thanks in advance
</message>
<message id="<28998.smithn@orvb.saic.com>" date="2988795793">
Newsgroups: comp.text.sgml
Date: 17 Sep 1994 13:03:13 UT
From: "Norman E. Smith" \<smithn@orvb.saic.com>
Message-ID: <28998.smithn@orvb.saic.com>
References: <33d253$en6@sundog.tiac.net> \<CONNOLLY.94Aug25104040@austin2.hal.com> <33ijke$oqs@sundog.tiac.net> <1994Sep16.224809.28435@sq.sq.com>
Subject: RE: HTML and SGML

[Liam R. E. Quin]

|   Unfortunately, a lot of HTML documents -- probably most of them --
|   contain errors.  One of the ways we're trying to help address this is
|   by making something like our Author/Editor SGML editor available by
|   anonymous ftp: HoTMetaL is available at the sites in my .signature, and
|   is a full SGML editor that is restricted to the HTML DTD.  This helps
|   people to produce documents that conform to the DTD.  (Although the DTD
|   included is a little out of date now, and we also included the HTML+
|   DTD, the next time we ship we'll be using HTML 2.0 with tables added)

As someone who has set up a Web server for a customer from day 1 as an SGML
system, (http://www.doe.gov/home.html and http://apollo.osti.gov/home2.html)
I feel compelled to throw my 2 cents in here.  You have to be careful about
which DTD you use.  I want to say up front that I like HoTMetaL for
authoring, but I can't feed the output files directly to the server.
HoTMetaL files are fed to an AWK program that takes out all of the extra
end tags for the empty elements.  Otherwise, we risk putting documents up
that consistently lock up some prevalent version of Mosaic.  Yes, Mosaic
does ignore most tags it does not know about, but an end tag on certain
empty elements is death.  An \</img> tag locks up Mosaic consistently.  Our
site is stuck at the last 16-bit version of the Windows Mosaic, I don't
remember the version number off hand.

Don't bother putting tables in your HTML DTD until Mosaic supports tables
as far as I am concerned.  It will only cause more grief when a user codes
up a table, then finds out that Mosaic can't display it!

|   |   Until then I'll use "real" SGML as a front end to filter into HTML.
|   
|   That's not a bad approach.

It is real SGML; there is a DTD (actually several :-).  It may not be
pretty, but it is SGML!

|   Don't poke fun at HTML as `not proper SGML'.  It can be, and is, since
|   there is a DTD for it.  Instead, help people to underastand that full
|   SGML (if you will) is more powerful than HTML, but HTML is a perfectly
|   legitimate use of SGML.

I agree.

|   Some SGML things to avoid in HTML:

I agree with most of Lee's list, so I will comment only on the ones I
disagree with.

|   * Marked sections (unimplemented)

Mosaic may not support them but, but there is no reason not to use marked
sections in baseline documents before they are parsed.  Yes, I set up
configuration management of HTML documents and use marked sections among
other things to help this along.  We keep a separate baseline copy of all
documents separate from the production version on the Web server.  We also
use entities to manage hyperlinks.  Our production DTD also has several
site specific elements added to help track document status through revision
cycles.  These are also dropped in the parsing process.  Documents are
parsed and turned into "real HTML" that Mosaic likes before being installed
on the server.

Choosing a DTD was a tough choice at the start.  There were several
available.  I ended up getting both the then current HTML DTD and the HTML+
DTD.  I declared the HTML DTD as our official DTD and started merging
features from the + DTD as I verified that Mosaic supported them.  The
biggest example is Forms.  Of course I renamed the DTD to MOSAIC.DTD
(http://apollo.osti.gov/html/osti/eei/encoding/mosaic.html) because it more
or less reflects what Mosaic supports and the features we have used in
documents on our system.  Marked sections have been added to support a
customer requirement that full production and development server areas
exist at the same time.  This lets me keep both the production and
development version of a document in the same file.  I don't want to end up
with multiple files and not knowing which is the production version and
which is the development.

|   * Attributes
|   
|     Double quotes are not required round attribute values, even where SGML
|     requires them for a URL with punctuation in it.

The X-Windows version of Mosaic doesn't like single quotes around attribute
values.  Other versions don't seem to care whether you use single or double
quotes.

|   * Unknown Elements
|   
|     Unknown elements are generally ignored -- i.e., as if the tag wasn't
|     there, but the content was.  The end element might or might not mark
|     the end of its container, though.

Mosaic doesn't like \</img> as noted above.

\<STRONG OPINION FOLLOWS...>
SGML IS THE ONLY WAY TO RUN A WEB SERVER AND MAINTAIN HTML DOCUMENTS!  The
server has been through four major overhauls since February and we have had
no down time because of the switches.  Hyperlinks automatically follow
documents as they get moved around.  In short the current state of the
server would not be possible without SGML!  It has all worked so well that
the customer, the Office of Scientific and Technical Information (OSTI),
was chosen of host the official Home Page for the entire Dept of Energy.
(Official grand opening in about a week.)  The server now has over 10M
bytes of data in hundreds of HTML files including several 100+ page full
text documents.

For all of the comments about DTDs and what type of grammar is used, the
one thing that SGML provides is a wonderful consistency in the data that
allows automation!  For example, we automatically generate a hyperlinked
table of contents for all large documents based on the header tags.  That
is possible because we can guarantee that header tags are not included
inside bullet lists to make Mosaic display text in a large font.  When all
is said and done, the simple fact is that SGML provides a consistent data
format that lends itself to automated manipulation without human
intervention!

SGML IS THE ONLY WAY TO RUN A WEB SERVER AND MAINTAIN HTML DOCUMENTS!

Norm
-- 
Norman E. Smith, CDP               | Internet:   smithn@orvb.saic.com
Science Applications International | Compu$erve: 72745,1566
P.O. Box 2501                      | Ma Bell:    +1 615 481 2186
Oak Ridge Tn. 37830                |
</message>
<message id="<29377.smithn@orvb.saic.com>" date="2988796172">
Newsgroups: comp.text.sgml
Date: 17 Sep 1994 13:09:32 UT
From: "Norman E. Smith" \<smithn@orvb.saic.com>
Message-ID: <29377.smithn@orvb.saic.com>
References: <33d253$en6@sundog.tiac.net> \<CONNOLLY.94Aug25104040@austin2.hal.com> <33ijke$oqs@sundog.tiac.net> <1994Sep16.224809.28435@sq.sq.com> <28998.smithn@orvb.saic.com>
Subject: RE: HTML and SGML

OOPS, I forgot one last point in the previous post:

Managing a Web server is one of the few SGML applications that has
immediate, short term benefits and pay back.  One of the traditional
stumbling blocks with using SGML is the high up front cost and and long
time before perceived pay back.  The reason for this is most of the hard
work of an SGML application has already been done; DTDs (HTML, HTML+, etc)
exist and user applications that don't require SGML knowledge (Mosaic,
Cello, et al.) already exist.  So, all you have to do is feed the Web
server SGML documents...

Norm
-- 
Norman E. Smith, CDP               | Internet:   smithn@orvb.saic.com
Science Applications International | Compu$erve: 72745,1566
P.O. Box 2501                      | Ma Bell:    +1 615 481 2186
Oak Ridge Tn. 37830                |
</message>
<message id="<1994Sep17.161241.13653@sqwest.wimsey.bc.ca>" date="2988807161">
Newsgroups: comp.text.sgml
Date: 17 Sep 1994 16:12:41 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep17.161241.13653@sqwest.wimsey.bc.ca>
References: <19940916T211444Z.erik@naggum.no> <1994Sep16.230125.5004@sqwest.wimsey.bc.ca> <19940917T023842Z.erik@naggum.no>
Subject: Re: SGML and its enemies

[Marcy Thompson]

|   I used to agree with this.  The problem is that I teach DTD design to
|   people who do not always have control over what tools will be used.  So
|   I started teaching students about ambiguity.  And you know what?  A
|   wonderful and strange thing happened.  When we started including
|   exercises abpout ambiguity in our courses, we discovered that these
|   exercises solidified students' understanding of content model
|   constructs to a much greater degree than if we had just had the same
|   number of minutes devoted to additional content model reading and
|   writing exercises.  (We know this, because to have time for the
|   ambiguity discussion, we had to cut other basic exercises.)

[Erik Naggum]

|   interesting.  my conjecture is that this is because it was an
|   opportunity to understand how the parser works its way through the
|   content model in an actual element instance.  did you teach this
|   separately?

Yes, we did.  The ambiguity exercises go like this:

	Here's a content model.  Is it ambiguous?  If so, show me an
	ambiguous instance.  Then fix it.

The human-parsing exercises go like this:

	Here's a valid DTD and a document instance.  Parse the instance.

I think the ambiguity exercises are more effective because they require
actual engagement with the content models, rather than just looking at them
and seeing if a particular instance conforms.

I wonder if exercises like this would have the same effect?

	Here's an element instance.  Here's an element declaration to which
	the instnance does *not* conform.  Alter the content model so that
	the instance *does* conform.  Do this in such a way that everything
	that now conforms continues to conform and (except for the
	particular instance you are trying to get to conform) nothing now
	non-conforming becomes conforming.

I wonder if anyone would every understand those instructions? :-)

Also, and for reasons I can't begin to explain, the ambuigity exercise
effect is far more pronounced with sensory learners.  I think that in order
to replace the ambiguity exercises with something else that has the same
effect (on that glorious day when it's not part of the standard anymore), I
will have to understand why this is.

Thanks for the interesting posting.

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<35gbbv$975@news.xs4all.nl>" date="2988846911">
Newsgroups: comp.text.sgml
Date: 18 Sep 1994 03:15:11 UT
From: Jan Grootenhuis \<jang@xs4all.nl>
Organization: XS4ALL, networking for the masses
Message-ID: <35gbbv$975@news.xs4all.nl>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   ... and most DTD designers are afraid of "ambiguous content models"
|   without understanding that some of them may _trivially_ be made
|   non-ambiguous.

I am (afraid, that is).  Recently, I tried hard to rewrite an existing
grammar into SGML, that specified something like: ((L?, A)*, (L?, B)*)
without using using an intermediate level.  I couldn't.  What are the
recognition rules to decide whether there's a _trivial_ way?

Thanks for bringing up an interesting item.
</message>
<message id="<247@rolben.win-uk.net>" date="2988866253">
Newsgroups: comp.text.sgml
Date: 18 Sep 1994 08:37:33 UT
From: David Dibbens \<ddibbens@rolben.win-uk.net>
Message-ID: <247@rolben.win-uk.net>
Subject: SGML User groups?

Are there any SGML user groups in England?  If so, then I would be very
grateful for any contact information.

Thanks.

-- 
ROLBEN Consultancy Limited  "Your Profit is our Business"
Fax/Answermachine:- 01293 773720 
Voice Business   :- 01293 821055 Mobile 1-2-1:- 0956 567754
</message>
<message id="<19940918T184935Z.erik@naggum.no>" date="2988902975">
Newsgroups: comp.text.sgml
Date: 18 Sep 1994 18:49:35 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940918T184935Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> <35gbbv$975@news.xs4all.nl>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   ... and most DTD designers are afraid of "ambiguous content models"
|   without understanding that some of them may _trivially_ be made
|   non-ambiguous.

[Jan Grootenhuis]

|   Recently, I tried hard to rewrite an existing grammar into SGML, that
|   specified something like : (L?, A)*, (L?, B)* without using using an
|   intermediate level.  I couldn't.

this is a example of where you need a stack to reduce L, x into the
appropriate expression.

|   What are the recognition rules to decide whether there's a _trivial_
|   way?

well, it's a trivial case if it is not on the list of non-trivial cases.
to find non-trivial cases, imagine the repetitions as loops, and see if the
loop exits at more than one place.  this is equivalent to seeing if there
is a need to back-track anywhere to discard a "wrong choice".  another way
to say it is if the loop contains tokens that have meaning after the end of
the loop.  in the case above, the loop exits after the A if followed by a
B, and after the L if an A is followed by an L followed by a B, and this
latter exit will also cause back-tracking over the L.  thus it is not a
trivial case.

disambiguation consists of factoring out tokens from the left side of a
content model.  optional tokens split split the expression in two, one with
the optional token and one without.  if you can make a finite number of
such factoring steps, you have reduced the expression to an non-ambiguous
case.  if you find that you do the same step twice, you cannot produce a
non-ambiguous case.

it is not necessarily trivial to identify a trivial case a priori, but
since the steps to get there are so simple, you can safely assume that it
can't be done it if any of those steps appear to repeat for the same data.
if you can't do it within a few minutes, rethink the solution.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<CwDIxE.8KM@cogsci.ed.ac.uk>" date="2988961680">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 11:08:00 UT
From: Steve Finch \<steve@cogsci.ed.ac.uk>
Organization: Centre for Cognitive Science, Edinburgh, UK
Message-ID: \<CwDIxE.8KM@cogsci.ed.ac.uk>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T223046Z.erik@naggum.no>
Subject: Re: SGML and its enemies

[Liam R. E. Quin]

|   If it wasn't for that, you could take a fully expanded (no OMITTAG, no
|   `obfuscatory entities' as the good Prof. calls 'em) instance and make
|   an in-memory data structure without needing the DTD.  You couldn't tell
|   if it was right or not, but often you don't care.

[Erik Naggum]

[A lot of sense, but...]

|   look, this is really simple.  if you read a start-tag, you assume it
|   has content.  when you read an end-tag, you close it, but when the
|   element just outside of an element ends before that element itself
|   ends, you know it was empty, and you do a tiny tree transformation so
|   that that empty element has all its children as right siblings,
|   instead.  no big deal.  of course, if you use languages in which such
|   manipulations are hard, that's your choice.

But you may have to read 200GB of data before you know the structural
position of the third opening tag (assuming the first can't be empty).
This was surely the main (misconceived) idea behind "no ambiguous content
models".  But as you rightly point out, such ambiguous content models are
unambiguous according to an automatic rewrite of the model, and
everything's trivially structurally unambiguous if minimization is
precluded.  So it seems simply stupid to include this restriction in the
standard, certainly in the hard form it is in now (although we probably
wish to preclude structural ambiguity if we allow minimization, but it's
far better to preclude minimization).

Precluding minimization make for fast unambiguous parsing, transparent
interpretation, isomorphism (almost; caveat marked sections, comments,
possibly entities) between the parse tree and the form (character stream)
of representation, and is generally the main thing which needs to be done
to make SGML attractive to comp-science types.

Steve.
</message>
<message id="<CwDKLM.9FK@cogsci.ed.ac.uk>" date="2988963847">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 11:44:07 UT
From: Steve Finch \<steve@cogsci.ed.ac.uk>
Organization: Centre for Cognitive Science, Edinburgh, UK
Message-ID: \<CwDKLM.9FK@cogsci.ed.ac.uk>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> <35gbbv$975@news.xs4all.nl> <19940918T184935Z.erik@naggum.no>
Subject: Re: SGML and its enemies

[Jan Grootenhuis]

|   Recently, I tried hard to rewrite an existing grammar into SGML, that
|   specified something like : (L?, A)*, (L?, B)* without using using an
|   intermediate level.  I couldn't.

[Erik Naggum]

|   this is a example of where you need a stack to reduce L, x into the
|   appropriate expression.

This is also an example of an over-zealous standard: Its a regular
expression hence unambiguous on single token look-ahead when properly
compiled (hence, BTW, doesn't need a stack).  Just because the syntax of
SGML doesn't let you express it unambiguously according to its definition
of "ambiguous", doesn't mean it is ambiguous, it just means the standard is
silly.

Hypothesis: The specification makes ambiguous any regular expression whose
associated finite state automata cannot be written with precisely one
positive recognition state (since there will be at least one transition
onto non-isomporphic parts of the FSA sharing a symbol from two positive
states, and this will be an uncollapsable ambiguity according to SGML).

Steve.
</message>
<message id="<1994Sep19.114731.19619@cs.nott.ac.uk>" date="2988964051">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 11:47:31 UT
From: Martijn Koster \<mak@nexor.co.uk>
Reply-To: m.koster@nexor.co.uk
Organization: NEXOR Ltd
Message-ID: <1994Sep19.114731.19619@cs.nott.ac.uk>
References: <1994Sep13.162422.2076@cs.nott.ac.uk> <1994Sep13.172502.13685@ast.saic.com>
Subject: Re: SGML/HTML: An obfuscated markup languag

[Martijn Koster]

|   As there seems little active effort by the SGML people to make SGML
|   simple for Web users (with the exception of the HotMetal people), and
|   after a number of requests from Web users I finally give in.  I have
|   written a document on how to setup psgml and sgmls, on
|
|       \<URL:http://web.nexor.co.uk/mak/doc/html/sgml-lib/html-sgml.html>.
|
|   Included in it is a tar archive of my /usr/local/lib/sgml, which is
|   most the confusion I encountered.

[Bob Agnew]

|   Your efforts to help others is laudable, but please be kind.

My apologies if I have offended anyone; that was not the intention.

|   Many people on this group read their news on VT100s (next thing up the
|   evolutionary ladder after the teletype and the VT52) and have never
|   heard of WEB, much else be in a position to help you with it.

[I first started using the Web on a VT100, but that aside :-)]

I didn't mean SGML people helping WWW users with the Web.  Closer attention
to SGML in the Web's starting days would have prevent a lot of hassle, but
that's all past now.

I meant that for non-SGML people it is not easy to get into SGML, and that
we need help there.  The whole jargon, and the lack of userguides, and the
complexity of tools make it very hard to start using SGML.  Several
newcomers have stated this, and this is not specific to Web users.

If I compare this situation to some other places on the net, say the Web
community, or the Mac community, or even the Windows Programmer's
community, I notice that they have many beginners guides, FAQs,
installation documents etc.  Maybe this difference can be attributed to the
fact that the SGML community is very much smaller, and a lot more technical
and theoretical.  However, this does pose a problem for newcomers, people
who want simple means to use the technology, and are not in a position to
dive deep into the wonders of SGML.

So my hope is that gentler introductions will make this technology more
accessible.  So far I'm afraid I have seen little in that direction.  I
would like to think my guide does a bit towards that, but readily admit I'm
out of my depth there.  I really would appreciate any help people can give
me; Have I configured my SGML files correctly, am I using the right tools
etc.

|   The application of SGML to the WEB server is significant to the future
|   of SGML, but it is somewhat off the mainstream of SGML applications.

Sure, I appreciate that.

|   (Just an opinion -- Oh no, where'd I put that asbestos...)

I'm sorry you think they'd be required... :-)

-- Martijn
-- 
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
Telephone: +44 115 9 520576
WWW: http://web.nexor.co.uk/mak/mak.html
</message>
<message id="<m0qmiig-000C9wC@newman>" date="2988969900">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 13:25:00 UT
From: Matt Timmermans \<mtimmerm@newman.microstar.com>
Message-ID: \<m0qmiig-000C9wC@newman>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   \<!ELEMENT badboy - - ((A*,B?,A*,C?,A*,B,A*)*,A,B,C)>
|   ... this expression is ambiguous as a regular expression, and that is a
|   far more interesting situation than that puny "ambiguity" that SGML
|   forbids...

There are no ambiguous regular expressions.  A regular expression has only
one reduction, and it takes at least two for an ambiguity.  Further, this
reduction always takes place at the end of the input string, which implies
that it is _never_ necessary to use more that one token of lookahead.

This begins to hint at the true cruelty of the SGML ambiguity rule.  If
there are no ambiguous regular expressions, then why do there have to be
ambiguous content models?

It is useful to look at exactly what the SGML ambiguity clause does.
11.2.4.3 states that:

a)  It must be possible to recognize each primitive content token in a 
    content model; and

b)  It must be possible to do so with only one token of lookahead.

Most people who gripe about ambiguity complain about the second part, but
it is the first part that is the real problem.  This turns a content model
from a regular expression into a _grammar_, with a production for each
primitive content token.  In other words, it is assumed that an _action_
will be performed at the start or end of every content model object as it
appears in an instance.

Now, where do these actions come from?  Since actions on content are under
the jurisdiction of the application, there _shouldn't_ be any need for SGML
to specify where in the content model these actions are.  Yet there _is_ a
need for a parser to do just that -- to support those abominable aggregate
groups (a\&b\&c\&d).

When common methods are used to convert aggregates into state machines, the
number of states produced explodes exponentially with the size of the
aggregate.  This can be avoided, however, by changing (a\&b\&c) into (a|b|c)*
and placing actions at the beginning or end of each choice to ensure that
it is entered only once.

The bottom line is that if you want to get rid of ambiguity, you have to
get rid of aggregates as well.  I consider this to be a good thing, because
aggregates ruin a lot of the other cool things you could otherwise do with
content models.

\</Matt>

PS.
There are very clever ways to parse aggregates without and ambiguity clause
_or_ an exponential number of states, but I think this is too much for an
International Standard to ask of _everybody_ who implements it.

-- 
Matt Timmermans               | Phone:  +1 613 727-5696
Microstar Software Ltd.       | Fax:    +1 613 727-9491
34 Colonnade Rd. North        | BBS:    +1 613 727-5272
Nepean Ontario CANADA K2E-7J6 | E-mail: mtimmerm@microstar.com
</message>
<message id="<m0qmjuU-000C9wC@newman>" date="2988974460">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 14:41:00 UT
From: Matt Timmermans \<mtimmerm@newman.microstar.com>
Message-ID: \<m0qmjuU-000C9wC@newman>
Subject: SGML Renewal: DTDs

Since we're on the subject of SGML renewal, I'd like to bring up the matter
of DTDs.

Specifically, the DTD is not a metalanguage.  It was not designed as a
metalanguage (third-hand quote from Charles), and it doesn't function very
well as a metalanguage.

The reason it's not a metalanguage is that it doesn't exist on its own.  A
DTD is not an SGML document, but only part of an SGML document.

The content model in a DTD represents a class of element structures.  Since
there is exactly one DTD for every SGML document (there is no way to
declare that a document conforms to some external DTD), and exactly one
actual element structure for every SGML document, the content model
information in the DTD is redundant.  In effect, the content model gives
several examples of what the element structure _could_ be, and is
immediately followed by the actual element structure.  Clearly, the content
model in the DTD is not required.

The DTD is, in fact, only a typing aid.  It's only purposes in SGML are to
provide an easy (?!) way for authors to avoid certain structuring mistakes,
to provide the information necessary to support tag omission, to support
short references, to provide default and fixed attribute values, to provide
a way to specify an attribute value without naming the attribute, and to
provide a convenient place to declare entities and notations.

It must be recognized, however, that people are using the DTD as a
metalanguage because they _need_ a metalanguage.  What I'm proposing is
that the content model part of the DTD should be removed from the SGML
document (and the SGML standard), and replaced with an external
meta-language (in a different standard).  There would still be a place in
an SGML document to declare entities, notations, and various typing-aid
features like default attribute values.

Now, a cool thing happens when an SGML document is not required to include
its content model -- The content model can be represented as an SGML
document!  (If an SGML document was required to include an SGML document,
there would be recursion ad infinitum.)  This would allow you to manipulate
metadocuments (written in the external metalanguage) with the same tools
you use to manipulate documents.

Further, since the strict one-to-one correspondence between documents and
content models would be broken, you could specify multiple metadocuments
for each document.  If the metalanguage was designed so that you could
specify element types without specifying the tag name, then architectural
forms could also be represented, manipulated, and validated as
metadocuments.

Also, if the metalanguage was designed such that metadocuments could be
chained the way lexers and parsers are today, then you would have all the
functionality of CONCUR, plus a whole lot of significantly more useful
things.

SGML would be much simpler and much more expressive.

\</Matt>

-- 
Matt Timmermans               | Phone:  +1 613 727-5696
Microstar Software Ltd.       | Fax:    +1 613 727-9491
34 Colonnade Rd. North        | BBS:    +1 613 727-5272
Nepean Ontario CANADA K2E-7J6 | E-mail: mtimmerm@microstar.com
</message>
<message id="<35kc6rINN4ac@oasys.dt.navy.mil>" date="2988978843">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 15:54:03 UT
From: Betty Harvey \<harvey@oasys.dt.navy.mil>
Organization: Advanced Information Systems Branch, DTMB, CDNSWC
Message-ID: <35kc6rINN4ac@oasys.dt.navy.mil>
References: <33ijke$oqs@sundog.tiac.net> <1994Sep16.224809.28435@sq.sq.com> <28998.smithn@orvb.saic.com>
Subject: Re: HTML and SGML

There are a couple of issues that I would like to address and I hate to
take anyone's comments out of context.  Yes, a lot of the Web documents are
bad and will not parse.  This is mainly because of lack of education in
what HTML really is.  It is SGML.  If you ask 9 out of 10 WWW Server
administrators I bet you they don't understand the relationship of HTML to
SGML.  I have talked to administrators about SGML and you get a blank
stare, mention HTML and you get a totally different reaction.

It is important to educate people.  I never mention HTML without showing
the relationship of SGML and HTML.  It is not important for Server
Administrators to understand all the nuances of SGML but it is important
for them to understand the relationship of HTML to SGML and know how to
create valid documents (instances).

[Norman E. Smith]

|   Don't bother putting tables in your HTML DTD until Mosaic supports
|   tables as far as I am concerned.  It will only cause more grief when a
|   user codes up a table, then finds out that Mosaic can't display it!

I have to disagree with this statement.  Mosaic is only one of many
browsers (in fact it's an excellent browser).  However, waiting until a
specific browser can display the data is like having the cart pull the
horse.  The technology will catch up with the data, as long as the data is
standardized.  This is an important link.  Configuration management of
DTD's is essential in SGML and for HTML.  There are many, many HTML DTDs
floating around the Web.  Which one is the valid one?  It's anyone's guess.

We have the same problem with our Military DTDs.  Once they are authored
and leave they change and there is no configuration management capability.
This is the reason the Navy is establishing the DTD/FOSI Repository because
we cannot keep track of who is doing what.

|   It is real SGML; there is a DTD (actually several :-).  It may not be
|   pretty, but it is SGML!

[Liam R. E. Quin]

|   Don't poke fun at HTML as `not proper SGML'.  It can be, and is, since
|   there is a DTD for it.  Instead, help people to underastand that full
|   SGML (if you will) is more powerful than HTML, but HTML is a perfectly
|   legitimate use of SGML.

[Norman E. Smith]

|   I agree.

I wholeheartedly agree.  

				Betty

-- 
Betty Harvey  \<harvey@oasys.dt.navy.mil>     | David Taylor Model Basin
Advanced Information Systems Branch          | Carderock Division
Code 183                                     | Naval Surface Warfare
Bethesda, Md.  20084-5000                    |   Center
                                             | DTMB,CD,NSWC   
URL:  http://navysgml.dt.navy.mil/betty.html |          
</message>
<message id="<bebb-190994170343@mac_149.ferndown.ate.slb.com>" date="2988979423">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 16:03:43 UT
From: Malcolm Bebb \<bebb@ferndown.ate.slb.com>
Organization: Schlumberger Technologies ATE
Message-ID: \<bebb-190994170343@mac_149.ferndown.ate.slb.com>
References: <1994Sep13.162422.2076@cs.nott.ac.uk> <1994Sep13.172502.13685@ast.saic.com> <1994Sep19.114731.19619@cs.nott.ac.uk>
Subject: Re: SGML/HTML: An obfuscated markup language

[Martijn Koster]

|   I meant that for non-SGML people it is not easy to get into SGML, and
|   that we need help there.  The whole jargon, and the lack of user
|   guides, and the complexity of tools make it very hard to start using
|   SGML.  Several newcomers have stated this, and this is not specific to
|   Web users.

I visit this group from time to time, and I have to agree with the above
comments.  Some of the posts here are absolutely unintelligible to me. 

That might not be a problem for discussions of underlying code and
structures that I, as a user, will never see.  But that isn't the
impression I get.

I keep trying to fend off the perception that SGML is the creation of a
small group of people self-indulgently building themselves an
over-complicated language that only they can understand.  That is probably
most unfair, but it's how it often appears to me.

To recommend adoption of SGML based authoring tools I'd need to be able to
easily see the benefits and be sure that any learning curve would not be
too steep or too long.  Information at that level does not seem to exist.

-- 
Malcolm      bebb@ferndown.ate.slb.com   (Usual disclaimers apply)
Tech Pubs    bebb@embetech.demon.co.uk
</message>
<message id="<1994Sep19.170533.20089@ast.saic.com>" date="2988983133">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 17:05:33 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep19.170533.20089@ast.saic.com>
References: <1994Sep16.174154.20566@midway.uchicago.edu>
Subject: Re: SGML and its enemies

[Betty Harvey]

|   I vowed I wouldn't enter any of these esoteric arguments about the the
|   validity, rightness/wrongness, of SGML, but I just can't help myself
|   sometimes.  One area that I see everyone is forgetting about SGML is it
|   can not only identify format or structure (if this is what you want to
|   do), it can also identify content th of the data.

[Richard L. Goerwitz]

|   Meaningless point.  The identification of content is more a philosophy
|   of markup than a formal specification.  It's not as if macros and style
|   sheets in many commerical wordprocessors don't do essentially this....

And so, I infer from what you say that many commercial wordprocessors
produce tagged documents which can be searched by third party database
programs based on the tags produced by the wordprocessor.  May I have the
names of some of those products please?

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep19.171931.21645@midway.uchicago.edu>" date="2988983971">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 17:19:31 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep19.171931.21645@midway.uchicago.edu>
References: <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> \<m0qmiig-000C9wC@newman>
Subject: Re: SGML and its enemies

[Matt Timmermans]

|   There are no ambiguous regular expressions.  A regular expression has
|   only one reduction, and it takes at least two for an ambiguity.
|   Further, this reduction always takes place at the end of the input
|   string, which implies that it is _never_ necessary to use more that one
|   token of lookahead.
|   
|   This begins to hint at the true cruelty of the SGML ambiguity rule.  If
|   there are no ambiguous regular expressions, then why do there have to
|   be ambiguous content models?

Oh heavens.  Don't tell me that the SGML designers didn't know what right
linear grammars were.  What is this mess we've gotten ourselves into?

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<35kkcc$5ac@finnegan.iol.ie>" date="2988987212">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 18:13:32 UT
From: Seac Mc Grath \<digitome@iol.ie>
Organization: Digitome Ltd.
Message-ID: <35kkcc$5ac@finnegan.iol.ie>
Subject: SGML CD-ROM's from Chadwyck-Healey???

I am led to believe that an organisation called Chadwyck-Healey produce
CD-ROM's of documents in SGML format.  Has anyone heard of them?

Thanks in advance,

-- 
Sean Mc Grath	digitome@iol.ie
Digitome Ltd.
Electronic Publishing
Irish Permenant House,Pearse St.,Ballina,Co. Mayo
Ireland
Tel : +353 96 72092
</message>
<message id="<9409191825.AA20376@mercury>" date="2988987927">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 18:25:27 UT
From: Mary Holstege \<holstege@mercury.kset.com>
Message-ID: <9409191825.AA20376@mercury>
Subject: Parsing EMPTY elements

Parsing elements whose content model is EMPTY without knowledge of the DTD
*does* present difficulties.  Yes, if you find the end tag of an element's
parent before you find an end tag for that element, you can declare that
element empty and promote its putative children to siblings.  The problem
is that you are thereby required to perform a complete parse of the parent
element in order to parse out any child.

Consider an application where you have, say, a database of SGML documents
and a server that feeds pieces of those documents back to a client.  Let's
say further you do not want to require the client to be too SGML savvy,
with its own full-up SGML parser and DTD manager and so on.  Yet this
client may well want to do things like parse out the child labeled "title"
from this hunk of SGML.  This should be trivial if the server provides full
tagging of the returned text (no omitted tags, etc.).

But if any of the children has an empty content model, you have to parse
the entire segment to be able to pull off any one of them that follows the
empty one (inclusive).  This is unfortunate.

Example:

\<segment>
    \<empty1>
    \<child2>
        Short stuff...
    \</child2>
    ... several more K of stuff I don't care about
\</segment>

If I knew empty1 had an empty content model, or if it had an end tag, I
could parse out child2 has soon as I saw its end tag.  As it is, I have no
idea whether child2 is a child of segment or of empty1 until I hit
\</segment>.  Furthermore, if there are multiple \<child2> tags in this
segment at various levels with various interposed EMPTY elements, I may be
required to keep around arbitrary amounts of state information in order to
find the child2 I desire.  If I had the end tag for empty1 I would be able
to do a simple scan through the text buffer maintaining one pointer to the
start tag of the child and one to the end of it.

This is not a minor programmer inconvenience: if the hunk of text is not
some little "segment" but the virtualization of an entire book I am now in
the position of possibly requiring the client to fetch and parse the entire
book in order to extract the first child, with unacceptable implications
for performance and memory utilization.

It seems that one is left with (1) tossing in bogus end tags (2) tossing in
bogus attributes (e.g., "this-is-empty=1") (3) making the client have to
know something about the DTD.  This is a live issue for me at the moment,
and if anyone has any experience or observations about these (or other)
alternatives, I'd love to hear about it.

                -- Mary
                   Holstege@kset.com

-- 
Mary Holstege, Sr. Member of Technical Staff
KnowledgeSet Corporation
555 Ellis Street                    Tel: +1 415 254 5452
Mountain View, CA 94043             FAX: +1 415 254 5451
</message>
<message id="<35klfi$5nv@crl.crl.com>" date="2988988338">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 18:32:18 UT
From: Joe English \<jenglish@crl.com>
Organization: Helpless people on subway trains
Message-ID: <35klfi$5nv@crl.crl.com>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <19940910.4934@naggum.no> <1994Sep16.173802.20269@midway.uchicago.edu>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   ...and _this_ is the tragedy that I'm trying to prevent by making SGML
|   palatable to the computer scientists and system designers, and to make
|   it possible to use existing tools to build SGML systems, rather than
|   having to build them from scratch because SGML is gratuitously
|   "different"

[Richard L. Goerwitz]

|   But it isn't just "different."  It is comparatively intractable.  My
|   ques- tion is this: Is this intractability necessary?  Is there
|   something about the system that required straying from the LR(1)
|   standard - from the sort of language that is guaranteed to be parsable
|   in nearly linear time using standard parser generators?

Actually, SGML content models are considerably easier to process than LR(1)
grammars.  LR parsers and SGML parsers both take nearly linear time to
process a document instance, but it's also possible to construct a parse
table from a DTD in nearly linear time.  The same is not true for LR or LL
grammars.

(The basic difference is that you must examine the entire grammar to build
an LR or LL parser, whereas each element content model can be parsed with a
DFA constructed independently of any other content models.)

The rules for start-tag omission make the DFA construction a little bit
more hairy than necessary, than but not intractable.

-- 
--Joe English

  jenglish@crl.com
</message>
<message id="<19940919.5248@naggum.no>" date="2988988394">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 18:33:14 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940919.5248@naggum.no>
References: <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> \<m0qmiig-000C9wC@newman> <1994Sep19.171931.21645@midway.uchicago.edu>
Subject: Re: SGML and its enemies

[Richard L. Goerwitz]

|   Oh heavens.  Don't tell me that the SGML designers didn't know what
|   right linear grammars were.  What is this mess we've gotten ourselves
|   into?

OK, won't tell you.  some of us are working on ways to get ourselves out of
the mess, without wreaking havoc on existing, rational documents.  it turns
out to be a rather complicated process.

#\<Erik>
-- 
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<CONNOLLY.94Sep19143459@ulua.hal.com>" date="2988992098">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 19:34:58 UT
From: Dan Connolly \<connolly@ulua.hal.com>
Organization: HaL Software Systems, Inc.
Message-ID: \<CONNOLLY.94Sep19143459@ulua.hal.com>
References: <33ijke$oqs@sundog.tiac.net> <1994Sep16.224809.28435@sq.sq.com> <28998.smithn@orvb.saic.com> <35kc6rINN4ac@oasys.dt.navy.mil>
Subject: Re: HTML and SGML [HTML Standard update...]

[Betty Harvey]

|   There are many, many HTML DTDs floating around the Web.  Which one is
|   the valid one?  It's anyone's quess.

I feel like I should make a periodic posting to this newsgroup...

The HTML standardization process is well under way.  There is an IETF
working group.  The document is stable -- it's all over but the crying for
the current revision of the HTML standard.  The working group charter
includes a milestone that says the document should be published (as an
internet RFC) before the december IETF meeting.

For details, see:

	"HTML Specification Review Materials"
	http://www.hal.com/%7Econnolly/html-spec/index.html

A fairly recent release is available via ftp:

	"HTML 2.0 August 22 Release Notes"
	ftp://halsoft.com/halsoft/olias/html-19940822/README.html

You can try it out "interactively" with a forms-based browser at:

	"HaLsoft HTML Validation Service"
	http://www.hal.com/%7Econnolly/html-test/service/validation-form.html

Configuration management, testing, and Q/A have been high priorities in the
development.  It's somewhat out of date, but there is an extensive
changelog at:

	http://www.hal.com/%7Econnolly/html-spec/ChangeLog

Dan
-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<9409191546.tn162850@aol.com>" date="2988992809">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 19:46:49 UT
From: Eileen Quirk \<equirk@aol.com>
Message-ID: <9409191546.tn162850@aol.com>
Subject: SGML Initiatives Info

I am preparing a review of SGML industry initiatives for SGML Open
(description below).  The intent of the review is to present the business
reason for adopting SGML in each of the industry segments and to establish
a compelling need for companies to implement SGML.  The review will also
describe the initiative's history, status, and direction.  If you have
pertinent information, or know of someone who does, please send email to me
at equirk@aol.com.

The first group of industries includes:

    Computing Software
    Telecommunications
    Commercial Aerospace
    Education
    Commercial Publishing
    Automotive

Thanks in advance for your help.  The information you provide will make the
reviews accurate and complete.

Regards,
Eileen

**
SGML Open is the industry consortium dedicated to accelerating the further
adoption, application, and implementation of SGML.  For further information
please contact Mary Laplante, Executive Director, at laplante@sgmlopen.com.
** 
</message>
<message id="<19940919T205828Z.erik@naggum.no>" date="2988997108">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 20:58:28 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940919T205828Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> <35gbbv$975@news.xs4all.nl> <19940918T184935Z.erik@naggum.no> \<CwDKLM.9FK@cogsci.ed.ac.uk>
Subject: Re: SGML and its enemies

[Jan Grootenhuis]

|   Recently, I tried hard to rewrite an existing grammar into SGML, that
|   specified something like: ((L?, A)*, (L?, B)*) without using using an
|   intermediate level.  I couldn't.

[Erik Naggum]

|   this is a example of where you need a stack to reduce L, x into the
|   appropriate expression.

[Steve Finch]

|   This is also an example of an over-zealous standard: It's a regular
|   expression, hence unambiguous on single token look-ahead when properly
|   compiled (hence, BTW, doesn't need a stack).  Just because the syntax
|   of SGML doesn't let you express it unambiguously according to its
|   definition of "ambiguous", doesn't mean it is ambiguous, it just means
|   the standard is silly.

I had to do this by hand, and found (of course) that Steve is right.  so
much for my rash explanations.  sorry, folks.

the content model ((L?, A)*, (L?, B)*) is satisfied iff the tokens are
recognized by this deterministic finite automaton.  (a figure would have
come in handy here, but all the ones I tried to make came out gross.)
starting state is 0, ending state $, invalid transition marked -.  </> is
the end of the containing element.

	state	\<L>	\<A>	\<B>	</>
	  0      1	 2	 3	 $
	  1      -       2       3	 -
	  2      1       2       3       $
	  3	 4       -       3       $
	  4      -       -       3	 -

|   Hypothesis: The specification makes ambiguous any regular expression
|   whose associated finite state automata cannot be written with precisely
|   one positive recognition state (since there will be at least one
|   transition onto non-isomporphic parts of the FSA sharing a symbol from
|   two positive states, and this will be an uncollapsable ambiguity
|   according to SGML).

what makes the above troublesome to describe with SGML content models is
the loop between states 1 and 2, both of which exit to state 3.  it would
be impossible to describe this automaton by a content model even if the
exit to the ending state could be coerced into one, such as with
((L?, A)*, (L?, B)+).

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940919T210901Z.erik@naggum.no>" date="2988997741">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 21:09:01 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940919T210901Z.erik@naggum.no>
References: <19940916T211444Z.erik@naggum.no> <1994Sep16.230125.5004@sqwest.wimsey.bc.ca> <19940917T023842Z.erik@naggum.no> <1994Sep17.161241.13653@sqwest.wimsey.bc.ca>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   interesting.  my conjecture is that this is because it was an
|   opportunity to understand how the parser works its way through the
|   content model in an actual element instance.  did you teach this
|   separately?

[Marcy Thompson]

|   Yes, we did. The ambiguity exercises go like this:
|   
|   	Here's a content model.  Is it ambiguous?  If so, show me
|   	an ambiguous instance.  Then fix it.
|   
|   The human-parsing exercises go like this:
|   
|   	Here's a valid DTD and a document instance.  Parse the instance.
|   
|   I think the ambiguity exercises are more effective because they require
|   actual engagement with the content models, rather than just looking at
|   them and seeing if a particular instance conforms.

in light of the discussion about regular expressions, it may be instructive
to construct deterministic finite state automata to show how the parser
will actually make each step through the model, matching tokens as it goes.

a content model is then strongly ambiguous (as opposed to weak ambiguities,
which vanish in rewriting) if there are loops in the FSA with more than one
exit, otherwise not.  such exercises seem to be very valuable to get a good
grip on regular expressions, and I find myself thinking clearer when I do
these exercises by hand.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940919T214747Z.erik@naggum.no>" date="2989000067">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 21:47:47 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940919T214747Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T223046Z.erik@naggum.no> \<CwDIxE.8KM@cogsci.ed.ac.uk>
Subject: Re: SGML and its enemies

[Erik Naggum]

|   look, this is really simple.  if you read a start-tag, you assume it
|   has content.  when you read an end-tag, you close it, but when the
|   element just outside of an element ends before that element itself
|   ends, you know it was empty, and you do a tiny tree transformation so
|   that that empty element has all its children as right siblings,
|   instead.  no big deal.  of course, if you use languages in which such
|   manipulations are hard, that's your choice.

[Steve Finch]

|   But you may have to read 200GB of data before you know the structural
|   position of the third opening tag (assuming the first can't be empty).
|   This was surely the main (misconceived) idea behind "no ambiguous
|   content models".

maybe I don't understand this.  how much does it take to retrieve a list of
elements that could be empty from a particular DTD?  suppose you could ask
a parser for such information.  you would need to know the following:

  - elements declared EMPTY
  - elements that have a CONREF attribute, and the name of the attribute

that's it, and you will now know whether elements will have end-tags or not.

|   So it seems simply stupid to include this restriction in the standard,
|   certainly in the hard form it is in now (although we probably wish to
|   preclude structural ambiguity if we allow minimization, but it's far
|   better to preclude minimization).

there are two issues here.  first is validation.  I have always believed
that validation should _never_ occur while a document is being used in
production programs.  production use of SGML documents should be limited to
_one_ error message: "this does not appear to be a valid SGML document.
please validate it, and run again."  second is minimization.  the idea is
that you shall be able to omit a construct if that construct unambiguously
can be constructed for you.  I think this is a great idea.  however, it may
need to be improved on a bit in terms of what "unambiguously" means.  some
of the discussions in WG8 are about this topic.  Dave Peterson wrote a
short paper on what it meant for an element to be contextually required
which I think he should post to this group.  Dave?

(this is why I think Mosaic's lenient acceptance of all and sundry bogus
HTML documents is so bad.  it should allow a "validation mode" which would
explain what is wrong and perhaps offer advice to correct the mistakes, and
be strict about accepting only valid documents.  people test their document
to see if they "work" with Mosaic, anyway, so there's no loss in asking
them to do it right.)

|   Precluding minimization make for fast unambiguous parsing, transparent
|   interpretation, isomorphism (almost; caveat marked sections, comments,
|   possibly entities) between the parse tree and the form (character
|   stream) of representation, and is generally the main thing which needs
|   to be done to make SGML attractive to comp-science types.

minimization is not _that_ difficult, and nobody forces you to write a
document out again with minimization, only that you be able to read it.
that's supposed to be a parser's job, and it is not the hardest part of
processing SGML by far.  we must not forget the users who actually have to
type in the document _somehow_.  I don't buy the argument that we should
need special software to author documents in SGML, or convert from some
other format that is much harder to deal with.  thus, authors must be
supported, and minimization is one good way of doing this.  I'm willing to
sacrifice _some_ functionality and flexibility to this end, but not as much
as we do today.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940919T221619Z.erik@naggum.no>" date="2989001779">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 22:16:19 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940919T221619Z.erik@naggum.no>
References: \<m0qmjuU-000C9wC@newman>
Subject: Re: SGML Renewal: DTDs

[Matt Timmermans]

|   (there is no way to declare that a document conforms to some external
|   DTD)

could you elaborate on this, please?  (yes, I do remember what you said
last time, which is why I ask.)

|   The content model in a DTD represents a class of element structures.
|   Since there is exactly one DTD for every SGML document ..., and exactly
|   one actual element structure for every SGML document, the content model
|   information in the DTD is redundant.  In effect, the content model
|   gives several examples of what the element structure _could_ be, and is
|   immediately followed by the actual element structure.  Clearly, the
|   content model in the DTD is not required.

this is a little warped.  there is no one-to-one relationship between a DTD
and a instance.  first, a DTD defines a document _type_, which intends to
show that certain documents have a commonality among them that may be
described formally and thus exploited.  the notion of a document _type_ is
central to why SGML is still a very good idea.  this also applies to the
element type level.  second, a document instance is not an instance of one
and only one DTD.  there are infinitely many DTD's that fit any given
instance, just as there are infinitely many instances that fit a given DTD.
clearly, the _specific_ DTD that is used and intended must be shipped with
the instance if only for identification purposes -- it is not redundant.

further, a DTD is not just a bunch of tags.  a DTD is an elaboration of a
document type as it exists in an SGML application, complete with semantics
of elements, attributes, notations, etc, whether embedded in code or
properly documented elsewhere.  in this way, a DTD is a formal description
of a part of a larger solution to a set of problems.  through the DTD, an
application program can know which documents it can handle, and documents
can know which SGML application they belong to.  clearly not redundant.

the perhaps most important aspect of the DTD, however, is that it makes it
possible to ask the question: "does this document conform to the type that
it purports to conform to?"  only if you have a DTD can you answer this
question.  the idea that a document instance _conform_ to a _type_ is one
of the core ideas of SGML.  what use is it to talk about a type if you
cannot know whether you face an instance of that type or not?  since SGML
is also intended for document interchange (whether to someone else or to
yourself, years from now), severing the link between type and instance has
some rather annoying practical ramifications.

so please note that the document _type_ is what makes SGML necessary.  the
ability to know that something is what it says it is, is extremely helpful
in keeping both data and systems consistent.  in fact, I would argue that
it is the _lack_ of ability to know whether data is consistent which causes
most of the problems in the information technology business.  I think of
SGML as _primarily_ important because of this, and not from a document's
point of view -- but from a software engineering point of view.  that it
came out of the document world is perhaps surprising and carries with it a
lot of unwanted baggage, but it is the software folks who have the most to
gain from implementing SGML.  (thus my cri de coeur, which Tim Bray called
it, to make SGML less hostile to the computer-scientists.)

|   The DTD is, in fact, only a typing aid.  It's only purposes in SGML are
|   to provide an easy (?!) way for authors to avoid certain structuring
|   mistakes, to provide the information necessary to support tag ommssion,
|   to support short references, to provide default and fixed attribute
|   values, to provide a way to specify an attribute value without naming
|   the attribute, and to provide a convenient place to declare entities
|   and notations.

and all this is "only a typing aid"?  I think you're underestimating a few
things quite dramatically.

|   Now, a cool thing happens when an SGML document is not required to
|   include its content model -- The content model can be represented as an
|   SGML document!  (If an SGML document was required to include an SGML
|   document, there would be recursion ad infinitum.)  This would allow you
|   to manipulate metadocuments (written in the external metalanguage) with
|   the same tools you use to manipulate documents.

you can do this even if you don't remove the heart of the document type
concept.  if the content model is isomorphous to another language, and
possibly one described by a set of SGML content models (!), you can map
back and forth all you like.  I know people who do just this, BTW.

|   Further, since the strict one-to-one correspondence between documents
|   and content models would be broken, you could specify multiple
|   metadocuments for each document.  If the metalanguage was designed so
|   that you could specify element types without specifying the tag name,
|   then architectural forms could also be represented, manipulated, and
|   validated as metadocuments.

maybe you need to explain a little more about this purported one-to-one
correspondence that seems to be a core premise to you argumentation.  I'll
hold off my counter-arguments until I'm sure what you're talking about.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<1994Sep19.233219.25809@ast.saic.com>" date="2989006339">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 23:32:19 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep19.233219.25809@ast.saic.com>
References: <1994Sep16.224809.28435@sq.sq.com>
Subject: Re: HTML and SGML

[Liam R. E. Quin]

|   Don't poke fun at HTML as `not proper SGML'.  It can be, and is, since
|   there is a DTD for it.  Instead, help people to underastand that full SGML
|   (if you will) is more powerful than HTML, but HTML is a perfectly
|   legitimate use of SGML.

[various agreements]

\<opinion>
Sorry folks, I can't agree on this one.  The syntax of SGML is rigorously
supported by EBNF productions just like Pascal and Modula-2, etc.  A Pascal
program without the "program" statement is not a Pascal program.  A
Modula-2 module without the "Module" statement is not a Module.
Conversely, an SGML instance without a declaration subset is not an SGML
instance.  It's SGML-like but it's not SGML.
\</opinion>

-- 
"One man's syntax is another man's semantics." 
</message>
<message id="<35l82k$7rn@ixnews1.ix.netcom.com>" date="2989007380">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 23:49:40 UT
From: Roland Alden \<ralden@ix.netcom.com>
Message-ID: <35l82k$7rn@ixnews1.ix.netcom.com>
Subject: HyTime FAQ needed

Can anybody give me a pointer to a FAQ or spec for HyTime?

(post here or email to 71246.260@compuserve.com)
</message>
<message id="<Y095420.940920.S@ozemail.com.au>" date="2989007666">
Newsgroups: comp.text.sgml
Date: 19 Sep 1994 23:54:26 UT
From: Kate Steketee \<steketee@ozemail.com.au>
Organization: OzEmail Pty Ltd
Message-ID: \<Y095420.940920.S@ozemail.com.au>
Subject: Re: Attempting to aquire DTDs

you can get an extensive test suite on CDROM from Software Exoterica.
</message>
<message id="<Pine.SUN.3.90.940919121341.28461A-100000@cnj.digex.net>" date="2989009163">
Newsgroups: comp.infosystems.interpedia,comp.text.sgml,comp.text
Date: 20 Sep 1994 00:19:23 UT
From: "Rita E. Knox" \<rknox@cnj.digex.net>
Message-ID: \<Pine.SUN.3.90.940919121341.28461A-100000@cnj.digex.net>
Subject: Reposting: Speakers sought for Documation '95

Following the asterisks below is a copy of a message I posted about 2 weeks
ago.  I have received a few very interesting responses, but am still
interested in hearing from people working in the legal, pharmaceutical,
automotive, product data exchange, or other application area.  In case you
didn't see the earlier posting, or have been putting off responding, I have
resent it.  All responses are welcome!

A side question: Is the lack of response to my earlier message indicative
of lack of activity in this "primary domain versus documentation" modelling
area?  Do people understand what the issue is about?  I am curious to hear
others' thoughts on the topic.  Thanks.

-- Rita 

************************************
I am chairing a session at Documation '95 -- "the international forum for
document management applications, document system technology and
interoperability solutions" -- which will be held from March 7-9 at the
Long Beach Convention Center, Long Beach, CA.  A description of the session
follows:
-----------------------------------------------------
Session for Documation '95
Session Chair:  Rita E. Knox

Title:  Data-Driven Documentation: Modelling Issues

Summary: There are many advantages to identifying content in document data
bases.  Among other things it supports cross-referencing, hypertext
navigation, automated data verification and update, and auto-generation of
document components.  Such content must be identified in a meaningful way
-- there must be a correspondence between the document content definition
and the natural structure of the information being documented.  However, at
the same time that document automation experts are developing content
models to support documentation uses, there are domain experts who are
developing content models to support many applications other than
documentation.  Where does the line between these potentially redundant
efforts fall?  What work should each "side" of the industry perform and how
might the efforts be coordinated?  This session explores these issues by
providing examples from different industries where such concurrent
modelling efforts are in progress.

Suggested topic areas:
-- Law/Legal publishing
-- Pharmaceutical/New Drug Applications
-- General Information/Newspaper Publishing
-- Product Data Exchange (STEP)/Technical Documentation
-----------------------------------------------------
I am looking for 2-3 speakers to participate in this session.  Potential
speakers may be working in one of the suggested topic areas or in some
other area where basic domain modelling and documentation modelling may be
occurring simultaneously.  Interested individuals should send me a brief
abstract (500 words) describing their proposed presentation that would
address this topic.  I will respond to all submissions no later than 1
November when I have reviewed all abstracts and made a selection.  Thanks.

-- 
******************************************************
Rita E. Knox, Ph.D.                v: 908.576.8678
Knox\&Assocs/Martin Hensel Corp.    f: 908.576.8679
167 Winding Way                    knox@kanda.com 
Little Silver, NJ 07739         OR rknox@cnj.digex.net
******************************************************
</message>
<message id="<1994Sep20.044545.24832@midway.uchicago.edu>" date="2989025145">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 04:45:45 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep20.044545.24832@midway.uchicago.edu>
References: <1994Sep16.174154.20566@midway.uchicago.edu> <1994Sep19.170533.20089@ast.saic.com>
Subject: Re: SGML and its enemies

[Bob Agnew]

|   And so, I infer from what you say that many commercial wordprocessors
|   produce tagged documents which can be searched by third party database
|   programs based on the tags produced by the wordprocessor.  May I have
|   the names of some of those products please?

All I am saying is that content-based markup is a philosophy, and is not
the exclusive property of SGML.  Its use in government and industry, I
might add, has nothing to do with cleanness and tractability of its design.
Do you challenge me on these points?

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<35m115$2ch@agate.berkeley.edu>" date="2989032933">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 06:55:33 UT
From: Merlin \<merlin@violet.berkeley.edu>
Organization: University of California, Berkeley
Message-ID: <35m115$2ch@agate.berkeley.edu>
Subject: Support for Encapsulated PostScript?

I'm in the process of deciding whether to convert a large body of technical
documentation from Framemaker to SGML.  Illustrations are a major factor in
this decision: I have a large investment in hundreds of illustrations done
in Encapsulated PostScript, and I don't want to completely retool the
process by which I produce illustrations.  Support for tables is another
significant factor.  Do existing SGML authoring tools provide adequate
support for tables and for illustrations done in Encapsulated PostScript?
Is there any readily-accessible literature in this area?  Is there public
domain software that provides such support?  Is there any alternative to
calling every vendor of SGML tools and asking these questions?  Any
information you can provide would be appreciated.
</message>
<message id="<CwFoFz.Fv@Newbridge.COM>" date="2989062141">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:02:21 UT
From: Thomas Wilson \<twilson@newbridge.com>
Organization: Newbridge Networks Corporation
Message-ID: \<CwFoFz.Fv@Newbridge.COM>
Subject: looking for HTML reference

Can anybody tell me a good source for HTML so that I can create HTML
documents for use in Mosaic?  Do I need an HTML parser or viewing software?
Any information about HTML and related software would be most welcome.

Tom Wilson
</message>
<message id="<1994Sep20.150649.16105@ast.saic.com>" date="2989062409">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:06:49 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep20.150649.16105@ast.saic.com>
References: \<bebb-190994170343@mac_149.ferndown.ate.slb.com>
Subject: Re: SGML/HTML: An obfuscated markup languag

[Malcolm Bebb]

|   I visit this group from time to time, and I have to agree with the
|   above comments.  Some of the posts here are absolutely unintelligible
|   to me.
|   
|   That might not be a problem for discussions of underlying code and
|   structures that I, as a user, will never see.  But that isn't the
|   impression I get.
|   
|   I keep trying to fend off the perception that SGML is the creation of a
|   small group of people self-indulgently building themselves an
|   overcomplicated language that only they can understand.  That is
|   probably most unfair, but it's how it often appears to me.

SGML is completely defined by the ISO-8879:1986 standard.  There is little
we can do here to change it.  Some folks here are discussing the
shortcomings of the current standard and proposing fixes, revisions,
deleteions, additions....etc.  This is a good and necessary thing lest the
next standard be mandated by a committee without any participation from us.
Yes, there are those of us who are experimenting with languages that you
will never see, but thats probably not what you overheard.  You probably
just overheard some of the implementors or parser writers discussing
efficient ways to handle subtleties of the language so that someday you
might have a fast and small windows dll that parses the SGML you write.
Please do not dampen their efforts.  If you do not understand the subject
of an article, or the first few sentences seem like jibberish to you, then
thats a good hint to skip the article and perhaps kill the thread so that
you won't be troubled by it any more.

|   To recommend adoption of SGML based authoring tools I'd need to be able
|   to easily see the benefits and be sure that any learning curve would
|   not be too steep or too long.  Information at that level does not seem
|   to exist.

There are many articles of that nature posted here from time to time, but
like any news group, there are many visiting factions.  Many people
participate in more than one type of discussion.  There are people from the
following disciplines and many more:

 authors
 publishers
 editors
 compiler writers
 tool smiths
 implementors 
 www users
 document analysists
 language designers
 .
 .

The SGML community is not large enough for us to split off groups like:

  comp.text.sgml.wizards
  comp.text.sgml.authors
  comp.text.sgml.questions
  comp.text.sgml.misc
  .
  .

Even if there were enough, this kind of fragmentation robs the less
advanced groups from the readership of the experts and breeds a kind of
snobbery (have you ever posted a Unix question to comp.unix.wizards.wiz ?)

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<35muc5$264@sundog.tiac.net>" date="2989062981">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:16:21 UT
From: "Keith M. Corbett" \<kmc@specialform.com>
Organization: Special Form Software
Message-ID: <35muc5$264@sundog.tiac.net>
References: <33d253$en6@sundog.tiac.net> \<CONNOLLY.94Aug25104040@austin2.hal.com> <33ijke$oqs@sundog.tiac.net> <1994Sep16.224809.28435@sq.sq.com> <28998.smithn@orvb.saic.com>
Subject: RE: HTML and SGML

[Keith M. Corbett]

|   Until then I'll use "real" SGML as a front end to filter into HTML.

[Liam R. E. Quin]

|   That's not a bad approach.

[Norman E. Smith]

|   It is real SGML; there is a DTD (actually several :-).  It may not be
|   pretty, but it is SGML!

I think I was not clear what I meant by '"real" SGML'.  I meant, I will use
"real" SGML tools (one or another HTML DTD, sgmls, HoTMetaL, etc.) to
pre-process my HTML files, using the "real" features we started out
discussing - such as entities.

|   SGML IS THE ONLY WAY TO RUN A WEB SERVER AND MAINTAIN HTML DOCUMENTS!

YES YES YES!! :>

Norman, it sounds like your application and approach are a lot like mine.
Your comments have been *extremely* valuable to me.  (Yours too, Lee.)

Thanks!!


 -kmc
</message>
<message id="<35mulb$264@sundog.tiac.net>" date="2989063275">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:21:15 UT
From: "Keith M. Corbett" \<kmc@specialform.com>
Organization: Special Form Software
Message-ID: <35mulb$264@sundog.tiac.net>
References: <33ijke$oqs@sundog.tiac.net> <1994Sep16.224809.28435@sq.sq.com> \<CONNOLLY.94Sep19143459@ulua.hal.com>
Subject: Proposal: comp.text.html [Was: HTML and SGML [HTML Standard update...]]

[Dan Connolly]

|   I feel like I should make a periodic posting to this newsgroup...

Your progress reports about the working group will be much appreciated.

I have a more radical idea: would you care to start a comp.text.html 
newsgroup?  Or maybe some other kind soul would volunteer.  (I don't
administer a news feed, and I would not be sufficiently organized to
manage the call for votes.)

 -kmc
</message>
<message id="<1994Sep20.152608.19376@ast.saic.com>" date="2989063568">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:26:08 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep20.152608.19376@ast.saic.com>
References: <1994Sep20.044545.24832@midway.uchicago.edu>
Subject: Re: SGML and its enemies

[Bob Agnew]

|   And so, I infer from what you say that many commercial wordprocessors
|   produce tagged documents which can be searched by third party database
|   programs based on the tags produced by the wordprocessor.  May I have
|   the names of some of those products please?

[Richard L. Goerwitz]

|   All I am saying is that content-based markup is a philosophy, and is
|   not the exclusive property of SGML.  Its use in government and
|   industry, I might add, has nothing to do with cleanness and
|   tractability of its design.  Do you challenge me on these points?

No.  Your response had nothing to do with my original challenge.  I was
challenging your implication that SGML accomplished no more than style
sheets and macros do in existing commercial products.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<35mvgc$l3t@tpd.dsccc.com>" date="2989064140">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:35:40 UT
From: Jeanne Jordan \<jjordan@tp.dsccc.com>
Organization: DSC Communications Corp. Plano, Tx.
Message-ID: <35mvgc$l3t@tpd.dsccc.com>
Subject: syntax

I am new to SGML DTD writing.  We have created the following:

(admonishment* , note*) it is intended to indicate that you can have an
admonishment (repeatable) without a note (repeatable) or a note without
admonishment but if you have both, the admonishment(s) MUST come before the
note(s).

We want to use this in a repeatable OR list:

\<!ELEMENT procedure - - ((admonishment* , note*), step, (step | para |
  (admonishment* , note*) | graphic | term | display | list | reldoc |
   table | tp100)*>

Question:  If I put this in a repeatable OR list does it negate the 
           enforced order?

           If it does, is there a way to enforce the order?

Thank you for any help.

Jeanne Jordan

email:  jjordan@tp.dsccc.com
</message>
<message id="<CwFqA5.6wL@undergrad.math.uwaterloo.ca>" date="2989064524">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:42:04 UT
From: Warren Baird \<wjbaird@undergrad.math.uwaterloo.ca>
Organization: University of Waterloo
Message-ID: \<CwFqA5.6wL@undergrad.math.uwaterloo.ca>
References: <1994Sep16.174154.20566@midway.uchicago.edu> <1994Sep19.170533.20089@ast.saic.com> <1994Sep20.044545.24832@midway.uchicago.edu>
Subject: Re: SGML and its enemies

[Richard L. Goerwitz]

|   All I am saying is that content-based markup is a philosophy, and is
|   not the exclusive property of SGML.  Its use in government and
|   industry, I might add, has nothing to do with cleanness and
|   tractability of its design.  Do you challenge me on these points?

I challenge you to make your points clearer.  Are you talking about SGML
and its use, cleanness, etc., or are you talking about content-based
markup's use, cleanness, etc.

In either case, I'm still not sure what you are saying in your second
sentence.

Maybe I just didn't get enough sleep last night, but if you want comments
on your points, you are going to have to expand upon them.

Warren
</message>
<message id="<35n0r7$nij@spock.ebt.com>" date="2989065511">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 15:58:31 UT
From: Kent Summers \<kjs@ebt.com>
Organization: Electronic Book Technologies, Inc.
Message-ID: <35n0r7$nij@spock.ebt.com>
References: <35kkcc$5ac@finnegan.iol.ie>
Subject: Re: SGML CD-ROM's from Chadwyck-Healey???

[Sean Mc Grath]

|   I am led to believe that an organisation called Chadwyck-Healey produce
|   CD-ROM's of documents in SGML format.  Has anyone heard of them?

they produce a large collection of english poetry in SGML and DynaText.
</message>
<message id="<m0qn81W-000C9wC@newman>" date="2989067160">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 16:26:00 UT
From: Matt Timmermans \<mtimmerm@newman.microstar.com>
Message-ID: \<m0qn81W-000C9wC@newman>
References: \<m0qmjuU-000C9wC@newman> <19940919T221619Z.erik@naggum.no>
Subject: Re: SGML Renewal: DTDs

[Erik Naggum]

|   maybe you need to explain a little more about this purported one-to-one
|   correspondence that seems to be a core premise to you argumentation.
|   I'll hold off my counter-arguments until I'm sure what you're talking
|   about.

Yes, the one-to-one correspondence between DTDs and documents is crucial
here.  I think we can agree that given an enforced one-to-one
correspondence, the type of, say, the memo of October 23rd would be "memo
of October 23rd", and that such type information is indeed redundant.

Note, also, that I'm only talking about DTDs here.  A document can have
other 'types' that are not defined in its DTD, such as "HyTime document" or
"Comp.Text.SGML posting from Matt", or, of course, "SGML Document".  The
association between documents and these types is made by the context in
which the documents are used.  The fact that I'm parsing a document with an
SGML parser, for example, implies that it is of type "SGML document".  The
fact that I'm posting this article to this newsgroup implies that it is of
type "comp.text.sgml posting from Matt".


The crucial fact establishing the one-to-one correspondence between DTDs
and documents is that every SGML document _contains_ its own DTD, as
opposed to _declaring_ conformance to some external DTD.  Since every
document _contains_ one DTD, there must be exactly one DTD for every
document.

When an SGML document begins with:

    \<!DOCTYPE cow PUBLIC "+//ISBN 1-55160::MSL//DTD cow//EN"
    [

it _appears_ to _declare_ that the document conforms to the external type
"+//ISBN 1-55160::MSL//DTD cow//EN".  You and I know, however, that this is
really just an entity reference, equivalent to

    \<!DOCTYPE cow
    [
    \<!ENTITY % thedtd PUBLIC "+//ISBN 1-55160::MSL//DTD cow//EN">
    %thedtd;
    ]>

and that it doesn't really imply any sort of conformance whatsoever,
because one could always

    \<!DOCTYPE cow PUBLIC "+//ISBN 1-55160::MSL//DTD cow//EN"
    [
    \<!ELEMENT cow - - ANY>

It is clear, I hope, that referencing an external DTD subset does not imply
conformance.  Nowhere in the Standard does it say that it _declares_
conformance, either.

Now, since every SGML document _must_ have a content model, and there is no
way _declare_ conformance to an external content model, it follows that
every SGML document must _define_ its own.

Any _real_ notion of type that gets associated with your SGML documents
comes from the context in which those documents are used.


If the above argument doesn't convince, then I'll try it a different way:

Every SGML document has exactly one DTD.

Unless there is a one-to-one correspondence between documents and DTDs, it
must be possible for two documents to share the same DTD.

How, then, do you declare or determine that two documents share the same
DTD?

Where does it say that in the Standard?


What I'm proposing is not that we abandon the notion of type, but simply
that we get rid of DTDs, have FPIs for _real_ types instead of declaration
subsets, and provide a standard, external way for SGML systems to maintain
the content models of _real_ document types as SGML documents.

\</Matt>

-- 
Matt Timmermans               | Phone:  +1 613 727-5696
Microstar Software Ltd.       | Fax:    +1 613 727-9491
34 Colonnade Rd. North        | BBS:    +1 613 727-5272
Nepean Ontario CANADA K2E-7J6 | E-mail: mtimmerm@microstar.com
</message>
<message id="<1994Sep20.170443.7223@ast.saic.com>" date="2989069483">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 17:04:43 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep20.170443.7223@ast.saic.com>
References: <35m115$2ch@agate.berkeley.edu>
Subject: Re: Support for Encapsulated PostScript?

[Merlin]

|   I'm in the process of deciding whether to convert a large body of
|   technical documentation from Framemaker to SGML.  Illustrations are a
|   major factor in this decision: I have a large investment in hundreds of
|   illustrations done in Encapsulated PostScript, and I don't want to
|   completely retool the process by which I produce illustrations.
|   Support for tables is another significant factor.  Do existing SGML
|   authoring tools provide adequate support for tables and for
|   illustrations done in Encapsulated PostScript?  Is there any
|   readily-accessible literature in this area?  Is there public domain
|   software that provides such support?  Is there any alternative to
|   calling every vendor of SGML tools and asking these questions?  Any
|   information you can provide would be appreciated.

Arbortext publisher provides adequate support for converting Frame to SGML
but conversions need to be done manually.  The way I do it is to bring up
frame and Arbortext side by side and cut and paste between them.  Arbortext
uses Island Draw as its draw tool.  Bring up the Island Draw, and select
convert from postscript.  Then store it as a ".drw" file.  You could keep
them in encapsulated postscript and define a different graphics processor
in Arbortext like Adobe Illustrator.  That would be my choice.

Tables have to be cut and pasted cell by cell.  I have looked at the mif
markup generated by frame for tables and it's not that difficult to
translate to SGML.  I would suspect that tools to do this already exist,
but if not, I have found myself another market niche.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<19940920T194408Z.erik@naggum.no>" date="2989079048">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 19:44:08 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940920T194408Z.erik@naggum.no>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <19940910.4934@naggum.no> <1994Sep16.173802.20269@midway.uchicago.edu> <35klfi$5nv@crl.crl.com>
Subject: Re: SGML and its enemies

[Joe English]

|   (The basic difference is that you must examine the entire grammar to
|   build an LR or LL parser, whereas each element content model can be
|   parsed with a DFA constructed independently of any other content
|   models.)

I think we're using the terms LL and LR with SGML in a more metaphorical
sense.  it is true that constructing the whole language would be rather
complicated given the content models are "productions", but since each
element is self-contained and self-delimited, you don't need to do this.

what is being discussed, I think, is whether it makes sense to limit the
DFA's such that they could have described LL(1) grammars.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<webdog-2009941208240001@edu-154.sfsu.edu>" date="2989080504">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 20:08:24 UT
From: Jeff Schwartz \<webdog@aol.com>
Reply-To: 71246.260@compuserve.com
Organization: Information Resources and Technology
Message-ID: \<webdog-2009941208240001@edu-154.sfsu.edu>
References: <35l82k$7rn@ixnews1.ix.netcom.com>
Subject: Re: HyTime FAQ needed

[Roland Alden]

|   Can anybody give me a pointer to a FAQ or spec for HyTime?

Use Mosaic or your favorite web browser to check out URL:

       http://info.cern.ch/hypertext/Standards/HyTime.html

-- 
 /  Jeff Schwartz     /   Halley's Comet  /
/   webdog@aol.com   /    is a myth!     /
</message>
<message id="<1994Sep20.212251.8537@midway.uchicago.edu>" date="2989084971">
Newsgroups: comp.text.sgml
Date: 20 Sep 1994 21:22:51 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep20.212251.8537@midway.uchicago.edu>
References: <1994Sep20.044545.24832@midway.uchicago.edu> <1994Sep20.152608.19376@ast.saic.com>
Subject: Re: SGML and its enemies

[Bob Agnew]

|   Your response had nothing to do with my original challenge.  I was
|   challenging your implication that SGML accomplished no more than style
|   sheets and macros do in existing commercial products.

Which was a silly thing for me to - imply.

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<1994Sep20.161459.1@us.oracle.com>" date="2989095299">
Newsgroups: comp.text.sgml,comp.text.interleaf
Date: 21 Sep 1994 00:14:59 UT
From: Comet \<comet@us.oracle.com>
Organization: Oracle Corporation -- WorldWide Support
Message-ID: <1994Sep20.161459.1@us.oracle.com>
Keywords: Interleaf SGML
Summary: The FAQ seemed incomplete.
Subject: Interleaf SGML?

Is it difficult to use Interleaf to produce SGML-compliant output?

-- 
                                ____________
Comet \<comet@bayvax.decus.org> /\\  _________\\    raves / polyamory / wicca
                               \\ \\ \\______  /    love / chocolate
An Equal-Opportunity Lover      \\ \\ \\Bi/ / /     pen-pals (U.S. Mail) / bi
Watch Out, You Could Be Next!    \\ \\ \\/ / /      photography / hugs / origami
                                  \\ \\/ / /       friends / flying / cats
Hmm.  What does ESC/X do           \\  / /        reading / tarot
on the Model 100, anyway?           \\/_/
</message>
<message id="<19940921T002950Z.erik@naggum.no>" date="2989096190">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 00:29:50 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940921T002950Z.erik@naggum.no>
References: <9409191825.AA20376@mercury>
Subject: Re: Parsing EMPTY elements

[Mary Holstege]

|   Parsing elements whose content model is EMPTY without knowledge of the
|   DTD *does* present difficulties.  ...  The problem is that you are
|   thereby required to perform a complete parse of the parent element in
|   order to parse out any child.

let's back up a bit.  Lee Quin said that if it wasn't for EMPTY "you could
take a fully expanded instance and make an in-memory data structure without
needing the DTD".  I replied that EMPTY did not make this more difficult,
and showed that one could learn from an instance whether a given element
was empty.  that is, the argument did not follow from its premises.

then Steve Finch says basically the same thing you do, that this may pose
some difficulties, but for very different scenarios than Lee Quin used, and
to which I answered.  please re-read his statement: "you could take a fully
expanded instance and make an in-memory data structure without needing the
DTD."  we are obviously not talking about 200GB of data here (Steve Finch).
we are obviously not talking about parsing whole books to retrieve a child
element (Mary Holstege).  we _are_ talking about processes that _do_ read
the entire document into memory before they do anything with it.

given Lee Quin's scenario, it is entirely legitimate to deduce elemental
emptiness from structure that you read into memory, and it would cost you
very little to deduce such, given that you _do_ read the entire structure
into memory.

so you have a choice: parse the whole instance and build an in-memory data
structure without knowing anything about empty elements, or know which
elements are empty, and read only as much as you need.  given this choice,
and 200GB of data or a whole book that I needed to find titles in, I don't
think I would have any trouble deciding to write a utility that told me
which elements could be empty in an instance.

|   It seems that one is left with (1) tossing in bogus end tags (2)
|   tossing in bogus attributes (e.g., "this-is-empty=1") (3) making the
|   client have to know something about the DTD.  This is a live issue for
|   me at the moment, and if anyone has any experience or observations
|   about these (or other) alternatives, I'd love to hear about it.

(1) is chosen by ESIS, which gives you an end-tag "event" after an empty
element, so you have no idea whether an element is empty because it is
declared empty or just because it doesn't have any data.

(2) is the mechanism used by CONREF attributes.  I assume that you don't
want to know about these, either.

(3) is kind of curious in that it assumes that the data the client receives
from the SGML document is completely foreign to it.  i.e., it knows nothing
about the structure of the document, and is presumably completely unaware
of what is being asked.  I think this is a primary example of when you _do_
need the DTD.  but suppose the client was tailored to this document type;
wouldn't it _already_ know that an element would be empty in the instance?
the DTD is supposed to be a formalization of the structure of the data the
application program expects.

so can I assume that this is all an allergic reaction to DTDs?  it doesn't
seem to be a real problem, any way I look at it.  would it help if we could
find ways to express the information in DTD's in a form better suited to
those programs?  if so, I think we could join in the effort to make SGML
more computer-science-friendly.  I don't think we should start by undoing
SGML, but rather see if there are ways to make it easier to deal with.  if
we need to undo things, we should undo things carefully and not without
looking for other options first.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<1994Sep21.011746.14931@rat.csc.calpoly.edu>" date="2989099066">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 01:17:46 UT
From: Anonymous \<rkaye@denali.csc.calpoly.edu>
Organization: Disorganized, Inc.
Message-ID: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: SGML Viewers and Formatters

I've been digging through mounds and mounds of documentation and product
literature on SGML tools.  I've not been able to find anything that comes
close to what I am looking for.

I am looking for an SGML viewer that supports hypertext links and that
takes care of formatting the documents.  We are hoping to publish SGML
documents on a CD-ROM, using a Windows based viewer.

So far, all the tools that I have seen have either lacking features, horrid
user interfaces, or cannot be re-distributed and packaged with our product.
The one package that comes close to what I am looking for is IADS, but it
suffers from the latter two deficiencies.

Can anyone point me to some companies/groups that might fit the bill?

-- 
-ruaok

That was more like an anal rasberry! - gcortez
</message>
<message id="<35o6hs$sua@newsbf01.news.aol.com>" date="2989104124">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 02:42:04 UT
From: Robert Reich \<robert1285@aol.com>
Message-ID: <35o6hs$sua@newsbf01.news.aol.com>
References: <342gak$bnt@finnegan.iol.ie>
Subject: Re: SGML to postscript/PDF

At Seybold SF, I saw a brief demo at the release of Acrobat 2.0, by
Avalanch.  The plugin is designed to abstract SGML from a PDF file.  As for
trying to produce PS direct from SGML, my sugestion is get a good page
layout program that can be automated.

Robert
reichr@moodys.com
</message>
<message id="<35oec9$o29@pop0.rain.rg.net>" date="2989112137">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 04:55:37 UT
From: Steven Kornreich \<steve@eps.com>
Organization: RGNET
Message-ID: <35oec9$o29@pop0.rain.rg.net>
Subject: Xyvision > SGML ??

Are there any good tools to take Xyvision formatted documents into SGML
format via something like ArborText?

-- 
Steven Kornreich
Kornreich Communications
</message>
<message id="<nwu.release.online@fastlane.vicki.net>" date="2989114203">
Newsgroups: alt.etext,alt.journalism,alt.motherjones,alt.politics.datahighway,alt.wired,alt.zines,comp.text,comp.text.sgml,misc.writing,rec.arts.books,rec.arts.int-fiction,rec.arts.prose
Date: 21 Sep 1994 05:30:03 UT
From: Philip Mattera \<slope@panix.com>
Organization: National Writers Union (AFL/CIO)
Message-ID: \<nwu.release.online@fastlane.vicki.net>
Subject: Fairness in Online Book Publishing

WRITERS UNION URGES FAIR TREATMENT OF AUTHORS IN ONLINE BOOK PUBLISHING

New York, September 21, 1994 -- The National Writers Union (NWU) today
issued a call for writers' rights in the emerging field of online book
publishing.

"The distribution of book-length works via networks such as the Internet is
an exciting development for writers," said NWU President Jonathan Tasini,
"but we want to be sure that authors are treated fairly in this new
medium."

As part of its ongoing efforts to assert writers' rights in all new media,
the NWU today released a document called Recommended Principles for
Contracts Covering Online Book Publishing.  This position paper complements
a similar document on CD-ROM and other disc-based electronic books issued
by the NWU last April.

"Online book publishing is an even more embryonic industry than CD-ROMs,"
said NWU Book Grievance Officer Philip Mattera, who led the group that
produced the Recommended Principles.  "But we think it is not too early to
urge online publishers to adopt fair practices in their dealings with
authors."

Mattera said that in analyzing the online publishing business, the NWU
recognized that there is already a wide range of practices in the young
industry.  Some online companies, Mattera noted, simply accept any work
that is submitted on diskette, list it in an online catalogue, and transmit
it to customers.  The more sophisticated ones add hypertext links to the
author's text, which allows readers accessing the work through the
Internet's World Wide Web to click on "hot spots" that automatically call
up related passages from other works or even bits of audio or video.

"Our position is that the terms of the online book contract should reflect
the amount of value added by the online publisher," Mattera said.  "Online
companies that do no editorial work and merely transmit plain-text versions
are in effect acting as mere distributors and should receive a small share
of the income from online sales.  Those that enhance the work through the
addition of hypertext links are entitled to a much larger portion of the
revenues."

The main points in the NWU's Recommended Principles are the following:

 - Writers should retain the copyright on their works, though online
   publishers adding hypertext links may claim copyright on those added
   elements.

 - Because online publishing is still underdeveloped, authors should grant
   publishing rights of limited duration.

 - The division of revenues between authors and online publishers should
   reflect the factors discussed above, but in no case should the author
   receive less than 50 percent of the money collected from customers.
   This reflects the absence of the manufacturing, warehousing, and other
   costs associated with the publishing of traditional print books.

 - Contracts for online books should specify how the work will be made
   available and what promotion efforts the publisher will undertake.

 - Given that traditional "out of print" procedures do not apply, contracts
   for online books should terminate when the publisher stops promoting the
   work.

The 4,000-member National Writers Union works to protect the rights of
freelance writers of all kinds, from journalists and technical writers to
novelists and poets.  Founded in 1983, it is affiliated with the United
Auto Workers union.

As part of its work on new technologies, the NWU helped a group of its
members bring suit last December against several publishers and online
services for distributing electronic versions of articles they had written
as freelancers.  The suit, Tasini v. New York Times et al., is pending in
federal court in New York.


TO RECEIVE THE FULL-TEXT OF THE RECOMMENDED PRINCIPLES (ABOUT 20k), CONTACT
PHILIP MATTERA AT slope@panix.com
</message>
<message id="<35otl5$1kt$1@mhadf.production.compuserve.com>" date="2989127780">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 09:16:20 UT
From: Marc Fresko <100116.1152@CompuServe.COM>
Organization: via CompuServe Information Service
Message-ID: <35otl5$1kt$1@mhadf.production.compuserve.com>
References: <35kkcc$5ac@finnegan.iol.ie>
Subject: Re: SGML CD-ROM's from Chadwyck-Heale..

[Sean McGrath]

|   I am led to believe that an organisation called Chadwyck-Healey produce
|   CD-ROM's of documents in SGML format.  Has anyone heard of them?

As I understand it, Chadwyck-Healey is a leading publisher of scholarly
books, microfilm and CD-ROM.  Telephone +44 223 215512.

Marc Fresko

-- 
Marc Fresko
e-mail 100116.1152@compuserve.com
paper mail 45A Welcomes Road, Kenley CR8 5HA, England
</message>
<message id="<35p9rk$cve@spock.ebt.com>" date="2989140276">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 12:44:36 UT
From: Kent Summers \<kjs@ebt.com>
Organization: Electronic Book Technologies, Inc.
Message-ID: <35p9rk$cve@spock.ebt.com>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: Re: SGML Viewers and Formatters

[Mr. Raytrace]

|   I am looking for an SGML viewer that supports hypertext links and that
|   takes care of formatting the documents.  We are hoping to publish SGML
|   documents on a CD-ROM, using a Windows based viewer.

ruaok --> send email to info@ebt.com containing your surface address.  i'll
send you some product information on dynatext.  i think this is what you're
looking for.  thanks.
</message>
<message id="<35paqr$d4i@spock.ebt.com>" date="2989141275">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 13:01:15 UT
From: Kent Summers \<kjs@ebt.com>
Organization: Electronic Book Technologies, Inc.
Message-ID: <35paqr$d4i@spock.ebt.com>
References: \<jfritchCw8DoL.D27@netcom.com>
Subject: Re: SGML in the Oil & Gas Industry?

[Jeanne Fritch]

|    I'm looking for any information about SGML in the Oil and Gas
|    Industry.  Is there an existing DTD?  Are there any case studies
|    available?

a good person to contact is bob streich at schlumberger --> +1 512 331 3010
</message>
<message id="<35pdo3$5mf@sundog.tiac.net>" date="2989144258">
Newsgroups: comp.text.sgml,comp.text.interleaf
Date: 21 Sep 1994 13:50:58 UT
From: "Keith M. Corbett" \<kmc@specialform.com>
Organization: Special Form Software
Message-ID: <35pdo3$5mf@sundog.tiac.net>
References: <1994Sep20.161459.1@us.oracle.com>
Subject: Re: Interleaf SGML?

[Comet]

|   Is it difficult to use Interleaf to produce SGML-compliant output?

I'm not familiar with other SGML products.  But for applications that
require the powerful functionality of Interleaf, "getting to SGML" is
certainly feasible.  There are several ways to go that provide more or less
integration between publishing and editing:

1) You can use Interleaf as a structured document editor, prior to (or in
   parallel with) adopting SGML.  If your data is already structured,
   conversion to SGML will go more smoothly.  In general good document
   design in in Interleaf produces documents that are more portable than
   those produced in format oriented markup systems.

2) Conversion or filtering from Interleaf out to SGML can be done in many
   ways.  It's possible to translate Ileaf Ascii markup directly (more or
   less) to SGML; some hand editing may be required.  Custom filters can be
   developed in Lisp to automate this process.  (I've heard about a product
   for converting Interleaf documents into SGML, but I haven't seen it.)

3) Interleaf's SGML produt incorporates sophisticated SGML capabilities
   into their standard authoring tools.  Input SGML is converted into
   Interleaf objects (folders, documents, components, graphics etc) based
   on a table of translation rules that trigger Lisp program code.
   Authoring within Interleaf is guided by the DTD.  Entities and
   attributes can be mapped to and from components, graphics, etc.

Some customization and/or software development is usually required to glue
the pieces together in an Interleaf SGML application.  For example,
Interleaf's CALS software is layered on top of the SGML toolkit; the result
is a complete desktop environment for working with CALS data.  Depending on
the complexity of the DTD and publishing requirements, building an
application for incorporating a new DTD into Interleaf could require a
significant development effort; 2 to 6 months for a team of two people
(programmer and application expert) is fairly typical, from what I've seen.

Your mileage may vary.
</message>
<message id="<35pech$d8e@news.xs4all.nl>" date="2989144913">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 14:01:53 UT
From: Jan Grootenhuis \<jang@xs4all.nl>
Organization: XS4ALL, networking for the masses
Message-ID: <35pech$d8e@news.xs4all.nl>
References: <19940909.4927@naggum.no> <34rjsj$5ol@deep.rsoft.bc.ca> <351a33$t9c@rs18.hrz.th-darmstadt.de> <1994Sep15.050330.4223@sq.sq.com> <19940916T211444Z.erik@naggum.no> <35gbbv$975@news.xs4all.nl> <19940918T184935Z.erik@naggum.no> \<CwDKLM.9FK@cogsci.ed.ac.uk>
Subject: Re: SGML and its enemies

[Jan Grootenhuis]

|   Recently, I tried hard to rewrite an existing grammar into SGML, that
|   specified something like: ((L?, A)*, (L?, B)*) without using using an
|   intermediate level.  I couldn't.

[Erik Naggum]

|   the content model ((L?, A)*, (L?, B)*) is satisfied iff the tokens are
|   recognized by this deterministic finite automaton.  (a figure would
|   have come in handy here, but all the ones I tried to make came out
|   gross.)  starting state is 0, ending state $, invalid transition marked
|   -.  </> is the end of the containing element.
|
|   	state	\<L>	\<A>	\<B>	</>
|   	  0      1	 2	 3	 $
|   	  1      -       2       3	 -
|   	  2      1       2       3       $
|   	  3	 4       -       3       $
|   	  4      -       -       3	 -

[Steve Finch]

|   Hypothesis: The specification makes ambiguous any regular expression
|   whose associated finite state automata cannot be written with precisely
|   one positive recognition state (since there will be at least one
|   transition onto non-isomporphic parts of the FSA sharing a symbol from
|   two positive states, and this will be an uncollapsable ambiguity
|   according to SGML).

[Erik Naggum]

|   what makes the above troublesome to describe with SGML content models
|   is the loop between states 1 and 2, both of which exit to state 3.  it
|   would be impossible to describe this automaton by a content model even
|   if the exit to the ending state could be coerced into one, such as with
|   ((L?, A)*, (L?, B)+).

Also, ((L?, A)+, (L?, B)+) wouldn't help much, would it?

I am very grateful for the expert opinions, but I do not understand the
lingo.  I was looking for sort of cheerful statement like: "An ambiguity in
SGML cannot be resolved by trivial reordering if the offending element
first occurs in a (optional/)repeatable group."  Is this true (if less
loosely formulated), or it the truth wider/narrower?

BTW the subject header might suggest I were a SGML enemy: not.

Jan
</message>
<message id="<1994Sep21.154717.5065@ast.saic.com>" date="2989151237">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 15:47:17 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep21.154717.5065@ast.saic.com>
References: <35mvgc$l3t@tpd.dsccc.com>
Subject: Re: syntax

[Jeanne Jordan]

|   I am new to SGML dtd writing.  We have created the following:
|   
|   (admonishment* , note*) it is intended to indicate that you can have an
|   admonishment (repeatable) without a note (repeatable) or a note without
|   admonishment but if you have both, the admonishment(s) MUST come before
|   the note(s).
|   
|   We want to use this in a repeatable OR list:
|   
|   \<!ELEMENT procedure - - ((admonishment* , note*), step, (step | para |
|     (admonishment* , note*) | graphic | term | display | list | reldoc |
|      table | tp100*)>
|   
|   Question:  If I put this in a repeatable OR list does it negate the 
|              enforced order?
|   
|              If it does, is there a way to enforce the order?

Yes, but before I can discuss that intelligently, you need to fix the
syntax error so that I can read the expression correctly.  I count 4 left
parens and 3 right parens.  (I'm not too good at that by hand any more so I
had vi check it.)  After you correct it, what is the scope of the ordering?
Do you intend to enforce the ordering over the entire scope of the
procedure element?

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep21.154845.2078@sq.sq.com>" date="2989151325">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 15:48:45 UT
From: Kate Hamilton \<kate@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep21.154845.2078@sq.sq.com>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: Re: SGML Viewers and Formatters

[Mr. Raytrace]

|   I am looking for an SGML viewer that supports hypertext links and that
|   takes care of formatting the documents.  We are hoping to publish SGML
|   documents on a CD-ROM, using a Windows based viewer.
|
|   So far, all the tools that I have seen have either lacking features,
|   horrid user interfaces, or cannot be re-distributed and packaged with
|   our product....
|
|   Can anyone point me to some companies/groups that might fit the bill?

SoftQuad Explorer has the features you mention:

* a handsome (and publisher-configurable) user interface
* very good document formatting
* hypertext links (HyTime nameloc, treeloc, dataloc; easy to add this
  support to your DTD and to use it)
* explicit support for CD-ROM publishing

and more:

* publisher-defined indexes
* navigators: interactive table of contents (use with any SGML elements:
  make a conventional table of contents -and- a list of illustrations, for
  example)
* webs: user can make annotations, highlights, bookmarks, hypertext links
* views: publisher or user can arrange documents in folders (can also be
  shown as trees)
* in-line graphics or graphics launch in a separate window

and more.

For sales/marketing/implementation materials, contact sales@sq.com or
kate@sq.com.

Kate Hamilton, Manager
Explorer Implementation Group
SoftQuad Inc.
</message>
<message id="<1994Sep21.155952.6616@ast.saic.com>" date="2989151992">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 15:59:52 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep21.155952.6616@ast.saic.com>
References: \<m0qn81W-000C9wC@newman>
Subject: Re: SGML Renewal: DTDs

[Erik Naggum]

|   maybe you need to explain a little more about this purported one-to-one
|   correspondence that seems to be a core premise to you argumentation.
|   I'll hold off my counter-arguments until I'm sure what you're talking
|   about.

[Matt Timmermans]

|   Yes, the one-to-one correspondence between DTDs and documents is
|   crucial here.  I think we can agree that given an enforced one-to-one
|   correspondence, the type of, say, the memo of October 23rd would be
|   "memo of October 23rd", and that such type information is indeed
|   redundant.
:
|   The crucial fact establishing the one-to-one correspondence between
|   DTDs and documents is that every SGML document _contains_ its own DTD,
|   as opposed to _declaring_ conformance to some external DTD.  Since
|   every document _contains_ one DTD, there must be exactly one DTD for
|   every document.


and I was sure the standard said that there could only be one ACTIVE DTD at
any one time and that the default active DTD was the BASE DTD declared in
the prolog in the document declaration subset.  Silly me.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<35plee$gak@icarus.convex.com>" date="2989152142">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 16:02:22 UT
From: Peter Cash \<cash@convex.com>
Organization: The Instrumentality
Message-ID: <35plee$gak@icarus.convex.com>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: Re: SGML Viewers and Formatters

[Mr. Raytrace]

|   I am looking for an SGML viewer that supports hypertext links and that
|   takes care of formatting the documents.  We are hoping to publish SGML
|   documents on a CD-ROM, using a Windows based viewer.

Have you looked at DynaText from Electronic Book Technologies in Rhode
Island?  DT has some warts, but it's the best around.  It has features that
support your requirements.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             |      Die Welt ist alles, was Zerfall ist.     |SirMore@aol.com
Peter Cash   |       (apologies to Ludwig Wittgenstein)      |cash@convex.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
</message>
<message id="<1994Sep21.160427.7598@ast.saic.com>" date="2989152267">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 16:04:27 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep21.160427.7598@ast.saic.com>
References: <1994Sep20.161459.1@us.oracle.com>
Subject: Re: Interleaf SGML?

[Comet]

|   Is it difficult to use Interleaf to produce SGML-compliant output?

Not if you ask an Interleave salesman. ;-)

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<CwHo9p.GL1@gordian.com>" date="2989155229">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 16:53:49 UT
From: Susan Gallagher \<susan@gordian.com>
Organization: Gordian
Message-ID: \<CwHo9p.GL1@gordian.com>
Subject: Frame to SGML -- latest translator?

I looked at Frame to SGML translators a couple of months ago, and was
unable to find anything that was at all simple to use (even to test) --
any notes seemed geared solely towards programmers.

Anyone know where I can find the latest filters?  Ideally, how to get
started with them?

Thanks,

Susan
-- 
susan@gordian.com
Gordian                                +1 714 850 0205
20361 Irvine Ave.                      +1 714 850 0533 FAX
Santa Ana Heights, CA 92707
</message>
<message id="<CONNOLLY.94Sep21120220@ulua.hal.com>" date="2989155740">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 17:02:20 UT
From: Dan Connolly \<connolly@ulua.hal.com>
Organization: HaL Software Systems, Inc.
Message-ID: \<CONNOLLY.94Sep21120220@ulua.hal.com>
References: <35kc6rINN4ac@oasys.dt.navy.mil> <1994Sep19.233219.25809@ast.saic.com>
Subject: Re: HTML and SGML

[Bob Agnew]

|   \<opinion>
|   Sorry folks, I can't agree on this one.  The syntax of SGML is rigously
|   supported by ebnf productions just like Pascal and Modula-2, etc.  A
|   Pascal program without the "program" statement is not a Pascal program.
|   A Modula-2 module without the "Module" statement is not a Module.
|   Conversly, an SGML instance without a declaration subset is not an SGML
|   instance.  It's SGML-like but it's not SGML.
|   \</opinion>

I'm having a hard time making sense of this: "an SGML instance without a
declaration subset is not an SGML instance."  Since when does the instance
part of an SGML document contain a declaration subset?  Section 6.2,
production [2] says:

	[2] SGML document entity = SGML declaration,
		s*, prolog, s*, document instance set, Ee

Definition 4.160 reads:

	4.160 instance (of a document type): The data and markup for a
	hierarchy of elements that conforms to a document type definition

To be precise: the data stream exchanged between HTTP clients and servers,
which more often begins with

	\<title>...\</title>

than

	\<!docytype html ...>
	\<title>...\</title>

is not an SGML document entity.  Nonetheless, HTML is an SGML application
conforming to ISO8879.

To construct an SGML document entity for parsing, if there's no prologue in
the data stream, the entity manager/application program/system adds one.
This is part of the specified relationship between a MIME text/html body
(data streams) and an SGML document conforming to the HTML document type.

I refer you to

	http://www.hal.com/products/sw/olias/Build-html/rzfs0XBMCmF84aK.html

Dan

-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<00984D07.1F4EAFA0.9@vax.ox.ac.uk>" date="2989157344">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 17:29:04 UT
From: Lou Burnard \<lou@vax.ox.ac.uk>
Message-ID: <00984D07.1F4EAFA0.9@vax.ox.ac.uk>
Subject: Book now for TEI Meta Workshop...

                           TEI Meta-Workshop

                       Chicago, 12-16 December 1994

The Text Encoding Initiative is pleased to announce a Workshop for
potential TEI Consultants and Trainers, to be held in Chicago, 12-16
December 1994 (Monday-Friday).  We're calling this a "Metaworkshop" because
its object will be to provide participants with an intensive training in
the teaching of TEI workshops -- not directly to train potential users of
the TEI Guidelines.  Numbers will be strictly limited (probably to about
thirty people), so if it's greatly over-subscribed we'll just have to
organize another one.

Who should attend?

The Workshop will be open to both academic and commercial participants.
Participants should have extensive experience using computers for textual
research, and preferably also familiarity with SGML and the TEI Guidelines.
They must also be willing to teach or help teach one or more multi-day TEI
workshop during the next two years, or to act as a TEI-assigned consultant
to outside projects.  (We don't guarantee to provide such consultancies --
but we want to build up a pool of expertise which the community can draw
on).

I'm interested.  What should I do?

Send either of us a note describing who you are and what you do.  Tell us
what technical expertise you have, where your general interests in using
the TEI lie, and how relevant use of the TEI scheme is to your current
employment.  Please also indicate whether you would need a full or partial
financial subsidy to attend.

What will it cost?

There will be no Workshop fee, but some participants may have to pay their
own travel and lodging, as we have only a limited amount of funding to
cover travel and subsistence.  Free places will be allocated by the
workshop organizers on the basis of your application.  Self-funding
participants will need to find the cost of travel to Chicago, and of a
week's accommodation in University housing (the rates are currently being
negotiated, but are unlikely to exceed $75 a day, all found).

Deadlines?

If you're interested, please let us know as soon as possible, although we
don't expect to be formally reviewing applications and allocating places
until mid-October.  Applications received after October 30th 1994 will not
be considered for this Workshop.

-- 
Michael Sperberg-McQueen        TEI@UIC.EDU
Lou Burnard                     LOU.BURNARD@OUCS.OX.AC.UK
</message>
<message id="<1994Sep21.183230.8900@chemabs.uucp>" date="2989161150">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 18:32:30 UT
From: "Larry W. Virden" \<lvirden@cas.org>
Organization: Nedriv Software and Shoe Shiners, Uninc.
Message-ID: <1994Sep21.183230.8900@chemabs.uucp>
Subject: SGMLS error interpretation 

I am attempting to use sgmls to verify HTML documents, but am having a
problem understanding the error output.  When I am told:

sgmls: SGML error at -, line 85 at ">":
       HTML element not allowed at this point in *DOC element
sgmls: SGML error at -, line 85 at ">":
       BODY element not allowed at this point in HEAD element
sgmls: SGML error at -, line 85 at ">":
       HEAD element ended prematurely; required subelement omitted
sgmls: SGML error at -, line 85 at ">":
       HTML element ended prematurely; required BODY omitted

I am guessing that something very early on in the document must be wrong,
since it is not detecting the \</HEAD> that is coded in the document.  The
problem is that I am not able to determine what the problem is - I don't
even know what a *DOC element is.

Is there a beginner's guide to sgmls that would be of assistance?

-- 
:s Great net resources sought...
:s Larry W. Virden                 INET: lvirden@cas.org
:s \<URL:http://www.mps.ohio-state.edu/cgi-bin/hpp?lvirden_sig.html>
The task of an educator should be to irrigate the desert not clear the forest.
</message>
<message id="<JCS.94Sep21144117@chekov.ebt.com>" date="2989161677">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 18:41:17 UT
From: Jeff Cutler-Stamm \<jcs@chekov.ebt.com>
Organization: Electronic Book Technologies Inc.
Message-ID: \<JCS.94Sep21144117@chekov.ebt.com>
References: <1994Sep12.225455@opal.tufts.edu>
Subject: Re: RTF to SGML converter -HELP!!

EBT will soon release DynaTag which is an interactive SGML conversion tool
which supports RTF, FrameMaker MIF and Interleaf ASCII.  Tables are
supported and images can be extracted and converted by launching an
external application such as HiJaak Pro.

Hope this helps.

--
              ____________________________________________
 ____________| Jeff Cutler-Stamm   email: jcs@ebt.com     |____________
 \\           | Electronic Book     phone: +1 401 421 9550 |           /
  \\          |  Technologies         fax: +1 401 421 9551 |          /
  /          |____________________________________________|          \\
 /____________\\                                          /____________\\
</message>
<message id="<1994Sep21.190227.10470@ast.saic.com>" date="2989162947">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 19:02:27 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep21.190227.10470@ast.saic.com>
References: <35mvgc$l3t@tpd.dsccc.com>
Subject: Re: syntax

[Jeanne Jordan]

|   I am new to SGML dtd writing.  We have created the following:
|   
|   (admonishment* , note*) it is intended to indicate that you can have an
|   admonishment (repeatable) without a note (repeatable) or a note without
|   admonishment but if you have both, the admonishment(s) MUST come before
|   the note(s).
|   
|   We want to use this in a repeatable OR list:
|   
|   \<!ELEMENT procedure - - ((admonishment* , note*), step, (step | para |
|     (admonishment* , note*) | graphic | term | display | list | reldoc |
|      table | tp100*)>
|   
|   Question:  If I put this in a repeatable OR list does it negate the
|              enforced order?
|   

\<ANSWER>Yes.\</ANSWER>

|              If it does, is there a way to enforce the order?
|   

\<ANSWER>Yes.\</ANSWER>

HELP -- I really need help from Erik and the other experts on this.  After
noting that there was a right paren missing, I contacted the author who
sent me back the correct expression:

\<!ELEMENT procedure - - ((admonishment* , note*), step, (step | para |
  (admonishment* , note*) | graphic | term | display | list | reldoc |
   table | tp100))*>

Now the remark about the repeatable or group makes sense.  The author also
confirmed that they wanted the scope of the ordering constraint to be the
entire procedure element.

For ease of manipulation, I rewrote this with single character identifiers.

\<!ELEMENT procedure - - ((a*,n*),s,(s|p|(a*,n*)|g|t|d|l|r|ta|tp))*>

The ordering constraint is roughly equivalent to stating that, of all the
the productions d given by:

    d ::= (a|b|c)*

we only wish to admit those productions in which a "b" is never followed by
an "a".  In my analysis, a was "admonishment", b was "note" and c was all
that other stuff.  I was able to represent the original model group as one
of the following mutually exclusive and exhaustive model groups (I hope):

group 1 (a*,s,(s|p|a*|g|t|d|l|r|ta|tp))
group 2 (a*,n+,s,(s|p|n*|g|t|d|l|r|ta|tp))
group 3 (a*,s,(s|p|(a*,n+)|g|t|d|l|r|ta|tp))
group 4 (n*,s,(s|p|n*|g|t|d|l|r|ta|tp))

Now if my reasoning is correct, any or none of type 1 can appear after
which either exactly one group 3 or exactly one group 4 or neither may
appear.  After either a 2 or a 3 has occured only productions from the
group 4 are admissible.

Thus I can write:

\<!ELEMENT procedure - - ((a*,s,(s|p|a*|g|t|d|l|r|ta|tp))*,
                         ((a*,n+,s,(s|p|n*|g|t|d|l|r|ta|tp))|
                         (a*,s,(s|p|(a*,n+)|g|t|d|l|r|ta|tp)))?,
                         (n*,s,(s|p|n*|g|t|d|l|r|ta|tp))*)>

Now if this is correct, it is still gruesome!  If not, the technique should
still work anyway.  Now, my dilema is as follows:

I originally was going to answer "No it can be done with a seq model group
because they can be parsed by a discrete finite automata and a DFA can't
recognize something which depends on the entire left context."  However, a
little thought said that I could set a flip-flop the first time I saw a "b"
and thereafter only admit productions without "a"s.  Now further reflection
says that the state of a finite automata depends on the entire left
context.

My question is really this -- I was able to reduce the logical requirement
to a seq model group, and I know how to build a finite automata to parse
this production rule, but does the model group correspond to a regular
expression in this case?  If so, how can regular expressions describe
context-free-grammars?  Any help will be appreciated.

-- 
"I should have taken formal grammars before reading Wittgenstein; now every
 thing makes sense."
</message>
<message id="<1994Sep21.203139.19909@sq.sq.com>" date="2989168299">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 20:31:39 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep21.203139.19909@sq.sq.com>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no>
Subject: Re: Parsing EMPTY elements

[Mary Holstege]

|   Parsing elements whose content model is EMPTY without knowledge of the
|   DTD *does* present difficulties.  ...  The problem is that you are
|   thereby required to perform a complete parse of the parent element in
|   order to parse out any child.

[Erik Naggum]

|   let's back up a bit.  Lee Quin said that if it wasn't for EMPTY "you
|   could take a fully expanded instance and make an in-memory data
|   structure without needing the DTD".  I replied that EMPTY did not make
|   this more difficult, and showed that one could learn from an instance
|   whether a given element was empty.  that is, the argument did not
|   follow from its premises.

In the presence of OMITTAG, you cannot distinguish an EMPTY element from
one whose endtag has been omitted.

People who have `bought into' (as they say) the concept of using an SGML
DTD have no problem with the idea of using a full SGML parser for
everything that processes SGML data, or, conversely, from wiring in (as in
Mosaic) the names of the tags and whether they are empty.

It turns out that there's a lot you can do with SGML-like [sic] languages,
where the DTD isn't needed to make some sense of the data.

Sometimes, you don't have a DTD.  In that case, I have a "mkdtd" Unix
shell-script that lets me bring the document into Author/Editor, sometimes
with a little tweaking as my script isn't perfect... but it asks the user
whether to assume OMITTAG or a nested element if it detects a matching
error.  It doesn't do EMPTY :-)

I'd like to see SGML become pervasive.

Imagine if the compilers (C, Pascal, Intercal...) used SGML for the parse
tree, and for the tree that was sent to the optimiser.

Imagine if "sed" worked with SGML.

Imagine if "ls" could produce SGML output.

Well, maybe it sounds as if I'm an SGML fanatic, but if you change "SGML"
to "explicitly marked up structure"...

You could write your own extra C optimization as an sgml-awk script :-)


But the present proliferation of methods for linking DTDs and instances
doesn't help.  I know about the SGML OPEN entity catalogue.  It's a start.
It isn't enough, though.


An SGML subset that you can write a reliable parser for in ten minutes
(after an hour or two reading the spec) would be a big step forward for
SGML ubiquity.  That's part of why I suggested a formally-defined (in
the mathematical sense) "SGML/F" subset of SGML.

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<1994Sep21.203800.20126@sq.sq.com>" date="2989168680">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 20:38:00 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep21.203800.20126@sq.sq.com>
References: <19940910.4934@naggum.no> <1994Sep16.173802.20269@midway.uchicago.edu> <35klfi$5nv@crl.crl.com>
Subject: Re: SGML and its enemies

[Joe English]

|   [...] Actually, SGML content models are considerably easier to process
|   than LR(1) grammars.  [...] but it's also possible to construct a parse
|   table from a DTD in nearly linear time. The same is not true for LR or
|   LL grammars.
|  
|   (The basic difference is that you must examine the entire grammar to
|   build an LR or LL parser, whereas each element content model can be
|   parsed with a DFA constructed independently of any other content
|   models.)

Is this true?  I've never written an SGML parser... but... what about
inclusions and exclusions on containing elements?

|   The rules for start-tag omission make the DFA construction a little bit
|   more hairy than necessary, than but not intractable.

No, clearly -- or there wouldn't be any SGML software :-)

But providing good error recovery for OMITTAG is much harder.

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<m0qnZZQ-000C9wC@newman>" date="2989173060">
Newsgroups: comp.text.sgml
Date: 21 Sep 1994 21:51:00 UT
From: Matt Timmermans \<mtimmerm@newman.microstar.com>
Message-ID: \<m0qnZZQ-000C9wC@newman>
References: <35mvgc$l3t@tpd.dsccc.com>
Subject: Re: syntax

To simplify Jeanne Jordan's question about content models:

You want to mix (a*,b*) into a repeatable or group with other stuff such
that all b's always follow all a's.

The example you gave was equivalent to (a|b|other-stuff)*, which does not
fit the bill.

Some content models which do work are:

( (other-stuff)*, (a,(other-stuff)*)* , (b,(other-stuff)*)* )

( (a|other-stuff)* , (b,(b|other-stuff)*)? )

( (a|other-stuff)* , (b,(other-stuff)*)* )

Notice that all of these use sequences to divide the content model into
sections depending on whether or not a or b can occur there.



The example you originally gave, ((a*,b*)|other-stuff)*, caught my eye
because it shows a wierd subtlety of the SGML ambiguity clause.

The intuitive interpretation of the clause is that if every terminal in the
content model were a production, then the resulting grammar would be LL(1).
If it were true, then the example you gave would be ambiguous because given
two consecutive a's or b's, you would not be able to tell which '*' was
repeating.  The example you gave, however, is not ambiguous.

You would think that (something)+ would be equivalent to ( (something),
(something)* ), but if the 'something' is (a?,b?), then the latter is
ambiguous while the former is not.  This one small wierdness makes it much
more difficult to disambiguate content models automatically.

\</Matt>

-- 
Matt Timmermans               | Phone:  +1 613 727-5696
Microstar Software Ltd.       | Fax:    +1 613 727-9491
34 Colonnade Rd. North        | BBS:    +1 613 727-5272
Nepean Ontario CANADA K2E-7J6 | E-mail: mtimmerm@microstar.com
</message>
<message id="<1994Sep22.010230.29789@chemabs.uucp>" date="2989184550">
Newsgroups: comp.text.sgml
Followup-To: comp.text.sgml
Date: 22 Sep 1994 01:02:30 UT
From: "Larry W. Virden" \<lvirden@cas.org>
Organization: Nedriv Software and Shoe Shiners, Uninc.
Message-ID: <1994Sep22.010230.29789@chemabs.uucp>
References: <1994Sep21.183230.8900@chemabs.uucp>
Subject: Re: SGMLS error interpretation 

Turns out that it was complaining about a signature put on the page by an
external piece of software which had a \<html>\<body>some other stuff that
was legit\</body>\</html> - it wanted a head and title tag in there, even if
the title was empty.

-- 
:s Great net resources sought...
:s Larry W. Virden                 INET: lvirden@cas.org
:s \<URL:http://www.mps.ohio-state.edu/cgi-bin/hpp?lvirden_sig.html>
The task of an educator should be to irrigate the desert not clear the forest.
</message>
<message id="<35qub2$pt0@cleese.apana.org.au>" date="2989194018">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 03:40:18 UT
From: Chris Stevenson \<pcoats@seldon.apanix.apana.org.au>
Organization: cleese.apana.org.au Public Access UNIX +61-8-3736006
Message-ID: <35qub2$pt0@cleese.apana.org.au>
Keywords: Music Notation SGML
Summary: Request for info on Musical Notation in SGML
Subject: Musical Notation??

In an article in BYTE some time ago, on SGML, mention was made of a working
group that was trying to come up with a standard for musical notation,
using SGML or some derivative.

As a musician and a computer programmer, my interest should be obvious.
Does anyone know where I can get any further information on this?

Also, is there a SGML FAQ?  Where can I get copies of SGML and HTML specs??

Please mail me direct if you can help, as I'm not a regular reader of this
newsgroup...

Chris.

-- 

Chris Stevenson 
pcoats@apanix.apana.org.au

</message>
<message id="<9409221505.AA16389@cambric.com>" date="2989195546">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 04:05:46 UT
From: Sherman Schorzman \<sschorz@cambric.com>
Organization: Cambric Graphics
Message-ID: <9409221505.AA16389@cambric.com>
Subject: Need Postscript Files and Guidence

I am new to SGML and am in the middle of a document conversion project
using sgmls.  I want to create postscript files to print so I can check my
work against the costumer's original printed documents.  I have several
DTD's I want to use.  I am under the assumption I need a file called a
FOSI???  If someone with patients could lead me in the correct direction, I
would be forever grateful.

Regards,

-- 
Sherman Schorzman                 internet: sschorz@cambric.com
Cambric Graphics, Inc.             attmail: cambric!sschorz
180 South 300 West                   voice: +1 801 363 6305
Salt Lake City, Utah 84101             fax: +1 801 363 6338
</message>
<message id="<1994Sep22.085510.704@falch.no>" date="2989212910">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 08:55:10 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep22.085510.704@falch.no>
References: \<CwHo9p.GL1@gordian.com>
Subject: Re: Frame to SGML -- latest translator?

[Susan Gallagher]

|   I looked at Frame to SGML translators a couple of months ago, and was
|   unable to find anything that was at all simple to use (even to test) --
|   any notes seemed geared solely towards programmers.  Anyone know where
|   I can find the latest filters? Ideally, how to get started with them?

There is, as you probably know, no way to get from Frame to SGML
_automatically_, because "SGML" can be many different things depending on
your target document type.

So any translation will have to be configured _somehow_.  The choice is
between programming, doing it manually, or using an intelligent,
interactive tool.  You aren't interested in the first of these; the second
isn't a good idea unless you are only talking about a small amount of data.
That leaves you with the third choice.  I know of two products which (I
believe) fall into that category (though I haven't used them myself):

  DynaTag - Electronic Book Technologies, +1 401 421 9550
  PowerPaste - Arbortext, +1 313 996 3566

Both use the Rainbow DTD as an intermediate format between MIF and the
target DTD.

Best regards,

Steve
-- 
</(pepper)steve>                                   pepper@falch.no
5------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 2216 3040                                fax +47 2216 2350
                      "Life begins at 0x28"
</message>
<message id="<35rpvn$khg@usenet.srv.cis.pitt.edu>" date="2989222327">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 11:32:07 UT
From: "David J. Birnbaum" \<djbpitt+@pitt.edu>
Organization: University of Pittsburgh
Message-ID: <35rpvn$khg@usenet.srv.cis.pitt.edu>
Subject: SQ RulesBuilder and TEI P3 DTDs

I remember seeing a brief mention of this problem in this forum before, but
I'd be grateful if anyone could provide more complete information.

SoftQuad's RulesBuilder 3.0 for Microsoft Windows objects to the TEI P3
DTDs.  I do not understand the source of the problem or how to correct it.
I have worked around it by downloading a modified TEI2.DTD and SGMLDECL.TEI
from another user, but I would be much more comfortable if I understood
exactly what it was that had to be changed in the original P3 DTDs (and
why).

Thanks,

David
-- 
David J. Birnbaum                    Voice: +1 412 687 4653
3955 Bigelow Blvd, #802              Fax:   +1 412 624 9714
Pittsburgh, PA 15213 USA             Email: djbpitt+@pitt.edu


</message>
<message id="<1994Sep22.142102.7385@calspan.com>" date="2989232462">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 14:21:02 UT
From: Dale Wiles \<wiles@calspan.com>
Organization: Calspan Advanced Technology Center
Message-ID: <1994Sep22.142102.7385@calspan.com>
Keywords: SGML, AUTO-TAGGER
Summary: I'm looking for one.
Subject: SGML auto-tagging

I'm looking for auto-tagging software for SGML on a Sun platform.

This program would go through an SGML file and make links from the table of
contents to the chapters, link sentences like "see figure foo" to figure
foo, and stuff like that there.  It needs to work with an arbitrary DTD,
not just HTML.  An example of what I'm talking about is FastTags.  It can
be free, or comercial.  Anyone heard of such a beast?

					Dale
-- 
Reply to: wiles@calspan.com
Disclaimer: I'm right, you're wrong, so disclaim this!!! \<Rude hand gesture.>
</message>
<message id="<35s7ia$bvb@news.kreonet.re.kr>" date="2989236234">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 15:23:54 UT
From: Chul-Woong Yang \<cwyang@dbserver.kaist.ac.kr>
Organization: Korea Research Environment Open Network (KREONet)
Message-ID: <35s7ia$bvb@news.kreonet.re.kr>
Keywords: SGML, sgml
Subject: SGML Parser

Hi.

I want to know if I can get free SGML Parser (parse and generate any
intermediate form to use) on internet.

I must parse SGML-document and use it.

Thanks in advance.
-- 
 Yang, Chul-woong      | Mail:cwyang@dbserver.kaist.ac.kr       o ,__o
 MS2 / Research Staff  | Tel :+82-42-869-5998      (Room)       \\-\\_<,
 Database Laboratory   |      +82-42-869-3653    (Office)      (*)/'(*)
 Computer Science Dept.| Fax :+82-42-869-3510 "~~'~~"~"~''~""~~'~~"~~"'~"~"~~'
 KAIST                 | HTTP://dbserver.kaist.ac.kr/WWWHOME/dbman.html#cwyang
</message>
<message id="<1994Sep22.152917.21136@ast.saic.com>" date="2989236557">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 15:29:17 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep22.152917.21136@ast.saic.com>
References: \<CONNOLLY.94Sep21120220@ulua.hal.com>
Subject: Re: HTML and SGML

[Bob Agnew]

|   \<opinion>
|   Sorry folks, I can't agree on this one.  The syntax of SGML is rigously
|   supported by ebnf productions just like Pascal and Modula-2, etc.  A
|   Pascal program without the "program" statement is not a Pascal program.
|   A Modula-2 module without the "Module" statement is not a Module.
|   Conversly, an SGML instance without a declaration subset is not an SGML
|   instance.  It's SGML-like but it's not SGML.
|   \</opinion>

[Dan Connolly]

|   I'm having a hard time making sense of this: "an SGML instance without
|   a declaration subset is not an SGML instance."  Since when does the
|   instance part of an SGML document contain a declaration subset?
|   Section 6.2, production [2] says:
|   
|   	[2] SGML document entity = SGML declaration,
|   		s*, prolog, s*, document instance set, Ee
|   

Indeed, and production [7] in section 7.1 says:

[7]	prolog = other prolog*,
	base document type declaration,
	(document type declaration | other prolog)*,
	(link declaration | other prolog)*

precisely the point.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<35s8kb$bu7@ruby.ora.com>" date="2989237323">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 15:42:03 UT
From: Terry Allen \<terry@ora.com>
Organization: O'Reilly & Associates, Inc.
Message-ID: <35s8kb$bu7@ruby.ora.com>
Subject: General Currency Sign

Just a question from curiosity: what uses are there for the general
currency sign defined in ISOnum

\<!ENTITY curren SDATA "[curren]"--=general currency sign-->

for use in countries that have a local character for currency?
Does it actually get used that way?

Regards,
Terry Allen
</message>
<message id="<9409221625.AA17721@mercury>" date="2989239925">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 16:25:25 UT
From: Mary Holstege \<holstege@mercury.kset.com>
Message-ID: <9409221625.AA17721@mercury>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no>
Subject: Re: Parsing EMPTY elements.

[Erik Naggum]

|   (3) is kind of curious in that it assumes that the data the client
|   receives from the SGML document is completely foreign to it.  i.e., it
|   knows nothing about the structure of the document, and is presumably
|   completely unaware of what is being asked.  I think this is a primary
|   example of when you _do_ need the DTD.  but suppose the client was
|   tailored to this document type; wouldn't it _already_ know that an
|   element would be empty in the instance?  the DTD is supposed to be a
|   formalization of the structure of the data the application program
|   expects.
|   
|   so can I assume that this is all an allergic reaction to DTDs?  it
|   doesn't seem to be a real problem, any way I look at it.  would it help
|   if we could find ways to express the information in DTD's in a form
|   better suited to those programs?  if so, I think we could join in the
|   effort to make SGML more computer-science-friendly.  I don't think we
|   should start by undoing SGML, but rather see if there are ways to make
|   it easier to deal with.  if we need to undo things, we should undo
|   things carefully and not without looking for other options first.

No, it's not a matter of pathological aversion to DTDs.

Consider this scenario:

You have an information server on the far side of a WAN.  This information
server gets requests for documents and portions of documents which it feeds
back to client viewer programs.  These client viewer programs toss the
information up on the screen for people to read.  There may be many
*different* DTDs in the information base.  Today we have CALS, tomorrow
OSF, who knows?  I may connect to a different information server tomorrow
with information stored using a different DTD.  The program shouldn't have
to care, so it must be data driven.  Now, you look at the SGML markup and
you say, geez, I can make this client program trees simple because any
clown can parse start tags, end tags, and attribute values, provided the
server feeds back unminimized SGML.  I can avoid having the client include
a full-up SGML parser, and have to separately request DTDs, and somehow
merge them when it is combining information from multiple documents that
have different DTDs, and know about entity management, etc., etc., and just
know how to parse a very simple data format and display it.[*]

The problem is that if you want to use bona fide SGML as that simple data
format (which would be nice for the purposes of allowing the client to
export and save in some standard format) you find that SGML throws up all
sorts of needless complications.  EMPTY elements not allowing end tags is
probably the least of these difficulties, actually.  The handling of
entities and delimiters is much more difficult.

[*] Yes, you do have to impose some sort of standards with respect to
formatting information.  Separate problem.

                -- Mary
                   Holstege@kset.com
-- 
Mary Holstege, Sr. Member of Technical Staff
KnowledgeSet Corporation
555 Ellis Street                    Tel: +1 415 254 5452
Mountain View, CA 94043             FAX: +1 415 254 5451
</message>
<message id="<35sb9d$809@spock.ebt.com>" date="2989240045">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 16:27:25 UT
From: Kent Summers \<kjs@ebt.com>
Organization: Electronic Book Technologies, Inc.
Message-ID: <35sb9d$809@spock.ebt.com>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu> <1994Sep21.154845.2078@sq.sq.com>
Subject: Re: SGML Viewers and Formatters

[Kate Hamilton]

|   SoftQuad Explorer has the features you mention

of course, there are some add'l issues/features you may wish to
consider that go well beyond what DynaText version 1.5 looked like two
years ago, like:

 - binary compatibility across UNIX, Mac, and Windows platforms

 - internationalization (double-byte display core capable of rendering all
   the european, asian, and cyrillic language sets)

 - complete customization/integration via Systems Integrators Toolkit
   (SIT). the name is taken thankyou.

 - a browser that can scale up to handle _very_ large information
   repositories

 - customers to support moving the product forward into the future

 - doing business with a company that both develops *and* markets their
   software products ;-)
</message>
<message id="<D2938E5C@warehouse.mn.org>" date="2989240680">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 16:38:00 UT
From: Mat Kramer \<mat.kramer@warehouse.mn.org>
Organization: The Warehouse BBS (612) 379-8272
Message-ID: \<D2938E5C@warehouse.mn.org>
Subject: IPF

Does anyone know if IPF (Information Presentation Facility - IBM's help
compiler language for OS/2) is SGML-compliant?  I'm writing an OS/2 help
editor, and I need to parse/import IPF source code.  I've obtained the
ARC-SGML parser, and I've been trying to determine if I can use it.  IPF
actually grew out of IBM's BookMaster, for which I found a DTD sample in
the the ARC package (IBM$BAS.DTD).

What I need to know is: how much of the syntax of an SGML application is
configurable.  For example, I'll list the differences I've found between
IPF and what I consider "standard" SGML.  (Please bear with me if I'm
somewhat naive about SGML -- I don't want to get into more than I have to).

1. IPF allows control words which have a . in column 1.  For instance, .*
   designates a line as a comment.

2. IPF allows insertion of symbols using the \&symbol. syntax, instead of
   the \&symbol; syntax used for SGML entities.

3. Element start and stops have a different syntax: :bold. and :ebold.
   instead of \<bold> and <\\bold>

4. Some attributes don't have values -- their presence is a switch.  For
   example, ":ol compact." for a compact, ordered list.

5. Some attributes are conditionally required or allowed, depending on the
   state of another attribute or the current context.

Are these things configurable with a DTD or the system declaration?  Any
advice on modifying the ARC-SGML to accept IPF?  Or does anyone know if
there is an existing DTD for IPF?  Thanks in advance for any help.

-- 
Mat Kramer = MekTek = Mat.Kramer@warehouse.mn.org
</message>
<message id="<1994Sep22.175159.18529@ast.saic.com>" date="2989245119">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 17:51:59 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep22.175159.18529@ast.saic.com>
Subject: Footnotes in Arbortext 38784C

This should clarify my posting on footnote references in FOSI's.  The
Arbortext supplied FOSI for MIL-M-38784C does not handle footnote
references properly.  Footnotes are sequentially numbered as they are
encountered in a section and are printed in the footer in small font with
the sequential reference number.  This number appears in the text as a
superscript at the point where the \<ftnote> tag was introduced.  This is as
should be.  There is also a \<ftnref> tag.  This tag is supposed to be able
to refer to the ID field of a footnote and should cause the sequential
number of the referenced footnote to appear in the text as a superscript at
the point where the \<ftnref> tag was introduced.

The xrefid attribute of the ftnref tag is used as an IDREF to refer to the
ID attribute of a footnote.  Obviously, this takes some feedback from the
formatting / layout algorithm.  Apparently the feedback mechanism works for
xref but not for ftnref.

In Arbortext, using the \<ftnref xrefid="some-footnote-ID"> does nothing.  I
fixed this by adding a "footnote" enum type to the xref tag's xidtype
attribute and modified the FOSI to handle this case.  Thus instead of

\<ftnref xrefid="vi-setup">

I now use:

\<xref xrefid="vi-setup" xidtype="footnote">

with exactly the desired effect.  Even though this appears slightly
inconvenient for hand tagging, it is actually more convenient under the
publisher because only one kind of cross reference element need be used.
With the DTD mods I included, the "footnote" element just magically appears
as a xrefid choice in the xref markup menu.

A DTD, FOSI, and ATD for a document called "paper" appears in

    actd.saic.com:/pub/SGML/paper

There is a small example of the footnote feature in paper_test.sgml.  These
are available via anonymous FTP.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep22.191606.3632@ast.saic.com>" date="2989250166">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 19:16:06 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep22.191606.3632@ast.saic.com>
References: <35mvgc$l3t@tpd.dsccc.com>
Subject: Re: syntax

[Jeanne Jordan]

|   I am new to SGML dtd writing.  We have created the following:
|   
|   (admonishment* , note*) it is intended to indicate that you can have an
|   admonishment (repeatable) without a note (repeatable) or a note without
|   admonishment but if you have both, the admonishment(s) MUST come before
|   the note(s).
|   
|   We want to use this in a repeatable OR list:
|   
|   \<!ELEMENT procedure - - ((admonishment* , note*), step, (step | para |
|     (admonishment* , note*) | graphic | term | display | list | reldoc |
|      table | tp100)*)>

In addition to the problems with admonishments and notes, there is a more
fundamental problem which is relevant to one we've been discussing.  The
dreaded "Ambiguous Model Group" problem.  I just tried to parse the
procedure element above after defining all the tags it uses.  The Arbortext
parser told me that the step element is ambiguous and cannot be resolved
without look-ahead.  This has to be fixed first!

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep22.194222.22460@falch.no>" date="2989251742">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 19:42:22 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep22.194222.22460@falch.no>
References: <1994Sep22.142102.7385@calspan.com>
Keywords: SGML, AUTO-TAGGER
Subject: Re: SGML auto-tagging

[Dale Wiles]

|   I'm looking for auto-tagging software for SGML on a Sun platform.
|
|   This program would go through an SGML file and make links from the
|   table of contents to the chapters, link sentences like "see figure foo"
|   to figure foo, and stuff like that there.  It needs to work with an
|   arbitrary DTD, not just HTML.  An example of what I'm talking about is
|   FastTags.  It can be free, or comercial.  Anyone heard of such a beast?

When we use the term 'auto-tagging software' we usually mean software that
converts _to_ SGML _from_ some other format (e.g., a word processing
format).  FastTAG is indeed an example of such a product.

What you need is something that converts _from_ SGML to a non-SGML format,
or to another (different and/or enhanced) form of SGML.  That's an SGML
transformation tool.  Examples are

- Balise
- SGML Hammer
- Omnimark
- CoST

The first three of these are commercial; the last one public domain.  All
four support arbitrary DTDs.  See the Whirlwind Guide to SGML Tools

      (ftp://ifi.uio.no/pub/SGML/SGML-Tools)

for more info.

Regards,

Steve
-- 
</(pepper)steve>                                   pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 2216 3040                                fax +47 2216 2350
                      "Life begins at 0x28"
</message>
<message id="<1994Sep22.194412.22620@falch.no>" date="2989251852">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 19:44:12 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep22.194412.22620@falch.no>
References: <35s7ia$bvb@news.kreonet.re.kr>
Keywords: SGML, sgml
Subject: Re: SGML Parser

[Chul-Woong Yang]

|   I want to know if I can get free SGML Parser (parse and generate any
|   intermediate form to use) on internet.

Look for James Clark's "sgmls" at ftp://ifi.uio.no/pub/SGML.

Regards,

Steve
-- 
</(pepper)steve>                                   pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 2216 3040                                fax +47 2216 2350
                      "Life begins at 0x28"
</message>
<message id="<rieger.13.0013CCCB@colin.muc.de>" date="2989252072">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 19:47:52 UT
From: Wolfgang Rieger \<rieger@colin.muc.de>
Organization: BSE Buero fuer Software-Entwicklung
Message-ID: \<rieger.13.0013CCCB@colin.muc.de>
References: <35qub2$pt0@cleese.apana.org.au>
Keywords: Music Notation SGML
Summary: Request for info on Musical Notation in SGML
Subject: Re: Musical Notation??

[Chris Stevenson]

|   In an article in BYTE some time ago, on SGML, mention was made of a
|   working group that was trying to come up with a standard for musical
|   notation, using SGML or some derivative.
|
|   As a musician and a computer programmer, my interest should be obvious.
|   Does anyone know where I can get any further information on this?

Dear Chris:

You are supposedly referring to SMDL, the Standard Music Description
Language.  SMDL is a HyTime application with specialized architectural
forms for the representation of musical constructs (in fact, HyTime evolved
kind of out of the attempt to apply SGML notation to music).

The official name is:

ISO/IEC CD 10743 Information Technology - Standard Music Description Language

A short intro is found in:

Joan M. Smith, SGML and related standards, p. 79-81

Regards

Wolfgang
-- 
Wolfgang Rieger
c/o Buero fuer Software-Entwicklung
Frankfurter Ring 193a
80807 Munich
Germany

Tel.: +89 323 19 93	Fax: +89 323 19 93
</message>
<message id="<rieger.14.0014229A@colin.muc.de>" date="2989253278">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 20:07:58 UT
From: Wolfgang Rieger \<rieger@colin.muc.de>
Organization: BSE Buero fuer Software-Entwicklung
Message-ID: \<rieger.14.0014229A@colin.muc.de>
Keywords: HTML WWW
Summary: DTDs acceptable to WWW clients
Subject: Q: WWW Client DTDs

Concerning the HTML-DTD discussion.  There is a basic HTML DTD and HTML+
and HTML2.0 proposals and some more in different stages of discussion/
standardization flying around on the net.

The problem with those (as I see it) is: nobody guarantees that a WWW
client will accept a document conforming to one of those DTDs (an example
being the DTD distributed with HoTMetal).

To get it the other way round, there is one thing I'd like to now.  Are
there somewhere DTDs defining the tag set and structure certain clients do
accept.  For instance, is there a DTD defining what Mosaic 2.0 a 5 for
windows does accept?  Or for Cello, or for ...?

Such DTDs being available, one could at least check a supposed HTML
document not only against my-very-own-HTML.dtd release 0.5, but against
those for the most widely used WWW clients.

Wolfgang
-- 
Wolfgang Rieger
c/o Buero fuer Software-Entwicklung
Frankfurter Ring 193a
80807 Munich
Germany

Tel.: +89 323 19 93	Fax: +89 323 19 93
</message>
<message id="<19940922T202631Z.erik@naggum.no>" date="2989254391">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 20:26:31 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940922T202631Z.erik@naggum.no>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no> <1994Sep21.203139.19909@sq.sq.com>
Subject: Re: Parsing EMPTY elements

[Erik Naggum]

|   let's back up a bit.  Lee Quin said that if it wasn't for EMPTY "you
|   could take a fully expanded instance and make an in-memory data
|   structure without needing the DTD".  I replied that EMPTY did not make
|   this more difficult, and showed that one could learn from an instance
|   whether a given element was empty.  that is, the argument did not
|   follow from its premises.

[Liam R. E. Quin]

|   In the presence of OMITTAG, you cannot distinguish an EMPTY element
|   from one whose endtag has been omitted.

I really thought you said "fully expanded instance" and meant that OMITTAG
was not used.  of course, if you change the premises of the discussion all
the time, there will never _be_ a valid answer, and we can argue forever.

the answer to your question must then be: "so don't _use_ OMITTAG".

|   People who have `bought into' (as they say) the concept of using an
|   SGML DTD have no problem with the idea of using a full SGML parser for
|   everything that processes SGML data, or, conversely, from wiring in
|   (as in Mosaic) the names of the tags and whether they are empty.

I have mixed feelings about this.  one the one hand, "a full SGML parser"
is not required.  you could very well do with a small library-type parser
that can cannot handle minimization, and which doesn't do all the rest of
the validation that a validating SGML parser must do.  on the other hand, I
recognize that this "small library-type parser" is not trivial to write,
and not _that_ small, and obtaining some sort of agreement on its interface
seems to be complicated.  these are the issues I want to address by making
a more computer-science-friendly SGML, which initially should be a pure
subset of SGML though a list of enforced conventions.

|   I'd like to see SGML become pervasive.

that's my goal, too.

|   Imagine if the compilers (C, Pascal, Intercal...) used SGML for the
|   parse tree, and for the tree that was sent to the optimiser.
|   
|   Imagine if "sed" worked with SGML.
|   
|   Imagine if "ls" could produce SGML output.
|   
|   Well, maybe it sounds as if I'm an SGML fanatic, but if you change
|   "SGML" to "explicitly marked up structure"...

in other words, what if programs could read and write SGML?  I think this
would be a nice situation, but we have to change both the programs _and_
SGML to ever hope to get there.

|   But the present proliferation of methods for linking DTDs and instances
|   doesn't help.

I think a strong expression language and a tree transformation process
would be the way to go.  LINK may be able to hold the connection between
the elements and the expressions applying to it, or some other mechanism
could be used.  however, the current state of affairs is to employ code-
driven traversal of the element hierarchy, and that only shows us that
we're still in the early stages of using SGML programmatically.

|   An SGML subset that you can write a reliable parser for in ten minutes
|   (after an hour or two reading the spec) would be a big step forward for
|   SGML ubiquity.  That's part of why I suggested a formally-defined (in
|   the mathematical sense) "SGML/F" subset of SGML.

"ten minutes"?  wow.  that is an incredibly ambitious goal, and it will
have to include changing the way people write programs.  today they write
their programs in clunky old languages like C and C++ ("recent" and "clunky
old" have no relation) and are willing to suffer through enormous amounts
of pain to get anything done.  ten minutes in this environments means you
have just found out the prototypes of a few functions you need to call.
instead, I think this requires an interpretative development environment
that understands SGML as a native data representation.  lacking any serious
alternatives, this indicates that LISP should be the ideal development
environment, and that SGML documents would just be represented externally
in a slightly different form than ordinary LISP structures and lists.


to be concrete in my suggestions: what if LISP's `read' function, or an
equivalent, could return something you could actually work on with LISP's
enormous set of functions to work on lists?  would you use LISP if it came
with such a function and some reasonable set of functions to work on the
SGML-specific parts of the structures?

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940922T202939Z.erik@naggum.no>" date="2989254579">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 20:29:39 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940922T202939Z.erik@naggum.no>
References: <1994Sep21.203800.20126@sq.sq.com>
Subject: SGML and error recovery

[Liam R. E. Quin]

|   But providing good error recovery for OMITTAG is much harder.

good error recovery in SGML seems to be inordinately difficult.  I have not
looked closely at this topic.  does anybody out there have anything to
offer that wouldn't compromise company secrecy policies?  research reports?
I know Exoterica has done good work here, but is any of this available in
their technical report series?

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<35sqk3$ghb@bmerha64.bnr.ca>" date="2989255747">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 20:49:07 UT
From: Sean Harris \<catulus@bmerhc58.bnr.ca>
Organization: Bell-Northern Research, Ottawa, Canada
Message-ID: <35sqk3$ghb@bmerha64.bnr.ca>
References: <1994Sep21.190227.10470@ast.saic.com> <1994Sep22.191606.3632@ast.saic.com>
Subject: Re: syntax

[Bob Agnew]

|   In addition to the problems with admonishments and notes, there is a
|   more fundamental problem which is relevant to one we've been
|   discussing.  The dreaded "Ambiguous Model Group" problem.  I just tried
|   to parse the procedure element above after defining all the tags it
|   uses.  The Arbortext parser told me that the step element is ambiguous
|   and cannot be resolved without look-ahead.  This has to be fixed first!

Try this out for size.

\<!ENTITY % grp "(step | para | graphic | term | display | list | reldoc | 
                  table | tp100)+, admonishment* , note*">

\<!ELEMENT procedure  - - (admonishment*, note*, step, ((%grp)+ | (note+, 
                         (%grp)*) | (admonishment+, note*, (%grp)*))?)>

Sean.
</message>
<message id="<35ss5o$fe7@net.fonorola.net>" date="2989257335">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 21:15:35 UT
From: Carl Stieren \<stieren@dev.simware.com>
Organization: Simware, Inc.
Message-ID: <35ss5o$fe7@net.fonorola.net>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: Re: SGML Viewers and Formatters

[Mr. Raytrace]

|   I've been digging through mounds and mounds of documentation and
|   product literature on SGML tools.  I've not been able to find anything
|   that comes close to what I am looking for.
|   
|   I am looking for an SGML viewer that supports hypertext links and that
|   takes care of formatting the documents.  We are hoping to publish SGML
|   documents on a CD-ROM, using a Windows based viewer.
|   
|   So far, all the tools that I have seen have either lacking features,
|   horrid user interfaces, or cannot be re-distributed and packaged with
|   our product.  The one package that comes close to what I am looking for
|   is IADS, but it suffers from the latter two deficiencies.
|   
|   Can anyone point me to some companies/groups that might fit the bill?

We are also looking for an SGML viewer that supports hypertext links and
that takes care of formatting the documents, and is Windows-based, since we
too are looking at publising SGML documents on CD-ROM.

The DynaText product from Electronic Book Technologies in Providence, R.I.,
is Windows-based, but I don't know if it supports hypertext links - we
would like context-sensitive links to our User Interface, as in Windows
Help.

In fact, we would also be interested in any product that takes an
SGML-tagged source and produces Windows Help (.hlp) files.  Have you ever
heard of one?

-- 
- Carl Stieren
  Technical Writer
  Simware, Inc., Ottawa, Canada
</message>
<message id="<truly.492.000E5AEF@lunemere.com>" date="2989257674">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 21:21:14 UT
From: Truly Donovan \<truly@lunemere.com>
Organization: La Lunemere
Message-ID: \<truly.492.000E5AEF@lunemere.com>
References: \<D2938E5C@warehouse.mn.org>
Subject: Re: IPF

[Mat Kramer]

|   Does anyone know if IPF (Information Presentation Facility - IBM's help
|   compiler language for OS/2) is SGML-compliant?  I'm writing an OS/2
|   help editor, and I need to parse/import IPF source code.  I've obtained
|   the ARC-SGML parser, and I've been trying to determine if I can use it.
|   IPF actually grew out of IBM's BookMaster, for which I found a DTD
|   sample in the the ARC package (IBM$BAS.DTD).

IPF didn't "grow out" of BookMaster -- it began as a minor subset and then
went a separate direction altogether.  I don't know the provenance of your
BookMaster DTD, but it could only be valid for creating documents that
could then be translated to BookMaster source, not for parsing any
BookMaster source document.

Truly Donovan
</message>
<message id="<19940922.5386@naggum.no>" date="2989261441">
Newsgroups: comp.text.sgml
Date: 22 Sep 1994 22:24:01 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940922.5386@naggum.no>
References: <35mvgc$l3t@tpd.dsccc.com> <1994Sep22.191606.3632@ast.saic.com>
Subject: Re: syntax

[Jeanne Jordan]

|   I am new to SGML dtd writing.  We have created the following:
|   
|   (admonishment* , note*) it is intended to indicate that you can have an
|   admonishment (repeatable) without a note (repeatable) or a note without
|   admonishment but if you have both, the admonishment(s) MUST come before
|   the note(s).
|   
|   We want to use this in a repeatable OR list:
|   
|   \<!ELEMENT procedure - - ((admonishment* , note*), step, (step | para |
|     (admonishment* , note*) | graphic | term | display | list | reldoc |
|      table | tp100)*)>

[Bob Agnew]

|   In addition to the problems with admonishments and notes, there is a
|   more fundamental problem which is relevant to one we've been
|   discussing.  The dreaded "Ambiguous Model Group" problem.  I just tried
|   to parse the procedure element above after defining all the tags it
|   uses.  The Arbortext parser told me that the step element is ambiguous
|   and cannot be resolved without look-ahead.  This has to be fixed first!

yes, the Arbortext parser has to be fixed first.  quite observant.

(step, (step | other)*) is _not_ ambiguous.  the first "step" is required,
and there's no way that first step can be ambiguous.  once that "step" has
been seen, any other steps will have to be in the repeated |or| group.

what is the probability that detecting ambiguity is harder and more error-
prone than constructing a DFA?  I have seen parsers fail in amazing ways
when they try to detect ambiguous content models, and I thought: hey,
that's stupid.  but, really I should have concluded that the specification
for ambiguity is insufficiently clear, and that it would be much better if
SGML could specify something that programmers could actually manage to
implement right.

now, you don't _have_ to detect ambiguous content models, so don't request
it.  request instead that the parser do the only sensible thing with your
content models.

sorry, Marcy, there goes your course material.

#\<Erik>
-- 
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<35tb1l$jp8@crl.crl.com>" date="2989272565">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 01:29:25 UT
From: Joe English \<jenglish@crl.com>
Organization: Helpless people on subway trains
Message-ID: <35tb1l$jp8@crl.crl.com>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no> <9409221625.AA17721@mercury>
Subject: Re: Parsing EMPTY elements.

[Mary Holstege]

|   Consider this scenario:
|   
|   You have an information server on the far side of a WAN.  This
|   information server gets requests for documents and portions of
|   documents which it feeds back to client viewer programs.  These client
|   viewer programs toss the information up on the screen for people to
|   read.  There may be many *different* DTDs in the information
|   base.
:
|   The program shouldn't have to care, so it must be data driven.  Now,
|   you look at the SGML markup and you say, geez, I can make this client
|   program trees simple because any clown can parse start tags, end tags,
|   and attribute values, provided the server feeds back unminimized SGML.
|   I can avoid having the client include a full-up SGML parser, and have
|   to separately request DTDs, and somehow merge them when it is combining
|   information from multiple documents that have different DTDs, and know
|   about entity management, etc., etc., and just know how to parse a very
|   simple data format and display it.[*]
:
|   [*] Yes, you do have to impose some sort of standards with respect to
|   formatting information.  Separate problem.

Is this really a separate problem?

If the client is completely data-driven and has no hardwired knowledge of
any DTDs, it must at least have access to some kind of mapping from
elements to formatting directives for any document type it receives from
the server.

Given that it has to get this information somehow, can't it use the same
mechanism to retrieve DTDs as well?

If you don't want to include a full SGML parser on the client side, the
server could also convert from the source DTD to a simpler document type
(maybe something like HTML?) that the client does have hardwired.


--Joe English

  jenglish@crl.com
</message>
<message id="<1994Sep23.034204.27496@sqwest.wimsey.bc.ca>" date="2989280524">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 03:42:04 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep23.034204.27496@sqwest.wimsey.bc.ca>
References: <35mvgc$l3t@tpd.dsccc.com> <1994Sep22.191606.3632@ast.saic.com> <19940922.5386@naggum.no>
Subject: Re: syntax

[Erik Naggum]

|   what is the probability that detecting ambiguity is harder and more
|   error- prone than constructing a DFA?  I have seen parsers fail in
|   amazing ways when they try to detect ambiguous content models, and I
|   thought: hey, that's stupid.  but, really I should have concluded that
|   the specification for ambiguity is insufficiently clear, and that it
|   would be much better if SGML could specify something that programmers
|   could actually manage to implement right.
|   
|   now, you don't _have_ to detect ambiguous content models, so don't
|   request it.  request instead that the parser do the only sensible thing
|   with your content models.
|   
|   sorry, Marcy, there goes your course material.

Well, you know, Erik, I would be such a happy camper if ambiguity went away
(or became something sensible) that I think I can live with it.

I've been playing around this week with content model adjustment exercises.
I think they might work just as well from the pedagogical point of view,
which is a good thing.  So now I have no objections, not even trivial ones,
forto excising ambiguity from the standard.  (And there are all kinds of
reasons why I'd be in favour of such a move...)  So tell me again just what
I have to do to get it Out Of There???

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<p.kerr-2309941557300001@130.216.90.127>" date="2989281450">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 03:57:30 UT
From: Peter Kerr \<p.kerr@auckland.ac.nz>
Organization: School of Music University of Auckland
Message-ID: \<p.kerr-2309941557300001@130.216.90.127>
References: <35qub2$pt0@cleese.apana.org.au>
Subject: Re: Musical Notation??

[Chris Stevenson]

|   Please mail me direct if you can help, as I'm not a regular reader of
|   this newsgroup...

Please post here for all of us who do read this group.

-- 
Peter Kerr                             bodger
School of Music                        chandler
University of Auckland                 neo-Luddite
</message>
<message id="<35uq74$83l@korfu.igd.fhg.de>" date="2989320868">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 14:54:28 UT
From: Hans Holger Rath \<rath@igd.fhg.de>
Organization: Zentrum fuer Graphische Datenverarbeitung e.V. (ZGDV), D-64283 Darmstadt
Message-ID: <35uq74$83l@korfu.igd.fhg.de>
Subject: ANNOUNCEMENT: WWW-Version of "The Whirlwind Giude to SGML Tools"

Hi everyone!

You can find a WWW-Version of Steve's

	"The Whirlwind Guide to SGML Tools"

at the URL

	http://zgdv.igd.fhg.de/papers/ed

Enjoy it!

\</HHR>
-- 
---------------------------------------------------------------------
| Hans Holger Rath - Computer Graphics Center - Darmstadt - Germany |
---------------------------------------------------------------------
|          ZGDV e.V.          | EMail: rath@igd.fhg.de              |
|      Wilhelminenstr. 7      | URL  : http://zgdv.igd.fhg.de/~rath |
|      D-64283 Darmstadt      | Tel. : +49 6151/155-152             |
|           Germany           | Fax  : +49 6151/155-199             |
---------------------------------------------------------------------
</message>
<message id="<1994Sep23.153613.9662@ast.saic.com>" date="2989323373">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 15:36:13 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep23.153613.9662@ast.saic.com>
References: <1994Sep22.142102.7385@calspan.com>
Subject: Re: SGML auto-tagging

[Dale Wiles]

|   I'm looking for auto-tagging software for SGML on a Sun platform.
|   
|   This program would go through an SGML file and make links from the
|   table of contents to the chapters, link sentences like "see figure foo"
|   to figure foo, and stuff like that there.  It needs to work with an
|   arbitrary DTD, not just HTML.  An example of what I'm talking about is
|   FastTags.  It can be free, or comercial.  Anyone heard of such a beast?

Look into PowerPaste by Arbortext (sales@arbortext.com)

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep23.162458.17392@ast.saic.com>" date="2989326298">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 16:24:58 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep23.162458.17392@ast.saic.com>
References: <9409221505.AA16389@cambric.com>
Subject: Re: Need Postscript Files and Guidence

[Sherman Schorzman]

|   I am new to SGML and am in the middle of a document conversion project
|   using sgmls.  I want to create postscript files to print so I can check
|   my work against the costumer's original printed documents.  I have
|   several DTD's I want to use.  I am under the assumption I need a file
|   called a FOSI???  If someone with patients could lead me in the correct
|   direction, I would be forever grateful.

What you need is to generate postscript descriptions of printed pages.
There are several ways to do this and the FOSI route is one of them.  A
FOSI though, by itself, will not get you to postscript.  It will then take
a FOSI interpreter either from DataLogics or Arbortext( the only two as of
last month).  The Arbortext FOSI engine, will (deep within its bowels and
unknown to the user) will convert this to TeX which will eventually convert
DVI from TeX to postscript giving you want you want.  Since you are running
free SGMLS, I must warn you those routes will cost you about $10K and for
Arbortext, at least for now, you'll need a Unix platform like a Sun.

I have posted tutorials on FOSIs here several times.  I will try to get
these into the FAQ.  I will mail the FOSI brief to you separately.  Here's
the 25 cent version.

Theoretically, DTDs reflect only the content or logical structure of a
document.  They do not or should not imply style or appearance of printed
or displayed copy.  This is usually done with an additional document.  In
some word and document processors these are called "style sheets".  In the
MIL-STD-28001B CALS world it is called a FOSI which stands for Formatted
Output Specification Instance.  This is an SGML - tagged document which is
a page layout specification.  The DTD for this document is called the
Outspec.dtd and appears in appendix B of 28001B.  The tags in the DTD for
the document being printed are tied to the E-I-C or element-in-context tags
of the FOSI which have attributes strongly couched in typesetter's
vernacular.  Alternatives include the emerging DSSSL which is still in
committee, I think.  I don't know if it's SGML or not.  Anyone?

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<CwLGKv.H6p@news.cis.umn.edu>" date="2989331665">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 17:54:25 UT
From: R A Milowski \<milor001@maroon.tc.umn.edu>
Organization: University of Minnesota
Message-ID: \<CwLGKv.H6p@news.cis.umn.edu>
References: \<m0qmjuU-000C9wC@newman>
Subject: Re: SGML Renewal: DTDs

[Matt Timmermans]

|   Since we're on the subject of SGML renewal, I'd like to bring up the
|   matter of DTDs.
|   
|   Specifically, the DTD is not a metalanguage.  It was not designed as a
|   metalanguage (third-hand quote from Charles), and it doesn't function
|   very well as a metalanguage.

I whole-heartedly agree.

|   The reason it's not a metalanguage is that it doesn't exist on its own.
|   A DTD is not an SGML document, but only part of an SGML document.
|   
|   The content model in a DTD represents a class of element structures.
|   Since there is exactly one DTD for every SGML document (there is no way
|   to declare that a document conforms to some external DTD), and exactly
|   one actual element structure for every SGML document, the content model
|   information in the DTD is redundant.  In effect, the content model
|   gives several examples of what the element structure _could_ be, and is
|   immediately followed by the actual element structure.  Clearly, the
|   content model in the DTD is not required.

I would say that the DTD *is* required.  The DTD defines a "set" of
relationship of elements to which we compare the document instance to.
Thus, a meta-language should operate upon the set defined by the DTD and
the set defined by the document instance.  The DTD is just as much a part
of your specification in SGML as the document instance set.  Thus, a
meta-language must be able to manipulate the relationships you define as
possible as well as the relationships you define in your document instance
set.

For example, we have the following parallels to first-order logic:

Suppose we have:

   F(x): X is blue.

F(x) is the definition of a sentence which is similar to a DTD.

F(Alex Milowski): Alex Milowski is blue.

This is an instance of the sentence.  It is not a factual statement (unless
its Monday).  Thus, it is not conforming.  You can think of F(x) as
defining a set of "things that are blue".  Thus, I would have to belong to
that set for the sentence to be true.

Now, in first-order logic, a meta-language is used to describe things like
argument structure, operation on arguments, relationship of sentences to
each other, etc.  In a sense, the meta-language of first-order logic deals
with set manipulations (not sets of sets--that's second order) of the
first-order.

Thus, in accordance, a meta-language must be able to manipulate and state
useful things about DTDs, document instance sets, and how they relate.  So,
If you get rid of the DTD, you lose the ability to know the "sentence"
structure of your "argument"!

|   The DTD is, in fact, only a typing aid.  It's only purposes in SGML are
|   to provide an easy (?!) way for authors to avoid certain structuring
|   mistakes, to provide the information necessary to support tag
|   ommission, to support short references, to provide default and fixed
|   attribute values, to provide a way to specify an attribute value
|   without naming the attribute, and to provide a convenient place to
|   declare entities and notations.

DTDs are currently being used as a typing aid, but they need not.  They
define a set of relationships and so can be used as such.

|   It must be recognized, however, that people are using the DTD as a
|   metalanguage because they _need_ a metalanguage.  What I'm proposing is
|   that the content model part of the DTD should be removed from the SGML
|   document (and the SGML standard), and replaced with an external
|   meta-language (in a different standard).  There would still be a place
|   in an SGML document to declare entities, notations, and various
|   typing-aid features like default attribute values.

Yes, we need a meta-language to manipulate both DTDs and everything else.
We do not need a meta-language to replace DTDs (since they are not).
Removing the DTD causes the structure to be lost.  Since you have no formal
syntactic way of representing the element relationships you have nothing to
apply a meta-language to!!  DTDs are instantiations of the standard and,
thus, cannot be used as a meta-language.

|   Now, a cool thing happens when an SGML document is not required to
|   include its content model -- The content model can be represented as an
|   SGML document! (If an SGML document was required to include an SGML
|   document, there would be recursion ad infinitum.)  This would allow you
|   to manipulate metadocuments (written in the external metalanguage) with
|   the same tools you use to manipulate documents.

You could do this with a meta-language now.  I have some set theory based
theorems which I will be having a poster session on at SGML '94 which will
cover DTD operations like comparing DTDs to see if one is a proper subset
of the other, etc.

Now, if we could only get that meta-language standardized!  ;)

-- 
R. Alexander Milowski
SGML Operations Manager        milor001@maroon.tc.umn.edu
Microcom Inc.                  +1 612 825 4132
SGML Consulting -- "The SGML Solutions Experts"
</message>
<message id="<1994Sep23.175712.6879@ast.saic.com>" date="2989331832">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 17:57:12 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep23.175712.6879@ast.saic.com>
References: <35sqk3$ghb@bmerha64.bnr.ca>
Subject: Re: syntax

[Sean Harris]

|   Try this out for size.
|   
|   \<!ENTITY % grp "((step | para | graphic | term | display | list | reldoc | 
|                     table | tp100))+, admonishment* , note*)+">
|   
|   \<!ELEMENT p - - (admonishment*, note*, step, (%grp | (note+, %grp) | 
|                   (admonishment+, note*, %grp))?)>

Yes, but now we required the author to insert additional \<grp> tags.  I
guess ambiguities can always be solved by adding tags just like a grammar
may disambiguated by adding more non-terminals.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep23.181738.11354@ast.saic.com>" date="2989333058">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 18:17:38 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep23.181738.11354@ast.saic.com>
References: <19940922.5386@naggum.no>
Subject: Re: syntax

[Erik Naggum]

|   yes, the Arbortext parser has to be fixed first.  quite observant.

Nope, nothing wrong with the Arbortext parser, it was my error.  The single
term element:

\<!ELEMENT procedure - - (admonishment*, step, (step | para |
   admonishment* | graphic | term | display | list | reldoc |
   table | tp100))>

parses correctly.  It's the full statement with all four cases that is
ambiguous:

\<!ELEMENT procedure - - ((admonishment*, step, (step | para |
   admonishment* | graphic | term | display | list | reldoc |
   table | tp100))*, ((admonishment* , note+, step, (step | para |
   note* | graphic | term | display | list | reldoc |
   table | tp100)) | (admonishment*, step, (step | para |
  (admonishment* , note+) | graphic | term | display | list | reldoc |
   table | tp100)))?, (note*, step, (step | para | note* | graphic | term |
   display | list | reldoc | table | tp100))*)>

Sorry.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<35v6iq$qhd@finnegan.iol.ie>" date="2989333530">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 18:25:30 UT
From: Sean Mc Grath \<digitome@iol.ie>
Organization: Digitome Ltd.
Message-ID: <35v6iq$qhd@finnegan.iol.ie>
Subject: Re: Need Postscript Files and Guidence

[Sherman Schorzman]

|   I am new to SGML and am in the middle of a document conversion project
|   using SGMLS.  I want to create postscript files to print so I can check
|   my work against the costumer's original printed documents.

How about converting the SGMLS output to Microsoft RTF (relatively easy to
do in a 3GL or using Perl/Awk etc).  Then import the RTF into Ms Word or
some other WP and print to postscript?

Regards,

-- 
Sean Mc Grath   digitome@iol.ie
Digitome Ltd.
Electronic Publishing
Irish Permenant House,Pearse St.,Ballina,Co. Mayo,Ireland.
Tel : 353 96 72092
</message>
<message id="<35v7c2$f59@crl.crl.com>" date="2989334338">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 18:38:58 UT
From: Joe English \<jenglish@crl.com>
Organization: Helpless people on subway trains
Message-ID: <35v7c2$f59@crl.crl.com>
References: <19940910.4934@naggum.no> <1994Sep16.173802.20269@midway.uchicago.edu> <35klfi$5nv@crl.crl.com> <1994Sep21.203800.20126@sq.sq.com>
Subject: Re: SGML and its enemies

[Joe English]

|   (The basic difference [between SGML and LR parsers] is that you must
|   examine the entire grammar to build an LR or LL parser, whereas each
|   element content model can be parsed with a DFA constructed
|   independently of any other content models.)

[Liam R. E. Quin]

|   Is this true?  I've never written an SGML parser... but...  what about
|   inclusions and exclusions on containing elements?

The SGML parser only processes elements as inclusions if the start-tag
can't be matched at the current point in the current content model; it can
do this after matching start-tags against the declared content model (i.e.,
it doesn't need to build a new model with all the applicable inclusions
inserted at appropriate points).

It can check exclusion exceptions *before* matching against the declared
content model, and trying to infer as many end-tags as are necessary when
it sees an excluded element.  (I think this is what sgmls does, in fact,
though it's hard to tell for sure.)

--Joe English

  jenglish@crl.com
</message>
<message id="<35vbb3$m02@bmerha64.bnr.ca>" date="2989338403">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 19:46:43 UT
From: Sean Harris \<catulus@bmerhc58.bnr.ca>
Organization: Bell-Northern Research, Ottawa, Canada
Message-ID: <35vbb3$m02@bmerha64.bnr.ca>
References: <35sqk3$ghb@bmerha64.bnr.ca> <1994Sep23.175712.6879@ast.saic.com>
Subject: Re: syntax

[Sean Harris]

|   Try this out for size.
|   
|   \<!ENTITY % grp "((step | para | graphic | term | display | list | reldoc | 
|                     table | tp100))+, admonishment* , note*)+">
|   
|   \<!ELEMENT p - - (admonishment*, note*, step, (%grp | (note+, %grp) | 
|                   (admonishment+, note*, %grp))?)>
|    

[Bob Agnew]

|   Yes, but now we required the author to insert additional \<grp> tags.  I
|   guess ambiguities can always be solved by adding tags just like a
|   grammar may disambiguated by adding more non-terminals.

Ok, try this.

\<!ELEMENT procedure  - - (admonishment*, note*, step, (((step | para | graphic 
                         | term | display | list | reldoc | table | tp100)+, 
                         admonishment* , note*)+ | (note+, ((step | para | 
                         graphic | term | display | list | reldoc | table | 
                         tp100)+, admonishment* , note*)*) | (admonishment+, 
                         note*, ((step | para | graphic | term | display | 
                         list | reldoc | table | tp100)+, admonishment* , 
                         note*)*))?)>

Sean.
</message>
<message id="<1994Sep23.203817.7641@kocrsv01.delcoelect.com>" date="2989341497">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 20:38:17 UT
From: Ravi Lakkaraju \<rklakkar@kocrsv01.delcoelect.com>
Organization: Delco Electronics Corp.
Message-ID: <1994Sep23.203817.7641@kocrsv01.delcoelect.com>
Subject: URLs, http and HTML

Dear Netters,

From my limited understanding of the way WWW works am I correct in saying
that:

1.  Mosaic can read documents conforming only to the HTML DTD.
2.  http is the protocol used for accessing packets of information across
    the internet the addresses for which are specified in the form of URLs.
3.  The information transferred between the WEB servers is unrelated to its
    URL.  i.e., the http has nothing to do with the content of the packet
    of information accessed other than specifying where it is located.
4.  Once that information is located and accessed the web server will know
    what to do with it because it assumes it will be in \<html>. By knowing
    what to do would also mean that Mosaic knows how to display it because
    it is familiar with the \<tags>.

Based on these points it seems like if it were possible to exchange
documents/packets of information using the http protocol no matter what DTD
the documents conformed to then all I need is a viewer that can understand
the particular DTD used and know how to format the tags properly (in the
originally intended style).

If I created documents using any arbitrary DTD, what is it in the web
server that restricts the transfer of that file with http?.  (Not taking
into account that the information once transferred is not viewable in
Mosaic).

I would like to know what you think about these issues.  My ultimate
objective being that one should not be concerned with what DTD is being
used for the document as long as the viewer can handle it and knows what to
do with the information...  I would like to here some general discussions
on these issues.

- Ravi
</message>
<message id="<199409232102.QAA02552@uahcs2.cs.uah.edu>" date="2989342979">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 21:02:59 UT
From: Lori Snyder \<lsnyder@cs.uah.edu>
Message-ID: <199409232102.QAA02552@uahcs2.cs.uah.edu>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: SGML Viewers and Formatters

I am afraid I am a bit confused over conceptions of what IADS will/will not
do.  IADS is being packaged on CD with data ranging from technical manual
data to technical data packages.  We prefer to put IADS on the hard drive
and simply access the information on the CD, but we recommend sending IADS
out on the CD so the users have access to the latest version and have a
pristine version in case something happens to the one on the hard drive.

IADS user interface is author-configurable.  Authors can change any/all of
the buttons, hide the menu bar, etc., etc.  All of this can be done in the
SGML-tagged ASCII files and need not be done from the IADS author unless
you want to.

Finally, IADS has no distribution fees attached, therefore you need not
worry about site licenses, royalties, etc.

Just as a point of imformation, a new version of IADS is planned for
December of this year.  We'll send out notices on what the new release
contains and how to obtain it.

Sincerely,

Sue Pape

-- 
IADS Programming Group
U.S Army Missile Command
+1 205 876 4024
spape@redstone-emh1.army.mil
lsnyder@redstone-emh2.army.mil
</message>
<message id="<PCPine_p.3.89.9409231413.D5079-0100000@tag2.cup.cam.ac.uk>" date="2989344696">
Newsgroups: comp.text.sgml
Date: 23 Sep 1994 21:31:36 UT
From: Rod Mulvey \<rmulvey@cup.cam.ac.uk>
Message-ID: \<PCPine_p.3.89.9409231413.D5079-0100000@tag2.cup.cam.ac.uk>
Subject: Cambridge University Press position

The Printing Division at Cambridge University Press has a new post for a
Programmer in Text Processing to specialise in the application of SGML.
The job will involve development, implementation and maintenance of
SGML-based systems for text conversion and typesetting and technical
support of staff who use them.  If you welcome the opportunity to join a
busy production environment please contact me for further details, sending
your curriculum vitae.

The salary will be negotiable in the range 12000 to 16000 pounds, depending
on the experience of the candidate.  A background in programming is
expected and experience with text, for example through typesetting with TeX
or working with SGML, would be an advantage.

Enquiries are invited as soon as possible and not later than 28 September
1994.

Rod Mulvey.

-- 
Rod Mulvey, Cambridge University Press
Technical Aplications Group, Printing Division
Shaftesbury Road, Cambridge CB2 2BS' England

INTERNET: rmulvey@cup.cam.ac.uk   JANET: rmulvey@uk.ac.cam.cup
phone: (44) 0223 325070
</message>
<message id="<CONNOLLY.94Sep23180619@ulua.hal.com>" date="2989350379">
Newsgroups: comp.text.sgml,comp.infosystems.www.providers
Date: 23 Sep 1994 23:06:19 UT
From: Dan Connolly \<connolly@ulua.hal.com>
Organization: HaL Software Systems, Inc.
Message-ID: \<CONNOLLY.94Sep23180619@ulua.hal.com>
Subject: Reorganized HTML 2.0 DTD
Content-Type: multipart/mixed; boundary="cut-here"


See RFC1521 for info on MIME, the format of this message

--cut-here

The HTML 2.0 specification review is heating up again in light of the
upcoming WWW conference in Chicago.

The review process is conducted by the (open) HTML Working group of the
Internet Engineering Task Force. The mailing list is \<html-wg@oclc.org>.

I have revised the public text parts of the HTML 2.0 specification.

The details are enclosed. This is a copy of the document at

	http://www.hal.com/%7Econnolly/html-spec/html-pubtext.html

The latest DTD is also enclosed.

--cut-here
Content-Type: text/html
URI: http://www.hal.com/%7Econnolly/html-spec/html-pubtext.html

\<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN//2.0">
\<head>
\<title>HTML 2.0 Public Text\</title>
\<base href="http://www.hal.com/%7Econnolly/html-spec/html-pubtext.html">
\</head>

\<body>

\<h1>Public Text of the HTML 2.0 Specification\</h1>

\<address>
Daniel W. Connolly\<br>
connolly@hal.com\<br>
$Id: html-pubtext.html,v 1.1 1994/09/23 22:49:26 connolly Exp $
\</address>

\<p> The HTML 2.0 specification includes both machine-readable public
    text -- SGML "code" if you will -- and human-readable text.
    The public text includes the DTD, an SGML declaration, and a version
    of the ISO Added Latin 1 entity set.

\<h2> Changes in this Revision \</h2>

\<ul>

\<li> The DTD is no longer spread across three files. The Level 0 and Level 1
     files contain just feature test entities now. The \<tt>html.dtd\</tt>
     contains all the elements.

\<li> The META tag was promoted from Proposed to standard. The Proposed
     feature test entity has been eliminated. (The nbsp and shy entities
     went away with it: they should come back in 2.1)

\<li> The HTML.Prescriptive feature test entity is now called HTML.Recommended.
     It switches the content of BODY and A between the "free form" style
     generally found on the net and the more structured, recommended form.

\<li> The HTML.Obsolete feature test entity is now called HTML.Deprecated.
     It switches \<tt>XMP\</tt>, \<tt>LISTING\</tt>, and \<tt>PLAINTEXT\</tt>.

\<li> The standard A content model now includes \<tt>H1\</tt>-\<tt>H6\</tt>
     and \<tt>%text\</tt> only -- \<tt>%block\</tt> has been removed.

\<li> The public text owner is now \<tt>-//IETF//\</tt>. If and when the
     IETF becomes a registerd ISO public text owner, the \<tt>-//\</tt>
     should be changed to \<tt>+//\</tt>.

\<li> The reference to the ISO latin 1 entity set now references the
     HTML version of this entity set, and not the public version
     from the SGML standard.

\</ul>

\<h2>The Public Text\</h2>

\<dl>

\<DT>The HTML DTD
\<DD>The text of the SGML
DTD for HTML
\<ul>
\<li>\<A HREF="html.dtd">The full DTD\</A>
\<li>\<A HREF="html-0.dtd">Level 0 features\</A>
\<li>\<A HREF="html-1.dtd">Level 1 features\</A>
\</ul>
\<DT>\<A
NAME="z20" HREF="html.decl">The HTML SMGL Declaration\</A>
\<DD>
The text of the SGML Declaration
for HTML
\<dt>\<a href="ISOlat1.sgml">ISO Added Latin 1\</a>
\<dd>The HTML version of the ISOlat1 entity set
\</dl>

\<h2> Element References \</h2>

\<p> These element references are an aid to reading and understanding
    the DTD.

\<dl>

\<DT>\<A
HREF="L2index.html">Element Reference (Level 2)\</A>
\<DD> Exhaustive alphabetical
listing of elements with syntax descriptions

\<DT>\<A
HREF="L2Pindex.html">Element Reference (Level 2 Recommended)\</A>

\<DD> This listing eliminates deprecated
idioms. This is the reference to consult when generating new
documents.

\<DT>\<A
HREF="L1index.html">Element Reference (Level 1)\</A>
\<DD> Alphabetical
listing of level 1 (no forms) elements with syntax descriptions

\<DT>\<A
HREF="L1Pindex.html">Element Reference (Level 1 Recommended)\</A>

\<DD> This listing eliminates deprecated
idioms. This is the reference to consult when generating new
documents without forms.

\<DT>\<A
HREF="L0index.html">Element Reference (Level 0)\</A>
\<DD> Alphabetical
listing of level 0 (minimally conforming) elements with syntax descriptions

\<DT>\<A
HREF="L0Pindex.html">Element Reference (Level 0 Recommended)\</A>

\<DD> This listing eliminates deprecated
idioms. This is the reference to consult when generating new
documents aimed at minimally conforming implementations.

\</DL>

\<p> For more information about the HTML specification, see \<a
    href="index.html">the HTML 2.0 review materials\</a>.

\</BODY>
\</HTML>

--cut-here
Content-Description: HTML 2.0 DTD

\<!--	html.dtd

        Document Type Definition for the HyperText Markup Language (HTML DTD)

	$Id: html.dtd,v 1.19 1994/09/23 22:46:51 connolly Exp $

	Author: Daniel W. Connolly \<connolly@hal.com>
	See Also: html.decl, html-0.dtd, html-1.dtd
		  http://www.hal.com/%7Econnolly/html-spec/index.html
		  http://info.cern.ch/hypertext/WWW/MarkUp2/MarkUp.html
-->

\<!ENTITY % HTML.Version
	"-//IETF//DTD HTML//EN//2.0"

        -- Typical usage:

            \<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
	    \<html>
	    ...
	    \</html>
	--
	>


\<!--================== Feature Test Entities ==============================-->

\<!ENTITY % HTML.Recommended "IGNORE"
	-- Certain features of the language are necessary for compatibility
	   with widespread usage, but they may compromise the structural
	   integrity of a document. This feature test entity enables
	   a more prescriptive document type definition that eliminates
	   the above features.
	-->

\<![ %HTML.Recommended [
	\<!ENTITY % HTML.Deprecated "IGNORE">
\]]>

\<!ENTITY % HTML.Deprecated "INCLUDE"
	-- Certain features of the language are necessary for compatibility
	   with earlier versions of the specification, but they tend
	   to be used an implemented inconsistently, and their use is
	   deprecated. This feature test entity enables a document type
	   definition that eliminates these features.
	-->

\<!ENTITY % HTML.Highlighting "INCLUDE">
\<!ENTITY % HTML.Forms "INCLUDE">

\<!--================== Imported Names =====================================-->

\<!ENTITY % Content-Type "CDATA"
	-- meaning a MIME content type, as per RFC1521
	-->

\<!ENTITY % HTTP-Method "GET | POST"
	-- as per HTTP specification
	-->

\<!ENTITY % URI "CDATA"
        -- The term URI means a CDATA attribute
           whose value is a Uniform Resource Identifier,
           as defined by 
	"Universal Resource Identifiers" by Tim Berners-Lee
	aka http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html
	aka RFC 1630

	Note that CDATA attributes are limited by the LITLEN
	capacity (1024 in the current version of html.decl),
	so that URIs in HTML have a bounded length.

        -->


\<!-- DTD "macros" -->

\<!ENTITY % heading "H1|H2|H3|H4|H5|H6">

\<!ENTITY % list " UL | OL | DIR | MENU " >


\<!--================ Character mnemonic entities ==========================-->

\<!ENTITY % ISOlat1 PUBLIC
  "-//IETF//ENTITIES Added Latin 1 for HTML//EN">
%ISOlat1;

\<!ENTITY amp CDATA "\&#38;"     -- ampersand          -->
\<!ENTITY gt CDATA "\&#62;"      -- greater than       -->
\<!ENTITY lt CDATA "\&#60;"      -- less than          -->
\<!ENTITY quot CDATA "\&#34;"    -- double quote       -->


\<!--=================== Text Markup =======================================-->

\<![ %HTML.Highlighting [

\<!ENTITY % font " TT | B | I ">

\<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">

\<!ENTITY % text "#PCDATA | A | IMG | BR | %phrase | %font">

\<!ENTITY % pre.content "#PCDATA | A | HR | BR | %font | %phrase">

\<!ELEMENT (%font;|%phrase) - - (%text)+>

\]]>

\<!ENTITY % text "#PCDATA | A | IMG | BR">

\<!ELEMENT BR    - O EMPTY>


\<!--================== Link Markup ========================================-->

\<![ %HTML.Recommended [
	\<!ENTITY % linkName "ID">
\]]>

\<!ENTITY % linkName "CDATA">

\<!ENTITY % linkType "NAME"
	-- a list of these will be specified at a later date -->

\<!ENTITY % linkExtraAttributes
        "REL %linkType #IMPLIED -- forward relationship type --
        REV %linkType #IMPLIED -- reversed relationship type
                              to referent data --
        URN CDATA #IMPLIED -- universal resource number --

        TITLE CDATA #IMPLIED -- advisory only --
        METHODS NAMES #IMPLIED -- supported public methods of the object:
                                        TEXTSEARCH, GET, HEAD, ... --
        ">

\<![ %HTML.Recommended [
	\<!ENTITY % A.content   "(%text)+"
	-- \<H1>\<a name="xxx">Heading\</a>\</H1>
		is preferred to
	   \<a name="xxx">\<H1>Heading\</H1>\</a>
	-->
\]]>

\<!ENTITY % A.content   "(%heading|%text)+">

\<!ELEMENT A     - - %A.content -(A)>

\<!ATTLIST A
	HREF %URI #IMPLIED
	NAME %linkName #IMPLIED
        %linkExtraAttributes;
        >

\<!--=================== Images ============================================-->

\<!ENTITY % img.alt.default "#IMPLIED"
	-- ALT attribute required in Level 0 docs -->

\<!ELEMENT IMG    - O EMPTY --  Embedded image -->
\<!ATTLIST IMG
        SRC %URI;  #REQUIRED     -- URI of document to embed --
	ALT CDATA %img.alt.default;
	ALIGN (top|middle|bottom) #IMPLIED
	ISMAP (ISMAP) #IMPLIED
        >


\<!--=================== Paragraphs=========================================-->

\<!ELEMENT P     - O (%text)+>


\<!--=================== Headings, Titles, Sections ========================-->

\<!ELEMENT HR    - O EMPTY -- horizontal rule -->

\<!ELEMENT ( %heading )  - -  (%text;)+>

\<!ELEMENT TITLE - -  (#PCDATA)
          -- The TITLE element is not considered part of the flow of text.
             It should be displayed, for example as the page header or
             window title.
          -->


\<!--=================== Text Flows ========================================-->

\<![ %HTML.Forms [
	\<!ENTITY % block.forms "| FORM | ISINDEX">
\]]>

\<!ENTITY % block.forms "">

\<![ %HTML.Deprecated [
	\<!ENTITY % preformatted "PRE | XMP | LISTING">
\]]>

\<!ENTITY % preformatted "PRE">

\<!ENTITY % block "P | %list | DL
	| %preformatted
	| BLOCKQUOTE %block.forms">

\<!ENTITY % flow "(%text|%block)*">

\<!ENTITY % pre.content "#PCDATA | A | HR | BR">
\<!ELEMENT PRE - - (%pre.content)+>

\<!ATTLIST PRE
        WIDTH NUMBER #implied
        >

\<![ %HTML.Deprecated [

\<!ENTITY % literal "CDATA"
	-- special non-conforming parsing mode where
	   the only markup signal is the end tag
	   in full
	-->

\<!ELEMENT XMP - -  %literal>
\<!ELEMENT LISTING - -  %literal>
\<!ELEMENT PLAINTEXT - O %literal>

\]]>


\<!--=================== Lists =============================================-->

\<!ELEMENT DL    - -  (DT*, DD?)+>
\<!ATTLIST DL
	COMPACT (COMPACT) #IMPLIED>

\<!ELEMENT DT    - O (%text)+>
\<!ELEMENT DD    - O %flow>

\<!ELEMENT (OL|UL) - -  (LI)+>
\<!ELEMENT (DIR|MENU) - -  (LI)+ -(%block)>
\<!ATTLIST (%list)
	COMPACT (COMPACT) #IMPLIED>

\<!ELEMENT LI    - O %flow>

\<!--=================== Document Body =====================================-->

\<![ %HTML.Recommended [
	\<!ENTITY % body.content "(%heading|%block|HR|ADDRESS)*">
	-- \<h1>Heading\</h1>
	   \<p>Text ...
		is preferred to
	   \<h1>Heading\</h1>
	   Text ...
	-->
\]]>

\<!ENTITY % body.content "(%heading | %text | %block | HR | ADDRESS)*">

\<!ELEMENT BODY O O  %body.content>

\<!ELEMENT BLOCKQUOTE - - %body.content>

\<![ %HTML.Recommended [
	\<!ENTITY % address.content "(%text)*">
\]]>
\<!ENTITY % address.content "(%text|P)*">
\<!ELEMENT ADDRESS - - %address.content>


\<!--================ Forms ===============================================-->

\<![ %HTML.Forms [

\<!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
\<!ATTLIST FORM
	ACTION %URI #REQUIRED
	METHOD (%HTTP-Method) GET
	ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
	>

\<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
			RADIO | SUBMIT | RESET |
			IMAGE | HIDDEN )">
\<!ELEMENT INPUT - O EMPTY>
\<!ATTLIST INPUT
	TYPE %InputType TEXT
	NAME CDATA #IMPLIED -- required for all but submit and reset --
	VALUE CDATA #IMPLIED
	SRC %URI #IMPLIED -- for image inputs -- 
	CHECKED (CHECKED) #IMPLIED
	SIZE CDATA #IMPLIED -- like NUMBERS,
				 but delimited with comma, not space --
	MAXLENGTH NUMBER #IMPLIED
	ALIGN (top|middle|bottom) #IMPLIED
	>

\<!ELEMENT SELECT - - (OPTION+)>
\<!ATTLIST SELECT
	NAME CDATA #REQUIRED
	SIZE NUMBER #IMPLIED
	MULTIPLE (MULTIPLE) #IMPLIED
	>

\<!ELEMENT OPTION - O (#PCDATA)>
\<!ATTLIST OPTION
	SELECTED (SELECTED) #IMPLIED
	VALUE CDATA #IMPLIED
	>

\<!ELEMENT TEXTAREA - - (#PCDATA)>
\<!ATTLIST TEXTAREA
	NAME CDATA #REQUIRED
	ROWS NUMBER #REQUIRED
	COLS NUMBER #REQUIRED
	>

\]]>


\<!--================ Document Head ========================================-->

\<!ENTITY % head.link "& LINK*">

\<![ %HTML.Recommended [
	\<!ENTITY % head.nextid "">
\]]>
\<!ENTITY % head.nextid "& NEXTID?">

\<!ENTITY % head.content "TITLE & ISINDEX? & BASE? & META*
			 %head.nextid
			 %head.link">

\<!ELEMENT HEAD O O  (%head.content)>

\<!ELEMENT LINK - O EMPTY>
\<!ATTLIST LINK
	HREF %URI #REQUIRED
        %linkExtraAttributes; >

\<!ELEMENT ISINDEX - O EMPTY>

\<!ELEMENT BASE - O EMPTY>
\<!ATTLIST BASE
        HREF %URI; #REQUIRED
        >

\<!ELEMENT NEXTID - O EMPTY>
\<!ATTLIST NEXTID N %linkName #REQUIRED>

\<!ELEMENT META - O EMPTY    -- Generic Metainformation -->
\<!ATTLIST META
        HTTP-EQUIV  NAME    #IMPLIED  -- HTTP response header name  --
        NAME        NAME    #IMPLIED  -- metainformation name       --
        CONTENT     CDATA   #REQUIRED -- associated information     --
        >


\<!--================ Document Structure ===================================-->

\<![ %HTML.Deprecated [
	\<!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
\]]>
\<!ENTITY % html.content "HEAD, BODY">

\<!ELEMENT HTML O O  (%html.content)>
\<!ENTITY % version.attr "VERSION CDATA #FIXED \&#34;%HTML.Version;\&#34;">

\<!ATTLIST HTML
	%version.attr;-- report DTD version to application --
	>



--cut-here--
-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<360c7m$da2@nkosi.well.com>" date="2989372086">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 05:08:06 UT
From: Robert Kohlhase \<rkohlhas@well.sf.ca.us>
Organization: The Whole Earth 'Lectronic Link, Sausalito, CA
Message-ID: <360c7m$da2@nkosi.well.com>
Keywords: database sgml www
Subject: Sources re electronic book retailing/libraries?

I'm looking for resources on future directions in electronic book
publishing and distribution (via the Internet or otherwise) from the
perspectives of book publishers and public libraries.  I'd love to hear
from other people interested in the topic, and would welcome tips on
sources such as mailing lists, professional associations, etc.  I'm
specifically looking for efforts using Oracle or other relational
databases, but will welcome all suggestions.  Thanks!

(That's the essence of this message -- see below for more details.)

Robert Kohlhase, rkohlhas@well.com


Details follow -----------------------------------

SPECIFIC TOPICS I'm interested in include:
- evolution of current print publishing houses to electronic media
- future infrastructure for electronic book retailing
- changing relationships of public libraries and forprofit book publishers
  (e.g. pay-per-use)
- development of rich, user-friendly tools for finding books/resources
  meeting certain criteria across publishers and sources
- magazine and newspaper publishing, reference and text publishing (i.e.,
  not just book publishing)

NOT SO INTERESTING TO ME would be information on low-level work
(e.g. troubleshooting Mosaic installations) -- that I've found already.
Also I'm not looking for "desktop publishing" or small-scale 'zine stuff.
Rather, I'm seeking big-picture, long range, industrial-strength
perspectives.

GOOD SOURCES might include names of:
- e-mail mailing lists for interested folks
- Universities with innovative programs (e.g. library science)
- Traditional publishing houses that moving (not just talking) on the topic
- USENET grous interested in the broader direction-of- the-industry
  questions
- Good magazines, journals, etc. to follow.
- Professional associations (something like "Association of Book Publishing
  Professionals Heavily Into Electronic Media" or whatever)

WHO'S ASKING?: I'm a system development guy with a background in relational
databases (specifically Oracle).  I want to work with book publishing and
library systems because I think society will be terrifically enriched by
the changes coming, and I'd like to be part of them.  I also need to pay
the rent -- I'm looking, ultimately, for contract system development work on
projects in this field.

MY PHONE if needed is +1 415 647 4663.
</message>
<message id="<LENST.94Sep24074105@dell.lysator.liu.se>" date="2989374065">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 05:41:05 UT
From: Lennart Staflin \<lenst@lysator.liu.se>
Organization: Lysator Computer Society, Link|ping University, Sweden
Message-ID: \<LENST.94Sep24074105@dell.lysator.liu.se>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no> <1994Sep21.203139.19909@sq.sq.com> <19940922T202631Z.erik@naggum.no>
Subject: Re: Parsing EMPTY elements

All this discussion about parsing SGML without a DTD has prompted me to
write an Perl5 library to do just that.

The library and two examples are available from
\<http://gopher.lysator.liu.se:70/1/information/SGML/LSDI> or
\<gopher://gopher.lysator.liu.se/11/information/SGML/LSDI>.

The file to load should contain a document instance only, no document type
declaration and should use

   1. no minimization,
   2. no entity references, and
   3. no markup declaration (except for comment declarations).

The library builds an internal data structure from an SGML document
instance.  The datastructure is a tree corresponding to the element
structure in the document instance.  The tree has three kinds of nodes:

	Element		element with gi, attributes and content
	Data		a data node
	Pi		a process instruction node

-- 
Lennart Staflin  \<lenst@lysator.liu.se>
              Will write DTDs for food.
</message>
<message id="<LENST.94Sep24104339@dell.lysator.liu.se>" date="2989385019">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 08:43:39 UT
From: Lennart Staflin \<lenst@lysator.liu.se>
Organization: Lysator Computer Society, Link|ping University, Sweden
Message-ID: \<LENST.94Sep24104339@dell.lysator.liu.se>
References: \<rieger.14.0014229A@colin.muc.de>
Subject: Re: Q: WWW Client DTDs

[Wolfgang Rieger]

|   To get it the other way round, there is one thing I'd like to now.  Are
|   there somewhere DTDs defining the tag set and structure certain clients
|   do accept.  For instance, is there a DTD defining what Mosaic 2.0 a 5
|   for windows does accept?

|   Such DTDs being available, one could at least check a supposed HTML
|   document not only against my-very-own-HTML.dtd release 0.5, but against
|   those for the most widely used WWW clients.

What if those DTDs are contradictory?  Like when Mosaic must have the A
element empty when used as destination anchor and W3 getting confused by
not having proper end tags for A.

-- 
Lennart Staflin  \<lenst@lysator.liu.se>
              Will write DTDs for food.
</message>
<message id="<CONNOLLY.94Sep24094611@austin2.hal.com>" date="2989406771">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 14:46:11 UT
From: Dan Connolly \<connolly@hal.com>
Organization: HaL Computer Systems, Inc.
Message-ID: \<CONNOLLY.94Sep24094611@austin2.hal.com>
References: \<rieger.14.0014229A@colin.muc.de>
Subject: Re: Q: WWW Client DTDs

[Wolfgang Rieger]

|   Concerning the HTML-DTD discussion.  There is a basic HTML DTD and
|   HTML+ and HTML2.0 proposals and some more in different stages of
|   discussion/standardization flying around on the net.

Yes... well, by "basic HTML DTD" perhaps you mean the one released as an
internet draft in 1993 that has been integrated into some projects,
modified for some purposes, etc.  It's not very useful at this point,
since, as you say, it doesn't match what the browsers do.

|   The problem with those (as I see it) is: nobody guarantees that a WWW
|   client will accept a document conforming to one of those DTDs (an
|   example being the DTD distributed with HoTMetaL).

It's very unfortunate that the initial release of HoTMetaL included a DTD
that differed significantly with "current practice."  The SoftQuad folks
thought they were doing folks a favor by getting them to use the HTML+ DTD
(well, a variation of it...).  They have realized their mistake, and will
be distributing the HTML 2.0 DTD in subsequent releases.  The unfortunate
part is that the damage is done -- everybody thinks that HoTMetaL is
"broken" and that HTML is irreconcilable with "real SGML."

|   To get it the other way round, there is one thing I'd like to now.  Are
|   there somewhere DTDs defining the tag set and structure certain clients
|   do accept.  For instance, is there a DTD defining what Mosaic 2.0 a 5
|   for windows does accept?  Or for Cello, or for ...?

Well... the HTML 2.0 DTD is supposed to be something of a compromise.  It
specifies a language that we believe (because of significant experience and
testing) is widely supported by the popular browsers.

The prose of the specification has quite a few admonishments like:

	NOTE: Some implementations consider '>' to be the end of a comment.

that generally point out bugs in Mosaic, or other areas of the spec that
are currently not implemented consistently.

|   Such DTDs being available, one could at least check a supposed HTML
|   document not only against my-very-own-HTML.dtd release 0.5, but against
|   those for the most widely used WWW clients.

I agree that it would be highly desirable for browser implementors to
include a DTD as part of the documentation for their browser.  "The
language supported by this browser is found in cello-html.dtd"

I don't think the browser implementors are familiar enough with SGML for
this to happen.

It is the intention of the HTML 2.0 standardization process to familiarize
the browser implementors with SGML issues, and to give them a starting
point.  The intention is that they can take the HTML 2.0 DTD and use it to
describe their extensions.

If users like yourself would request a DTD from the browser implementors
directly, that would help.

	HINT: The version of Mosaic that will be released at the Chicago
	WWW/Mosaic conference will support tables.  I have seen examples
	of the markup, but I have not seen a DTD.  I don't think they've
	developed one.  Please send mail to:

		mosaic-x@ncsa.uiuc.edu

	and request that they release a DTD with the Mosaic 2.5
	documentation.  Tell them I'm willing to write it for them, if
	they'll give me the details.

Dan
-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<CONNOLLY.94Sep24100440@austin2.hal.com>" date="2989407880">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 15:04:40 UT
From: Dan Connolly \<connolly@hal.com>
Organization: HaL Computer Systems, Inc.
Message-ID: \<CONNOLLY.94Sep24100440@austin2.hal.com>
References: <1994Sep23.204907.8365@kocrsv01.delcoelect.com>
Subject: Re: http, URLs and HTML

[Ravi Lakkaraju]

|   From my limited understanding of the way WWW works am I correct in
|   saying that:
|   1.  Mosaic can read documents conforming only to the HTML dtd.

Or plain text or GIF or XBM format, or lots of other formats by using
external viewers, as per the MIME/mailcap mechanism.
(See:
1343  I    N. Borenstein, "A User Agent Configuration Mechanism For Multimedia 
           Mail Format Information", 06/11/1992. (Pages=10) 
           (Format=.txt, .ps) 

at ftp://ds.internic.net/rfc/rfc1343.ps)

But yes... as far as SGML is concerned, HTML is the only thing Mosaic
groks.

|   2.  http is the protocol used for accessing packets of information
|       across the internet the addresses for which are specified in the
|       form of URLs.

Correct. "Packets" is an unusual term, but...

|   3.  The information transferred between the WEB servers is unrelated to
|       its URL.  i.e., the http has nothing to do with the content of the
|       packet of information accessed other than specifying where it is
|       located.

Not quite.  There is information in the HTTP protocol that specifies the
type of the data in the "packets."  For example, an HTTP server lables an
HTML document thusly:

	Content-Type: text/html

|   4.  Once that information is located and accessed the web server will
|       know what to do with it because it assumes it will be in \<html>.
|       By knowing what to do would also mean that Mosaic knows how to
|       display it because it is familiar with the\<tags>.

Not at all.  HTML just one data format.  It is somewhat ubiquitous because
support for HTML is required of all HTTP clients and servers.  But it is by
no means the only format.  Plain text, postscript, and several graphics
formats are widely supported.

In an HTTP request, the client specifies the data types it accepts ala:

	GET /this/document HTTP/1.0
	Accept: text/html, text/plain, application/postscript


|   Based on these points it seems like if it were possible to exchange
|   documents / packets of information using the htt protocol no matter
|   what dtd the documents conformed to then all I need is a viewer that
|   can understand the particular dtd used and know how to format the tags
|   properly (in the originally intended style).

The guys at SoftQuad are releasing a Mosaic bolt-on to do just this.
Contact yuri@sq.com for details. (I hope they don't mind my giving out this
info... it was part of a "pre-announcement" at Seybold.)

|   If I created documents using any arbitrary dtd, what is it in the web
|   server that restricts the transfer of that file with http?.  (Not
|   taking into account that the information once transferred is not
|   viewable in Mosaic).

Only your imagination and creative talents. :-)

|   I would like to know what you think about these issues.  My ultimate
|   objective being that one should not be concerned with what dtd is being
|   used for the document as long as the viewer can handle it and knows
|   what to do with the information..  I would like to here some general
|   discussions on these issues.

The WWW architecture is very much amenable to this sort of thing.

See:	"About document formats (Design Issues)" by Tim Berners-Lee
	http://info.cern.ch/hypertext/WWW/DesignIssues/Formats.html#4

HaL is developing products along these very lines.

For details, see:
	http://www.hal.com/products/sw/olias/index.html

Dan

-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<3620kg$mmh@agate.berkeley.edu>" date="2989425744">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 20:02:24 UT
From: "Jerome P. McDonough" \<jmcd@prophesy.Berkeley.EDU>
Organization: /etc/organization
Message-ID: <3620kg$mmh@agate.berkeley.edu>
References: <1994Sep23.203817.7641@kocrsv01.delcoelect.com>
Subject: Re: URLs, http and HTML

[Ravi Lakkaraju]

|   I would like to know what you think about these issues.  My ultimate
|   objective being that one should not be concerned with what dtd is being
|   used for the document as long as the viewer can handle it and knows
|   what to do with the information...  I would like to here some general
|   discussions on these issues.

If I get the gist of what you'd like to see, it's some type of
client-server setup for wide area network document retrieval & display that
will handle any-ol'-SGML-document.  I think there are probably quite a few
people working on such projects, using a variety of different protocols.
I'm working on a project that's trying to develop a probabilistic
information retrieval system that will handle any text-centered SGML
document (as opposed to graphics/audio/etc.)  and will communicate using
the Z39.50 protocol.  We're also trying to develop a client that will use
Z39.50 and hopefully do at least a half-way decent job of display SGML
documents.  I know that other library schools are working on similar
projects (Toronto definitely is, and I've heard rumors of some others).
I'm sure other academic disciplines and the commercial world are working on
similar systems.  If I had to guess, I'd say that the next 2-3 years will
see a rash of releases of SGML-document retrieval and display systems
adapted for Wide Area Network use.

We've found that SGML actually lends itself quite well to the server end of
the retrieval process.  While display of any-ol'-SGML on the client end may
be somewhat trickier, we're hoping that the application we're looking at
(text retrieval in a library and academic context) will limit the number of
different styles of work we need to display sufficiently to make it fairly
simple to implement (hey, we can dream, can't we? :).

-- 
Jerry McDonough -- jmcd@info.Berkeley.EDU             |    (......)
UCB Sch. of Lib. & Info. Studies                      |    \\ *  * /
                                                      |    \\  <>  /
"Don't worry.  I know what I'm doing!"                |     \\ -- /  SGNORMPF!!!
         -- From the Famous Last Words file           |      ||||
</message>
<message id="<dan.780442038@handel>" date="2989430838">
Newsgroups: comp.text.sgml
Date: 24 Sep 1994 21:27:18 UT
From: "J. Daniel Smith" \<dan@bristol.com>
Organization: Bristol Technology Inc.
Message-ID: \<dan.780442038@handel>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: Re: SGML Viewers and Formatters

[Mr. Raytrace]

|   I am looking for an SGML viewer that supports hypertext links and that
|   takes care of formatting the documents. We are hoping to publish SGML
:
|   So far, all the tools that I have seen have either lacking features,
|   horrid user interfaces, or cannot be re-distributed and packaged with
:
|   Can anyone point me to some companies/groups that might fit the bill?

We have a few tools that might be of some interest...our HyperHelp product
is an implementation of the Windows WinHelp viewer for X/Motif.  What you
may be more interested in are some of the tools we have, in particular our
own SGML help compiler.  We also have another tool which will turn our .hlp
files into Windows .hlp files.  If you want more details, send email to
info@bristol.com.

   Dan
-- 
====================== message is author's opinion only ======================
J. Daniel Smith         Bristol Technology Inc., Ridgefield, Connecticut (USA)
dan@bristol.com         +1 203 438 6969, 438 5013 (FAX)
                        FTP: ftp.bristol.com     WWW: http://www.bristol.com
</message>
<message id="<19940925T001913Z.erik@naggum.no>" date="2989441153">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 00:19:13 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940925T001913Z.erik@naggum.no>
References: <35s8kb$bu7@ruby.ora.com>
Subject: Re: General Currency Sign

[Terry Allen]

|   Just a question from curiosity: what uses are there for the general
|   currency sign defined in ISOnum
|   
|   \<!ENTITY curren SDATA "[curren]"--=general currency sign-->
|   
|   for use in countries that have a local character for currency?
|   Does it actually get used that way?

the history of this symbol appears to indicate that it was intended to be
used only as a reference point.  it appeared in the obsolete ISO 646 IRV
because some countries didn't want the dollar sign in the "international
version", probably in the misguided belief that it was a symbol of American
imperialism or some such nonsense.  the first three character sets using
ISO 646 were the IRV (ESC 2/8 4/0), BS 4730 (ESC 2/8 4/1, often called UK
ASCII), and ANSI X3.4-1968 (ESC 2/8 4/2, better known as ASCII), which
have, respectively, the currency symbol, the dollar sign, and the pound
sign in position 2/4.  other countries with no need for a currency symbol
have alternated between using the currency symbol and the dollar sign.  in
1991, ISO moved to replace the old ISO 646 IRV with ASCII as the new IRV.

this, in my eyes, makes the currency symbol no more than of historical
interest, which is important enough, but you shouldn't use it in new
documents.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940925T083614Z.erik@naggum.no>" date="2989470974">
Newsgroups: comp.std.internat,comp.std.misc,comp.text.sgml
Followup-To: poster
Date: 25 Sep 1994 08:36:14 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940925T083614Z.erik@naggum.no>
Subject: ISO 8613:1989

I'm seeking to unload my copy of ISO 8613:1989, all parts (1 through 8,
except 3).  this is ODA, the full name of which is: information processing
-- text and office systems -- office document architecture (ODA) and
interchange format.  since this was acquired at significant cost, and there
might be someone out there who would have liked to buy it from ISO had they
wanted to waste a lot of money, I'm willing to help you out by swapping it
with a much smaller amount of money, instead of just recycling it (which I
have been very reluctant to do for about three years).  I have not used it,
and it looks brand new.

if you think you'd have more use for this standard than a small amount of
money, please let me know.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<kimber.84.00071F63@passage.com>" date="2989483638">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 12:07:18 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.84.00071F63@passage.com>
Subject: The Web Vision and SGML

Last week I had the singular pleasure to spend many hours in close
conversation with both Tim Berners-Lee, the principle inventor of the World
Wide Web, and Dave Raggett, the steward of HTML 3.0 and its eventual
progeny.  In that time, I gained an appreciation for both the vision that
Tim and Dave had and have for the Web, as well as an appreciation for the
forces that led to the development of the Web and HTML as we know it today.
What impressed me the most about both Dave and Tim was their commitment to
a vision of much greater breadth and length than is apparent in the current
state of things, as well as a commitment to see that vision through, even
if the road to it is a bit crooked at times.  I have every confidence that
as long as they are involved in the continued development of the Web and
its associated technologies and standards, that things will work out for
the best, to the degree it's in their power to make it happen.

Last week was the 1994 European Conference on Hypertext, sponsored by the
ACM and SIGLINK.  This conference represented a particularly historic event
because it led to the meeting of Doug Engelbart, of Augment fame (you may
remember the video clip of him demonstrating the mouse in conjunction with
an interactive graphics tool shown in the PBS series on computers aired a
few years back) and Tim Berners-Lee.  Mr. Engelbart's vision of a networked
system of communication in which all information in a computer system could
be managed and interlinked as a giant collaborative workspace (to grossly
oversimplify) is in many ways identical to the vision that led Tim
Berners-Lee to develop the web.  Both saw not a technology alone, but a way
to use technology as a chance for humanity to build a new form of society
(perhaps) or, at least, a new way for humans to work together that would be
more productive and effective than the systems we have today (the name
"augment" coming from Doug's vision of a system that would "augment the
collective IQ").  To see these two men in conversation, and to hear them
both discuss their visions and the difficulties they've had seeing them
implemented was inspiring and fascinating.  The Web today can only be
considered a pale shadow of these men's visions, yet it's explosive
popularity has gotten all of us thinking about how we can do more.  I hope
we have, as a society, the wherewithal to fulfill these visions, keeping in
mind Tim's caution, delivered during his closing keynote address, that
there is a tremendous potential danger in developing a new society and that
we should proceed cautiously and carefully, mindful of the lessons of
earlier failures of social experiments.  It's clear that humans as a
society are in the process of evolving completely new and hitherto
impossible social structures made possible by internetworking.  Whatever
happens, both Tim and Doug will be remembered as key players in these
developments.  I hope the result is good.  I know that Doug and Tim want it
to be and are working very hard to make it so.

-- 
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<Cwoz3v.1K6@cogsci.ed.ac.uk>" date="2989495671">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 15:27:51 UT
From: Steve Finch \<steve@cogsci.ed.ac.uk>
Organization: Centre for Cognitive Science, Edinburgh, UK
Message-ID: \<Cwoz3v.1K6@cogsci.ed.ac.uk>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no> <9409221625.AA17721@mercury>
Subject: Re: Parsing EMPTY elements.

[Mary Holstege]

|   No, it's not a matter of pathological aversion to DTDs.

DTDs have a major role to play in data driven parameterization of software
(especially PUBLIC ones), and provide many representation utilities (e.g.,
default values for attributes, entities, and so on).  I once shared the
view that SGML didn't need DTDs to be useful as a general representation
format; now I find that much of the data-driven parameterization of
software I use happens in the DTD, and not in the data itself.

|   Consider this scenario:
|
|   You have an information server on the far side of a WAN.  This
|   information server gets requests for documents and portions of
|   documents which it feeds back to client viewer programs.  These client
|   viewer programs toss the information up on the screen for people to
|   read.  There may be many *different* DTDs in the information base.
|   Today we have CALS, tomorrow OSF, who knows?  I may connect to a
|   different information server tomorrow with information stored using a
|   different DTD.  The program shouldn't have to care, so it must be data
|   driven.  Now, you look at the SGML markup and you say, geez, I can make
|   this client program trees simple because any clown can parse start
|   tags, end tags, and attribute values, provided the server feeds back
|   unminimized SGML.  I can avoid having the client include a full-up SGML
|   parser, and have to separately request DTDs, and somehow merge them
|   when it is combining information from multiple documents that have
|   different DTDs, and know about entity management, etc., etc., and just
|   know how to parse a very simple data format and display it.[*]
|
|   [*] Yes, you do have to impose some sort of standards with respect to
|   formatting information.  Separate problem.

But this representation-static information naturally resides not in the
dynamic data stream, but in the data stream's specification.  In any case,
in data driven processing you're always going to have the data itself, and
information about the data necessary to process it.  So you've always got
to have at least two entities, so why not data and DTD?

I think that empty end tags should be optional (as it would appear from
their DTD declaration, after all), and that a subset of SGML (*) should be
defined which assumes an \<!ELEMENT X - - ANY > declaration for all X (with
each element having any CDATA attributes whatsoever), and having only
entities which are either character entities, or self-expanding entities,
and where "<" is always represented in data as \&lt;, and "&" and \&amp;.
The ESIS representation of an SGML document (or part thereof) then
corresponds to one and precisely one such readily parsable stream.  I
currently use this definition of "normalized" SGML as a carrier format for
information, but have found that even with this format, it still is very
useful to use DTDs for default attribute values and debugging (and
entities, which I expand in attribute values to facilitate better data
driven processing).

One very important concern, however, is what happens when element structure
is manipulated if a DTD with a non trivial content model is in place.  To
insert or delete an element in the presence of a DTD requires the ability
to know that the manipulation preserves conformance, or at least the
checking of conformance after the manipulation has been done.  This is
slow, difficult, and restricts the sorts of manipulations which can be done
more than is the case that if the ESIS isomorphic option is used (I'm not
really saying this is a Bad Thing, just that I don't know how to do it).
Of course, minimization would have to go, but then SGML can't be all things
to all people.  This is my main concern, not DTDs per se at all (which I
think are exceedingly useful).

(*) OK, I know this isn't a subset, but it's a subset in the sense that any
particular document (or any finite set of documents) will have such a DTD.

Steve.
</message>
<message id="<Cwp1Lo.2oI@cogsci.ed.ac.uk>" date="2989498903">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 16:21:43 UT
From: Steve Finch \<steve@cogsci.ed.ac.uk>
Organization: Centre for Cognitive Science, Edinburgh, UK
Message-ID: \<Cwp1Lo.2oI@cogsci.ed.ac.uk>
References: <9409191825.AA20376@mercury> <19940921T002950Z.erik@naggum.no> <1994Sep21.203139.19909@sq.sq.com> <19940922T202631Z.erik@naggum.no>
Subject: Re: Parsing EMPTY elements

[Erik Naggum]

|   instead, I think [sgml processing] requires an interpretative
|   development environment that understands SGML as a native data
|   representation.  lacking any serious alternatives, this indicates that
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Proof by assertion?

|   LISP should be the ideal development environment, and that SGML
|   documents would just be represented externally in a slightly different
|   form than ordinary LISP structures and lists.
|
|   to be concrete in my suggestions: what if LISP's `read' function, or an
|   equivalent, could return something you could actually work on with
|   LISP's enormous set of functions to work on lists?  would you use LISP
|   if it came with such a function and some reasonable set of functions to
|   work on the SGML-specific parts of the structures?

Let's not be programming language specific; this is what's needed (together
with a print equivalent and several search and manipulation functions ---
an API basically), but it can be done in any programming language.  I use
one written in C, for example, and there are very good reasons for using C
and C++, not the least of which is its extensive commercial use.

OK, this is just a post to say that some of us already do this :-)

Steve.

-- 
------------------------------------------------------------------------------
When you steal from one person, it's called Plagiarism-
When you steal from many, it's research.                       - Wilson Misner
------------------------------------------------------------------------------
Steven Finch   Phone: +44 31 650 4656     | University of Edinburgh
					  | Language Technology Group
					  | Human Communication Research Centre
ARPA:  S.Finch%ed.ac.uk@nsfnet-relay.ac.uk| 2 Buccleuch Place
JANET: steve@uk.ac.ed.cogsci              | Edinburgh            EH8 9LW
</message>
<message id="<CONNOLLY.94Sep25121224@austin2.hal.com>" date="2989501944">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 17:12:24 UT
From: Dan Connolly \<connolly@hal.com>
Organization: HaL Computer Systems, Inc.
Message-ID: \<CONNOLLY.94Sep25121224@austin2.hal.com>
References: \<rieger.14.0014229A@colin.muc.de> \<LENST.94Sep24104339@dell.lysator.liu.se>
Subject: Re: Q: WWW Client DTDs

[Lennart Staflin]

|   What if those DTDs are contradictory?  Like when Mosaic must have the A
|   element empty when used as destination anchor and W3 getting confused
|   by not having proper end tags for A.

Please don't confuse the issue any more than it is.  The situation is, for
example, if you have:

	\<ul>
	\<li>\<a href="#l1">one\</a>
	\<li>\<a href="#l2">two\</a>
	\<li>\<a href="#l3">three\</a>
	\</ul>

	\<ul>
	\<li>\<a name="l1">one: parses and works\</a>
	\<li>\<a name="l2">two: doesn't parse, but works
	\<li>\<a name="l3">\</a>three: parses, but doesn't work
	\</ul>

you will find that the fist link target parses by the DTD (all versions
that I've ever seen...) and works as expected -- mosaic will find the
linked text.

The second doesn't parse (\</a> is not omissable) but Mosaic finds the
linked text; i.e., it "works."

The third parses, but due to some strange code in Mosaic, doesn't work.
This is a known bug.

-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<1994Sep25.140203.11493@wvnvms>" date="2989504923">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 18:02:03 UT
From: "Joy J. Zhang" \<un020767@wvnvms.wvnet.edu>
Organization: West Virginia Network for Educational Telecomputing
Message-ID: <1994Sep25.140203.11493@wvnvms>
Subject: wordperfect ->PostScript

Anybody can tell me where can I find a utility software that can convert
RFT, WordPerfact or Word file to PostScript file.

thanks you!!
</message>
<message id="<HANCHE.94Sep25211210@pyanfar.imf.unit.no>" date="2989512730">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 20:12:10 UT
From: Harald Hanche-Olsen \<hanche@imf.unit.no>
Organization: University of Trondheim, Norway
Message-ID: \<HANCHE.94Sep25211210@pyanfar.imf.unit.no>
References: <35uq74$83l@korfu.igd.fhg.de>
Subject: Re: ANNOUNCEMENT: WWW-Version of "The Whirlwind Giude to SGML Tools"

[Hans Holger Rath]

|   Hi everyone!
|   You can find a WWW-Version of Steve's
|
|   	"The Whirlwind Guide to SGML Tools"
|
|   at the URL
|
|   	http://zgdv.igd.fhg.de/papers/ed

Does not work, at least not with the version of mosaic I run, because the
server redirects you to http://archimedes/papers/ed/ rather than to

  http://archimedes.igd.fhg.de/papers/ed/

like it ought to (or should mosaic have been clever enough to figure that
out?)  For now, just get the latter URL.

- Harald
</message>
<message id="<1994Sep25.134647.1@east.pima.edu>" date="2989514807">
Newsgroups: comp.text.sgml
Date: 25 Sep 1994 20:46:47 UT
From: Gloria McMillan \<gmcmillan@east.pima.edu>
Message-ID: <1994Sep25.134647.1@east.pima.edu>
Subject: Read TEI with SGML viewer?

I am news to SGML and TEI (encoding used for classic books, ongoing
international encoding initiative at Oxford and elsewhere.)

I just downloaded a couple of these TEI-encoded classics from Oxford.  It
would seem that some SGML viewing software would be what a scholar would
use to run various searches through these texts.

The lab imagineer over at West Campus of Pima told me that this is what he
thinks.  Are any of you familiar with TEI-encoding system?  What do you use
to analyze those texts.

                               Thanks!!!

			    :*)	Gloria :*)
</message>
<message id="<366t8g$dnr@ams.amsinc.com>" date="2989586128">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 16:35:28 UT
From: Mark Murphy \<mark_murphy@mail.amsinc.com>
Organization: American Management Systems, Inc.
Message-ID: <366t8g$dnr@ams.amsinc.com>
References: <35uq74$83l@korfu.igd.fhg.de> \<HANCHE.94Sep25211210@pyanfar.imf.unit.no>
Subject: Re: ANNOUNCEMENT: "The Whirlwind Giude to SGML Tools"

I can get neither URL to work with WinMosaic.  Is there a non-WWW version
(e.g., PostScript) available for FTP?  I couldn't find one on this site.

Thanks in advance!
</message>
<message id="<9409261828.AA22050@source.asset.com>" date="2989592939">
Newsgroups: comp.text.sgml}
Date: 26 Sep 1994 18:28:59 UT
From: "Claude L. Bullard" \<bullardc@source.asset.com>
Message-ID: <9409261828.AA22050@source.asset.com>
Subject: Metafile for Interactive Documents (MID)

Because I have received a number of inquiries about the effort to design
the MID DTD, I am posting this message to notify concerned parties that the
last MID Design Team meeting was successfully completed on Friday,
September 23, 1994.  Inquiries about the MID should be directed to John
Junod at the David Taylor Model Basin (DTMB - Carderock)
\<junod@oasys.dt.navy.mil> for the time being.

Ongoing MID efforts here at Unisys Huntsville include preparing the Draft
Final Report on the MID design for DTMB.  As the report becomes the
property of the US Navy, copies must be requested through DTMB.  As there
were *major and significant* changes in the design between the Vancouver
meeting that resulted in the recently published SGML TAG article and the
final meeting in Huntsville, concerned parties are encouraged to obtain the
final report for technical details of the MID DTD and its application to
Interactive Electronic Technical Manuals (IETMs).

After official release of the final report, I shall be happy to provide
detailed information about the MID DTD and its applications to the CTS
forum on an informal basis.  This should be possible sometime in late
November or early December, 1994.  I believe that information about the MID
DTD will also be posted to the WWW, but if so, this will be done by DTMB.

Let me take this opportunity to publicly commend the MID Team, the MID
reviewers, the MID Advisory Panel, and the MID sponsors for the excellent
work that has resulted in the MID DTD.  To paraphrase one of the members,
"nothing good is accomplished without objections", but let me also add,
little of significance is achieved without team effort.  IMHO, this is
indeed an example when within a short schedule and using few resources,
something of significance has been achieved.

Len Bullard
MID Team Chairman and Document Editor
Unisys Huntsville
</message>
<message id="<1994Sep26.190826.5680@ast.saic.com>" date="2989595306">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 19:08:26 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep26.190826.5680@ast.saic.com>
References: \<CONNOLLY.94Sep23180619@ulua.hal.com>
Subject: Re: Reorganized HTML 2.0 DTD

[Dan Connolly]

|   The HTML 2.0 specification review is heating up again in light of the
|   upcoming WWW conference in Chicago.

This DTD will not parse with SGMLS or other validating parsers until the
contents are embedded in a \<!doctype HTML [ ...]> statement and prefaced by
an appropriate SGML declaration. I "borrowed" the header for the original
html.dtd that came with the sgmls source. The following file parses
correctly with the command sgmls -p filename where filename contains
everything that follows the "-- Cut Here --" lines.

-- 
"One man's syntax is another man's semantics."

-------------------------------- Cut Here ---------------------------------

\<!SGML  "ISO 8879:1986"
--
        Document Type Definition for the HyperText Markup Language 
        as used by the World Wide Web application (HTML DTD).

        NOTE: This is a definition of HTML with respect to
        SGML, and assumes an understanding of SGML terms.

        If you find bugs in this DTD or find it does not compile
        under some circumstances please mail www-bug@info.cern.ch
--

CHARSET
         BASESET  "ISO 646:1983//CHARSET
                   International Reference Version (IRV)//ESC 2/5 4/0"
         DESCSET  0   9   UNUSED
                  9   2   9
                  11  2   UNUSED
                  13  1   13
                  14  18  UNUSED
                  32  95  32
                  127 1   UNUSED
     BASESET   "ISO Registration Number 100//CHARSET
                ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
     DESCSET   128 32 UNUSED
               160 95 32
               255  1 UNUSED


CAPACITY        SGMLREF
                TOTALCAP        150000
                GRPCAP          150000
  
SCOPE    DOCUMENT
SYNTAX   
         SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
                           19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
         BASESET  "ISO 646:1983//CHARSET
                   International Reference Version (IRV)//ESC 2/5 4/0"
         DESCSET  0 128 0
         FUNCTION RE          13
                  RS          10
                  SPACE       32
                  TAB SEPCHAR  9
         NAMING   LCNMSTRT ""
                  UCNMSTRT ""
                  LCNMCHAR ".-"
                  UCNMCHAR ".-"
                  NAMECASE GENERAL YES
                           ENTITY  NO
         DELIM    GENERAL  SGMLREF
                  SHORTREF SGMLREF
         NAMES    SGMLREF
         QUANTITY SGMLREF
                  NAMELEN  32
                  TAGLVL   100
                  LITLEN   1024
                  GRPGTCNT 150
                  GRPCNT   64                   

FEATURES
  MINIMIZE
    DATATAG  NO
    OMITTAG  NO
    RANK     NO
    SHORTTAG NO
  LINK
    SIMPLE   NO
    IMPLICIT NO
    EXPLICIT NO
  OTHER
    CONCUR   NO
    SUBDOC   NO
    FORMAL   YES
  APPINFO    NONE
>

\<!DOCTYPE HTML [

\<!--	html.dtd

        Document Type Definition for the HyperText Markup Language (HTML DTD)

	$Id: html.dtd,v 1.19 1994/09/23 22:46:51 connolly Exp $

	Author: Daniel W. Connolly \<connolly@hal.com>
	See Also: html.decl, html-0.dtd, html-1.dtd
		  http://www.hal.com/%7Econnolly/html-spec/index.html
		  http://info.cern.ch/hypertext/WWW/MarkUp2/MarkUp.html
-->

\<!ENTITY % HTML.Version
	"-//IETF//DTD HTML//EN//2.0"

        -- Typical usage:

            \<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
	    \<html>
	    ...
	    \</html>
	--
	>


\<!--================== Feature Test Entities ==============================-->

\<!ENTITY % HTML.Recommended "IGNORE"
	-- Certain features of the language are necessary for compatibility
	   with widespread usage, but they may compromise the structural
	   integrity of a document. This feature test entity enables
	   a more prescriptive document type definition that eliminates
	   the above features.
	-->

\<![ %HTML.Recommended [
	\<!ENTITY % HTML.Deprecated "IGNORE">
\]]>

\<!ENTITY % HTML.Deprecated "INCLUDE"
	-- Certain features of the language are necessary for compatibility
	   with earlier versions of the specification, but they tend
	   to be used an implemented inconsistently, and their use is
	   deprecated. This feature test entity enables a document type
	   definition that eliminates these features.
	-->

\<!ENTITY % HTML.Highlighting "INCLUDE">
\<!ENTITY % HTML.Forms "INCLUDE">

\<!--================== Imported Names =====================================-->

\<!ENTITY % Content-Type "CDATA"
	-- meaning a MIME content type, as per RFC1521
	-->

\<!ENTITY % HTTP-Method "GET | POST"
	-- as per HTTP specification
	-->

\<!ENTITY % URI "CDATA"
        -- The term URI means a CDATA attribute
           whose value is a Uniform Resource Identifier,
           as defined by 
	"Universal Resource Identifiers" by Tim Berners-Lee
	aka http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html
	aka RFC 1630

	Note that CDATA attributes are limited by the LITLEN
	capacity (1024 in the current version of html.decl),
	so that URIs in HTML have a bounded length.

        -->


\<!-- DTD "macros" -->

\<!ENTITY % heading "H1|H2|H3|H4|H5|H6">

\<!ENTITY % list " UL | OL | DIR | MENU " >


\<!--================ Character mnemonic entities ==========================-->

\<!ENTITY % ISOlat1 PUBLIC
  "-//IETF//ENTITIES Added Latin 1 for HTML//EN">
%ISOlat1;

\<!ENTITY amp CDATA "\&#38;"     -- ampersand          -->
\<!ENTITY gt CDATA "\&#62;"      -- greater than       -->
\<!ENTITY lt CDATA "\&#60;"      -- less than          -->
\<!ENTITY quot CDATA "\&#34;"    -- double quote       -->


\<!--=================== Text Markup =======================================-->

\<![ %HTML.Highlighting [

\<!ENTITY % font " TT | B | I ">

\<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">

\<!ENTITY % text "#PCDATA | A | IMG | BR | %phrase | %font">

\<!ENTITY % pre.content "#PCDATA | A | HR | BR | %font | %phrase">

\<!ELEMENT (%font;|%phrase) - - (%text)+>

\]]>

\<!ENTITY % text "#PCDATA | A | IMG | BR">

\<!ELEMENT BR    - O EMPTY>


\<!--================== Link Markup ========================================-->

\<![ %HTML.Recommended [
	\<!ENTITY % linkName "ID">
\]]>

\<!ENTITY % linkName "CDATA">

\<!ENTITY % linkType "NAME"
	-- a list of these will be specified at a later date -->

\<!ENTITY % linkExtraAttributes
        "REL %linkType #IMPLIED -- forward relationship type --
        REV %linkType #IMPLIED -- reversed relationship type
                              to referent data --
        URN CDATA #IMPLIED -- universal resource number --

        TITLE CDATA #IMPLIED -- advisory only --
        METHODS NAMES #IMPLIED -- supported public methods of the object:
                                        TEXTSEARCH, GET, HEAD, ... --
        ">

\<![ %HTML.Recommended [
	\<!ENTITY % A.content   "(%text)+"
	-- \<H1>\<a name="xxx">Heading\</a>\</H1>
		is preferred to
	   \<a name="xxx">\<H1>Heading\</H1>\</a>
	-->
\]]>

\<!ENTITY % A.content   "(%heading|%text)+">

\<!ELEMENT A     - - %A.content -(A)>

\<!ATTLIST A
	HREF %URI #IMPLIED
	NAME %linkName #IMPLIED
        %linkExtraAttributes;
        >

\<!--=================== Images ============================================-->

\<!ENTITY % img.alt.default "#IMPLIED"
	-- ALT attribute required in Level 0 docs -->

\<!ELEMENT IMG    - O EMPTY --  Embedded image -->
\<!ATTLIST IMG
        SRC %URI;  #REQUIRED     -- URI of document to embed --
	ALT CDATA %img.alt.default;
	ALIGN (top|middle|bottom) #IMPLIED
	ISMAP (ISMAP) #IMPLIED
        >


\<!--=================== Paragraphs=========================================-->

\<!ELEMENT P     - O (%text)+>


\<!--=================== Headings, Titles, Sections ========================-->

\<!ELEMENT HR    - O EMPTY -- horizontal rule -->

\<!ELEMENT ( %heading )  - -  (%text;)+>

\<!ELEMENT TITLE - -  (#PCDATA)
          -- The TITLE element is not considered part of the flow of text.
             It should be displayed, for example as the page header or
             window title.
          -->


\<!--=================== Text Flows ========================================-->

\<![ %HTML.Forms [
	\<!ENTITY % block.forms "| FORM | ISINDEX">
\]]>

\<!ENTITY % block.forms "">

\<![ %HTML.Deprecated [
	\<!ENTITY % preformatted "PRE | XMP | LISTING">
\]]>

\<!ENTITY % preformatted "PRE">

\<!ENTITY % block "P | %list | DL
	| %preformatted
	| BLOCKQUOTE %block.forms">

\<!ENTITY % flow "(%text|%block)*">

\<!ENTITY % pre.content "#PCDATA | A | HR | BR">
\<!ELEMENT PRE - - (%pre.content)+>

\<!ATTLIST PRE
        WIDTH NUMBER #implied
        >

\<![ %HTML.Deprecated [

\<!ENTITY % literal "CDATA"
	-- special non-conforming parsing mode where
	   the only markup signal is the end tag
	   in full
	-->

\<!ELEMENT XMP - -  %literal>
\<!ELEMENT LISTING - -  %literal>
\<!ELEMENT PLAINTEXT - O %literal>

\]]>


\<!--=================== Lists =============================================-->

\<!ELEMENT DL    - -  (DT*, DD?)+>
\<!ATTLIST DL
	COMPACT (COMPACT) #IMPLIED>

\<!ELEMENT DT    - O (%text)+>
\<!ELEMENT DD    - O %flow>

\<!ELEMENT (OL|UL) - -  (LI)+>
\<!ELEMENT (DIR|MENU) - -  (LI)+ -(%block)>
\<!ATTLIST (%list)
	COMPACT (COMPACT) #IMPLIED>

\<!ELEMENT LI    - O %flow>

\<!--=================== Document Body =====================================-->

\<![ %HTML.Recommended [
	\<!ENTITY % body.content "(%heading|%block|HR|ADDRESS)*">
	-- \<h1>Heading\</h1>
	   \<p>Text ...
		is preferred to
	   \<h1>Heading\</h1>
	   Text ...
	-->
\]]>

\<!ENTITY % body.content "(%heading | %text | %block | HR | ADDRESS)*">

\<!ELEMENT BODY O O  %body.content>

\<!ELEMENT BLOCKQUOTE - - %body.content>

\<![ %HTML.Recommended [
	\<!ENTITY % address.content "(%text)*">
\]]>
\<!ENTITY % address.content "(%text|P)*">
\<!ELEMENT ADDRESS - - %address.content>


\<!--================ Forms ===============================================-->

\<![ %HTML.Forms [

\<!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
\<!ATTLIST FORM
	ACTION %URI #REQUIRED
	METHOD (%HTTP-Method) GET
	ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
	>

\<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
			RADIO | SUBMIT | RESET |
			IMAGE | HIDDEN )">
\<!ELEMENT INPUT - O EMPTY>
\<!ATTLIST INPUT
	TYPE %InputType TEXT
	NAME CDATA #IMPLIED -- required for all but submit and reset --
	VALUE CDATA #IMPLIED
	SRC %URI #IMPLIED -- for image inputs -- 
	CHECKED (CHECKED) #IMPLIED
	SIZE CDATA #IMPLIED -- like NUMBERS,
				 but delimited with comma, not space --
	MAXLENGTH NUMBER #IMPLIED
	ALIGN (top|middle|bottom) #IMPLIED
	>

\<!ELEMENT SELECT - - (OPTION+)>
\<!ATTLIST SELECT
	NAME CDATA #REQUIRED
	SIZE NUMBER #IMPLIED
	MULTIPLE (MULTIPLE) #IMPLIED
	>

\<!ELEMENT OPTION - O (#PCDATA)>
\<!ATTLIST OPTION
	SELECTED (SELECTED) #IMPLIED
	VALUE CDATA #IMPLIED
	>

\<!ELEMENT TEXTAREA - - (#PCDATA)>
\<!ATTLIST TEXTAREA
	NAME CDATA #REQUIRED
	ROWS NUMBER #REQUIRED
	COLS NUMBER #REQUIRED
	>

\]]>


\<!--================ Document Head ========================================-->

\<!ENTITY % head.link "& LINK*">

\<![ %HTML.Recommended [
	\<!ENTITY % head.nextid "">
\]]>
\<!ENTITY % head.nextid "& NEXTID?">

\<!ENTITY % head.content "TITLE & ISINDEX? & BASE? & META*
			 %head.nextid
			 %head.link">

\<!ELEMENT HEAD O O  (%head.content)>

\<!ELEMENT LINK - O EMPTY>
\<!ATTLIST LINK
	HREF %URI #REQUIRED
        %linkExtraAttributes; >

\<!ELEMENT ISINDEX - O EMPTY>

\<!ELEMENT BASE - O EMPTY>
\<!ATTLIST BASE
        HREF %URI; #REQUIRED
        >

\<!ELEMENT NEXTID - O EMPTY>
\<!ATTLIST NEXTID N %linkName #REQUIRED>

\<!ELEMENT META - O EMPTY    -- Generic Metainformation -->
\<!ATTLIST META
        HTTP-EQUIV  NAME    #IMPLIED  -- HTTP response header name  --
        NAME        NAME    #IMPLIED  -- metainformation name       --
        CONTENT     CDATA   #REQUIRED -- associated information     --
        >


\<!--================ Document Structure ===================================-->

\<![ %HTML.Deprecated [
	\<!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
\]]>
\<!ENTITY % html.content "HEAD, BODY">

\<!ELEMENT HTML O O  (%html.content)>
\<!ENTITY % version.attr "VERSION CDATA #FIXED \&#34;%HTML.Version;\&#34;">

\<!ATTLIST HTML
	%version.attr;-- report DTD version to application --
	>

]>
</message>
<message id="<1994Sep26.190903.18702@falch.no>" date="2989595343">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 19:09:03 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep26.190903.18702@falch.no>
References: <35uq74$83l@korfu.igd.fhg.de> \<HANCHE.94Sep25211210@pyanfar.imf.unit.no> <366t8g$dnr@ams.amsinc.com>
Subject: Re: ANNOUNCEMENT: "The Whirlwind Giude to SGML Tools"

[Mark Murphy]

|   I can get neither URL to work with WinMosaic.  Is there a non-WWW
|   version (e.g., PostScript) available for FTP?  I couldn't find one on
|   this site.

ftp://ifi.uio.no/pub/SGML/SGML-Tools/
ftp://falch.no/pub/SGML-Tools/

People having trouble reaching Hans Holger's URL can try

http://www.falch.no/~pepper/SGML-Tools.html

Steve
</message>
<message id="<182@c-art-w.wimsey.bc.ca>" date="2989595708">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 19:15:08 UT
From: John Eadie \<jme@c-art.com>
Organization: Computing Art Inc
Message-ID: <182@c-art-w.wimsey.bc.ca>
References: <1994Sep21.011746.14931@rat.csc.calpoly.edu>
Subject: Re: SGML Viewers and Formatters

[Mr. Raytrace]

|   I am looking for an SGML viewer that supports hypertext links and that
|   takes care of formatting the documents.  We are hoping to publish SGML
|   documents on a CD-ROM, using a Windows based viewer.

.. several viewers described ..

You might also consider OLIAS, by HaL Computer Systems -- the browser is
designed to access multiple sources of SGML including the www using the
same interface.  The windows port is available soon.

To show off the OLIAS searching capabilities HaL handed out CD-ROMs at
Seybold the week before last, that contain an SGML Info Library with 100k
abstracts from www documents.  On the CD-ROM you get the infolib, a
browser, plus the Broker that accesses www through your local firewall.
Anybody that would like to try an OLIAS `Index to the World-Wide Web'
CD-ROM (for SPARC) can contact me ..

-jme

Ps: OLIAS Version1.1 features architectural forms dtd-to-dtd conversion, an
incorporated parser, a more complete web-browser, etc, etc.

--
John Eadie  _COMPUTING ART Inc_
  klee wyck Cottage, 120 Keith Road, West Vancouver BC  V7T 1L3
   # jme@c-art.com #  416.287.6811 -or- 604.922.5104  Fax 604.922.5194
 
`The monks who did not buy printing presses are now making wine.  In the
years ahead, some of us will make the same choice'  Steven Cherry, ELSEVIER
</message>
<message id="<Cwr5Cy.AJI@zeno.fit.edu>" date="2989597281">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 19:41:21 UT
From: Priya Asokarathinam \<rcs13472@cs.fit.edu>
Organization: Florida Institute of Technology
Message-ID: \<Cwr5Cy.AJI@zeno.fit.edu>
Subject: cweb to html

I am a student writing a thesis part of which will provide a mosaic
interface to a graphics tutorial.  I'm writing the tutorial in LaTeX and
using LaTeX2HTML to convert a .tex file to a .html file.  I want to include
CWEB program fragments in the mosaic application.  My questions are:

Is there a way to include the latex and html commands into my .web files so
that when I cweave it, it doesn't complain?

The cwebmac.tex file indicates that there is a separate set of macros for
LaTeX documents.  Does anyone know where to find it, or if it exists at
all?

If possible, please mail answers directly to me.

Thank you
Priya Asokarathinam
-- 
e-mail:  rcs13472@cs.fit.edu, rcs13472@zach.fit.edu, priya@ee.fit.edu

http://tuck.cs.fit.edu/~rcs13472
</message>
<message id="<CONNOLLY.94Sep26154911@ulua.hal.com>" date="2989601351">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 20:49:11 UT
From: Dan Connolly \<connolly@ulua.hal.com>
Organization: HaL Software Systems, Inc.
Message-ID: \<CONNOLLY.94Sep26154911@ulua.hal.com>
Subject: Multilingual HTML, SGML documents?

There is a lot of pressure to be able to represent documents in various
languages in HTML.

There are two problems to address:

	Problem 1: How do I use writing system X in HTML?

HTML is specified to include ISO8859-1, so the western European languages
are representable.

But there is no specified way to write a document in Russian, let along
Japanese.  This issue must be addressed _very_ soon.

Would someone care to post a complete SGML document -- declaration,
prologue, and instance -- that uses cyrillic(sp?) data characters?  (i.e.,
SDATA entities don't count.)  I assume the SGML declaration would be
different from the one used for HTML in a way that reflects the different
document character set, no?  Does sgmls support the resulting SGML
declaration?

How about a corresponding example using Japanese characters?  What does an
SGML declaration look like for a document with multi-byte characters?

	Problem 2: How do I use writing systems X, Y, and Z in
	the same HTML document?

It's not clear to me that HTML should be enhance to include this feature.
Certainly we need a document format in which multiple languages are
expressible.  But that feature will not be used widely enough to require
support in all WWW clients, as is the case for HTML.

In any case, someone posted an "acid test" in comp.os.linux.development, of
all places.  The test is: can I quote from the Quran (sp?) and from
Shakespear in the same paragraph?

Would someone care to post a complete SGML document (declaration, prologue,
and instance) giving an example of a solution to this problem?

Dan
-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<1994Sep26.221212.171@exoterica.com>" date="2989606332">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 22:12:12 UT
From: "Eric R. Skinner" \<ers@exoterica.com>
Organization: Exoterica Corporation
Message-ID: <1994Sep26.221212.171@exoterica.com>
References: <1994Sep21.203800.20126@sq.sq.com> <19940922T202939Z.erik@naggum.no>
Subject: Re: SGML and error recovery

[Liam R. E. Quin]

|   But providing good error recovery for OMITTAG is much harder.

[Erik Naggum]

|   good error recovery in SGML seems to be inordinatly difficult.  I have
|   not looked closely at this topic.  does anybody out there have anything
|   to offer that wouldn't compromise company secrecy policies?  research
|   reports?  I know Exoterica has done good work here, but is any of this
|   available in their technical report series?

Sorry for the late reply - I'm a little behind in my reading.

Yes, we do sufficient error recovery to allow single-pass validation with
even the most nasty of problems present in the DTD or instance.  I don't
believe we have any public documents on how our error recovery works, but
I'll check.

Regards,
-- 
Eric R. Skinner                          ers@exoterica.com
Exoterica Corporation                  Tel +1 613 722 1700
Ottawa, Canada                         Fax +1 613 722 5706
Product information:                    info@exoterica.com
</message>
<message id="<1994Sep26.222358.301@exoterica.com>" date="2989607038">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 22:23:58 UT
From: "Eric R. Skinner" \<ers@exoterica.com>
Organization: Exoterica Corporation
Message-ID: <1994Sep26.222358.301@exoterica.com>
References: \<Y095420.940920.S@ozemail.com.au>
Subject: Re: Attempting to aquire DTDs

Kate Steketee]

|   you can get an extensive test suite on CDROM from Software Exoterica.

We have two discs available.

1. The Compleat SGML

   This disc contains an online version of ISO8879 (running under Asymetrix
   Toolbook for Windows) with extensive hypertext.  The body of the
   standard is linked throughout to appropriate test files from the
   Exoterica ISO8879 SGML Conformance Test Suite.  You can view the test
   files, (SGML and RAST) or select a collection of test files that meet
   certain criteria, but you cannot extract the test files for your own
   use.

   Windows only.  Cost is roughly US$95 depending on where you live.

2. The ISO8870 SGML Conformance Test Suite

   This disc contains the test files (SGML and RAST) and a utility in C
   source-code form for selecting subsets of the test files.  Recommended
   for parser writers and SGML product developers.
   
   ISO9660 format, any platform. Cost is in the $2,000 neighborhood.


If you'd like more information, call us or write to info@exoterica.com.  Or
in Europe, info@europe.exoterica.com.

Regards,
-- 
Eric R. Skinner                          ers@exoterica.com
Exoterica Corporation                  Tel +1 613 722 1700
Ottawa, Canada                         Fax +1 613 722 5706
Product information:                    info@exoterica.com
</message>
<message id="<1994Sep26.233841.22334@midway.uchicago.edu>" date="2989611521">
Newsgroups: comp.text.sgml
Date: 26 Sep 1994 23:38:41 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep26.233841.22334@midway.uchicago.edu>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com>
Subject: Re: Multilingual HTML, SGML documents?

[Dan Connolly]

|   But there is no specified way to write a document in Russian, let along
|   Japanese.  This issue must be addressed _very_ soon.

There are Mosaic clients specially made (read: hacked) to autodetect which
of the ISO 8859 charsets a doc is using (one of which includes Cyrillic,
another Greek, another Hebrew, another Arabic, etc.).  This is not a good
solution, but the very fact that they exist testifies to some demand.

|   	Problem 2: How do I use writing systems X, Y, and Z in
|   	the same HTML document?
|   
|   It's not clear to me that HTML should be enhance to include this
|   feature.  Certainly we need a document format in which multiple
|   languages are expressible.  But that feature will not be used widely
|   enough to require support in all WWW clients, as is the case for HTML.

Every country with its own distinct script would, probably, like to post
bilingual HTML menus.  Most can't now.  For an example of one - one you
won't be able to read if your default fonts are 8859-1, check out
http://www.huji.ac.il/WWW_DIR/default.html.

If you would like to benefit from the wisdom of someone who's spend what
seems like a long time (in net terms), check the work being done by Michael
Sperberg-McQueen of the Text Encoding Initiative (try
ftp://sgml1.ex.ac.uk/tei/p3/doc/).  Here's a section from the file
ftp://sgml1.ex.ac.uk/tei/p3/doc/p3front.doc:

--------------------- 
 
   The impetus for the project came from the humanities computing
community, which sought a common encoding scheme for complex textual
structures in order to reduce the diversity of existing encoding practices,
simplify processing by machine, and encourage the sharing of electronic
texts.  It soon became apparent that a sufficiently flexible scheme could
provide solutions for text encoding problems generally.  The scope of the
TEI was therefore broadened to meet the varied encoding requirements of any
discipline or application.  Thus, the TEI became the only systematized
attempt to develop a fully general text encoding model and set of encoding
conventions based upon it, suitable for processing and analysis of any type
of text, in any language, and intended to serve the increasing range of
existing (and potential) applications and use.
 
   What is published here is a major milestone in this effort. It provides
a single, coherent framework for all kinds of text encoding which is
hardware-, software- and application-independent. Within this framework, it
specifies encoding conventions for a number of key text types and
features. The ongoing work of the TEI is to extend the scheme presented
here to cover additional text types and features, as well as to continue to
refine its encoding recommendations on the basis of extensive experience
with their actual application and use.

---------------------

The TEI archive has all kinds of DTDs in it.

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<CONNOLLY.94Sep26214912@austin2.hal.com>" date="2989622952">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 02:49:12 UT
From: Dan Connolly \<connolly@hal.com>
Organization: HaL Computer Systems, Inc.
Message-ID: \<CONNOLLY.94Sep26214912@austin2.hal.com>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com> <1994Sep26.233841.22334@midway.uchicago.edu>
Subject: Re: Multilingual HTML, SGML documents?

[Richard L. Goerwitz]

|   connolly@ulua.hal.com (Dan Connolly) writes:
|   >
|   >But there is no specified way to write a document in Russian, let
|   >along Japanese. This issue must be addressed _very_ soon.
|
|   There are Mosaic clients specially made (read: hacked) to autodetect
|   which of the ISO 8859 charsets a doc is using (one of which includes
|   Cyrillic, another Greek, another Hebrew, another Arabic, etc.).  This
|   is not a good solution, but the very fact that they exist testifies to
|   some demand.

Yes I am familiar with this.  For those who are not, see:

	"Japanese Encoding Methods"
	http://www.ntt.jp/japan/note-on-JP/encoding.html

As you say, their solution is something of a "hack."  In particular, It's
not clear to me that the HTML documents using shift-JIS encoding are
conforming SGML documents.  If anyone could present an argument that shows
that they are or that they are not, I would be greatful.

[Dan Connolly]

|	Problem 2: How do I use writing systems X, Y, and Z in
|	the same HTML document?
|   
|   It's not clear to me that HTML should be enhance to include this
|   feature.  Certainly we need a document format in which multiple
|   languages are expressible.  But that feature will not be used widely
|   enough to require support in all WWW clients, as is the case for HTML.

[Richard L. Goerwitz]

|   Every country with its own distinct script would, probably, like to
|   post bilingual HTML menus.  Most can't now.  For an example of one -
|   one you won't be able to read if your default fonts are 8859-1, check
|   out http://www.huji.ac.il/WWW_DIR/default.html.
|
|   If you would like to benefit from the wisdom of someone who's spend
|   what seems like a long time (in net terms), check the work being done
|   by Michael Sperberg-McQueen of the Text Encoding Initiative (try
|   ftp://sgml1.ex.ac.uk/tei/p3/doc/).  Here's a section from the file
|   ftp://sgml1.ex.ac.uk/tei/p3/doc/p3front.doc:

I am also familiar with TEI.  But from what I read about writing systems, a
software system to deal with them would be quite complex, and it's not
clear that it would have sufficient performance for interative use.

Again, if someone would provide specific examples and experiences, along
with an explanation of how to apply SGML character set mechanisms, I'd be
greatful.

Dan
-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<1994Sep27.142307.2026@midway.uchicago.edu>" date="2989664587">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 14:23:07 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep27.142307.2026@midway.uchicago.edu>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com> <1994Sep26.233841.22334@midway.uchicago.edu> \<CONNOLLY.94Sep26214912@austin2.hal.com>
Subject: Re: Multilingual HTML, SGML documents?

[Dan Connolly]

|   I am also familiar with TEI.  But from what I read about writing
|   systems, a software system to deal with them would be quite complex,
|   and it's not clear that it would have sufficient performance for
|   interative use.

Wish I could be of more help.  Just a quick note, though: Word processors
of this sort have been available for micros for some time now.  This
includes bilingual ones -- Arabic-English, Chinese-English, Hebrew-English,
and so on, as well as multilingual ones, like Nisus for the Mac and MLS for
the Peecee.  Nisus, I'm told, is great.  I use MLS under OS/2 all the time,
and it's fine -- even when dealing with documents carrying Greek, Hebrew,
Arabic, IPA, English, and other fonts/wordwraps/languages all at once.  It
is impossible to write anything technical in my field without such
capabilities.  This is true of many philological, historical, and
linguistic disciplines.

As I've begun to discover, though, you are pretty much familiar with all
these things.  So perhaps my comments are superfluous....

Regarding DTDs, TEI proposals on how to encode multiple languages were
floated as far back as '90, but they didn't look terribly viable to me
then.  Has the situation improved?

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<9409271528.AA20622@source.asset.com>" date="2989668509">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 15:28:29 UT
From: "Claude L. Bullard" \<bullardc@source.asset.com>
Message-ID: <9409271528.AA20622@source.asset.com>
References: <9409261828.AA22050@source.asset.com>
Subject: Re: Metafile for Interactive Documents (MID)

After I posted the notification of the completion of the MID design effort
on the CTS, the MID sponsor requested that I also post a brief overview of
the MID and aspects of its design that changed after the Vancouver meeting.
This information is posted here for parties who are involved in or are
anticipating projects involving the U.S. DoD specifications for Interactive
Electronic Technical Manuals (IETMs), of which the two of most concern at
this time are:

o  MIL-M-87268 - (colloquially, the user interface)

o  MIL-D-87269 - (colloquially, the technical information data base)

The Document Type Definition of the Metafile for Interactive Documents
(MID) bridges the two MIL specifications cited above in that it specifies
the deliverable run time format for IETMs.  Features of this DTD include:

o  Element Types that specify user interface objects (e.g., titlebars,
   panes, message area, menus, popups, etc.)

o  Element Types that use fundamental HyTime location and linking forms as
   models and for which the processing semantic is defined (e.g., goto,
   gosub, etc.)

o  Element types for display primitives (text, graphics, tables, etc) which
   can be *hardened* (included inline) or returned by HyQ query (soft
   links).  Note, the semantic of the term "primitive" is derived from
   MIL-D-87269 concepts and should not be related to other computer science
   semantics for this term.

o  Element Types that define scripts for display of user interface types
   and content primitives, dynamic hyperlinking, user interaction, queries,
   declarations of local and global variables and user-defined functions,
   etc.  Scoping rules are also defined.

The structures of the MID DTD have been organized to enable the various
substructures to evolve independently.  That is, if a different interface
should be required, it will be possible to modify the type definition
without serious modifications to the other structures.  This is, of course,
true of most DTDs wherein modularity is a high priority.  Given the rapid
advance of features in current windowing environments, it is a prerequisite
for the MID.

The TAG article includes references to the MID Hypermedia Scripting
Language (MID-HSL) which enables the user to define a complete
object-oriented environment for hypermedia.  While some of the MID-HSL
scripting features are retained in the final MID DTD, it was decided to
remove the object-oriented class definition features from the DTD in
favor of more specific element types that rigorously define the required
objects as SGML element types.  This was done to simplify the application
of the DTD by the target audience and to remove the requirement for the use
of public texts.  This also reduces the expense of implementing the MID and
aligns it with current industry practices.

The MID DTD is a valid design for the application of HyTime/SGML.  HyTime
location and addressing element types have been retained to enable a
MID-capable presentation system to use information encoded in multiple SGML
DTDs (MIL-D-87269, HTML and others) and/or non-SGML encodings (e.g., CGM).
It is possible to embed queries in the MID instance that access external
data bases, or to *harden* the MID instance and include the same
information directly in the display primitives.  This allows data to be
transported without the deletion of structural or non-display oriented
meta-information as can be incurred by "lossy translation".  A user can,
alternatively, optimize the instance for run-time execution thus allowing
the contracting parties to determine the preferred method by the mission of
the instance.

Translation, interpretation and compilation are all valid process options
for MID instances.

Documenting and testing the MID design has begun.  It is anticipated that a
prototype implementation of the MID will be demonstrated at CALS '94.

Len Bullard
MID Team Chairman and Document Editor
Unisys Corporation
</message>
<message id="<1994Sep27.154444.16794@ast.saic.com>" date="2989669484">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 15:44:44 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep27.154444.16794@ast.saic.com>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com>
Subject: Re: Multilingual HTML, SGML documents?

[Dan Connolly]

|   There is a lot of pressure to be able to represent documents in various
|   languages in HTML.
|   
|   There are two problems to address:
|   
|   	Problem 1: How do I use writing system X in HTML?
|   
|   HTML is specified to include ISO8859-1, so the western European
|   languages are representable.
|   
|   But there is no specified way to write a document in Russian, let along
|   Japanese.  This issue must be addressed _very_ soon.
|   
|   Would someone care to post a complete SGML document -- declaration,
|   prologue, and instance -- that uses cyrillic(sp?) data characters?
|   (i.e., SDATA entities don't count.)  I assume the SGML declaration
|   would be different from the one used for HTML in a way that reflects
|   the different document character set, no?  Does sgmls support the
|   resulting SGML declaration?

This is nonsense!  Why doesn't SDATA count?  ISO-cyr1 and ISO-cry2 are
defined in terms of SDATA and so is ISO-lat1 for that matter.  Just what is
it that you are asking for?  I could certainly post a complete Russian SGML
document instance using these ISO Cyrillic font entities, but somehow I
sense that's not what you want.  Correct?

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<19940927T183458Z.erik@naggum.no>" date="2989679698">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 18:34:58 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940927T183458Z.erik@naggum.no>
Subject: EMPTY proposal

I have thought a bit about the various arguments against EMPTY and the
forbidden end tag, and I think I may have found one or two ways out.

(1) make an end-tag legal.  also make it so that if the start-tag is not
    immediately followed by an end-tag for the same element, its end-tag is
    inferred as with other tag omission.

(2) allow an empty content model `()', which would not allow any content,
    but would otherwise look like an ordinary element.  this would also be
    closed automatically through omitted tag minimization, but with the
    difference that now the end-tag would be required in minimal documents.

neither of these address the CONREF attribute default value, but that is
perhaps something that will never be used in settings where EMPTY is a
problem?  if so, we may have a solution that would treat the omitted end
tag for empty elements (which ESIS says should be supplied by the parser!)
as minimization.

now, to continue to let conforming documents remain conforming, we would
need to make certain that even in minimal SGML documents, it is legal to
omit the end-tag, but not recommended, and validating parsers should warn
(or perhaps given an error) if new documents miss the end-tag, while
non-validating parsers should survive omitted end-tags.  old parsers will
not be able to handle new documents, but this is less of a requirement.

would this be a reasonable compromise?

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<1994Sep27.212411.22933@ast.saic.com>" date="2989689851">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 21:24:11 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep27.212411.22933@ast.saic.com>
Subject: Jobs in Southern Ca.

As of Oct. 7, 1994, I shall become jobless.  If anyone knows of any SGML
DTD/FOSI work in the San Diego area, I would appreciate any leads.

Thanks
-- Bob Agnew

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<CONNOLLY.94Sep27175631@ulua.hal.com>" date="2989695391">
Newsgroups: comp.text.sgml
Date: 27 Sep 1994 22:56:31 UT
From: Dan Connolly \<connolly@ulua.hal.com>
Organization: HaL Software Systems, Inc.
Message-ID: \<CONNOLLY.94Sep27175631@ulua.hal.com>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com> <1994Sep27.154444.16794@ast.saic.com>
Subject: Re: Multilingual HTML, SGML documents?

[Dan Connolly]

|   Would someone care to post a complete SGML document -- declaration,
|   prologue, and instance -- that uses cyrillic(sp?) data characters?
|   (i.e., SDATA entities don't count.)  I assume the SGML declaration
|   would be different from the one used for HTML in a way that reflects
|   the different document character set, no?  Does sgmls support the
|   resulting SGML declaration?

[Bob Agnew]

|   This is nonsense! Why doesn't SDATA count?

Because SDATA, as far as I understand, means "translate as appropriate to
your local system."  So you don't gain any expressive power with SDATA
entities: they reduce to normal text entities, or CDATA entities, or
external data entities -- but on a per-system basis.

Or is this not the case?  Is there any difference, visible to a
structure-controlled application, between the following two documents?

(1)	\<!doctype test [
	\<!element test - - ANY>
	\<!entity abc SDATA "abc">
	]>
	\<test>\&abc;

(2)	\<!doctype test [
	\<!element test - - ANY>
	\<!entity abc CDATA "abc">
	]>
	\<test>\&abc;

If there is a difference, then I suppose it's possible to represent
characters outside the document character set as sequence of data
characters in an SDATA entity.  There are certain performance issues to
deal with, but I guess it's workable.

|   ISO-cyr1 and ISO-cry2 are defined in terms of SDATA and so is ISO-lat1
|   for that matter.

The way I understand it, only the names are specified.  SDATA is a way of
saying "we don't specify what the entities resolve to.  That's your
problem."

|   Just what is it that you are asking for?

I'm asking for an interface between a parser and an application that
supports multilanguage documents, I guess.  I'm not as interested in what
the markup actually looks like as what that markup represents to the
application.

|   I could certainll post a complete Russian SGML document instance using
|   these ISO Cyrillic font entities, but somehow I sense that's not what
|   you want.  Correct?

Well, it's what I asked for, I guess.  If you could post it and explain how
that document is consumed by some conforming application, I'd appreciate
it.  I'd prefer an application that's actually been developed and deployed,
but a hypothetical application is OK, as long as it plays by all the rules.

I'd actually prefer to see an example that didn't involve an entity
reference for every character.  I can't imagine folks would use a system
like that.

Dan
-- 
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project   +1 512 834 9962 x5010
\<connolly@hal.com>                   http://www.hal.com/%7Econnolly/index.html
</message>
<message id="<1994Sep28.003137.18225@sq.sq.com>" date="2989701097">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 00:31:37 UT
From: "Liam R. E. Quin" \<lee@sq.sq.com>
Organization: SoftQuad Inc., Toronto, Canada
Message-ID: <1994Sep28.003137.18225@sq.sq.com>
References: <19940921T002950Z.erik@naggum.no> <9409221625.AA17721@mercury> \<Cwoz3v.1K6@cogsci.ed.ac.uk>
Subject: Re: Parsing EMPTY elements.

Just a brief response to one of Steve's points... which could be read as
implying that SGML editors have to be slow.

[Steve Finch]

|   One very important concern, however, is what happens when element
|   structure is manipulated if a DTD with a non trivial content model is
|   in place.  To insert or delete an element in the presence of a DTD
|   requires the ability to know that the manipulation preserves
|   conformance, or at least the checking of conformance after the
|   manipulation has been done.

yes...

|   This is slow, difficult, and restricts the sorts of manipulations which
|   can be done more than is the case that if the ESIS isomorphic option is
|   used

Not necessarily.  If this was always true, our Author/Editor product would
have a hard time.  Our RulesBuilder product does a lot of the work for us,
building a `rules file' that lets Author/Editor know which elements are
valid at any given point, and a few other useful things.  The trick is not
to have to validate the whole document when only a part of it is changed.
The other side of this is being flexible enough to allow users to get work
done in an intuitive way.

I'm not going to give away the Author/Editor internals :-), but here is an
example that might be illustrative.  I will give a paper at the upcoming
WWW conference on writing an HTML editor that touches on some of these
issues, although I don't know if it will be very interesting to many people
in this group.


if you are creating a new document, you might have to insert several
required elements -- e.g., if a content model uses
    \<!ELement Person - -
	(FirstName & LastName & MiddleName & (ShoeSize|ReligiousOrder))
    >

the editor doesn't know which order you want the names in, and whether you
want ShoeSize or ReligiousOrder; it can't easily insert a template, and
even if it did, you must be able to move the elements around and to change
ShoeSize to ReligiousOrder.

Author/Editor lets you create the element without content, and then helps
you to add the elements inside it.

If you turn Rules Checking off, you can have both ShoeSize and
ReligiousOrder there until you delete the one you don't want (you could
also use Change Element Type, of course, to turn one into the other).  When
you do Validate, you'll be moaned at appropriately.  When Rules Checking is
on, you will be allowed to have `unfinished' elements, but not elements in
places where they will never be allowed.  In normal editing mode, though,
you can delete LastName and maybe paste it elsewhere.

[Peter, if you're listening, by all means correct me if I am not giving a
 clear example!]


What I'm saying is that designing an editor needs more than simply `does
this document conform to the DTD?', and you have to be able to do it
quickly.

But this can be done, as you can see -- Author/Editor is of comparable
speed to some of the commercial word processors we have around here, and
I'd say it's faster than some of them.  Even with Rules Checking on.

Let's not give the world a chance to think SGML won't solve all their
problems really quickly.  It won't solve all their problems, but it needn't
be slow.

:-)

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/
</message>
<message id="<kimber.85.0014A794@passage.com>" date="2989705148">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 01:39:08 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.85.0014A794@passage.com>
References: <19940927T183458Z.erik@naggum.no>
Subject: Re: EMPTY proposal

[Erik Naggum]

|   I have thought a bit about the various arguments against EMPTY and the
|   forbidden end tag, and I think I may have found one or two ways out.
|
|   (1) make an end-tag legal.  also make it so that if the start-tag is
|       not immediately followed by an end-tag for the same element, its
|       end-tag is inferred as with other tag omission.
|
|   (2) allow an empty content model `()', which would not allow any
|       content, but would otherwise look like an ordinary element.  this
|       would also be closed automatically through omitted tag minimiation,
|       but with the difference that now the end-tag would be required in
|       minimal documents.

I prefer solution 1.  While I understand the logic behind the forbidden end
tag, I think it is a case of over completeness that makes things more
difficult than they need to be.  As Erik points out, allowing the end tag
(but not requiring it) would make ad-hoc parsing more reliable for
normalized instances and SGML parsing would still not have a problem for
EMPTY elements since they're declared as EMPTY.

The problem I have with solution 2 is that it is difficult to determine if
"()" is the intent or just a typo, while the EMPTY keyword is completely
unambiguous (this is the same logic behind the rule that omissible start
tags cannot be omitted for elements with empty content).  Remember: SGML is
for humans.

It's also critical that parsers indicate in some way that a given element
instance is in fact empty, whether because its declared content is EMPTY or
because a CONREF attribute was specified.  Otherwise, processors must have
a priori knowledge of the element and attribute declarations in order to
know which elements are EMPTY and which attributes are CONREF.

-- 
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<Y141951.940928.M@ozemail.com.au>" date="2989714799">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 04:19:59 UT
From: Michael Harmer \<mharmer@ozemail.com.au>
Organization: OzEmail Pty Ltd
Message-ID: \<Y141951.940928.M@ozemail.com.au>
Subject: What is "Markup" in German???

Does anyone know the term-of-art for "Markup" in German.  I need it to help
a non-technical translator who is preparing some documents for us.

Also if anyone knows the terms-of-art for "business process automation" or
"business process engineering" in German I would be eternally grateful.

Thanks,
Michael
</message>
<message id="<p.kerr-2809941654060001@130.216.90.127>" date="2989716846">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 04:54:06 UT
From: Peter Kerr \<p.kerr@auckland.ac.nz>
Organization: School of Music University of Auckland
Message-ID: \<p.kerr-2809941654060001@130.216.90.127>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com> <1994Sep26.233841.22334@midway.uchicago.edu> \<CONNOLLY.94Sep26214912@austin2.hal.com>
Subject: Re: Multilingual HTML, SGML documents?

[Dan Connolly]

|   Problem 2: How do I use writing systems X, Y, and Z in the same HTML
|   document?
|   
|   It's not clear to me that HTML should be enhance to include this
|   feature.  Certainly we need a document format in which multiple
|   languages are expressible.  But that feature will not be used widely
|   enough to require support in all WWW clients, as is the case for HTML.

Just remember there's 3 billion people out there who don't speak English.
And many of them are multilingual in more than one of the ISO 8859 groups.
And if we don't get systems ready for them they're gonna pour off those
slip roads and cause one hell of a traffic jam on this InfoBahn.

-- 
Peter Kerr                             bodger
School of Music                        chandler
University of Auckland                 neo-Luddite
</message>
<message id="<1994Sep28.133142.19437@calspan.com>" date="2989747902">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 13:31:42 UT
From: Matthew Stringer \<stringer@calspan.com>
Organization: Calspan Advanced Technology Center
Message-ID: <1994Sep28.133142.19437@calspan.com>
Subject: Attempting to aquire documents

In continuing my evaluations, I need test document instances for the ISO
DTDs, most especially for the book DTD (preferably using tables) and for
the math DTD.  A point in the right direction would be appreciated.

--
*------------------------------------------------------------------------------
*- Matthew S. Stringer   Software Engineer   Calspan Advanced Technology Center
*- Business related email - 	stringer@calspan.com   voice-(716)632-7500x5119
*- Personal email - 		stringer@cs.buffalo.edu       fax-(716)631-6722
*- The opinions stated here are my own and do not reflect on my employer.
</message>
<message id="<26645.9409281333@exua.exeter.ac.uk>" date="2989747981">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 13:33:01 UT
From: Michael G. Popham \<MGPopham@exua.ex.ac.uk>
Message-ID: <26645.9409281333@exua.exeter.ac.uk>
Subject: Changes to anonymous ftp server at Exeter, UK

===============================================================
        IMPORTANT NOTICE TO ALL USERS OF sgml1.ex.ac.uk
===============================================================

The machine sgml1.ex.ac.uk will shortly be REMOVED.

The files in the ftp archive on sgml1.ex.ac.uk have been moved to a new
machine: info.ex.ac.uk [144.173.6.13]

The files are still accessible on sgml1.ex.ac.uk to anonymous ftp
users. They can be found under the directory pub/SGML/

Anonymous ftp users are STRONGLY RECOMMENDED to connect directly to
info.ex.ac.uk:pub/SGML/ in future, as sgml1.ex.ac.uk may be removed at
short notice.

Apologies for any inconvenience this may cause.


Michael Popham
-- 
SGML Project - C.D.O                    Email:M.G.Popham@exeter.ac.uk
IT Services - Laver Building            Phone:+44 392 263946
North Park Road, University of Exeter   Fax:  +44 392 211630
Exeter EX4 4QE, United Kingdom
</message>
<message id="<36bsc7$b3@doc.cs.nyu.edu>" date="2989749063">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 13:51:03 UT
From: Robert Ducharme \<m-rd0107@cs.nyu.edu>
Organization: Courant Institute of Mathematical Sciences
Message-ID: <36bsc7$b3@doc.cs.nyu.edu>
Subject: TEI FAQ?

Is there a FAQ for the Text Encoding Initiative?  Where can I find it?

-- 
                           ___
Thanks,                   /\\  \\
                         | :\\( |  Help stamp out emoticons.
Bob DuCharme              \\__\\/
bobducharme@acm.org     (redirected to m-rd0107@cs.nyu.edu)
CompuServe 72441,3003 (not checked much; best for binaries)
</message>
<message id="<28212.9409281411@exua.exeter.ac.uk>" date="2989750260">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 14:11:00 UT
From: Michael G. Popham \<MGPopham@exua.ex.ac.uk>
Message-ID: <28212.9409281411@exua.exeter.ac.uk>
Subject: SGML Users' Group releases new materials

The following materials have been made available to the SGML Users' Group
for distribution to the community:

* YASP -- the Yorktown SGML Parser (known colloquially as "yet another SGML
  parser").  The YASP distribution includes documentation (in the file
  yasp125p.doc), which states: "The SGML Standard (ISO 8879:1986) assumes
  that a conforming SGML parser processes a document sequentially, while
  providing information about the structure of the document to the
  application.  The basic set of information a parser must provide is known
  as the Element Structure Information Set(ESIS).  The YASP interface has
  been designed to accomplish this main role, furnishing to the application
  NOTIFICATIONS of SGML events conforming to the ESIS.  However the YASP
  interface goes beyond this requirement, and can provide more information
  than that needed to achieve SGML conformance."

* The harmonized CTI/Sema test suite, released to the SGML Users' Group by
  the Graphic Communications Association Research Institute (GCARI).

The files are available via anonymous ftp to info.ex.ac.uk [144.173.6.13]
in pub/SGML/YASP and pub/SGML/CTI-SEMA.

Both sets of materials are accompanied by warranties, and users are
strongly urged to consult these. All materials are supplied on an "as is"
basis, and the SGML Users' Group, GCARI, and Exeter University will NOT
provide any support OR accept any liability for them.

The files on the disks sent to me were packed using pkzip, and the pkunzip
utility has been included in each directory to help with unpacking.  (In
the case of the CTI/Sema test suite, remember to set the -d switch to
pkunzip if you wish to preserve the original directory structure).

Information about the SGML Users' Group can be obtained from:

		Gaynor West
		P.O.Box 361
		Swindon
		Wiltshire SN5 7BF
		United Kingdom

		Phone: +44 793 512 515
		Fax: +44 793 512 516
		Email: dpsl!gew@visionware.co.uk

Michael Popham
-- 
SGML Project - C.D.O                    Email:M.G.Popham@exeter.ac.uk
IT Services - Laver Building            Phone:+44 392 263946
North Park Road, University of Exeter   Fax:  +44 392 211630
Exeter EX4 4QE, United Kingdom
</message>
<message id="<1994Sep28.142556.13592@midway.uchicago.edu>" date="2989751156">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 14:25:56 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep28.142556.13592@midway.uchicago.edu>
References: <1994Sep26.233841.22334@midway.uchicago.edu> \<CONNOLLY.94Sep26214912@austin2.hal.com> \<p.kerr-2809941654060001@130.216.90.127>
Subject: Re: Multilingual HTML, SGML documents?

[Peter Kerr]

|   Just remember there's 3 billion people out there who don't speak
|   English.  And many of them are multilingual in more than one of the ISO
|   8859 groups.  And if we don't get systems ready for them they're gonna
|   pour off those slip roads and cause one hell of a traffic jam on this
|   InfoBahn.

Hit the nail right on the head.  Remember, it's very difficult for
Americans to understand this issue, not because we're stupid, but because
we live in a huge and largely monolingual culture.  Spanish is creeping in
as America "Latinizes" (an interesting process -- most of us don't realize
that rodeos, cowboy hats, spurrs, and so on are derivatives of the
Latin-American culture).  Most Americans, though, just aren't terribly
aware of the multilinguality that characterizes much of the world's
population.

Even Americans, though, can surely appreciate bilingual menus, multilingual
warning labels, quotations from foreign documents where the original
wording is critical, and so on.  Most Americans -- on the whole a religious
bunch -- know that their Bible is written in Hebrew, Greek, and Aramaic,
and some know that the Aramaic parts are interleaved with the Hebrew.
Others surely know about nations like Canada, which are officially
bilingual, where everything has to be done in French and English.  This
sort of problem gets worse in places like India, which uses English for
lots of technical and government business, but switches to local languages
often as well.  And, of course, there are Islamic contries where Arabic is
frequently used, but is not the vernacular.  We also have small countries
like the Netherlands, and countries now covering several tribal and
cultural areas -- where virtually everyone knows at least one other
language....

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<36bvg6$n9e@kaa.heidelbg.ibm.com>" date="2989752262">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 14:44:22 UT
From: Christoph Altenhofen \<caltenhofen@vnet.ibm.com>
Organization: IBM Germany, European Networking Center, Heidelberg
Message-ID: <36bvg6$n9e@kaa.heidelbg.ibm.com>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com> <1994Sep27.154444.16794@ast.saic.com> \<CONNOLLY.94Sep27175631@ulua.hal.com>
Subject: Re: Multilingual HTML, SGML documents?

[Bob Agnew]

|   Just what is it that you are asking for?

[Dan Connolly]

|   I'm asking for an interface between a parser and an application that
|   supports multi-language documents, I guess.  I'm not as interested in
|   what the markup actually looks like as what that markup represents to
|   the application.

Every application should be able to handle the characters of the language,
where the application is sited.  Here in Germany, there are several
characters, called "umlaut", that are beyond the American character set.
But they are included into the ISOLAT1 character set (which is included
into HTML for example).  Now, no author wants to insert the entity refs,
for example "\&auml;" for the character "ae", by hand or by a pull down
menu, because they are usual characters with average use.  He wants to
insert them as all other characters -- by pressing the according key on the
keyboard.  The conversion into the SGML representation, strongly
recommended to be able to exchange the document to another person, should
then to be done automatically.

Unfortunately, up to now, some products don't work this way.  Mostly
because the American developers of such products didn't see the necessity
of such an behavior.  (In English, there are no "unusual" characters that
have to be inserted in dozens of times during authoring a document.)

But one problem of the multi-language document stays valid -- if the
character map of the users system can only handle 255 characters, there
will be less possibility to display all characters of a document with mixed
language characters, say Cyrillic and Japanese, correctly.  Up to now, I'm
only aware of text processing systems, as WordPerfect, that have own
character mapping tables to handle the display of lots of characters.  So,
for example, loading an SGML document into WP Intellitag converts the
entity references of the characters into the WP character representation
that can be displayed afterwards in WP.

[Bob Agnew]

|   I could certainly post a complete Russian SGML document instance using
|   these ISO Cyrillic font entities, but somehow I sense that's not what
|   you want.  Correct?

[Dan Connolly]

|   Well, it's what I asked for, I guess.  If you could post it and explain
|   how that document is consumed by some conforming application, I'd
|   appreciate it.  I'd prefer an application that's actually been
|   developed and deployed, but a hypothetical application is OK, as long
|   as it plays by all the rules.
|   
|   I'd actually prefer to see an example that didn't involve an entity
|   reference for every character.  I can't imagine folks would use a
|   system like that.

As I mentioned above, if the characters belong to the character map you use
for your keyboard, no author should become aware of the conversion of the
characters he inserts into the SGML document by usual keypress, if he uses
an SGML editor.

For all other characters the insertion is awkward in every system.

Christoph
-- 
* Christoph Altenhofen      /    IBM  *  European Networking Center
*                           \\Vangerowstr. 18  *  D-69115 Heidelberg
*  Tel.: +6221 / 59 - 4503   \\_____________                 Germany
*  e-mail : CALTENHOFEN at VNET.IBM.COM    \\______________________/
*           christo@heidelbg.ibm.com
*  X-400  : C=DE;A=IBMX400;P=IBMMAIL;S=ALTENHOFEN;G=ALTENHC
</message>
<message id="<Pine.3.05.9409290046.C13027-c100000@chopin>" date="2989753006">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 14:56:46 UT
From: Rick Jelliffe \<ricko@ee.uts.edu.au>
Message-ID: \<Pine.3.05.9409290046.C13027-c100000@chopin>
Summary: An attempted subset of SGML, easy to understand and implement
Subject: miniSGML: grammar and rationale (1/2)

MiniSGML -- GOALS, METHOD, USES
-------------------------------

There have been recent discussion about the various infelicities of SGML.
I commented previously that I thought the major problem is that it is too
hard to understand (for people wanting to produce SGML code) and it is too
hard to implement (for people wanting to parse the code).

I have made up a simple language called miniSGML. Its goals are:

* subset of SGML;
* single page grammar;
* as full a subset as possible;
* simple to understand and implement.

The last goal was most important.  The basic idea is that, as everyone in
the field will attest, SGML does not represent everything needed: everyone
develops conventions outside of (& on top of) SGML.  For example, the
Interleaf IP we use completely ignores all newlines in data content: if you
want a space between any words you must put them in, even if on the words
already have a newline separator (a good idea, btw IMHO).

METHOD
------

The method I used was is:

* assume the data is valid, normalized, INCLUDEd and general entities fully
  expanded SGML;
* so not use PUBLIC identifiers;
* assume all SGML features are NO, all lengths and quantities allow big
  names, and all checking is off;
* use concrete reference syntax;
* ignore any bits of SGML I have never or rarely used;
* if SGML allows two ways of doing things, only allow one;
* make rules to deal with all other information kept in the DTD (see next
  paragraph);
* adopt conventions to try to overcome any other remaining problems.
	
The other information in the DTD is handled this way:

* no EMPTY elements are allowed;
* all attributes are strings;
* any SGML keywords that don't appear in the miniSGML definition can not be
  used.

USES
----

The miniSGML language is designed to be easy for programmers to implement.
Data produced in this format is valid SGML (i.e., you can readily make up a
DTD that parses the same file and presents the same ESIS-type information
to its application).  SGML data produced according to the same conventions
and caveats will be read in by a miniSGML application correctly.

This is a first effort.  Comments are welcome -- I am sure I've made a
mistake somewhere.

I think there are a lot of people who need a 2 page C program to import
marked-up data, but don't need a 50-page one: this is aimed at them, while
attempting to keep away from gratuitous dissimilarities with SGML.  It may
also be useful for people wanting to mark-up structure elements in order to
detect the structure (i.e., as a precursor to implementing SGML).
Implementing miniSGML in a standard macro-based editor like emacs would be
simple also.

RELATION TO SGML
----------------

There is a definite notional SGML declaration to this.  And anyone
producing data in miniSGML will have at least a notional DTD.  They are
well advised to actually make up a real DTD, and try to make their data
completely valid SGML too.  The application programmer must build these
into the application AS THEY HAVE TO IN SGML TOO!

But miniSGML does not read the DTD or SGML declaration: all the information
required to parse the document is in long-hand in the document.

I have built some very large SGML systems: one produces CD-ROMs, books, and
involves over 200, 000 files along the way.  Everything I do in that system
I could also do in miniSGML, and using the same tools (including Omnimark),
though admittedly not as gracefully sometimes.

If anyone is interested, I would think that something on these lines would
be an appropriate annexe to the SGML standard, or even a new standard.
Leave SGML as it is: a big language for big jobs.  But let's also have a
standard 'little language' for the other jobs.

---------------------------------------------------------------------------

MiniSGML  28/9/94
--------

This is a definition of a language, miniSGML, which attempts to be a useful
subset of SGML, yet small enough to implement easily.  No SGML declaration
or DTD are required by the parser.

The rules specify:

* input (miniSGML)
* output (data available to the application)
* the mapping between the two.

These rules can be inverted--then they specify:

* input (commands from the application)
* output (miniSGML)
* the mapping between the two.

Note also the conventions and caveats section below. These make 
better conformance with SGML.

PRODUCTIONS
-----------
Notes:
* LHS is non-terminal
* RHS after "::=" is the rule for the text form
* RHS after "=>" is the output tuple (MiniESIS I suppose) that should be
  available to the application. This does not specify an output format,
  merely the minimum data that can be available. The line number of the
  start of input should also be available to the application.
* Literals are encased in matching " or '
* any-character means any character, ws means white-space, nl means
  newline.


document 	::= doctype instance

doctype		::= "\<!DOCTYPE" ws+ gi.name ws+ "SYSTEM" ws+ string ws+ 
			"[" ( entity-dec | comment | ws | null-dec )* "]>" ws+
		=> (type="doctype" gi.name string)

string 		::= ( '"' any-character* '"' ) |
		    ( "'" any-character* "'" )

*.name		::= alpha (alpha | number | "." | "-")*

comment		::= nl? "\<!--" any-character* "-->"
		=> (type="comment" any-character*)

null-dec	::= nl? "\<!>"

entity-dec	::= "\<!ENTITY" ws+ entity.name ws+
			( "SYSTEM" ws+ string ws+ notation.name)|
			( "SDATA" ws+ string) ws+ ">"
		=> (type="system entity" entity.name string notation.name)
	or	=> (type="sdata entity" entity.name string)

pi		::= nl? "\<?" any-character* ">"
		=> (type="pi" any-character* )

instance	::= (ws | comment | null-dec)* element (ws | comment | null-dec)* 

element		::= open-tag contents close-tag

open-tag	::= "<" gi.name attribute* ws+ ">" nl?
		=> (type="open-tag" gi.name) 

attribute	::= ws+ attribute.name "=" string
		=> (type="attribute" string)

close-tag	::= nl? "</" gi.name ws+ ">"
		=> (type="close-tag" gi.name)

contents	::= (comment | pi | null-dec | entity-reference | 
			data | element | char-entity )*

data		::= any-character*
		=> (type="data" any-character*)		

char-entity	::= "&#" digit+ ";"?
		=> (type="char-entity" digit+)

entity-reference ::= "&" entity.name ";"?
		=> (type="entity-reference" entity.name)

CONVENTIONS & CAVEATS
---------------------

MiniSGML accepts any data-characters inside content that are not accepted
by any other rule as data, and makes it available to the application.  This
is dissimilar to SGML, which looks at the DTD to decide.

Applications output-ing data should use the following convention,
to keep close to SGML:

* in the instance, only generate newlines when desired, remembering that
  the rules of the grammar will strip some;
* to make pretty printing better, generate a newline then a space inside
  tags wherever whitespace is possible, e.g.,

    \<para 
     type="x" 
     secur="none"
    >

  This also may allow easier editing or review in a text editor, and
  prevent some line-length problems.
* If the start tag belongs to an inclusion element (in the mental DTD),
  then you should be aware that SGML parsers may strip out an immediately
  preceding newline.
* in the instance data, any opening delimiters (i.e. "<" or "&" or "</"
  followed by an alpha, or "<" followed by "?" or "/" or "!")  should have
  a null declaration interposed (e.g. "B\&J" becomes "B&\<!>J") to prevent
  incorrect markup recognition.

Applications input-ing data should remember:

* If the start tag belongs to an inclusion element (in the mental DTD) then
  you should be aware that SGML parsers may strip out any immediately
  preceding newline.

---------------------------------------------------------------------------

-ricko

-- 
Rick Jelliffe

Electrical Engineering
University of Technology, Sydney
ricko@ee.uts.edu.au

Allette System
Sydney, Australia
ricko@allette.com.au (soon)
</message>
<message id="<19940928T170622Z.erik@naggum.no>" date="2989760782">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 17:06:22 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940928T170622Z.erik@naggum.no>
References: <19940927T183458Z.erik@naggum.no> \<kimber.85.0014A794@passage.com>
Subject: Re: EMPTY proposal

[W. Eliot Kimber]

|   The problem I have with solution 2 is that it is difficult to determine
|   if "()" is the intent or just a typo, while the EMPTY keyword is
|   completely unambiguous (this is the same logic behind the rule that
|   omissible start tags cannot be omitted for elements with empty
|   content).  Remember: SGML is for humans.

I don't agree with your argument here.  it shouldn't be harder to determine
whether () is a typo than whether "" is a typo, and there is no special
construct for attributes that you want to be empty string.  I rather like
(2) myself, since it allows us to use a form that uses all the existing
facilities for element content and content models, and would not break old
documents either way.

|   It's also critical that parsers indicate in some way that a given
|   element instance is in fact empty, whether because its declared content
|   is EMPTY or because a CONREF attribute was specified.  Otherwise,
|   processors must have a priori knowledge of the element and attribute
|   declarations in order to know which elements are EMPTY and which
|   attributes are CONREF.

yes!  I agree with this completely.  however, ESIS does not provide this
information (it specifies to provide an end-tag for empty elements), and
many parser builders seem to think that ESIS is useful for the parser/
application interface.  ESIS was intended for conformance testing, and this
seems to be a good idea for conformance testing, which is concerned with
comparing strings as in regression testing.  (now, whether conformance
testing of SGML is a good idea or not is a separate issue.)

what I think we need is a DTD "information set" that we can query or be
served like we can query or be served the instance.  there is information
there that is valuable.  your DTD for DTD's is a cute hack, but does not
quite replace a parser's knowledge of the DTD.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<19940928T170927Z.erik@naggum.no>" date="2989760967">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 17:09:27 UT
From: Erik Naggum \<erik@naggum.no>
Organization: Naggum Software; +47 2295 0313
Message-ID: <19940928T170927Z.erik@naggum.no>
References: <26645.9409281333@exua.exeter.ac.uk>
Subject: Re: Changes to anonymous ftp server at Exeter, UK

[Michael G. Popham]

|   Anonymous ftp users are STRONGLY RECOMMENDED to connect directly to
|   info.ex.ac.uk:pub/SGML/ in future, as sgml1.ex.ac.uk may be removed at
|   short notice.

I have never had any problem talking to sgml1.ex.ac.uk, but now that I
tried to transfer YASP and CTI-SEMA from info.ex.ac.uk, it spuriously died
on me more often than it transferred files.

you will find YASP and CTI-SEMA at ftp.ifi.uio.no (same pathnames) and I
will make an effort to mirror the Exeter SGML archive in the near future.

#\<Erik>
--
Microsoft is not the answer.  Microsoft is the question.  NO is the answer.
</message>
<message id="<9408287807.AA780775286@clink.acad.com>" date="2989766952">
Newsgroups: comp.text.sgml
Path: naggum.no!comp-text-sgml
Approved: erik@naggum.no
Date: 28 Sep 1994 18:49:12 UT
From: Katherine Pagaard \<kpagaard@acad.com>
Message-ID: <9408287807.AA780775286@clink.acad.com>
References: <1994Sep27.212411.22933@ast.saic.com>
Subject: Jobs in Southern Ca.

[Bob Agnew]

|   As of Oct. 7, 1994, I shall become jobless.  If anyone knows of any
|   SGML DTD/FOSI work in the San Diego area, I would appreciate any leads.

The San Diego office of Academic Press may be recruiting for such a
position in November '94.  The salary is certainly not astronomical, but
you are welcome to apply.

Regards,

Katherine M. Pagaard
Academic Press
San Diego, California
kpagaard@acad.com
</message>
<message id="<1994Sep28.185528.8674@sqwest.wimsey.bc.ca>" date="2989767328">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 18:55:28 UT
From: Marcy Thompson \<marcy@sqwest.wimsey.bc.ca>
Organization: SoftQuad Inc., Surrey, B.C. CANADA
Message-ID: <1994Sep28.185528.8674@sqwest.wimsey.bc.ca>
References: <19940927T183458Z.erik@naggum.no> \<kimber.85.0014A794@passage.com>
Subject: Re: EMPTY proposal

[Erik Naggum]

|   (2) allow an empty content model `()', which would not allow any
|       content, but would otherwise look like an ordinary element.  this
|       would also be closed automatically through omitted tag minimiation,
|       but with the difference that now the end-tag would be required in
|       minimal documents.

[W. Eliot Kimber]

|   The problem I have with solution 2 is that it is difficult to determine
|   if "()" is the intent or just a typo, while the EMPTY keyword is
|   completely unambiguous (this is the same logic behind the rule that
|   omissible start tags cannot be omitted for elements with empty
|   content).  Remember: SGML is for humans.

Fascinating argument.  There are things in SGML now that do not conform to
this logic.  For example, if I want to put "" for an attribute.  How can I
tell this is not a typo?

Additionally, people are *always* doing stuff in their DTDs that I wish
were typos.  Consider promiscuous use of inclusion exceptions, or naming
elements things like PA12, PA21 and PA22 where the first is a kind of
paragraph, the second is a kind of footnote and the third is a kind of
title.  (Real example, letters changed to protect the guilty!)

I like this solution myself.  Getting rid of it to protect ourselves
against a possible typo seems too draconian a response to a small problem.

Marcy
-- 
Marcy Thompson		Manager, Education and Training	
  SoftQuad Inc.	  +1 604 585 0079
    marcy@sqwest.wimsey.bc.ca 
</message>
<message id="<36dbio$sj0@news.delphi.com>" date="2989780879">
Newsgroups: comp.text.sgml
Date: 28 Sep 1994 22:41:19 UT
From: Jeffrey McArthur \<j_mcarthur@BIX.com>
Organization: ATLIS Publishing
Message-ID: <36dbio$sj0@news.delphi.com>
References: <19940921T002950Z.erik@naggum.no> <9409221625.AA17721@mercury> \<Cwoz3v.1K6@cogsci.ed.ac.uk> <1994Sep28.003137.18225@sq.sq.com>
Subject: Re: Parsing EMPTY elements.

[Liam R. E. Quin]

|   But this can be done, as you can see -- Author/Editor is of comparable
|   speed to some of the commercial word processors we have around here,
|   and I'd say it's faster than some of them.  Even with Rules Checking
|   on.

That has NOT been our experience.

-- 
    Jeffrey M\\kern-.05em\\raise.5ex\\hbox{\\b c}\\kern-.05emArthur
    a.k.a. Jeffrey McArthur          email: j_mcarthur@bix.com
    phone: +1 301 210 6655
    fax:   +1 301 210 4999
    home:  +1 410 290 6935

The opinions express are mine.  They do not reflect the opinions
of my employer.  My access to the Internet is not paid for by my
employer.
</message>
<message id="<1994Sep29.131228.14050@midway.uchicago.edu>" date="2989833148">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 13:12:28 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep29.131228.14050@midway.uchicago.edu>
References: <1994Sep27.154444.16794@ast.saic.com> \<CONNOLLY.94Sep27175631@ulua.hal.com> <36bvg6$n9e@kaa.heidelbg.ibm.com>
Subject: Re: Multilingual HTML, SGML documents?

[Dan Connolly]

|   I'm asking for an interface between a parser and an application that
|   supports multilanguage documents, I guess.  I'm not as interested in
|   what the markup actually looks like as what that markup represents to
|   the application.

[Christoph Altenhofen]

|   Every application should be able to handle the characters of the
|   language, where the application is sited.

Dan, I'm sure, knows this.  You are talking about localization issues,
though, and he is talking about multilingual documents.  In other words,
I'm trying to say politely that your response has little to do with the
original question.

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<1994Sep29.190640.4605@ast.saic.com>" date="2989854400">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 19:06:40 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep29.190640.4605@ast.saic.com>
References: \<kimber.85.0014A794@passage.com>
Subject: Re: EMPTY proposal

[Erik Naggum]

|   I have thought a bit about the various arguments against EMPTY and the
|   forbidden end tag, and I think I may have found one or two ways out.

[W. Eliot Kimber]

|   It's also critical that parsers indicate in some way that a given
|   element instance is in fact empty, whether because its declared content
|   is EMPTY or because a CONREF attribute was specified.  Otherwise,
|   processors must have a priori knowledge of the element and attribute
|   declarations in order to know which elements are EMPTY and which
|   attributes are CONREF.

But of course EMPTY elements and CONREF attributes are specified in the
declaration subset which is part of every document instance so the a priori
knowledge is right there in the prolog.  I don't see the problem.  When a
start tag is parsed, look up its GI in the parser's DTD symbol table and
check to see if it's declared EMPTY.  Same thing with CONREF.  What's the
pronlem?
 
-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep29.191324.5978@ast.saic.com>" date="2989854804">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 19:13:24 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep29.191324.5978@ast.saic.com>
References: \<Pine.3.05.9409290046.C13027-c100000@chopin>
Subject: Re: miniSGML: grammar and rationale

[Rick Jelliffe]

|   MiniSGML -- GOALS, METHOD, USES
|   -------------------------------
|   
|   There have been recent discussion about the various infelicities of
|   SGML.  I commented previously that I thought the major problem is that
|   it is too hard to understand (for people wanting to produce SGML code)
|   and it is too hard to implement (for people wanting to parse the code).
|   
|   I have made up a simple language called miniSGML. Its goals are:
|   
|   * subset of SGML;
|   * single page grammar;
|   * as full a subset as possible;
|   * simple to understand and implement.

After reviewing your proposal, I suggest that you rename it to "MicroSGML"
or perhaps "NanoSGML".  In your language the Tags have no meaning or
allowable context.

-- 
"One man's syntax is another man's semantics."
</message>
<message id="<1994Sep29.192017.7424@ast.saic.com>" date="2989855217">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 19:20:17 UT
From: Bob Agnew \<agnew@actd.saic.com>
Organization: Science Applications International Corp.
Message-ID: <1994Sep29.192017.7424@ast.saic.com>
References: \<Pine.3.05.9409290046.C13027-c100000@chopin>
Subject: Re: miniSGML: grammar and rationale

[Rick Jelliffe]

|   MiniSGML -- GOALS, METHOD, USES
|   -------------------------------
:
|   Applications output-ing data should use the following convention,
|   to keep close to SGML:
|   
|   * in the instance, only generate newlines when desired, remembering that
|     the rules of the grammar will strip some;
|   * to make pretty printing better, generate a newline then a space inside
|     tags wherever whitespace is possible, e.g.,
|   
|       \<para 
|        type="x" 
|        secur="none"
|    

If you want the application to print a newline, then why not just declare a
\<newline> or \<nl> element in the DTD and tell the Formatting spec (FOSI or
DSSSL) to do newlines when it encounters this element.  This seems to work
very well for me.

-- 
"Electrical Engineer -- Someone who writes system programs in Fortran."
</message>
<message id="<rieger.18.001394A2@colin.muc.de>" date="2989856082">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 19:34:42 UT
From: Wolfgang Rieger \<rieger@colin.muc.de>
Organization: BSE Buero fuer Software-Entwicklung
Message-ID: \<rieger.18.001394A2@colin.muc.de>
References: \<Y141951.940928.M@ozemail.com.au>
Subject: Re: What is "Markup" in German???

[Michael Harmer]

|   Does anyone know the term-of-art for "Markup" in German.  I need it to
|   help a non-technical translator who is preparing some documents for us.

The terms I used in the book I'm currently writing on SGML is "Markierung",
"Markierungen", "Startmarkierung" for start tag and "Endemarkierung"for end
tag.  Also "markieren" for tagging.  As you see, the german terms are
longer, but you can do with fewer words.

The book is to appear at the end of this year (published by Springer).

Regards

Wolfgang

-- 
Wolfgang Rieger
c/o Buero fuer Software-Entwicklung
Frankfurter Ring 193a
80807 Munich
Germany

Tel.: +89 323 19 93	Fax: +89 323 19 93
</message>
<message id="<36f98s$7i3$1@mhadg.production.compuserve.com>" date="2989860572">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 20:49:32 UT
From: John Rogers <72634.2402@CompuServe.COM>
Message-ID: <36f98s$7i3$1@mhadg.production.compuserve.com>
Subject: Wanted: HTML spec/tutorial/etc

Hi!

I'm interested in HTML.  How would I get a copy of a spec or tutorial on
it?

Thanks in advance...
--JR (John Rogers)
72634.2402@CompuServe.com
</message>
<message id="<CwwtEE.Kx9@undergrad.math.uwaterloo.ca>" date="2989861717">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 21:08:37 UT
From: Warren Baird \<wjbaird@undergrad.math.uwaterloo.ca>
Organization: University of Waterloo
Message-ID: \<CwwtEE.Kx9@undergrad.math.uwaterloo.ca>
References: \<Pine.3.05.9409290046.C13027-c100000@chopin>
Subject: Re: miniSGML: grammar and rationale (1/2)

[Rick Jelliffe]

|   I have made up a simple language called miniSGML.  Its goals are:
|   
|   * subset of SGML;
|   * single page grammar;
|   * as full a subset as possible;
|   * simple to understand and implement.

I'm not sure what you mean by your second goal.  Do you mean that there is
the explicit concept of a "page" in miniSGML?

|   * no EMPTY elements are allowed;

Ouch.  This would make the thing unusable for me.  I'm still an SGML
neophyte, but I've found it very useful in a number of cases to create
empty elements and then use attributes to store the needed information.
I've created elements for things like hyperlink destinations, figures,
references, urls, and index entries like this.

Overall, I think it's a good idea to make something like this standard, but
it's going to need some work, I think...

Warren
</message>
<message id="<p.kerr-3009940953440001@130.216.90.127>" date="2989864424">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 21:53:44 UT
From: Peter Kerr \<p.kerr@auckland.ac.nz>
Organization: School of Music University of Auckland
Message-ID: \<p.kerr-3009940953440001@130.216.90.127>
References: \<CONNOLLY.94Sep26154911@ulua.hal.com> <1994Sep27.154444.16794@ast.saic.com> \<CONNOLLY.94Sep27175631@ulua.hal.com> <36bvg6$n9e@kaa.heidelbg.ibm.com>
Subject: Re: Multilingual HTML, SGML documents?

[Christoph Altenhofen]

|   As I mentioned above, if the characters belong to the character map you
|   use for your keyboard, no author should become aware of the conversion
|   of the characters he inserts into the SGML document by usual keypress,
|   if he uses an SGML editor.

What about the reader of the document?  He/she needs the same character
map.  We cannot assume that the reader of any document will automatically
be using reading software which is aware of the character map used by the
author.  It might be embedded in the document, but that's a little messy,
and assumes that the reading software/hardware can handle whatever
character map it is given.

ISO 10646 compliance could be a systematic way to ensure that
authoring/browsing systems work properly.

-- 
Peter Kerr                             bodger
School of Music                        chandler
University of Auckland                 neo-Luddite
</message>
<message id="<1994Sep29.232903.10546@falch.no>" date="2989870143">
Newsgroups: comp.text.sgml
Date: 29 Sep 1994 23:29:03 UT
From: Steve Pepper \<pepper@falch.no>
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1994Sep29.232903.10546@falch.no>
References: <36bvg6$n9e@kaa.heidelbg.ibm.com>
Subject: SGML editors and accented Latin characters

[Christoph Altenhofen]

|   Here in Germany, there are several characters, called "umlaut", that
|   are beyond the American character set.  But they are included into the
|   ISOLAT1 character set (which is included into HTML for example).  Now,
|   no author wants to insert the entity refs, for example "\&auml;" for the
|   character "ae", by hand or by a pull down menu, because they are usual
|   characters with average use.  He wants to insert them as all other
|   characters -- by pressing the according key on the keyboard.  The
|   conversion into the SGML representation, strongly recommended to be
|   able to exchange the document to another person, should then to be done
|   automatically.
|
|   Unfortunately, up to now, some products don't work this way.  Mostly
|   because the American developers of such products didn't see the
|   necessity of such an behavior.

Christoph has mentioned one of my pet peeves, so I'd like to jump in here
to support him.

Vendors of SGML editors, are you listening?  Do you want to sell your
products in Europe - to customers using languages other than English?  Then
get your act sorted out as far as accented and other "non-English" letters
are concerned.

We Europeans want to be able to type our "funny letters" -- say the
Norwegian/Danish \&oslash;, the French \&eacute; or the German \&szlig; --
directly from the keyboard and have them represented as character entity
references.  We don't want them put into the document as 8-bit character --
or at best as numeric character references -- because then we have to do
conversions between different character sets all the time.

Author/Editor, for example (which is otherwise a fine product), doesn't do
this right.  Upon exporting to SGML I can choose to have my \&oslash;
represented as 0xf8 or \&#248; (if I'm using the Windows or UNIX versions),
or 0xbf or \&#191; (Macintosh).  What I _want_ is the \&oslash; itself -- so
that the character appears correctly _whatever_ version of this (or other)
products I choose to use.

From what I know, Incontext and the Adept products do this the right way
(although I'm not sure how adaptable they are -- the functionality should
not be tied to the entities defined in the ISO public entity sets: one day
I may have a good reason to call my \&oslash; something else, like \&oe;).

We are hoping that psgml will soon be updated to accomplish the same thing.
Other vendors, please follow suit...

Regards,

Steve
-- 
</(pepper)steve>                                   pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 2216 3040                                fax +47 2216 2350
                      "Life begins at 0x28"
</message>
<message id="<1994Sep30.024603.29919@midway.uchicago.edu>" date="2989881963">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 02:46:03 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep30.024603.29919@midway.uchicago.edu>
References: \<CONNOLLY.94Sep27175631@ulua.hal.com> <36bvg6$n9e@kaa.heidelbg.ibm.com> \<p.kerr-3009940953440001@130.216.90.127>
Subject: Re: Multilingual HTML, SGML documents?

[Peter Kerr]

|   It might be embedded in the document, but that's a little messy, and
|   assumes that the reading software/hardware can handle whatever
|   character map it is given.
|   
|   ISO 10646 compliance could be a systematic way to ensure that
|   authoring/browsing systems work properly.

In theory, I think you're right, at least in a sense.

But I don't think ISO 10646 is the be all and end all of encoding
standards.  There will always be simple 8-bit systems, and there will
probably be many many 16-bit, 32-bit, and perhaps still other systems.

In fact, what we'll need - for maximum flexibility - is a language at-
tribute *and* an encoding attribute.  Both will need to be embedded in the
document.  Messier than a stiff one-encoding-standard system, but
considerably more practical and flexible over the long haul.

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
<message id="<kimber.86.00025942@passage.com>" date="2989898454">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 07:20:54 UT
From: "W. Eliot Kimber" \<kimber@passage.com>
Organization: Passage Systems, Inc.
Message-ID: \<kimber.86.00025942@passage.com>
References: <28212.9409281411@exua.exeter.ac.uk>
Subject: Re: SGML Users' Group releases new materials

[Michael G. Popham]

|   The following materials have been made available to the SGML Users'
|   Group for distribution to the community:
|
|   * YASP -- the Yorktown SGML Parser (known colloquially as "yet another
|     SGML parser").

I'm very pleased to see YASP finally get into public hands.  The YASP
parser is tremendously robust--it's the core of the IBM SGML Translator
product and was also used in the WriterStation/PM product from Datalogics
(I believe they use SGMLS in the newer WriterStation products, but I could
be wrong).  People who have been trying to use SGMLS and have found it
lacking should give YASP a close look.  It's also pretty fast.  When I was
at IBM it was frustrating that few folks outside IBM had access to this
technology.  I don't think Dr. Goldfarb or James Clark will dispute the
fact that YASP is much more robust code than ARCSGML and SGMLS are.  YASP
should prove much more amenable to many of the problems that SGMLS is not
well suited to, including multi-threaded and re-entrant applications.

--
\<Address HyTime=bibloc>
W. Eliot Kimber (kimber@passage.com) Systems Analyst and HyTime Consultant
Passage Systems, Inc., 9971 Quail Blvd., Suite 903, Austin TX 78758 +1 512 339 1400
465 Fairchild Dr., Suite 201, Mountain View, CA  94043, +1 415 390 0911
\</Address>
</message>
<message id="<Cwxzu5.39L@tc.fluke.COM>" date="2989916708">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 12:25:08 UT
From: Gary Benson \<inc@tc.fluke.COM>
Organization: Fluke Corporation, Everett, WA
Message-ID: \<Cwxzu5.39L@tc.fluke.COM>
References: <1994Sep27.154444.16794@ast.saic.com> \<CONNOLLY.94Sep27175631@ulua.hal.com> <36bvg6$n9e@kaa.heidelbg.ibm.com>
Subject: Re: Multilingual HTML, SGML documents?

[Christoph Altenhofen]

|   Now, no author wants to insert the entity refs, for example "\&auml;"
|   for the character "ae", by hand or by a pull down menue, because they
|   are usual characters with average use.  He wants to insert them as all
|   other characters -- by pressing the according key on the keyboard. The
|   conversion into the SGML representation, strongly recommended to be
|   able to exchange the document to another person, should then to be done
|   automatically.
|   
|   Unfortunately, up to now, some products don't work this way.  Mostly
|   because the american developers of such products didn't see the
|   necessity of such an behavior.  (In english, there are no "unusual"
|   characters that have to be inserted in dozens of times during authoring
|   a document.)

This is an excellent point, and too often overlooked, even in discussions
(like this one) where the whole point is to try to address multilingual
documentation.

There are many possible ways technology could fix this, but so far, the
only solutions proposed are those that come from minds trained in
America...  minds that are not in the habit of asking end users what kind
of solution would make sense in their own environment.

-- 
Gary Benson-_-_-_-_-_-_-_-_-_-inc@tc.fluke.com_-_-_-_-_-_-_-_-_-_-_-_-_-_-

Inventions reached their limit long ago, and I see no hope for further
development.            -Julius Frontinus, 1st century AD
</message>
<message id="<9409301532.AA18908@mercury>" date="2989927976">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 15:32:56 UT
From: Mary Holstege \<holstege@mercury.kset.com>
Message-ID: <9409301532.AA18908@mercury>
Subject: SGMLS on PCs with Microsoft compiler

Has anyone successfully ported the SGMLS parser to PCs using the Microsoft
C++ compiler (or, if not that, the Microsoft Quick C compiler)?  If you
have I'd appreciate any hints or tips for what I need to do to the config
file and makefile to get it working.

Don't flame me; I don't pick the compilers around here.


                -- Mary
                   Holstege@kset.com
-- 
Mary Holstege, Sr. Member of Technical Staff
KnowledgeSet Corporation
555 Ellis Street                    Tel: +1 415 254 5452
Mountain View, CA 94043             FAX: +1 415 254 5451
</message>
<message id="<36hekvINNp3b@afshub.boulder.ibm.com>" date="2989931615">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 16:33:35 UT
From: "Wayne L. Wohler" \<wohler@vnet.ibm.com>
Organization: IBM Information Development Strategy and Tools
Message-ID: <36hekvINNp3b@afshub.boulder.ibm.com>
References: <19940927T183458Z.erik@naggum.no>
Subject: Re: EMPTY proposal

I've missed much of the preceding discussion so I may not be fully
informed.  Based on the discussion I've seen here, I prefer solution 1
since it requires a smaller change to the standard and meets the
requirement as I understand it.

Whichever solution is adopted, it must also address what may appear between
the start and end tags of an empty element.  It isn't as obvious (to me) as
you might think (with my 'shoot-from-the-hip' suggestion:

* inclusions                 No element or data allowed, so no
* pi's                       Yes, PI's are not data
* re, rs, space and other space function chars  Yes
* comments                   Yes
* ignored marked sections    Yes
* 'empty' entity references  not sure, need to refresh my memory on
                             where entity refs are valid
* ...

The philosophy guiding me in my suggestions was that data and elements
would not be valid in the context of an empty element and that an empty
element's content would be parsed as it if were element content, not mixed
content.

(Personal note: I've been missing in action for 2 months,  I'm back now.)

-- 
Wayne L. Wohler                   Internet:  wohler@vnet.ibm.com
Dept G82/025Z                     IBMMAIL:   USIB29WX@IBMMAIL
Information Development Strategy and Tools   Phone: +1 303 924 5943
IBM Corporation
PO Box 1900
Boulder, Colorado  80301-9191

Disclaimer: This posting represents the poster's views, not those of IBM
</message>
<message id="<36hfm9INNp3b@afshub.boulder.ibm.com>" date="2989932681">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 16:51:21 UT
From: "Wayne L. Wohler" \<wohler@vnet.ibm.com>
Organization: IBM Information Development Strategy and Tools
Message-ID: <36hfm9INNp3b@afshub.boulder.ibm.com>
References: <36bvg6$n9e@kaa.heidelbg.ibm.com> <1994Sep29.232903.10546@falch.no>
Subject: Re: SGML editors and accented Latin characters

When I was working on an SGML editor a few years ago, the behavior I
proposed to address this problem was this:

* The editor should provide a means for an application designer to
  associate a character to an entity so that the editor can do the
  following:
* When a key is struck, if a mapping is defined for that character to an
  entity, use that mapping to store the entity reference in the data being
  created.
* When a character is presented, reverse the process by presenting the
  character entity as a character if possible.
* The information always contains the entities for transmission and
  storage, the author can use the keyboard and see the character as it was
  typed, assuming the hardware/operating system support the character in
  question.

If the keyboard doesn't support entry of a particular character (I can't
enter many French and German characters, to say nothing of Eastern European
and Asian characters), the entity mechanisms are still there.  This could
be enhanced by editor provided keyboard mappings that give direct access to
characters not normally entered on the attached keyboard (like accented
characters in my case, or symbols, etc.)  The editor may wish to warn the
author if it cannot insert entity references because the current context
doesn't support them (declared content or marked section CDATA, for
example).

If the display or operating system doesn't support the particular
character, it may be presented as an entity reference or some other
presentation scheme.  You would want the editor to do whatever 'codepage'
changes necessary to get access to the entire repetoire of characters,
including symbols, etc.

The input and presentation should be independent however since, for
example, I can't enter accented characters on my keyboard but my software
and hardware can certainly display them.

The most important part of all this is that the information you've created
is transportable and 'normalized' with regard to character set usage.

Defining the character entities to the editor is a problem.  For DTDs that
define character entities for their authors to use, the administrator who
builds the editor support for a particular DTD can make this happen.  If
the DTD uses a 'standard' set that is already known to the editor, perhaps
shipped with the editor, that task would be very straightforward.

-- 
Wayne L. Wohler                   Internet:  wohler@vnet.ibm.com
Dept G82/025Z                     IBMMAIL:   USIB29WX@IBMMAIL
Information Development Strategy and Tools   Phone: +1 303 924 5943
IBM Corporation
PO Box 1900
Boulder, Colorado  80301-9191

Disclaimer: This posting represents the poster's views, not those of IBM
</message>
<message id="<9409301652.AA34292@source.asset.com>" date="2989932740">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 16:52:20 UT
From: "Claude L. Bullard" \<bullardc@source.asset.com>
Message-ID: <9409301652.AA34292@source.asset.com>
References: <1994Sep26.233841.22334@midway.uchicago.edu> \<CONNOLLY.94Sep26214912@austin2.hal.com> \<p.kerr-2809941654060001@130.216.90.127> <1994Sep28.142556.13592@midway.uchicago.edu>
Subject: Re: Multilingual HTML, SGML documents?

[Richard L. Goerwitz]

|   most of us don't realize that rodeos, cowboy hats, spurs, and so on are
|   derivatives of the Latin-American culture...

I guess if we didn't get through a fourth grade history class or never
watched a day of American television, that could be true.  Otherwise, "The
Cisco Kid is a friend of mine....".

|   Most Americans, though, just aren't terribly aware of the
|   multilinguality that characterizes much of the world's population.

Awareness and preparedness are not the same thing.  I think the lack of
knowledge of other languages deprives one of appreciating the concepts
other languages describe such as the multitude of Arabic words for love.
In that I contend one's thinking patterns are shaped by the concepts of the
language one uses, to some extent, IMHO, one might be culturally deprived.
On the other hand, some Arabic speakers might not know what a "Jones" is in
a sentence such as, "he got da hoop Jones".  Nor might they care... The
line between what is formal and slang in a living language moves constantly
and what one must know depends greatly on one's requirements to apply a
language, i.e., are environmental.  The last time I looked at the IADS
product, they were building an Arabic interface for it. Money talks outside
of a university and sometimes inside it.

Of course, it helps if one speaks the language of money or needs to.

|   Most Americans -- on the whole a religious bunch -- know that their
|   Bible is written in Hebrew, Greek, and Aramaic, and some know that the
|   Aramaic parts are interleaved with the Hebrew.  Others surely know
|   about nations like Canada, which are officially bilingual, where
|   everything has to be done in French and English.

*My_Bible* was originally written in the languages of the Pacific Rim and
the Indian subcontinent.  While that makes me a minority, it doesn't make
me Un-American, unless the Far Right manages to smother the current active
bursts of free thought of the last two years and declares a national
religion.  Yet, having the usual smattering of training from the Tube, I'm
not *too surprised* that the Judeo-Christian book you refer to was
originally written in the languages of its original authors.  ;-)
Translation is sometimes devastating especially if commissioned by a power
with certain political aims in mind and "olde King James" was not exactly
"a merry olde soule".

As for the current situation in Canada, while it is true that bilingualism
is a cultural fact, it is also obvious that they are currently on the verge
of secessions over the issue as one linguistic culture refuses to accept
the other's preferences.  Multi-lingual cultures are sometimes xenophobic.
I don't believe the Canadians are, but in politics, any difference that
makes a difference to a body is often a means to leverage control by those
who find that who is in control is far more important to them than the
quality of life of those that they control.

A native of the Netherlands often speaks multiple languages.  That is
usually an economic and social necessity when the borders are small.  If
you go to South Florida right now, you will find that a lot of
non-Hispanics speak practical Spanish.  However, when one requires by law
that they do, one discovers that a hefty financial requirement has also
been levied.  On the other hand, a large number of the secondary
educational institutions require foreign language training of some sort, so
the average student can acquire the practical skills they need for this
naturally bilingual, and with the advent of Haitian immigrants, tri-lingual
community.  The question is one of how many languages will we require
by_law, not one of how many are practical in a given community.  English
remains the American language of law and commerce.  It is therefore, a
practical necessity to master it in America.

[Richard L. Goerwitz] <1994Sep19.170533.20089@ast.saic.com>

|   All I am saying is that content-based markup is a philosophy, and is
|   not the exclusive property of SGML.  Its use in government and
|   industry, I might add, has nothing to do with cleanness and
|   tractability of its design.  Do you challenge me on these points?

Bob Agnew might not but I do.

1.  Content based markup is a technique, not a philosophy.  In a narrow
    definition of the first term, I can't conceive of a way in which
    content-based markup has any application to the "pursuit of wisdom".  I
    agree that the technique is not the exclusive property of SGML.
    Standing upright to look across the savannah for predators was not the
    exclusive technique of primates, but it sure was a practical technique
    to master.

2.  The use of content-based markup in government and industry occurs for
    precisely the same reason that one uses multiple languages.  Whereas
    one might legislate English or the CALS DTD as the vehicle for
    communications on law and commerce, one can lose the ability to express
    very specific information local to an environment and culture.

Should one deny to applications of SGML that freedom of expression hard
fought and won for other languages?

I suggest that one who thinks that a single set of element types can do all
the jobs currently required of SGML applications has not done much in the
way of commercial work for multiple customers in multiple environments.  It
is the requirement for multiple DTDs that led to current thinking in the
hypermedia environment in which the view of the data to the presentation
system is different from the view of the document database itself.  How one
assimilates these different views is a matter of application.  Some
translate; others query, and so on.  But a commercial engineer who doesn't
understand and apply one of these techniques, and instead, insists that all
views be expressed by one set of element types will not survive into the
next generation of players.  Each practitioner in a given domain of
knowledge defines its information and concepts in a set that that domain
recognizes and by which it communicates.  That is the nature of the
agreement and all communication in some form is contractual at its base.
An SGML DTD is one practical way to express that contract.  That's not
theory or philosophy.  It's just a way to get the job done to the
satisfaction of the customer.

Len Bullard
</message>
<message id="<Cwynqo.Enu@unx.sas.com>" date="2989947696">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 21:01:36 UT
From: Lee Peterson \<saslmp@cliff.unx.sas.com>
Organization: SAS Institute Inc.
Message-ID: \<Cwynqo.Enu@unx.sas.com>
Keywords: STYLE REGRESSION BATCH
Subject: Anyone regression testing doc styles?

Greetings,

Is anyone doing regression testing of their document style code?  We have
done ours in a batch environment to date.  The method has been, more or
less -- First look at the output for obvious problems; beyond that, a
page-by-page comparison of the regression standard to the newly formatted
document.

We're now looking for a better way to do ours with the notion that doing it
visually is best.  The givens are: SGML markup with styles encoded in
FOSIs.  We'll need to regression test output in (at least) these formats:
PostScript, RTF, and BookMaster/IPF(for Presentation Manager).

Does anyone have some experience they'd like to share?

Thanks,

-- 
     --------------------------------------------------------
   /  /           /  Lee Peterson       saslmp@unx.sas.com  /
  /  /   /> />   /  SAS Institute, Inc.                    /
 /  `---`--`--  /  Publications__Technology__Development  /
 --------------------------------------------------------
</message>
<message id="<36hutp$6b6@sundog.tiac.net>" date="2989948281">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 21:11:21 UT
From: "Keith M. Corbett" \<kmc@specialform.com>
Organization: Special Form Software
Message-ID: <36hutp$6b6@sundog.tiac.net>
References: <36bvg6$n9e@kaa.heidelbg.ibm.com> <1994Sep29.232903.10546@falch.no>
Subject: Re: SGML editors and accented Latin characters

[Steve Pepper]

|   Vendors of SGML editors, are you listening?  Do you want to sell your
|   products in Europe - to customers using languages other than English?
|   Then get your act sorted out as far as accented and other 'non-English'
|   letters are concerned.

I believe Interleaf SGML does this "right".  The Interleaf editor has a
completely configurable keymap and language interface.  The language
specific dictionary is a text attribute (one of the font properties).  I
think they support more European languages than any other publishing
system, certainly they're competitive in this area.

They also support JES-2 encoding in their Japanese version, which includes
a Kanji front end processor.

Maybe other Interleaf SGML users can comment on how well these features are
integrated within the SGML environment.  I believe character entities are
used as needed when documents are output as SGML - I worked briefly on the
internal implementation of entities in I6 - but I've never used this
feature.

 -kmc
</message>
<message id="<1994Sep30.223749.24944@midway.uchicago.edu>" date="2989953469">
Newsgroups: comp.text.sgml
Date: 30 Sep 1994 22:37:49 UT
From: "Richard L. Goerwitz" \<goer@midway.uchicago.edu>
Organization: University of Chicago
Message-ID: <1994Sep30.223749.24944@midway.uchicago.edu>
References: \<p.kerr-2809941654060001@130.216.90.127> <1994Sep28.142556.13592@midway.uchicago.edu> <9409301652.AA34292@source.asset.com>
Subject: Re: Multilingual HTML, SGML documents?

[Claude L. Bullard]

|   1.  Content based markup is a technique, not a philosophy....  [But] I
|       agree that the technique is not the exclusive property of SGML....
|   
|   2.  The use of content-based markup in government and industry occurs
|       for precisely the same reason that one uses multiple languages....

Len, get to the point quicker next time.  I'm sure you're an alright guy.
But your posting seemed more a discourse on the meaning of life than a
response to my last article.

-- 

   -Richard L. Goerwitz              goer%midway@uchicago.bitnet
   goer@midway.uchicago.edu          rutgers!oddjob!ellis!goer
</message>
