Newsgroups: comp.text.sgml Date: 01 Jun 1992 01:14:05 UT From: Brian Travis \ Organization: SGML, Inc. Message-ID: <707361245snx@sgmlinc.com> References: <1992May29.142718.4971@crd.ge.com> Subject: Yacc'able sgml grammar In article <1992May29.142718.4971@crd.ge.com> barnettj@pookie.crd.ge.com writes: > > I apologize if this is a FAQ, but I've just started reading this > group. I'd like to know if there exists a yacc parser built to > recognize sgml'd text or a subset of sgml. If none exists, is > there any tool that parses sgml and allows you to insert actions > when rules are reduced (like yacc)? The ARC Parser, available from the usual sources (check the FAQ), comes with a REXX interface, from which the programmer can get the Element Structure Information Set (ESIS) information...things like attribute values, element-in- context information, expanded entity information and stuff like that, for use in a REXX program. The source is available, so it would be a trivial task to build whatever interface around the ESIS data. I imagine that parsers based on ARC (SGML-S and others) have the same kind of output accessible, but I haven't had time to investigate these yet. Brian. --------------------------------------------------------------------------- <> Brian Travis <> Managing Editor <> brian@sgmlinc.com <> \: The SGML Newsletter <> 6360 S. Gibraltar Cir. <> Aurora CO 80016 <> <> (303) 680-0875 <> Fax: (303) 680-4906 Newsgroups: comp.text.sgml Date: 01 Jun 1992 18:34:48 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23172C@erik.naggum.no> Subject: Access to the comp.text.sgml archive I'm pleased to announce that the comp.text.sgml is now available with Gopher and WAIS. The archive is available by date, by message-id, and with keyword search. The keyword index and directory chaches are generated every night. Credits go to Anders Ellefsrud of the operations staff here at the Institute of Informatics at the University of Oslo. Thanks, Anders. The WAIS information base is updated with the access parameters, and can be accessed immediately. The Gopher system is available as part of the the international Gopher system, but access isn't so easy unless you know a little Norwegian. Find the entry for Norway, go down to "Universitetet i Oslo", then down to "Matematisk-Naturvitenskapelig Fakultet", then "Institutt for Informatikk", then "SGML", and you're there. The available directories include "SGML", "SIGhyper", and "comp.text.sgml". The latter has "by.date", "by.msg-id", and the keyword search facility. There's a way to specify how to get right there, but I moved this weekend, and can't find the details (on a piece of paper, naturally). If you know Gopher, you can possibly make sense out of these details: Host: gopher.ifi.uio.no Port: 70 Dir: /SGML This could be put in a file to make access simpler, and I'll get back to you with more information as soon as I can find it, or Anders patiently tells me for the third time... Let's give Anders a big hand for this! Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 03 Jun 1992 10:35:18 UT From: jean-luc GUIMPIER \ Organization: IRIT-UPS, Toulouse, France Message-ID: <2320@irit.irit.fr> Subject: ODA vs SGML Hi everybody, having to evaluate both norms to choose one for future developement I would welcome any comments, advices or information. If comparison work is available somewhere I'm interested. Thanks for helping. Newsgroups: comp.text.sgml Date: 03 Jun 1992 15:14:57 UT From: Siamak Khoubyari \ Reply-To: khoub-s@cs.buffalo.edu Organization: SUNY at Buffalo, Computer Science / CEDAR Message-ID: \ Subject: SGML Parser Hi all, I'm looking for an SGML parser or interpreter of some kind. I would appreciate any information you may be able to mail me regarding what is available, wnd where it can be obtained. Thanks, -- Siamak =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Siamak Khoubyari | Internet: khoub-s@cs.buffalo.edu Department of Computer Science / CEDAR | BITNET: khoub-s@sunybcs.BITNET State University of New York at Buffalo | UUCP: !uunet!cs.buffalo.edu!khoub-s =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.text.sgml Date: 04 Jun 1992 09:07:00 UT From: Denis Excoffier \ Organization: Ecole Normale Superieure, PARIS, France Message-ID: <1992Jun4.090700.10778@ens.fr> Subject: ESIS -> SGML A question to all those SGML hackers : Suppose you have a DTD and the ESIS of some SGML instance you don't have. From these, how do you derive an SGML instance that would, if parsed, give the same ESIS ? Example : if DTD is : \ ]> if ESIS is : (DOC -Yes, I have a number. )DOC An SGML instance could be : \Yes, I have a number.\ Is it so easy in all cases ? Newsgroups: comp.text.sgml Date: 04 Jun 1992 09:30:24 UT From: Luc Dupuy \ Organization: Universite du Quebec a Montreal Message-ID: <1992Jun4.093024.1834@cari.telecom.uqam.ca> References: <2320@irit.irit.fr> Subject: Re: ODA vs SGML In article <2320@irit.irit.fr> guimpier@irit.irit.fr (jean-luc GUIMPIER) writes: >Hi everybody, > >having to evaluate both norms to choose one for future developement I would welcome any comments, advices or information. >If comparison work is available somewhere I'm interested. > >Thanks for helping. Just one naive question : why choose? Newsgroups: comp.text.sgml Date: 05 Jun 1992 11:16:59 UT From: Mike Popham \ Organization: Computer Unit, Exeter University Message-ID: <2294@exua.exeter.ac.uk> Subject: Sample SEMA DTDs available A new directory (write-it.dtds) has been added to the SGML archive held at The SGML Project (c/o University of Exeter, UK). A copy of the README file for this directory follows: ------------------------ write-it.dtds/README --------------------------- SGML Project 05 Jun. 1992 write-it.dtds This directory contains copies of some DTDs kindly donated to the archive by Martin Bryan (SGML Products Manager of SEMA Software Technology). The DTDs were copied straight from an MS-DOS disk to a Sun SPARC- station, then packed using the UNIX tar and compress utilities. These DTDs are offered "as is", and are not supported in any way. A copy of the README file that accompanied the files on the disk follows: This disc contains 4 DTDs. Three are stripped down versions of the training DTDs that are used in The WRITE-IT Manual for teaching users how to create marked up memos, letters and reports. Each of these DTDs is preceded by the SGML Declaration used to process the DTD. The fourth DTD in the root directory is the A-W.DTD used to create SGML - An Author's Guide to the Standard Generalized Markup Language by M. Bryan. (The latter has been copyrighted by Addison-Wesley but can be modified for use as the basis of other DTDs.) Within WRITE-IT extensive use is made of short references, and information on the use of elements is stored in link processing instructions that form part of the DTD. The full version of each DTD, as used by WRITE-IT, can be found in the WRITE-IT subdirectory, which also contains the files needed to configure WRITE-IT to handle each class of DTD (the .cfg files). NAME TYPE CONTENTS README File (This file) write-it.dtds.tar.Z File DTDs for letter, memo, report and book(c) donated by SEMA Software Technology. ------------------------------------------------------------------- Michael Popham SGML Project - Computing Development Officer Computer Unit - Laver Building North Park Road, University of Exeter Exeter EX4 4QE, United Kingdom email: sgml@uk.ac.exeter OR M.G.Popham@uk.ac.exeter Phone: +44 0392 263946 Fax: +44 0392 211630 ------------------------------------------------------------------- Newsgroups: comp.text.sgml Date: 05 Jun 1992 13:01:45 UT From: Mike Popham \ Organization: Computer Unit, Exeter University Message-ID: <2295@exua.exeter.ac.uk> References: <2294@exua.exeter.ac.uk> Subject: Re: Sample SEMA DTDs available In my last posting, I forgot to say HOW to connect to the archive held by The SGML Project. Using ftp: Host: sgml1.ex.ac.uk [144.173.6.61] Login: anonymous Password: \ the files are in directory write-it.dtds. Many apologies for any inconvenience/confusion caused by my earlier omission! ------------------------------------------------------------------- Michael Popham SGML Project - Computing Development Officer Computer Unit - Laver Building North Park Road, University of Exeter Exeter EX4 4QE, United Kingdom email: sgml@uk.ac.exeter OR M.G.Popham@uk.ac.exeter Phone: +44 0392 263946 Fax: +44 0392 211630 ------------------------------------------------------------------- Newsgroups: comp.text.sgml Date: 05 Jun 1992 20:29:46 UT From: Siamak Khoubyari \ Reply-To: khoub-s@cs.buffalo.edu Organization: SUNY at Buffalo, Computer Science / CEDAR Message-ID: \ Subject: SGML Collection / Corpus? Hi all, I was wondering if you know of any large on-line collection of SGML-formatted documents. If so, how can I obtain it? Thanks, -- Siamak =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Siamak Khoubyari | Internet: khoub-s@cs.buffalo.edu Department of Computer Science / CEDAR | BITNET: khoub-s@sunybcs.BITNET State University of New York at Buffalo | UUCP: !uunet!cs.buffalo.edu!khoub-s =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.text.sgml Date: 05 Jun 1992 20:56:39 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun5.205639.21823@news.eng.convex.com> Subject: using NOTATIONs inline The WWW group is attempting to define a multimedia interchage format called HTML. It is intended to be an SGML language, but most existing HTML has never been through an SGML parser. Anyway, the question is: how do you put a bitmap in an HTML document? That is, is it possible to put an arbitrary 8 bit binary stream _inside_ an SGML document? My guess is: no. But if we use CDATA, can we include anything that doesn't contain the closing tag in full? For example: \ \ ... \@#$@#$@#$@ raw gif data @#$@#$@#\ ... \ Someone made the point that an SGML document is only allowed to include SGML characters as specified by the SGML declaration, and if we're going to use the default SGML declaration, we have to stick to the characters blessed by it. That's not my understanding. I thought that inside CDATA (or SDATA, I think) you could put _anything_ but the closing tag in full. What's the scoop? Do we have to use external entities for raw data? Dan Newsgroups: comp.text.sgml Date: 06 Jun 1992 19:55:11 UT From: Stephen P Spackman \ Organization: University of Chicago CILS Message-ID: \ References: <1992Jun5.205639.21823@news.eng.convex.com> Subject: Re: using NOTATIONs inline Alarm bells. Are you seriously considering a graphics format that stipulates that "the sequence '\' may not appear anywhere in the encoded form of the image" as a validity constraint *on the original source image*? Don't do it. Of course, if I've missed the point, ignore me :-). ---------------------------------------------------------------------- stephen p spackman Center for Information and Language Studies stephen@estragon.uchicago.edu University of Chicago ---------------------------------------------------------------------- Believe in Strong AI? I don't even believe in Strong I! Newsgroups: comp.text.sgml Date: 06 Jun 1992 20:03:27 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23177A@erik.naggum.no> References: <1992Jun5.205639.21823@news.eng.convex.com> Subject: Re: using NOTATIONs inline Dan Connolly \ writes: | | The WWW group is attempting to define a multimedia interchange | format called HTML. . . . Why not use HyTime? : | That is, is it possible to put an arbitrary 8 bit binary stream | _inside_ an SGML document? My guess is: no. But if we use | CDATA, can we include anything that doesn't contain the closing | tag in full? If you by "the closing tag in full" mean the entire end-tag, complete with etago, generic identifier, and tagc, as in "\", this is not the way SGML does it. CDATA and SDATA are terminated by a etago "delimiter-in-context", which is an etago (end-tag open, ""). : | Someone made the point that an SGML document is only allowed to | include SGML characters as specified by the SGML declaration, and if | we're going to use the default SGML declaration, we have to stick to | the characters blessed by it. Blessed and blessed. The SGML declaration is supposed to reflect the reality of the document, not enforce arbitrary limits on them. So you write an SGML declaration which fits the document. | That's not my understanding. I thought that inside CDATA (or SDATA, | I think) you could put _anything_ but the closing tag in full. As said above, the etago delimiter-in-context terminates the data, regardless of whether it's a legal end-tag in that context. You should be aware that the SGML parser will parse the contents of the "binary" content, and ignore record start, and treat record ends different from other characters. In addition, it's an error for an SGML entity to contain characters with any of the numbers listed in the SHUNCHAR part of the SYNTAX declaration. This is _not_ what you want with binary data. | What's the scoop? Do we have to use external entities for raw data? Yes. An external entity that is not an SGML text entity requires a notation identifier, so you only need to list the entities in the DTD, with notation, and refer to them by name in the document instance. If this is not satisfactory, you should declare the objects to be CDATA, and use a binary to text-only transformation scheme. There are several such schemes. Among them, base64 is the preferred encoding in my view, since it's available as part of the new Multipurpose Internet Mail Extensions (MIME) RFC-to-be. (The latest draft is available for anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for two weeks from today. Section 5.2 which concerns the base64 encoding is also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.) Transformation back to the binary form from the text-only form may be done on the fly by the application before sending the data to the notation interpreter. In addition to being much easier to deal with in SGML, this also makes SGML documents containing such content robust with respect to file transfer, etc. Hope this helps, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 07 Jun 1992 04:23:58 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun7.042358.29367@news.eng.convex.com> Subject: MIME for global hypertext The WAIS, gopher, and world-wide-web projects are all client/server information retrieval systems. All three deliver plain text information quite well, and they each have evolving mechanisms for delivering other forms of information. The MIME RFC defines a system for processing multi-part, multimedia messages on the internet. I would like to see these systems, along with USENET news and internet mail, interoperate with MIME as the substrate. The clients for these systems go something like this: 0 user invokes client (and chooses a starting point) 1 client displays user's request 2 user reads page, chooses a reference to more info 3 user informs client of choice (e.g. "show me item #1," or "search for googoo") 4 go to step 1 These systems often consist of a hierarchy of menus with text files at the leaf nodes. The system allows the user to interactively navigate the menus and browse leaf nodes. But 1) the format of the menus is particular to the system (USENET newsgroups/articles, unix directories/files, WAIS source/database/document). And 2) once a user is at a leaf node, the system can no longer interactively follow references. The novel aspect of hypertext is that the distinction between the menu pages and the text pages disappears. In the world-wide-web, text documents have machine-readable links inside them, and all menus are represented as hypertext documents. The WWW format works well, but it would benefit from use of MIME's features. For a common hypertext document format, I propose we define a subtype of the MIME multipart message: X-HYPERTEXT. The first part of a multipart/X-HYPERTEXT message is the content of the document, and the remaining parts are multimedia attachments and links to other documents. The content part contains references (by Content-ID) to the attachments and links. The client software allows the user to interactively choose references to display/follow. The remaining parts may be attached image/audio/video using MIME's various types and transfer encodings (text attachments would work too) or they may be references to information accessible elsewhere using MIME's message/external-body type. The parameters to the external-body content-type provide the same information as WWW's Universal Document Indentifier. (MIME only defines ANON-FTP, FTP, TFTP, LOCAL-FILE and AFS. The remaining access-types (WAIS, gopher, etc) would be experimental (X-WAIS, X-GOPHER) until standardized.) The emerging standard for structured, platform-independent text is SGML. The WWW project defines an SGML document type with traditional elements (title, heading, paragraph, list) and new hypertext elements (anchor). Soon it will have multimedia elements (image, audio). The current design places external document references (to files, WWW servers, WAIS documents, gophers, etc.) inside the SGML as attributes. There are lexical incompatibilities, and the design is under strain. I suggest that we implement references as as SGML entities that identify message/external-body parts by content-id. Representing document content in SGML allows the same information to be accessed using different user interface paradigms (e.g. dumb terminals vs. curses style vs. x windows point-and-click). Short of full SGML parsing, we could adopt the MIME text/richtext format, with the addition of a \...\ tag. In fact, any representation that allows the user to interactively indicate one of the attached body parts by content-id will do. For example, plain text with one-line descriptions would do. The Andrew ez data stream would also work, but only Andrew sites could parse it. This brings up the issue of format negociation. No one format is optimal for all information. Clients are likely to be able to process information in several formats, and servers are likely to be able to provide different representations. The various formats can be enclosed in a MIME multipart/alternative message. And rather than including the data for all formats in the message, the data could be in message/external-body parts. The client chooses the type of data it likes and retrieves the corresponding external-body. This (modified) example from the MIME rfc may help explain: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=42 --42 Content-Type: message/external-body; name="BodyFormats.ps"; site="thumper.bellcore.com"; access-type=ANON-FTP; directory="pub"; mode="image"; Content-type: application/postscript --42 Content-Type: message/external-body; name="/u/nsb/writing/rfcs/RFC-XXXX.ez"; site="thumper.bellcore.com"; access-type=AFS; Content-type: application/x-ez --42 Content-Type: message/external-body; name="BodyFormats.txt"; site="thumper.bellcore.com"; access-type=ANON-FTP; directory="pub"; Content-type: text/plain --42-- The client can choose between postscript, ez, and plain text, and retrieve the corresponding message body. The question then becomes: how do these systems interoperate? By making information available as multipart/X-HYPERTEXT MIME messages. The WWW client interfaced to the other systems by defining "addressing schemes" and implementing the various protocols and translating the data into HTML. Gopher has a similar typing scheme -- one character is reserved to indicate the access type and the data type. WAIS clients have yet another method of resolving types, though they only support one protocol. The NewsGrazer application has its own encapsulation mechanism. This is becoming a mess. In the short term, global hypertext viewers will have to support the access-type and content-type of each system with which it interoperates (so we have X-WAIS, X-HTTP, X-GOPHER, X-NNTP, as well as Some of the access types will become standard, and some will die out. But all the data types should be encapsulated in MIME messages. Any data that has machine-readable pointers to other data should be made into a multipart/X-HYPERTEXT message. For example, a WAIS question should have attachments for each of the result documents (the content part can stay application/x-wais-question, or it could be converted to a text type, or both), at least in the case where those documents are available by some standard access method. [I wrote a perl script that will change an HTML document into a MIME message with attachments.] Leaf documents, i.e. documents with no external links, can stay in single part types. e.g. Plain text files become MIME messages by simply adding a blank line at the beginning (to separate the headers (none) from the body). Under this model, a mail message can point to a news article which references a WAIS document which contains several drawings and pointers to several more available by FTP, and a user could just point-and-click between them. The only need for protocols like gopher and HTTP is to encapsulate data that's not already MIME compliant. This is clearly a pipe dream, but it's the kind of thing we can work towards today. Dan Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 07 Jun 1992 14:32:41 UT From: Nathaniel Borenstein \ Organization: Bellcore Message-ID: <1992Jun7.143241.7491@walter.bellcore.com> Subject: Re: MIME for global hypertext I think that Dan's message makes a lot of sense, and I'd had similar thoughts myself. The one change I'd suggest is that instead of multipart/x-hypertext Some group of interested parties should take the time to write a clear RFC describing this content-type, register it with IANA, and use something more like multipart/hypertext In other words, I think this application is important enough to take seriously enough to work towrads standardizing it. Certainly, though, we could start with x-hypertext and standardize it after we had some experience with it -- that's a reasonable approach. But ultimately, this could be a very important MIME type. -- Nathaniel Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 07 Jun 1992 18:36:43 UT From: Edward Vielmetti \ Organization: Msen, Inc. -- Ann Arbor, Michigan Message-ID: <10tl09INN20u@nigel.msen.com> References: <1992Jun7.042358.29367@news.eng.convex.com> Subject: Re: MIME for global hypertext connolly@convex.com (Dan Connolly) writes: : : The WAIS, gopher, and world-wide-web projects are all client/server : information retrieval systems. All three deliver plain text information : quite well, and they each have evolving mechanisms for delivering : other forms of information. : : The MIME RFC defines a system for processing multi-part, multimedia : messages on the internet. I would like to see these systems, along : with USENET news and internet mail, interoperate with MIME as the substrate. There are a couple of servers that already return MIME documents within WAIS: the "mime-samples" source has the Bellcore set of sample documents, and there's a server at Inria, France with album covers etc. Search the directory of servers for MIME and you will get the current list back. It would make some sense to rebuild a few of the other sources -- I'm thinking of the "uunet", "wuarchive", and "cica-win3" servers running at CICnet -- as MIME formatted collections. That would let people with proper viewers connect directly to the FTP site that they index rather than cut and paste. If anyone wants to give this a try drop me some mail and I'll see what I can do to bash something together that would work OK. Edward Vielmetti, vice president for research, Msen Inc. emv@Msen.com Msen Inc., 628 Brooks, Ann Arbor MI 48103 +1 313 741 1120 Newsgroups: comp.text.sgml Date: 08 Jun 1992 19:39:21 UT From: Anne Brueggemann-Klein \ Organization: Institut fuer Informatik der Universitaet Freiburg, Deutschland Message-ID: \ Keywords: attributes, CURRENT, defaults Subject: Defaulting mechanism for CURRENT attributes Let us assume we have a CURRENT attribute *att* and an ID attribute *ident* defined for element type *elem*. The content model for *elem* is (elem | #PCDATA)*. Consider now the partial document \ \ \ att=ZZZ ident=c>\ \ \ \ \. Now, the attribute *att* of *elem* instance *d* defaults to the most recently specified value. Is this ZZZ (thus considering the left-to-right ordering of value specifications) or is this XXX (thus considering the top-down hierarchical ordering? Thank you for your help, Anne Brueggemann-Klein (brueggemann@informatik.uni-freiburg.de) Newsgroups: comp.text.sgml Date: 09 Jun 1992 00:36:55 UT From: "Steven R. Newcomb" \ Message-ID: <9206090036.AA12682@tti> Subject: HTML vs. HyTime Dan Connolly writes: > The WWW group is attempting to define a multimedia interchage > format called HTML. It is intended to be an SGML language, but > most existing HTML has never been through an SGML parser. Why create another SGML-based multimedia interchange format when HyTime was just approved on May 1 as an International Standard for that very purpose? Steven R. Newcomb, President TechnoTeacher, Inc. Voice: +1 904 422 3574 1810 High Road Fax: +1 904 386 2562 Tallahassee, FL 32303-4408 USA Internet: srn@techno.com Newsgroups: comp.text.sgml Date: 09 Jun 1992 12:46:14 UT From: Brian Travis \ Organization: SGML, Inc. Message-ID: <708093974snx@sgmlinc.com> References: \ Subject: Defaulting mechanism for CURRENT attributes In article \ brueggem@informatik.uni-freiburg.de writes: > > > Let us assume we have a CURRENT attribute *att* > and an ID attribute *ident* > defined for element type *elem*. > The content model for *elem* is (elem | #PCDATA)*. > > Consider now the partial document > > \ > \ > \ att=ZZZ ident=c>\ > \ > \ > \ > \. > > Now, the attribute *att* of *elem* instance *d* > defaults to the most recently specified value. > Is this ZZZ (thus considering the left-to-right ordering > of value specifications) > or is this XXX (thus considering the top-down > hierarchical ordering? The standard says a CURRENT element is "The open element whose start-tag most recently occurred (or was omitted through markup minimization)" (Definition 4.68). This would seem to indicate that the hierarchy does not matter; that the value is inherited from "most recently occurred" start-tag, not the "most recently opened but not closed" start-tag, as might be assumed from a hierarchical reading. In this case, it is "ZZZ". I put this question to Exoterica's parser, and got this minimal result (after removing the errant TAGC for ident=c): \ \ \ \ \ \ \ \ Brian. Newsgroups: comp.text.sgml Date: 09 Jun 1992 19:44:45 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23202A@erik.naggum.no> References: \ Subject: Re: Defaulting mechanism for CURRENT attributes Anne Brueggemann-Klein \ writes: | | Let us assume we have a CURRENT attribute *att* and an ID attribute | *ident* defined for element type *elem*. The content model for | *elem* is (elem | #PCDATA)*. I assume you have something like this in mind: \ \ | \ | \ | \\ | \ | \ | \ | \. | Now, the attribute *att* of *elem* instance *d* defaults to the most | recently specified value. Is this ZZZ (thus considering the | left-to-right ordering of value specifications) or is this XXX (thus | considering the top-down hierarchical ordering? I honestly think you would find ISO 8879 and SGML easier to deal with if you didn't attempt to make it fit a mathematical model which wasn't derived from it. On the other hand, if you could formulate a mathematical model consistent with the standard, it would probably be much appreciated by the entire SGML community. In contrast to the parts of the standard which explicitly discusses the hierarchical aspects of the document instance, the standard text in 7.9.1.1 Omitted Attribute Specification, item b) (Goldfarb [329:9]) talks about how the default value for a current attribute is assigned, and to which elements the default applies. It's evident from this that there is no notion of hierarchy involved. That is, "most recently specified" as used about current attributes is to be understood in the temporal sense, where time is understood as parse time. This can at least partially be understood by considering the fact that an attribute specification for a current attribute affects _all_ the elements for which the attribute definition applies. E.g. Given \ \ then \ \ is identical to \ \ This would make it very complex for the human user to figure out what exactly the value of an attribute would be. Not that it isn't already non-trivial, but the effect of hierarchical inheritance on attributes that were declared in a group is a little too much just to contemplate. It's also worth noting that the mechanisms that deal with aspects of the document instance hierarchy also deal with implicit scoping, in that when the element is no longer available, neither is the information pertaining to it. (E.g., short reference sets, as well as the trivial case of sub-element content.) The default value survives the element whose start-tag defined it, and a different mechanism than that of the other hierarchical ones would have to be defined in either case. Hope this helps. Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 10 Jun 1992 08:17:43 UT From: "Jungju Kim (CSDept)" \ Organization: KAIST in Seoul Korea Message-ID: <1992Jun10.081743.19470@kum.kaist.ac.kr> References: <1992Jun7.042358.29367@news.eng.convex.com> Subject: Re: MIME for global hypertext In article <1992Jun7.042358.29367@news.eng.convex.com> connolly@convex.com (Dan Connolly) writes: > >The MIME RFC defines a system for processing multi-part, multimedia >messages on the internet. I would like to see these systems, along >with USENET news and internet mail, interoperate with MIME as the substrate. > >Short of full SGML parsing, we could adopt the MIME text/richtext >format, with the addition of a \...\ tag. >In fact, any representation that allows the user to interactively indicate >one of the attached body parts by content-id will do. For example, >plain text with one-line descriptions would do. The Andrew ez >data stream would also work, but only Andrew sites could parse it. > IMHO, I don't think it's that easy. There can be various kinds of links in a document. One may possibly want to see the next part or the previous part, or one can a have a question on the specific region of the image he/she is watching. Richtext is not designed to deal with the problems stated above, and I don't think it will evolve in that direction. And if it is mixed up with WAIS, how do you think the other fields of the question be filled ? (related documents, servers,...) Anyway, could somebody out there kindly send me the list of documents describing WWW & gopher ? Jungju __ _ _ _ | | / / |_| | Jungju Kim - jjkim@cosmos.kaist.ac.kr | | / / _ _________ | | |/ / | | | _ _ | | Tel : +82-2-962-8861 | |\\ \\ | | | | | | | | | Fax : +82-2-969-0239 | | \\ \\ | | | | | | | | | GUI Consortium Project / System Architecture Lab. |_| \\_\\ |_| |_| |_| |_| | Computer Science Department, KAIST,KOREA Newsgroups: comp.text.sgml Date: 10 Jun 1992 13:45:46 UT From: hehanninen@tnclus.tele.nokia.fi Organization: Nokia Telecommunications. Message-ID: <1992Jun10.154546.1@tnclus.tele.nokia.fi> Subject: ansi vs. iso while naming tags ? Hi ! We are defining tag-names for our technical manuals. Could somebody comment following issues ?: -Should we rest on f.exam. ANSI/NISO z39.59-1988 standard while specifying common tag names or ISO/IEC TR 9573 ? Those offer a bit different decision for tag-name strategy, f.exam. : ANSI/NISO ISO/IEC \ \ \ \ \

Heading level 1 and title \

Heading level 1 \ Heading level 1 title \
Heading level 2 \
Heading level 2 and title \ Heading level 2 title \ bold text \ bold text -How to take care of multilingual phrases etc. With Marked section or via atributes or with specified tags. Sgml-formed text will be imported to Interleaf (or FrameMaker). Need some piece of practical advice ! Many thanks ! Heimo H{nninen internet: HEHANNINEN@tnclus.tele.nokia.fi Newsgroups: comp.text.sgml Date: 10 Jun 1992 15:03:57 UT From: ""Wayne Wohler"" \ Message-ID: <9206101513.AA14868@ucbvax.Berkeley.EDU> Subject: Scope of the CURRENT attribute Regarding the scope of the specification of a "current" attribute, check 7.9.1.1 (329:3). In part "a", it states that "There need be an attribute specification only for a required attribute, and for a current attribute on the first occurrence of ANY ELEMENT IN WHOSE ATTRIBUTE DEFINITION LIST IT APPEARS." (emphasis is mine.) In addition, in part "b", it states that "the new default affects ALL ELEMENTS associated with the attribute definition list in which the attribute was defined." (again, emphasis is mine.) Based on these quotes from the standard its pretty clear that Erik has correctly coded his example. Wayne L. Wohler IBM Corp Publishing Systems Boulder, Colorado Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 10 Jun 1992 16:08:36 UT From: "dennis.r.vogel" \ Organization: AT\&T Message-ID: <1992Jun10.160836.12078@cbnewsj.cb.att.com> References: <1992Jun10.081743.19470@kum.kaist.ac.kr> Subject: Re: MIME for global hypertext From article <1992Jun10.081743.19470@kum.kaist.ac.kr>, by jjkim@kum.kaist.ac.kr (Jungju Kim (CSDept)): > > Anyway, could somebody out there kindly send me the list of documents > describing WWW & gopher ? > Please post the list to this newsgroup. There are other folks who are interested in this topic. Dennis R. Vogel AT\&T Bell Laboratories Middletown, NJ Newsgroups: comp.text.sgml Date: 10 Jun 1992 16:31:22 UT From: Brad Might \ Organization: HaL Computer Systems Message-ID: \ References: \ <23202A@erik.naggum.no> Subject: Re: Defaulting mechanism for CURRENT attributes In article <23202A@erik.naggum.no> erik@naggum.no (Erik Naggum) writes: Erik, I think we had discussions on this about a year ago, but I cannot find the mail or postings. Perhaps you have it. > From: erik@naggum.no (Erik Naggum) > Date: 9 Jun 92 19:44:45 GMT > > E.g. > > Given > > \ > \ zot NAME #CURRENT > > > > then > > \ > \ > > is identical to > > \ > \ > Not true: \ \ is an error (if this is the first occurrence of \ in the document. The following is the only reference I can find to back it up, but I think there is another (better) statement that I cannot locate at the moment: B.5.2.4 Changing Default Values (pg. 38 SGMLh) If the default value is specified as "CURRENT", the default will automatically become the most recently specified value. This allows an attribute value to be "inherited" by default from the previous element of the same type. ^----------------------^ Overview 4.4.3.1 8 NOTE -- The start-tag cannot be omitted for the first occurrence of an element with a current attribute. This does not say anything about the current attribute value being supplied by another element. -- - standard disclaimers apply - jbm@hal.com (Brad Might) HaL Computer Systems - (512)794-2855 \more fun than a barrel of macros Newsgroups: comp.text.sgml Date: 10 Jun 1992 23:31:58 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun10.233158.29502@news.eng.convex.com> References: <9206090036.AA12682@tti> Subject: Re: HTML vs. HyTime In article <9206090036.AA12682@tti> srn@elvin.UUCP (Steven R. Newcomb) writes: >Dan Connolly writes: > >> The WWW group is attempting to define a multimedia interchage >> format called HTML. It is intended to be an SGML language, but >> most existing HTML has never been through an SGML parser. > >Why create another SGML-based multimedia interchange format when HyTime >was just approved on May 1 as an International Standard for that very >purpose? > You got a public implementation I can use? At least for SGML I can grab the smgls package and go. The HyTime standard is all well and good, but 1) I can't even read it without buying hardcopy, and 2) even if I had a hardcopy, it's so involved that it would take me years to implement it. I'm not saying we should ignore it -- I'm just not doing anything that extensive. If the HyTime spec intersects or includes the functionality I'm after, I'd like to know what it's strategies are and how hard they are to implement. But I've seen parts of the standard and it looks even huger than SGML! Dan Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 11 Jun 1992 01:47:11 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun11.014711.10853@news.eng.convex.com> References: <1992Jun7.042358.29367@news.eng.convex.com> <1992Jun10.081743.19470@kum.kaist.ac.kr> Subject: Re: MIME for global hypertext --cut-here In article <1992Jun10.081743.19470@kum.kaist.ac.kr> jjkim@kum.kaist.ac.kr (Jungju Kim (CSDept)) writes: >In article <1992Jun7.042358.29367@news.eng.convex.com> connolly@convex.com (Dan Connolly) writes: >> >>The MIME RFC defines a system for processing multi-part, multimedia >>messages on the internet. I would like to see these systems, along >>with USENET news and internet mail, interoperate with MIME as the substrate. >> >>Short of full SGML parsing, we could adopt the MIME text/richtext >>format, with the addition of a \...\ tag. >>In fact, any representation that allows the user to interactively indicate >>one of the attached body parts by content-id will do. For example, >>plain text with one-line descriptions would do. The Andrew ez >>data stream would also work, but only Andrew sites could parse it. >> >IMHO, I don't think it's that easy. There can be various kinds of >links in a document. One may possibly want to see the next part >or the previous part, or one can a have a question on the specific >region of the image he/she is watching. Sure, a client can implement all sorts of fancy semantics on top of a MIME-encapsulated document, especially if the structure is described by an SGML document. I don't see the conflict. > Richtext is not designed to >deal with the problems stated above, and I don't think it will evolve >in that direction. Ok, don't use RichText. You're right: it's more for formatting anyway. But interoperability is always a good thing. And if RichText doesn't cost much to implement, I might support it. > And if it is mixed up with WAIS, how do you think >the other fields of the question be filled ? (related documents, >servers,...) > Like this: SGML entites reference MIME body parts by ID. The body of the body part is the content of the entity. We use the expressive power of MIME, especially the message/external body semantics and the ghost-body area to get from one place to another in the global hypernet. The rest of this message, i.e. the second part of it, is a hypothetical global hypertext. Most of the semantics it demonstrates exist in avalable systems. For example, memtamail can be configured to call gopher (or even a shell script that calls telnet) to get the IMAP RFC. --cut-here Content-Type: multipart/X-HYPERTEXT; separator=attachment --attachment Content-Type: text/SGML \ \ \ \ \ ]> \example global hypertext\ \
\
Here are some comments on the MIME draft: Your client might include it inline, or allow you to click an icon and show it in another window. \ \ \
here's an the IMAP RFC. Your client will probably display it in place of this text if you select this element: \ \
here's an image. Your client might display it in a separate window. If you have xfig, you can ftp the source and edit the graphic: \
nifty graphic\
\
Here's a database of MIME samples. Your client should show a description of the database and allow you toinvoke a wais client and make queries: \ \
--attachment Content-id: att1 Content-Description: "Keith Moor Re: comments on mime draft, esp. richtext" Content-Type: message/external-body; access-type=x-wais Content-Type: message ;; This is the ghost-body. The real body is external, i.e. ;; it's the content of the following WAIS document: (:document-id :score 1000 :document (:document :headline "Keith Moor Re: comments on mime draft, esp. richtext" :doc-id (:doc-id :original-database "/usr/spool/uucppublic/pub/wais-indices/mime-samples" :original-local-id "0 16412 /var/spool/uucppublic/pub/mime/samples/+kdPOk1u0M2Yt8hPuAh" :copyright-disposition 0 ) :source (:source-id (:source :version 3 :ip-name "wais.msen.com" :tcp-port 210 :database-name "mime-samples" ) :number-of-lines 638 :number-of-bytes 16412 :type "MIME" ;; obsolete, if I get my way :content-type "message" ) ) --attachment Content-id: att2 Content-type: message/external-body; access-type=gopher; host=bauhaus.micro.umn.edu; port=70; Content-type: text 0/RFC Reference/RFCs/rfc1064.txt --attachment Content-id: att3 Content-Type: multipart/alternative; boundary=illegal-base-64-string --illegal-base-64-string Content-Type: image/gif Content-transfer-encoding: base64 l3kj45l3k4j5l3k4j5l34kj534l5j34l5kj34l5kj34l5kj34l5kj ... base 64 data inline ... lkj2345lk34j5l34kj5l34kj534lk5j= --illegal-base-64-string Content-description: editable source to the figure Content-Type: message/external-body; access-type=anon-ftp; site="prep.ai.mit.edu"; name=/pub/stuff/thingy.xfig Content-type: application/x-fig --illegal-base-64-string-- --attachment Content-Type: multipart/parallel; boundary=alt --alt Content-type: text Server created with WAIS release 8 b3.1 on Feb 20 04:44:57 1992 by emv@midori.msen.com multimedia documents in internet MIME multimedia mail format. originally from thumper.bellcore.com:/pub/nsb/. type for each of these things is MIME. with a proper viewer this should yield voice, pictures, text, nice pretty formatted text, and smell-o-vision. --alt Content-Type: application/x-wais-source (:source :version 3 :ip-name "wais.msen.com" :tcp-port 210 :database-name "mime-samples" :cost 0 :cost-unit :free :maintainer "emv@cic.net" :description "Server created with WAIS release 8 b3.1 on Feb 20 04:44:57 1992 by emv@midori.msen.com multimedia documents in internet MIME multimedia mail format. originally from thumper.bellcore.com:/pub/nsb/. type for each of these things is MIME. with a proper viewer this should yield voice, pictures, text, nice pretty formatted text, and smell-o-vision. ") --alt-- --attachment-- --cut-here-- Newsgroups: comp.text.sgml Date: 11 Jun 1992 09:19:40 UT From: Brad Might \ Organization: HaL Computer Systems Message-ID: \ References: <9206101513.AA14868@ucbvax.Berkeley.EDU> Subject: Re: Scope of the CURRENT attribute The (below) quoted paragraph however begins with If "SHORTTAG YES" or "OMMITTAG YES" is specified on the SGML declaration: So what happens if they are not specified ? Are all standards this difficult and obtuse ? Even with all the cross referencing in the Handbook, it still takes a long time to find all relevant information about almost anything if you can even find it (Current attributes are not cross referenced to this section which is TOTALLY RELEVANT to the question). In article <9206101513.AA14868@ucbvax.Berkeley.EDU> WOHLER@BLDVM1.VNET.IBM.COM ("Wayne Wohler") writes: > Regarding the scope of the specification of a "current" attribute, check > 7.9.1.1 (329:3). In part "a", it states that "There need be an > attribute specification only for a required attribute, and for a current > attribute on the first occurrence of ANY ELEMENT IN WHOSE ATTRIBUTE > DEFINITION LIST IT APPEARS." (emphasis is mine.) In addition, in part > "b", it states that "the new default affects ALL ELEMENTS associated > with the attribute definition list in which the attribute was defined." > (again, emphasis is mine.) Based on these quotes from the standard its > pretty clear that Erik has correctly coded his example. > > Wayne L. Wohler > IBM Corp > Publishing Systems > Boulder, Colorado > -- - standard disclaimers apply - jbm@hal.com (Brad Might) HaL Computer Systems - (512)794-2855 \more fun than a barrel of macros Newsgroups: comp.text.sgml Date: 11 Jun 1992 11:26:10 UT From: Steve Pepper \ Organization: Falch Hurtigtrykk as, Oslo, Norway Message-ID: <1992Jun11.112610.3834@falch.no> References: <1992Jun10.154546.1@tnclus.tele.nokia.fi> Subject: Re: ansi vs. iso while naming tags ? hehanninen@tnclus.tele.nokia.fi writes: > -Should we rest on f.exam. ANSI/NISO z39.59-1988 standard while > specifying common tag names or ISO/IEC TR 9573 ? Terve Heimo! Some colleagues and I have recently prepared a Norwegian version of the 'General' DTD found in TR 9573. (We call it General DTD//NO or 'GenDok'.) The main purpose of the exercise for us was to try to establish a standard Norwegian nomenclature for the most common tag names. In the process we came up against the same questions you are asking. In addition to 'General', we looked closely at the usage in ANSI/NISO Z39.59-1988 (aka the AAP DTDs) and a beta (0.8) version of the 'Majour' DTD (Modular Application for Journals) being developed by EWS (European Workgroup on SGML). Here are some of the conclusions that I personally drew from this work: 1. Tag name length ------------------ The AAP DTDs were originally developed at a time when there were no SGML-aware editors around and almost all tagging had to be done manually. This (I believe) is the main reason for the tendency towards short and cryptic tags in the ANSI DTD. Today the situation is different: there are many editors that let you (or even force you to) do your tagging via menus, and that allow you to hide your tags if you wish. There is therefore no longer any reason to save single keystrokes (bdy/body, fm/frontm) at the expense of making the tag names less intuitive - especially with tags that only occur a handful of times in a particular document. The corresponding tags in GenDok are \ and \. Majour uses \
("front matter" is inappropriate in journal articles) and \. (TEI also uses \.) 2. Tags for font styles ----------------------- One of our main tasks in explaining SGML to authors is getting across to them the importance of separating content (structure) from appearance (processing). (Except where the appearance _is_ a part of the content, as in many TEI applications.) Allowing tags like \ for "bold" flies in the face of this and should be avoided. (Can you be sure that you will _always_ want your \ text formatted using bold face?) As an article by J. Sperling Martin in the March 1992 issue of EPSIG News tells us, the AAP team was a little at odds with itself here, allowing \, \ etc. (for 'emphasised text') in addition to \, \ etc. In GenDok we use \, \ etc. (for 'uthevet tekst'). Majour has opted for \..\. 3. Headed sections -- \
etc. ------------------------------- These were actually the GIs that gave us most trouble and the conclusion that I personally have come to is that the author of General cheated, the AAP team took the easy way out, and EWS are attempting to sort out the muddle! The confusion seems to stem from the fact that authors (especially those used to older generic markup schemes) think in terms of headings, subheadings and text rather than sections and sub- sections comprised of headings, paragraphs etc. When they put a tag in front of a heading - say \
- they think of it as a code for the heading itself. But in most SGML applications that is not the case. In General \
is a _headed section_ (level 1); the actual heading (or headed section _title_) is \. What I feel the author of this DTD did was to choose a GI that means one thing for the DTD ('headed section'), and something else for the author ('heading'). In so doing, he/she established a source of confusion that has some far-reaching consequences. The fact that you call \
'Heading level 1' in your message testifies to this, but you are not alone. Just take a look at the ways \ (n=0-4) is translated in 9573: in three of the five languages, \ is translated to something meaning 'title' or 'heading' level 'n' (German 'Ueberschrift', Swedish 'rubrik', Danish 'overskrift'). Only the French and Dutch translations accurately reflect the fact that we are talking about structured parts or sections (French 'element' (2x \é), Dutch 'onderdeel'). Interestingly, section 5.7 of 9573 does not include translations for \, \ etc. Why not? What would, say, the Danes have called these? Perhaps 'overskrift niveau 1 titel' (heading level 1 title)? What about the ANSI/NISO solution? The AAP team seems to have been hedging its bets (sometimes you think they have everything except the kitchen sink in their DTD). Here we find \
, \
etc. - but not as headed sections! The AAP calls them 'Head, level 1', etc., and they have no direct hierarchical function. They are treated as 'subsection elements' on a level with paragraphs (so there is nothing to stop you having as many assorted \ (n=1-4) elements as you want, in whatever order, more-or-less anywhere you can put a simple paragraph). In addition to \
, \
etc., ANSI/NISO has real sections and subsections, with corresponding (optional) section titles. They are called \_{, \ (n=1-3) and \ respectively. I have my
doubts about the wisdom of mixing section titles and 'floating'
heads in the same DTD, but at least the GI names and descriptions
are not deliberately misleading, as they are in General.

The EWS seems to be aware of all this confusion. At any rate they
manage avoid it - by steering well clear of \, etc. The
proposed Majour 'body' consists of sections called \, \,
\ and \. Each section has an optional number, \, and
title, \. The price they pay is forcing the use of two tags,
where ANSI and ISO, using tag omission, usually get by with one,
that is:

\\Heading/title

instead of

\Heading/title

My personal opinion is that EWS has the best solution (we have
proposed \ etc. for the Norwegian GenDok), and I would
strongly advise you to follow their example.

By the way, the exercise of translating TR 9573's General DTD was
very useful. Perhaps you should get together with the people from
WSOY Information Systems (Jouko Riikonen?) and do a Finnish
version? (If you'd like a copy of our GenDok, send me an email.)

Cheers,

Steve

--
pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 (2) 163040 fax +47 (2) 162350 bbs +47 (2) 162650

Newsgroups: comp.text.sgml
Date: 11 Jun 1992 14:34:05 UT
From: "David J. Fiander" \
Organization: Mortice Kern Systems Inc., Waterloo, Ontario, CANADA
Message-ID: <1992Jun11.143405.27527@mks.com>
Subject: Changing parts of a concrete syntax

I'm using sgmls 0.7, and have found the 8-character limit on
names rather restrictive. The sgmls man page says that "[t]he
upper limit on NAMELEN is 239."

So how do I change it? Do I have to provide an entire SGML
declaration just to change one quantity?

--
David J. Fiander |The manager will be continually amazed that
\ |policies he took for common knowledge are totally
Mortice Kern Systems Inc. |unknown to some member of his team.
Waterloo, Ontario, Canada | - Fredrick P. Brooks, Jr.

Newsgroups: comp.text.sgml
Date: 11 Jun 1992 16:11:11 UT
From: jaap \
Reply-To: jaap@alice.UUCP ()
Organization: AT\&T, Bell Labs
Message-ID: <23011@alice.att.com>
References: <1992Jun10.154546.1@tnclus.tele.nokia.fi> <1992Jun11.112610.3834@falch.no>
Subject: Re: ansi vs. iso while naming tags ?

In article <1992Jun11.112610.3834@falch.no> pepper@falch.no (Steve Pepper) writes:
>
> Terve Heimo!
>
> Some colleagues and I have recently prepared a Norwegian version
> of the 'General' DTD found in TR 9573. (We call it General DTD//NO
> or 'GenDok'.) The main purpose of the exercise for us was to try
> to establish a standard Norwegian nomenclature for the most common
> tag names...

And later on in section ``1. Tag name length'':

> .... Today the
> situation is different: there are many editors that let you (or
> even force you to) do your tagging via menus, and that allow you
> to hide your tags if you wish. There is therefore no longer any
> reason to save single keystrokes (bdy/body, fm/frontm) at the
> expense of making the tag names less intuitive - especially with
> tags that only occur a handful of times in a particular document.

This really confuses me. If in today's systems the tag names can be
hidden form the user, why would anybody do the effort to translate
these names of them? Why not translate only the way the tags are
presented to the user? Isn't translating the tags as well just making
thing needlessly complicated?

It seems to me that when I have the General DTD in my system and you
send me a document using the Norwegian version of the dtd, I need that
dtd as well although the functionality might be the same as the one I
have. And if this trent continues, I can soon expect French, Spanish,
Finnish, Dutch, South-African and other language dtd's with no
functional differences? I'm afraid I don't get it.

It would be similar to translating in changing ``switch(){}'' into
``schakelaar(){}'' for the C-language. I know, in some french
algol-60 compilers one can use french for begin ... end as well, but
it always seemed to me hardly worth the trouble, and if you give away
your program to a non-french compiler, you had to translate these
things anyway. Of is there something that I miss? Please enlighten
me.

jaap

Newsgroups: comp.text.sgml
Date: 11 Jun 1992 16:54:02 UT
From: wathu@violet.ccit.arizona.edu
Reply-To: wathu@arizona.edu
Message-ID: <1992Jun11.095402.1@violet.ccit.arizona.edu>
Keywords: SGML, Converterss
Subject: SGML Converters

SGML Converters
===============

We are in the process of setting up a WorldWideWeb (WWW) server for computer
center documentation. Our current documents are in many different
word processor formats, such as Ventura, WordPerfect (DOS), MS-Word (Mac)
TeX, LaTeX, PostScript, RTF and etc. We would like to convert them to SGML
so that we can link them to WWW.

We would like recommendations for good products (commercial or free).
If any of you have tried this type of conversions, please comment on
your experiences.

Thank you

Wije Wathugala
wathu@arizona.edu

Newsgroups: comp.text.sgml
Date: 11 Jun 1992 20:40:50 UT
From: C. M. Sperberg-McQueen \
Organization: University of Illinois at Chicago
Message-ID: <92163.154050U35395@uicvm.uic.edu>
References: \ <23202A@erik.naggum.no>
Subject: Re: Defaulting mechanism for CURRENT attributes

> | Now, the attribute att of elem instance d defaults to the most
> | recently specified value. Is this ZZZ (thus considering the
> | left-to-right ordering of value specifications) or is this XXX (thus
> | considering the top-down hierarchical ordering?
>
> I honestly think you would find ISO 8879 and SGML easier to deal with if
> you didn't attempt to make it fit a mathematical model which wasn't
> derived from it. ...

For the record, I would like to say Erik is speaking for himself here,
but not for me; I find Dr. Brueggemann-Klein's work on SGML, and her
question here, extremely useful. I believe the intention of 8879 to be
as described by earlier replies to the query (current value is that most
recently specified in a depth-first left-to-right scan of the entire
tree, not in a direct descent from the root). But the question is not
nearly so uninteresting or obvious as Erik suggests.

The standard says the value is the "most recently specified" value
(clause 4.67, definition of "current attribute", which 11.3.4 says ATT
in Dr. Brueggemann-Klein's example is). But what happened most recently
depends in reality rather dramatically on what order you have been doing
things in. The interpretations offered thus far assume you have been
processing the text in a left-to-right, depth-first scan of the document
tree. But the original question effectively asks whether that is
guaranteed; what would happen if we processed the document in a
breadth-first traversal?

If there is any explicit specification in 8879 that an SGML parser
must process a document through a left-to-right depth-first traversal
of the document tree, I would very much like to know where it is. I
don't think it exists: the standard doesn't even specify explicitly
that the input has to be electronic characters (though it is hard to
understand how to declare or recognize delimiters if it's anything
else), and does explicitly say "This International Standard does not
constrain the physical organization of the document ..." (clause 6.1
note 1). If one wrote an SGML parser that did not process the text left
to right--and let us recall that there are parsing algorithms for
other orders, including Unger's and the Cocke/Younger/Kasami (CYK)
method--would one have the right to specify a different value for a
CURRENT attribute from the value given by a left-to-right depth-first
scan?

In practice, I don't see how one can handle SGML in any way other than
left-to-right scan of the data stream. But then, there's a lot I don't
know. And the standard does say (Annex F) that 8879 does not require
any particular implementation techniques or architecture.

So perhaps we should have a little more patience with questions that
seem to have obvious answers.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago

Newsgroups: comp.text.sgml
Date: 11 Jun 1992 21:17:54 UT
From: C. M. Sperberg-McQueen \
Organization: University of Illinois at Chicago
Message-ID: <92163.161754U35395@uicvm.uic.edu>
Subject: Entity references in attribute values: a new conundrum

Two related questions for the group, only slightly loaded.

1 is a conforming parser supposed to recognize entity references inside
attribute values? (please cite chapter and verse)

2 do existing implementation in fact recognize entity references inside
attribute values?

To anyone who thinks they know the answer to the first question off
hand: good, so did I, but then I found I was unable to back it up from
the standard, and in fact at the passage I found the standard seems to
be saying, quite clearly, the opposite of what I thought was the case,
and think should be the case, and believe was intended by the
drafters of 8879 to be the case. (If I am right, and the text says the
opposite of what was intended, then we really ought to make sure it gets
fixed in the revision!)

In order to allow us all to go to the standard without preconceptions, I
won't say what I thought was the case, at least not now.

I would be very happy to see postings on this question from some of the
implementers who read this list.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 04:03:29 UT
From: David Durand \
Organization: Roberstool Research Labs
Message-ID: \
References: <9206090036.AA12682@tti> <1992Jun10.233158.29502@news.eng.convex.com>
Subject: Re: HTML vs. HyTime

In article <1992Jun10.233158.29502@news.eng.convex.com> connolly@convex.com (Dan Connolly) writes:

In article <9206090036.AA12682@tti> srn@elvin.UUCP (Steven R. Newcomb) writes:
>Why create another SGML-based multimedia interchange format when HyTime
>was just approved on May 1 as an International Standard for that very
>purpose?
>
[stuff deleted]
The HyTime standard is all well and good, but 1) I can't even read
it without buying hardcopy, and 2) even if I had a hardcopy, it's
so involved that it would take me years to implement it.

I'm not saying we should ignore it -- I'm just not doing anything
that extensive.

If the HyTime spec intersects or includes the functionality I'm
after, I'd like to know what it's strategies are and how hard
they are to implement. But I've seen parts of the standard
and it looks even huger than SGML!

Dan

HyTime is a framework that can be added to _any_ DTD, and you could
write an application that uses a private markup language, but by
attaching appropriate #FIXED attributes to your DTD, it could be
accepted by a HyTime engine. Later on, when you got more ambitious you
migth want to implement a HyTime engine of your own (sgmls could be a
base for such...). HyTime is also a modular standard, which covers a
huge scope -- _if_ all modules are included. For most hyptertext
applications (those that do not involve sophisticated multi-media
links) only a small portion of the standard is needed. I think (based
on limited scanning of the approved version of the standard) that the
"base", "location address", and "hyperlinks" modules would be needed.

Whatever you might propose should at least fit the mold of HyTime --
It's not that hard to ensure at least _lack of in-compatibility_.
Seriously, write any spec for your protocol that you want, but if
you're going to try to set standards, you need to take pre-existing
standards into careful account. HyTime _is_ a bear to get the hang of,
but I think that this forum should be able to help you with specific
problems as they arise.

-- David

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 07:11:08 UT
From: Steve Pepper \
Organization: Falch Hurtigtrykk as, Oslo, Norway
Message-ID: <1992Jun12.071108.7248@falch.no>
References: <1992Jun10.154546.1@tnclus.tele.nokia.fi> <1992Jun11.112610.3834@falch.no> <23011@alice.att.com>
Subject: Re: ansi vs. iso while naming tags ?

jaap@alice.att.com (jaap) writes:

> It seems to me that when I have the General DTD in my system and you
> send me a document using the Norwegian version of the dtd, I need that
> dtd as well although the functionality might be the same as the one I
> have. And if this trent continues, I can soon expect French, Spanish,
> Finnish, Dutch, South-African and other language dtd's with no
> functional differences? I'm afraid I don't get it.

Dear jaap,

Have no fear! I wouldn't dream of sending you a document that used
GenDok (the Norwegian version of General in TR 9573). If I had an
instance of a GenDok document that you needed, and I knew you had
the General DTD, I would convert it (perhaps using an explicit
link) before transmitting to you. Because GenDok is a straight
translation of General (warts and all :-), that would simply
involve a one-to-one mapping of GIs.

Having said that, I doubt whether I'll ever have a GenDok document
to send you anyway. Our aim in doing the translation was not to
end up with a DTD that people will actually use _as is_ (except
perhaps for training and experimentation), but to try to establish
some conventions for Norwegian tag names.

Out in the big wide world (and even some places in Norway!), where
the natural thing is to use English GIs, certain conventions have
emerged. For example, when I see the tag \ I now automatically
think of 'list item' and in 95% of cases I am right; \ means
'paragraph' unless otherwise specified; etc. Conventions like this
make it easier for document designers to read other people's DTDs,
and they make life easier for the end user who must switch between
different DTDs.

Of course, things are far from perfect in the English-speaking
world, as J. Sperling Martin points out in the article I mentioned
in my original posting (EPSIG News, March 92):

How many ways do you want to have to remember the correct
tag for a "heading" -- \, \, \, \ and so on.
(I've seen each of these in actual DTDs. The first is the
Z39.59 flavor, and the last is from the TEI.) We should
always keep in mind that in some instances an author writing
an article for, say, the ACS Journal may also be preparing
a manuscript for his latest book for Wiley...

SGML is still in its infancy here in Norway, but things are
starting to move fast and we wanted to try to nip this kind of
chaos in the bud. Our starting point is that Norwegian authors
ought to be able to use Norwegian tag names (unless there is a
very good reason not to), and that they shouldn't be called upon
to learn umpteen different GIs for elements that perform exactly
the same role in different DTDs. We hope that by publishing GenDok
under the auspices of the Norwegian SGML Users' Group, we will
establish a standard practice - e.g. \ ('punkt') and \
('avsnitt') as equivalents for \
and \.

As to your suggestion about translating the way tag names are
presented to the user, I agree - that would be one solution, for
systems that allow it. But those that I have seen generally do
not. They allow you to customise the _description_ of the tag, but
not the actual tag name, nor the way it is presented on the screen
when you choose 'Show Tags' instead of 'Hide Tags'. Thus, taking
the \
element as an example, Author/Editor would allow me to
add the (Norwegian) description 'Punkt i en nummerert eller
unummerert liste' in the dialog box from which the tag would be
chosen. But when I press Cmd-Spacebar for 'Show Tags', the
screen representation would be [ li > and not (as my Norwegian
users would like) [ pkt > .

Hope this clears up the confusion.

Cheers,

Steve

--
pepper@falch.no
------------------------------------------------------------------
falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway
tel +47 (2) 163040 fax +47 (2) 162350 bbs +47 (2) 162650

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 12:11:52 UT
From: hehanninen@tnclus.tele.nokia.fi
Organization: Nokia Telecommunications.
Message-ID: <1992Jun12.141152.1@tnclus.tele.nokia.fi>
Subject: more about national DTDs

>Message-ID:<23011@alice.att.com>

>In article <1992Jun11.112610.3834@falch.no> pepper@falch.no (Steve Pepper) writes:
> >
> > Terve Heimo!
> >
> > Some colleagues and I have recently prepared a Norwegian version
> > of the 'General' DTD found in TR 9573. (We call it General DTD//NO
> > or 'GenDok'.) The main purpose of the exercise for us was to try
> > to establish a standard Norwegian nomenclature for the most common
> > tag names...
>
>And later on in section ``1. Tag name length'':
>
> > .... Today the
> > situation is different: there are many editors that let you (or
> > even force you to) do your tagging via menus, and that allow you
> > to hide your tags if you wish. There is therefore no longer any
> > reason to save single keystrokes (bdy/body, fm/frontm) at the
> > expense of making the tag names less intuitive - especially with
> > tags that only occur a handful of times in a particular document.
>
>This really confuses me. If in today's systems the tag names can be
>hidden form the user, why would anybody do the effort to translate
>these names of them? Why not translate only the way the tags are
>presented to the user? Isn't translating the tags as well just making
>thing needlessly complicated?
>
>It seems to me that when I have the General DTD in my system and you
>send me a document using the Norwegian version of the dtd, I need that
>dtd as well although the functionality might be the same as the one I
>have. And if this trent continues, I can soon expect French, Spanish,
>Finnish, Dutch, South-African and other language dtd's with no
>functional differences? I'm afraid I don't get it.

I agree with this. We are not going to modify the finnish version of GEN-DTD
with finnish tag names. Let's say, it's against our commercial policy and
people have gotten used to play with english style names f. exam. in W4W
and IL. Last but not least, why should we upset foreigners, although text
would be offered with finnish DTD.

Like Steve wrote, we may also supply a description for cryptic tag names
in editors 'help field'.

I guess the situation here in Finland is quite similar to Norway. Thus it
would be really useful to offer finnish DTD for people who are studying
SGML technique !!

Question: Is it necessary to include structural info in tag name
(in prefix an in suffix) if you can see (and follow)
the structure in DTD ?

f.ex. %p.em.ph or %pharases

Sun tan for everybody , Heimo

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 14:24:50 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23205B@erik.naggum.no>
References: <1992Jun12.141152.1@tnclus.tele.nokia.fi>
Subject: Re: more about national DTDs

The reason I, too, want national DTD's is that the people who keyboard
the documents will want to use their natural language also for the
markup. With conventional markup languages, this has not been possible
without an incredible lot of work, and it is consequently seldom done.

I am concerned about the "magic" involved if tag names are cryptic in a
foreign language. People who want to express their ideas in written
form will have to find SGML useful for their needs, and will have enough
effort poured into the written expression of their ideas even before
they would be forced to learn another bloody form of computer-imposed
magic. I'm perfectly happy with English comments in the DTD, English
parameter entity names, etc, because they are used by document designers
and programmers, who _also_ have better things to do than to learn
another language every time they turn around. Programmers are very
special users, and making any sort of concession to them (and I'm
talking about myself here, too), is likely to scare many users off.

As regards document interchange: If the document language is a foreign
to you, the tag names are the least of your problems. As far as SGML
goes, we do have explicit link, and I'm providing an explicit link type
declaration with the Norwegian General Document just for this purpose.
(I notice, with some regret, that attributes can't be "renamed" with
this feature, and will need special treatment.)

That "everybody" knows English is no reason to believe that everybody in
fact know English well enough to not feel intimidated by English tag
names. SGML even provides a means to rename keywords of the language.
A parser should be able to accomodate such trivial modifications for
national needs.

hehanninen@tnclus.tele.nokia.fi writes:
|
| Question: Is it necessary to include structural info in tag name
| (in prefix an in suffix) if you can see (and follow)
| the structure in DTD ?
|
| f.ex. %p.em.ph or %pharases

Ah, but those are not tag names, those are parameter entities! I find
this prefix and suffix thingy to be very useful for DTD designers, but
users will seldom see, much less know about, them. Again, programmers
and users should not be treated the same.

| Sun tan for everybody , Heimo

(How ironic that you should say that; I'm recovering from a bad case of
sun burn... :-)

Best regards,
\
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 15:10:01 UT
From: ""Wayne Wohler"" \
Message-ID: <9206121721.AA02395@mammoth.Berkeley.EDU>
Subject: Re: Scope of the CURRENT attribute

Reference: \

-> The (below) quoted paragraph however begins with

->
-> If "SHORTTAG YES" or "OMMITTAG YES" is specified
-> on the SGML declaration:
->
-> So what happens if they are not specified ?

If SHORTTAG and OMITTAG are both NO, then attributes may not be omitted
and therefore CURRENT becomes a mute point; there can be no defaulting.
This makes some sense since these features would not be very useful if
all attributes had to be specified. If both are NO, then the document
needs to have all markup fully specified, including all attribute
values.

-> Are all standards this difficult and obtuse ?
-> Even with all the cross referencing in the Handbook, it
-> still takes a long time to find all relevant information
-> about almost anything if you can even find it
-> (Current attributes are not cross referenced to this section
-> which is TOTALLY RELEVANT to the question).

I don't have much experience with international standards outside of the
SGML area. It is pretty clear the standards I have read are not
intended to be user's guides! I understand your frustration with the
standard, it does take some study to learn the terminology and to learn
to navigate the standard. The definitive author's guide and DTD
writer's guide hasn't been written yet.

In general, I've found the handbook's indexing to be quite helpful;
following the "current attribute" entry to page 328 was how I found the
standard citations I used. That particular index entry should also have
included page 329.

Wayne L. Wohler
IBM Corp
Publishing Systems
Boulder, Colorado

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 15:58:53 UT
From: ""Wayne Wohler"" \
Message-ID: <9206121721.AA02386@mammoth.Berkeley.EDU>
Subject: Re: Defaulting mechanism for CURRENT attributes

Reference: <92163.154050U35395@uicvm.uic.edu>

> > | Now, the attribute att of elem instance d defaults to the most
> > | recently specified value. Is this ZZZ (thus considering the
> > | left-to-right ordering of value specifications) or is this XXX (thus
> > | considering the top-down hierarchical ordering?
> >
> > I honestly think you would find ISO 8879 and SGML easier to deal with if
> > you didn't attempt to make it fit a mathematical model which wasn't
> > derived from it. ...
>
> For the record, I would like to say Erik is speaking for himself here,
> but not for me; I find Dr. Brueggemann-Klein's work on SGML, and her
> question here, extremely useful. I believe the intention of 8879 to be
> as described by earlier replies to the query (current value is that most
> recently specified in a depth-first left-to-right scan of the entire
> tree, not in a direct descent from the root). But the question is not
> nearly so uninteresting or obvious as Erik suggests.

I also found Dr. Brueggemann-Klein's question interesting and after
looking for a bit, did not find any clause in the standard to explicitly
state that "most recently specified" meant "most recently specified in
the linear SGML datastream" or some such. Erik gave some good reasons
to surmise that the standard intends the interpretation he gave. All
the systems I on which I have used current attributes take the
linear view of SGML data and current attribute definition. That's no
guarantee but it does mean a fair number of people have come to the same
conclusion.

> The standard says the value is the "most recently specified" value
> (clause 4.67, definition of "current attribute", which 11.3.4 says ATT
> in Dr. Brueggemann-Klein's example is). But what happened most recently
> depends in reality rather dramatically on what order you have been doing
> things in. The interpretations offered thus far assume you have been
> processing the text in a left-to-right, depth-first scan of the document
> tree. But the original question effectively asks whether that is
> guaranteed; what would happen if we processed the document in a
> breadth-first traversal?

I have never thought of an SGML parser processing a document tree ... it
provides the information that allows an application to build one. Like
most people, I have always thought of the parser processing a linear
sequence of characters. The best citation to support this is in 6.2
"Each SGML character is parsed in the order it occurs ...". There are
also several (if not many) occurances of the words "first", "start",
"order" which all support a linear, sequential view of SGML processing.
Two examples: from 9.6.3 "Delimiter strings ... are recognize IN THE
ORDER THEY OCCUR, with no overlap", from 9.6.4 "If multiple delimiter
strings START with the same character". Delimiter recognition is
particularly important since one cannot build a tree until the
delimiters are recognized and their meaning applied.

Another interesting case is USEMAP declarations in content (as opposed
to within the DTD). The implementations I have seen change the map at
the point the USEMAP declarations occur; it does not apply to data that
occurred before the declaration in the same element. Reading 11.6, I
don't see anything to support this interpretation. 11.6.3 hints at it
when it talks about the current map being superceded by a short
reference use declaration occurring in an instance of the element.

Wayne L. Wohler
International Business Machines Corporation
Publishing Solutions
Boulder, Colorado

The opinions expressed are my own and do not represent the opinions of
IBM.

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 16:01:59 UT
From: ""Wayne Wohler"" \
Message-ID: <9206121721.AA02398@mammoth.Berkeley.EDU>
Subject: Entity references in attribute values: a new conundrum

Reference: <92163.161754U35395@uicvm.uic.edu>

> Two related questions for the group, only slightly loaded.
>
> 1 is a conforming parser supposed to recognize entity references inside
> attribute values? (please cite chapter and verse)
Check Figure 3. The LIT and LITA delimiters are recognized in TAG mode
(which is the mode active as element tags are parsed). This sets LIT
mode. You'll find that ERO is recognized in the LIT recognition mode.
Attribute value literals may therefore contain entity references. That
means, by the way, that entity references must occur within literal
delimiters in attribute values like attribute='\&value' and not
attribute=\&value. You can also check the definition for attribute value
literal in 4.17 which states: "attribute value literal: A delimited
character string that is interpreted as an attribute value by REPLACING
REFERENCES and ..." (emphasis is mine).

> 2 do existing implementation in fact recognize entity references inside
> attribute values?
All the parsers I've tried do.

> To anyone who thinks they know the answer to the first question off
> hand: good, so did I, but then I found I was unable to back it up from
> the standard, and in fact at the passage I found the standard seems to
> be saying, quite clearly, the opposite of what I thought was the case,
> and think should be the case, and believe was intended by the
> drafters of 8879 to be the case. (If I am right, and the text says the
> opposite of what was intended, then we really ought to make sure it gets
> fixed in the revision!)
What passage was that?

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 16:53:55 UT
From: Joachim Schrod \
Organization: TU Darmstadt
Message-ID: <1992Jun12.165355.26433@infoserver.th-darmstadt.de>
References: <2294@exua.exeter.ac.uk>
Subject: Re: Sample SEMA DTDs available

In article <2294@exua.exeter.ac.uk>, MGPopham@exua.exeter.ac.uk (Mike Popham) writes:
> A new directory (write-it.dtds) has been added to the SGML archive
> held at The SGML Project (c/o University of Exeter, UK).

Exeter's connection is sometimes not the best. You can also fetch it
from ftp.th-darmstadt.de [130.83.55.75], directory pub/text/sgml/DTD.
Changes are updated automatically each Sunday night.

Enjoy.

--
Joachim

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Joachim Schrod Email: schrod@iti.informatik.th-darmstadt.de
Computer Science Department
Technical University of Darmstadt, Germany

``How do we persuade new users that spreading fonts across the page
like peanut butter across hot toast is not necessarily the route to
typographic excellence? -- Peter Flynn

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 17:21:40 UT
From: C. M. Sperberg-McQueen \
Organization: University of Illinois at Chicago
Message-ID: <92164.122141U35395@uicvm.uic.edu>
Subject: national language versions of DTDs -- and architectural forms

Those interested in national-language versions of SGML DTDs may be
interested in the following method, developed for the TEI, which
simplifies the task of generating national-language versions (or any
alternate-name versions) of a DTD. The credit for this goes to the
TEI's Metalanguage and Syntax committee, headed by David Barnard of
Queen's University, Ontario.

The method is simple.

1 take the canonical DTD and rewrite it, substituting a suitable
parameter entity reference for every occurrence of every generic
identifier. The parameter entity needs to have a predictable form,
which leads to no ambiguities or name clashes. E.g. prepend 'n.' to the
gi itself, so that

\
\
\

is rewritten

\
\
\

2 for each such parameter entity, supply an appropriate definition:

\
\
\

That's it.

To rename an element, the user need only supply an overriding parameter
entity reference in the DTD subset:

\
\
\
]>

It is also easy to provide, in advance, sets of translation equivalents
which the user can embed if desired:

\
%names.nor;
]>

And of course one can also rename gis if one prefers different names,
for whatever reason, as long as one avoids name conflicts.

This may impose a burden on a processor written to be aware of a
particular tag set; how does it know that your 'pkt' is what it knows as
'item'?

The (current) TEI solution to this is to borrow from the HyTime notion
of architectural forms: each TEI element has an attribute of the name
'TEI.form' which provides the 'canonical' form of the element's GI. (At
the moment, the canonical form is expected to be the English-language GI
used in the Guidelines. There is a small but vocal minority of users
who are plumping hard for numeric identifiers instead -- sort of like
MARC field numbers.)

So the TEI definition of ITEM might look like that above, with an
attribute list declaration something like this (simplifying slightly):

\

Since when you redefine the entity 'n.item' you leave the attribute list
untouched, the value of TEI.form is still 'item'. So a TEI-aware
processor can know how to process your \ elements, without much
fuss, simply by checking the value of TEI.form.

Attributes can also be renamed legally in TEI-conformant documents, but
there is no way (yet?) for a TEI-aware processor to know what the new
names mean. Suggestions welcome.

We have not taken over all the details of architectural forms, in part
because we are engaged in an ongoing discussion within the TEI as to
just what those details are. So some possible uses of TEI.form
attributes will not be specified in the next draft of the TEI
Guidelines. In particular, it is not clear what the rule should be if
the user modifies an element declaration. If I modify a list to
require a head (title), should I provide TEI.form as an attribute with
the value 'list'? May I? What if I require something else not present
in the original -- say, if I require each item to have a preceding
enumerator element?

One possible rule is: TEI.form gets its old original value only if the
only change made to the element or its attributes is a renaming. If the
content model or attribute list change, TEI.form should not be defined.
This rule effectively says that if I specify 'foo' as the value for
TEI.form on an element called BLORT, it means that means I am using the
exact content model and attlist of FOO as defined in the TEI Guidelines.
This will probably be the rule in the TEI Guidelines version 2.

A second rule is like the one I believe HyTime specifies (unless it
changed since Steve Newcomb's article in CACM): I can specify
TEI.form=foo for my element BLORT if, and only if, every instance of
BLORT could be parsed with the TEI content model for FOO, and if no
attributes (? or no required attributes?) are dropped from the attlist
declaration. This content model effectively says that defining
'TEI.form=foo' for BLORT means that all instances of BLORT would be
acceptable as \s under the TEI declarations, hence that BLORT is a
sub-class of the TEI \ class. (Technical note: the restriction is
on instances of BLORT, not on the content model of BLORT. I presume
this is to avoid burdening the user with attempting to prove that one
regular expression defines a sublanguage of another regular expression,
which I understand to be hard, if not intractable.)

A third rule is that one can specify TEI.form=foo whenever one likes,
without regard for the changes made. If I specify 'TEI.form=foo' for my
BLORT element, then, I am saying only "to understand what a BLORT is all
about, read the (prose) description of \ in the TEI documentation".
If an instance of BLORT turns out not to be parsable as a FOO, well,
that just shows that although all BLORTs are FOOs semantically, they
may not be syntactically a subset of FOOs. A processor is supposed to
make a best-guess attempt to process a BLORT using its routines for FOO;
this may mean ignoring any material it doesn't know what to do with. If
the processor cannot manage this, then it has the legal right to issue a
warning or error message saying 'I cannot process this BLORT as a FOO;
entering default processing mode' and abend or switch to a very simple
processing routine.

Some TEI participants who read this group may wish to refine my
description of these possible rules; others are welcome to offer their
opinions on which of these rules makes most sense to them. If anyone
can tell us clearly what restrictions HyTime places on uses of
architectural forms, I'd be grateful for that too.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 19:23:08 UT
From: Eric Freese \
Organization: Mead Data Central, Dayton OH
Message-ID: <1992Jun12.192308.10569@meaddata.com>
Subject: #CONREF

The international standard is a little vague on the use of #CONREF and the
effects of its use. I know that when the attribute which is declared to have
a #CONREF value is explicitly stated, the element with the ID referenced
becomes the content of the referencing element. Does this include the start
and end tags of the referenced element or just the content? For example:

DTD:

\
\

Instance:

\test data\
\\

When the above example is resolved, is it equivalent to:

\test data\
\\test data\\

or

\test data\
\test data\

I defining a FOSI for a DTD which has some #CONREF elements and can't determine
what the contexts of the referenced elements would be.

Thanks in advance for any help.

--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| Eric Freese | My thoughts are my own | (513) 865-6800 x5311 |
| Mead Data Central | and are not necessarily | Lead Software Engineer |
| P.O. Box 933 | shared by Mead Data | Source Packaging Systems |
! Dayton, Ohio 45401 | Central, Inc. | ericf@meaddata.com |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
! There are no perfect men in this world; just perfect intentions. |
| - Morgan Freeman as "Azeem" in "Robin Hood: Prince of Thieves" |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 21:34:12 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23205E@erik.naggum.no>
References: \ <9206121721.AA02395@mammoth.Berkeley.EDU>
Subject: Re: Scope of the CURRENT attribute

Brad Might \ writes:
|
| Are all standards this difficult and obtuse ?
| Even with all the cross referencing in the Handbook, it still takes
| a long time to find all relevant information about almost anything
| if you can even find it (Current attributes are not cross referenced
| to this section which is TOTALLY RELEVANT to the question).

Wayne Wohler \ writes:
|
| I don't have much experience with international standards outside of
| the SGML area. It is pretty clear the standards I have read are not
| intended to be user's guides! I understand your frustration with
| the standard, it does take some study to learn the terminology and
| to learn to navigate the standard. The definitive author's guide
| and DTD writer's guide hasn't been written yet.

I have worked with standards of various breeds for about 5 years, and
SGML is better than most at several points. It's worse than others at
only one point, and that's the changes to the syntax of the language
according as various features are turned on or off. This is precisely
the point which produces Brad's frustration.

SGML is more consistent than all the other standards of a comparable
level of complexity that I have worked with, but the flip side of this
is that it takes an inordinately high level of concentration and
attention to detail to make use of this consistency. Learning the
terminology where one word carries so much information is costly, but
once you do, you also have much better odds to find what you're looking
for, because you know what it's called.

Personally, I think Goldfarb's Handbook is nothing less than essential
to understand SGML, but the index doesn't maintain the high quality of
the body of the book. I have several times found items where an index
entry should have pointed, but didn't, and this is frustrating.

I do my best to use SGML's terminology when I write technical comments,
so that others can look things up, and also perhaps make connections
between SGML terms. However, it often takes the most time to translate
a question into this terminology.

On the other hand, coming to SGML with several preconceived notions
about what SGML terms mean is asking for trouble. Take "ambiguity" as
an example, where those who know this term well from language theory
will get confused by the SGML usage of the term unless they're willing
to listen to SGML's definitions and dispense with the previous
definition. Ideally, this shouldn't be difficult at all -- we're used
to call similar things by various conflicting names and notations in all
areas of human knowledge. It must be our great task, then, to reach
behind the names and notations and grok the real meaning, and turn
around and express it again with suitable names and notations.

Best regards,
\

--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 12 Jun 1992 23:14:09 UT
From: C. M. Sperberg-McQueen \
Organization: University of Illinois at Chicago
Message-ID: <92164.181410U35395@uicvm.uic.edu>
Subject: tree-based inheritance of attribute values

Anne Brueggemann-Klein's posting about #CURRENT attributes raises a
question I didn't address in my earlier posting. If #CURRENT provides
inheritance from the attribute value specified most recently in a
left-right depth-first scan of the tree, and not inheritance from a
parent or ancestor element, one may ask: is there any way to obtain the
second sort of behavior with SGML? One of SGML's great strengths is its
representation of documents as trees; it would be nice if one could use
that tree and its structure in specifying attribute defaults, etc.

The answer, as far as I know, is no. But readers of this group may be
interested in a convention the TEI has adopted to address this problem.

The TEI provides a global attribute LANG, the value for which indicates
the natural language of an element's contents. If a chapter is
in English, by and large the paragraphs of that chapter will also be
in English; if a chapter is in German, its paragraphs are likely also to
be in German. We want to exploit this fact in specifying the default
value of LANG. If an element specifies a value for LANG, that value
is accepted. If no value is specified, the LANG value of the parent
element is to be inherited. (On the document element, a LANG value is
logically required.)

In version 1 of the TEI Guidelines, we simply defined LANG with a
declared value of CDATA and a default of #IMPLIED. Many uses of
#IMPLIED really mean the attribute value is optional; this one really
does mean it is implied: the processor can infer it from the rest of
the document.

On the other hand, it would be nice if the specification of the default
value were able to indicate the specific algorithm used to calculate a
value for the attribute. So in version 2 of the Guidelines, we expect
to stress the specific algorithm involve here by specifying the global
LANG attribute with an attribute definition like

lang CDATA 'INHERITED'

The special (magic?) value 'INHERITED' is to be interpreted by
TEI-aware processors as a signal to take the LANG value from the parent
element. (We considered using '#INHERITED', to make a closer analogy to
#CURRENT and so on, but decided against using the SGML reserved name
indicator, in order not to have to try to explain to SGML novices why
#CURRENT can be used as is in the DTD, while '#INHERITED' requires
quotation marks. It seemed simpler to use a star as a sort of
TEI-magic-word marker.)

It would be nice to have similar facilities in SGML, though one might
experience some difficulty devising a nice notation to distinguish

- attributes which take their default from a parent or ancestor
of the same GI (or of any GI used in the ATTLIST declaration)
- attributes which take their default from any parent or ancestor
with an attribute of this name (such as the TEI global LANG
attribute).

Of course, one has the same problem with #CURRENT, or would have if
one wanted to specify inheritance from any element which happened
to have an attribute of this name. If ATTLIST declarations could be
repeated with additive effect, of course, this problem would go away.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 00:20:53 UT
From: C. M. Sperberg-McQueen \
Organization: University of Illinois at Chicago
Message-ID: <92164.192053U35395@uicvm.uic.edu>
Subject: entity references in attribute values

Several readers of this group have asked what I found in 8879 about
entity references in attribute values that was so mysterious, or hard to
understand, or confusing. The answer: clause 11.3.3, which says that
if the attribute is defined with declared value of CDATA, then the
attribute value is 'character data' -- which means no markup is
recognized in it, which means no entity references are recognized in it.

Now, I knew that every implementation I had ever seen or heard of did in
fact recognize entity references, including Goldfarb's ARCSGML, so I was
puzzled and thought perhaps this was just a leftover from the days
before character data had been subdivided into parsed character data,
and replaceable character data, and the various other flavors of
character data -- i.e. when they made the definition of 'character data'
more restrictive they forgot to update 11.3.3, so it says the wrong
thing. But I was a bit worried as well: I could not find any way to
avoid the conclusion that what looked like a simple error in editing the
standard had led to a really problematic rule. (I did not see, at the
time, clause 4.17, the definition of 'attribute value literal'.)

I 'knew' that entity references had to be legal, but couldn't prove it.
And we have just had several demonstrations here that sometimes we
'know' things about 8879 that turn out to be false -- like knowing that
the SGML declaration is always in ISO 646 IRV. So I admit it, I was
very worried.

Before you laugh at me, answer me honestly: how many of you know now,
before I tell you and before you look at the text again, why 11.3.3 does
not mean that SGML parsers should not recognize entity references in
attribute value specifications? If there are any of you, then my
hat's off to you.

To make a long story short, Wayne Wohler appears to be right, and entity
references are legal in attribute value specifications (but only with
quotation marks around them). The story of how I persuaded myself of
this fact after posting my query to the net can be read in the appended
exercise in SGML exegesis. I leave to the reader to decide whether any
technical specification, even an ISO standard, should require this kind
of analysis.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago

------
Sic et Non: on Entity References in Attribute Values

(with apologies to Thomas Aquinas)

Question: whether entity references are recognized in attribute values?

I. It would seem the answer is NO, because

1 Either the attribute has a declared value of CDATA, or it has a
declared value of ENTITY, ENTITIES, ID, IDREF, IDREFS, NAME, NAMES,
NMTOKEN, NMTOKENS, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, NOTATION, or a
name group.

2 If the attribute's declared value is not CDATA, it would appear entity
references cannot be recognized, because:

21 The tokens in these types of values are defined as having the lexical
type NAME.

22 Clause 9.3, production 55 defines NAME as a series of name start
characters and name characters, and does not define them as containing
entity references.

23 Therefore, 2 is correct: entity references cannot be recognized in
values of attributes with declared value other than CDATA.

3 If the attribute's declared value is CDATA, it would appear entity
references cannot be recognized, because:

31 Clause 11.3.3 says that if the declared value of an attribute is
CDATA, "the attribute value is character data".

32 Entity references appear not to be recognized in character data.

321 Markup is not recognized in character data.

3211 Clause 9.2 defines character data as a sequence of data characters.
It opposes it, in this way, to 'replaceable character data', defined in
9.1 as containing data characters, character references, general entity
references, and entity end signals.

3212 Goldfarb, in his commentary on 9.2 (p. 344 of Handbook) says "no
markup will be recognized in character data other than the delimiters
that would terminate the character data."

3213 Clause 4.33 defines character data as "Zero or more characters that
occur in a context in which no markup is recognized, other than the
delimiters that end the character data. Such characters are classified
as data characters because they were declared to be so."

3214 Therefore, 321 is correct: Markup is not recognized in character
data.

322 Entity references are markup.

3221 The note to clause 4.183 identifies "references" as being one kind
of markup.

3222 Clause 4.144 defines a general entity reference as a named entity
reference to a general entity.

3223 Clause 4.205 defines a named entity reference as an entity
reference.

3224 Clause 4.124 defines an entity reference as a reference.

3225 Clause 4.256 defines a reference as 'Markup that is replaced by
other text ...'

3226 Therefore, 322 is correct: entity references are markup.

323 Therefore, 32 is correct: entity references are not recognized in
character data.

33 Therefore, 3 is correct: entity references cannot be recognized in
values of attributes with declared value of CDATA.

4 No matter what the declared value of the attribute, entity references
cannot be recognized within its value, because:

41 Clause 7.9.3 (note after production 34) says "Interpretation of
an attribute value literal occurs as though the attribute were character
data, regardless of its actual declared value."

42 Entity references are not recognized within character data (see
statement 32 above).

II. It would seem the answer is YES, because

1 Clause 7.9.3 defines 'attribute value specification' as either
'attribute value' or 'attribute value literal'.

2 If the attribute value is specified as an attribute value literal,
entity references are recognized, because:

21 Clause 7.9.3 says "An attribute value literal is interpreted as
an attribute value by replacing references within it, ignoring Ee and
RS, and replacing an RE or SEPCHAR with a SPACE."

22 Clause 4.17 defines 'attribute value literal' as "A delimited
character string that is interpreted as an attribute value by replacing
references and ignoring or translating function characters."

III About the arguments against the proposition, we may say:

1 All statements made are correct, but apply only to the 'attribute
value' supplied or derived for the attribute.

2 Clause 7.9.3 specifies that attribute values may be specified either
directly, or as attribute value literals. A value specified as an
attribute value literal is processed by the parser into a determinate
attribute value.

3 The specification that attribute values are treated as character data
(rather than replaceable character data) therefore applies only to the
end product of the processing specified in 7.9.3, not to the attribute
value literal possibly provided in the document instance.

IV Conclusions

1 The current formulations of 8879 do not in fact require entity
references to be unrecognized in attribute value specifications.

2 They do however require a Talmudic or Jesuitical process to unravel,
in order to establish that fact.

3 The revision of ISO 8879 should eliminate the misleading use of the
term 'attribute value', either by reformulating all the sections on
declared value specifications and attribute value specifications, or
by introducing a suitably unambiguous term such as 'internal' or
'processed attribute value'.

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 00:41:48 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23206A@erik.naggum.no>
References: \ <23202A@erik.naggum.no> <92163.154050U35395@uicvm.uic.edu>
Subject: Re: Defaulting mechanism for CURRENT attributes

C. M. Sperberg-McQueen \ quotes me:
|
| > I honestly think you would find ISO 8879 and SGML easier to deal with if
| > you didn't attempt to make it fit a mathematical model which wasn't
| > derived from it. ...

I feared something like this would happen. Therefore, I did in fact
write the following _second_ sentence of the above quoted paragraph:

|| On the other hand, if you could formulate a mathematical model
|| consistent with the standard, it would probably be much appreciated
|| by the entire SGML community.

The key here is "a mathematical model WHICH WASN'T DERIVED FROM [SGML]",
and the attendant emphasis on what I _would_ find more valuable than
attempting to shoe-horn SGML into something it was never meant to be
even associated with, namely a formal model for what SGML does in fact
say. This discussion talks about the tree which SGML _represents_, not
about SGML's representation of that tree. To me this difference is
truly obvious.

C. M. Sperberg-McQueen \ writes:
|
| For the record, I would like to say Erik is speaking for himself
| here, but not for me; I find Dr. Brueggemann-Klein's work on SGML,
| and her question here, extremely useful. I believe the intention of
| 8879 to be as described by earlier replies to the query (current
| value is that most recently specified in a depth-first left-to-right
| scan of the entire tree, not in a direct descent from the root).
| But the question is not nearly so uninteresting or obvious as Erik
| suggests.

I think we're losing track of the context in which SGML was defined, and
maybe even losing track of what ISO 8879 does define: a language to
represent a document structure, whatever it might be. Again, the
difference between object represented and its representation is
important. Just like a noun is not the thing itself, like a C source
program is not the executing process, like a sequence of printed glyphs
is not the meaning of the sentence they form if read by a human, a
character stream is not the document structure it represents according
to ISO 8879. Why is this a problem?

| The standard says the value is the "most recently specified" value
| (clause 4.67, definition of "current attribute", which 11.3.4 says
| ATT in Dr. Brueggemann-Klein's example is). But what happened most
| recently depends in reality rather dramatically on what order you
| have been doing things in. The interpretations offered thus far
| assume you have been processing the text in a left-to-right,
| depth-first scan of the document tree. But the original question
| effectively asks whether that is guaranteed; what would happen if we
| processed the document in a breadth-first traversal?

But we're not processing a document tree, we're processing a character
stream, which, under a suitable interpretation, can give rise to an
understanding of a document tree in the mind of the reader, or to
another representation of a document tree in a parser or application,
which just formalizes that understanding. What you want to do with that
document tree afterwards does not, and can not, concern SGML. SGML is a
language which can be used to represent such a tree in a character
stream, and certain features are useful in that context only. The
current attribute is among them, as is markup minimization, short
references, indeed all entity references, even the very concept of
markup recognition. There is no _markup_ in the interpreted character
stream, because the document structure (ESIS) have other means to
express what the markup expressed in its linear, sequential form.

| If there is any explicit specification in 8879 that an SGML parser
| must process a document through a left-to-right depth-first
| traversal of the document tree, I would very much like to know where
| it is.

The standard views an SGML document as a sequence of characters, which
are interpreted as markup or data according to the rules of the
standard. See clause 6 for the details. You claim that there is room
for a different interpretation, which I frankly think is pure fantasy.
In particular, the text in sub-clause 6.2 SGML Entities (Goldfarb
[297:1]) could not be clearer on this issue.

| I don't think it exists: the standard doesn't even specify
| explicitly that the input has to be electronic characters (though it
| is hard to understand how to declare or recognize delimiters if it's
| anything else), and does explicitly say "This International Standard
| does not constrain the physical organization of the document ..."
| (clause 6.1 note 1).

You have to rip that sentence pretty far out of its context if you think
it refers to the character-streamness of the entities _as_ _SGML_ _sees_
_them_. The physical organization of the document is indeed irrelevant
to SGML, as long as the parser will see the contents of each entity,
character by character until it is exhausted (at which point the Entity
end occurs), all clearly sequential. See production [4] SGML text
entity for the most eloquent example of this.

| If one wrote an SGML parser that did not process the text left to
| right--and let us recall that there are parsing algorithms for
| other orders, including Unger's and the Cocke/Younger/Kasami (CYK)
| method--would one have the right to specify a different value for a
| CURRENT attribute from the value given by a left-to-right
| depth-first scan?

We don't _have_ a "left-to-right depth-first scan", dammit! We have a
character stream which, if _interpreted_ _by_ _a_ _parser_ (according to
ISO 8879) can be _regarded_ _as_ a left-to-right depth-first view of a
tree, but only _after_ the markup is recognized, all attributes are
specified, all elements have their start and end established, etc.

I think, honestly, that if SGML had been intended to be parsed in any
way but the obvious sequential, the standard would have said so. One of
the main things I learned when I first started to read the peculiar
prose that standards make, is that the standard says what the standard
says, nothing more and nothing less, that any interpretation must be
confined to the context in which the standard _does_ say something, that
any inference from what the standard does _not_ say is, ipso facto,
invalid, and that it's always the reader's responsibility to understand
the context in which the specifications and requirements of the standard
have meaning. As a corollary to the last: that context can without harm
be treated as always differing between any two standards. The context
can, however, often be established by reference to related standards.
This process is naturally recursive, but does converge. My point is
that this is _identical_ to any other field of human endeavor, and we
should hone out abilities to enter a new terminology-universe without
too much baggage. As a result, we have to be even more careful about
using prior knowledge from other contexts in new standards, because we
have to check that the knowledge applies in the new contexts at every
single point. Yes, this makes reading standards incredibly hard work.
It was incredibly hard writing it, too. (Sorry, I just had to.)

| In practice, I don't see how one can handle SGML in any way other than
| left-to-right scan of the data stream. But then, there's a lot I don't
| know. And the standard does say (Annex F) that 8879 does not require
| any particular implementation techniques or architecture.

Of course it doesn't require any such thing, and you're free to parse an
SGML document's characters in any random order, as long as the result is
according to ISO 8879, which does specify that things happen in a
certain order. See 5.2 Ordering and Selection Symbols for the crucial
evidence about the syntax of the productions.

I regard your "but then, there's a lot I don't know", as an instance of
the argumentum ad ignorantiam fallacy in rhetoric, and challenge you to
show me what you _do_ _know_ which allows you to interpret the standard
contrary to what I say about this particular issue.

| So perhaps we should have a little more patience with questions that
| seem to have obvious answers.

I fully agree that we should not dismiss such questions immediately or
without consideration, and looking behind the obvious can oftentimes
result in very valuable discoveries. Fundamental parts of philosophy,
such as epistemology, indeed consist of taking a careful look at what
people would regard as obvious. This doesn't mean that questioning the
obvious will always result in valuable discoveries, and most often, the
obvious is the obvious, and nothing more. I prefer to cut through the
noise if I can establish beyond doubt that that is what it is.

I really wonder why this has become a topic of discussion. Brueggemann-
Klein's original question was relevant, but we've far digressed beyond
the point of relevance if we start discussing alternative ways to parse
formal languages when we're dealing with a markup language for text
documents, or question basic truisms.

If I sound unduly pissed, it's probably because my sun burn is itchy.

Best regards,
\
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 01:05:29 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23206B@erik.naggum.no>
References: <92163.161754U35395@uicvm.uic.edu>
Subject: Re: Entity references in attribute values: a new conundrum

C. M. Sperberg-McQueen \ writes:
|
| 1 is a conforming parser supposed to recognize entity references inside
| attribute values? (please cite chapter and verse)

I assume you really mean "attribute value specification", and under that
assumption the answer is clearly "Yes".

Let's walk down the syntax hierarchy (page reference to Goldfarb's
Handbook in brackets):

Clause 7.4, production 14, start-tag, references attribute specification
list. [314]

Clause 7.9, production 31, attribute specification list, references
attribute specification. [327]

Clause 7.9, production 32, attribute specification, references attribute
value specification. [328]

Clause 7.9.3, production 33, attribute value specification, references
attribute value literal. [331]

Clause 7.9.3, production 34, attribute value literal, references
replaceable character data. [331]

Clause 9.1, production 46, replaceable character data, references
general entity reference. [343]

If you really meant "attribute value", the answer is equally clearly
"No".

See also Goldfarb, 7.9.3 Attribute Value Specification [330], in
particular the first paragraph on page 331. It's interesting to note
that Goldfarb says that the distinction between attribute value and
attribute value literal "is a very important and sometimes misunderstood
distinction."

| To anyone who thinks they know the answer to the first question off
| hand: good, so did I, but then I found I was unable to back it up from
| the standard, and in fact at the passage I found the standard seems to
| be saying, quite clearly, the opposite of what I thought was the case,
| and think should be the case, and believe was intended by the
| drafters of 8879 to be the case. (If I am right, and the text says the
| opposite of what was intended, then we really ought to make sure it gets
| fixed in the revision!)

I don't know what you're talking about. It took me about 65 ms to
realize that the answer to your question would be in clause 7.9.3, but I
have no idea whatsoever what passage you're talking about. It couldn't
be 7.9.4, could it?

| I would be very happy to see postings on this question from some of
| the implementers who read this list.

I hope you're happy, then. :-)

Best regards,
\
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 01:25:51 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23206C@erik.naggum.no>
References: <92164.122141U35395@uicvm.uic.edu>
Subject: Re: national language versions of DTDs -- and architectural forms

C. M. Sperberg-McQueen \ writes:
|
| Those interested in national-language versions of SGML DTDs may be
| interested in the following method, developed for the TEI, which
| simplifies the task of generating national-language versions (or any
| alternate-name versions) of a DTD. The credit for this goes to the
| TEI's Metalanguage and Syntax committee, headed by David Barnard of
| Queen's University, Ontario.
|
| The method is simple.

Indeed, and I like it. Thank you for sharing this with us.

| So the TEI definition of ITEM might look like that above, with an
| attribute list declaration something like this (simplifying slightly):
|
| \
|
| Since when you redefine the entity 'n.item' you leave the attribute list
| untouched, the value of TEI.form is still 'item'. So a TEI-aware
| processor can know how to process your \ elements, without much
| fuss, simply by checking the value of TEI.form.
|
| Attributes can also be renamed legally in TEI-conformant documents, but
| there is no way (yet?) for a TEI-aware processor to know what the new
| names mean. Suggestions welcome.

How about a fixed attribute which contains a list of pairs of names, the
first being the canonical attribute name or attribute value, and the
second the name or value specified in the attribute definition list?
E.g.,

\

I note, with some regret, that it's not possible to parameterize the
attribute value completely with a parameter entity, since the attribute
specification will only interpret general entity references, whereas the
markup declaration will interpret parameter entity references.
Something like this will be needed:

\
\

\

This may be unduly messy. Occam's razor may apply: Entities should not
be multiplied.

Best regards,
\
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 02:06:14 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23206D@erik.naggum.no>
References: <92164.192053U35395@uicvm.uic.edu>
Subject: Re: entity references in attribute values

C. M. Sperberg-McQueen \ writes:
|
| Several readers of this group have asked what I found in 8879 about
| entity references in attribute values that was so mysterious, or
| hard to understand, or confusing. The answer: clause 11.3.3, which
| says that if the attribute is defined with declared value of CDATA,
| then the attribute value is 'character data' -- which means no
| markup is recognized in it, which means no entity references are
| recognized in it.

Hmmm. This is an example of why I really love Goldfarb's Handbook. At
[423:18], we get a pointer to "attribute value" at [333:1], so we go
look that up, and also look it up in the index, and find "used in
production, 331:2", then read all of 7.9.3, and voila! there is no
problem. Since I already knew that, I may be somewhat biased, though.

| Before you laugh at me, answer me honestly: how many of you know now,
| before I tell you and before you look at the text again, why 11.3.3 does
| not mean that SGML parsers should not recognize entity references in
| attribute value specifications? If there are any of you, then my
| hat's off to you.

I take pride in the fact that I knew the answer. The reason is that I
spent a fair amount of time trying to figure out the note starting at
[331:14] some time ago. I think this note should be removed.

| To make a long story short, Wayne Wohler appears to be right, and entity
| references are legal in attribute value specifications (but only with
| quotation marks around them). The story of how I persuaded myself of
| this fact after posting my query to the net can be read in the appended
| exercise in SGML exegesis. I leave to the reader to decide whether any
| technical specification, even an ISO standard, should require this kind
| of analysis.

But your analysis is much too convoluted. It rests on an equivocation
over what "attribute value" means, where you have interpreted it to be
whatever is at the right of the value indicator (vi) delimiter,
quotation marks or not. This is understandable, and the difference
between the attribute value and attribute value literal is subtle.
However, even if you didn't realize this, you should have triggered on
Goldfarb's comments at the end of his annotations in 7.9.4: "... is the
result of interpreting the attribute value literal to produce the
attribute value."

| ------
| Sic et Non: on Entity References in Attribute Values
|
| (with apologies to Thomas Aquinas)
|
| Question: whether entity references are recognized in attribute values?
:
| 41 Clause 7.9.3 (note after production 34) says "Interpretation of
| an attribute value literal occurs as though the attribute were character
| data, regardless of its actual declared value."

Ah, here it is! The way I interpret this, after much struggling, is
that there is a uniform interpretation of attribute value literals
independent of attribute type, and that the only attribute type for
which there is no additional interpretation is the character data
attribute type. So, to emphasize that the attribute value literal
interpretation occurs in two stages, this note was included to pave the
way for a later interpretation if and only if the attribute value is not
character data. Yes, I did find this very hard to figure out.

| IV Conclusions
|
| 1 The current formulations of 8879 do not in fact require entity
| references to be unrecognized in attribute value specifications.

Careful with that double negation. You're saying a tiny little bit more
than the standard says...

| 2 They do however require a Talmudic or Jesuitical process to unravel,
| in order to establish that fact.

I'm sympathetic to your problems (because I've spent an incredible
amount of time doing the same thing), but once I stopped thinking that
"it had to be there", and instead go looking for what it _did_ say,
things got much, much simpler.

| 3 The revision of ISO 8879 should eliminate the misleading use of the
| term 'attribute value', either by reformulating all the sections on
| declared value specifications and attribute value specifications, or
| by introducing a suitably unambiguous term such as 'internal' or
| 'processed attribute value'.

I think we should perhaps add a little note right under the heading for
7.9.4, saying something like:

NOTE -- An attribute value may be specified in an attribute
value specification as an attribute value literal.

Do you think this would help?

Best regards,
\
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 02:44:46 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23206E@erik.naggum.no>
References: <1992Jun12.192308.10569@meaddata.com>
Subject: Re: #CONREF

Eric Freese \ writes:
|
| The international standard is a little vague on the use of #CONREF
| and the effects of its use.

Not really vague, just a little brief. I hope you have Goldfarb's The
SGML Handbook [1], as it goes into more detail on this issue. (See the
overview section 4.4.3.3, on page 159, in particular.)

| I know that when the attribute which is declared to have a #CONREF
| value is explicitly stated, the element with the ID referenced
| becomes the content of the referencing element.

CONREF doesn't do this, but I see where you were misled. CONREF means
that the attribute value somehow refers to something which the
application could use to generate content, or put as simply as "the
element either has content or attribute", and it isn't restricted to
IDREF attributes or any real "references" at all. You can have any kind
of attribute have a CONREF default value, and the effect is simply to
make the element empty if the attribute is present (specified).
(Standardese note: I didn't say anything about what happens if the
attribute is not specified, this is taken care of elsewhere.)

CONREF can be used to make external references, where the ID/IDREF
mechanism cannot conveniently reach. E.g. a figure reference that
either contains a descriptive reference, or relies on the application to
provide such reference automatically based on the ID of the figure.

It could also be used if you want to provide an option to use an entity
to hold the contents of an element, and don't want to use an entity
reference in the content:

\
\
\
\
\
:
\
inline content
\
:
\

| Thanks in advance for any help.

Hope this helps, even though I didn't address your question directly.

Best regards,
\

-------
References:
[1] Charles F Goldfarb: The SGML Handbook. Oxford University Press,
1991. ISBN 0-19-853737-9.

--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 13 Jun 1992 03:06:19 UT
From: Erik Naggum \
Reply-To: enag@ifi.uio.no
Message-ID: <23206F@erik.naggum.no>
References: <1992Jun11.143405.27527@mks.com>
Subject: Re: Changing parts of a concrete syntax

David J. Fiander \ writes:
|
| I'm using sgmls 0.7, and have found the 8-character limit on
| names rather restrictive. The sgmls man page says that "[t]he
| upper limit on NAMELEN is 239."
|
| So how do I change it? Do I have to provide an entire SGML
| declaration just to change one quantity?

Yes. In fact, you have to specify the entire syntax. If you think
that's a little excessive, I agree with you, and I hope we can get
something like a "declaration subset" in the next revision for both the
SGML declaration and the syntax declaration.

Note that sgmls 0.8 was released May 11.

Say you want to use the reference concrete syntax (revised to refer to
ISO 646:1991 properly), you need to provide something like SGML
declaration:

\

Whew!

Best regards,
\
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis.

Newsgroups: comp.text.sgml
Date: 14 Jun 1992 05:01:22 UT
From: Robin Cover \
Organization: UT Arlington
Message-ID: \
Subject: SGML and Online Journal of Current Clinical Trials

I ran across an interesting article on the use of SGML in the
electronic \Online Journal of Current Clinical Trials (CCT)</>,
developed by OCLC. The network address for the article is given
at the end of this poster for anyone who wishes to see the full text.
Two excerpts and one reference follow here.

\<quote>
1.0 Introduction

The Online Journal of Current Clinical Trials (CCT) is a peer-
reviewed, interactive electronic journal. The primary form of
publication is electronic--no paper version of the journal is
planned. In addition to the full text of articles, CCT includes
tables, equations, and graphics. . .
\</quote>

\<quote>
6.0 Database Construction

Articles are peer reviewed using a bulletin board system at AAAS,
to which all the editors and reviewers have dial-up access. One
of the goals of AAAS is to reduce the time taken to publish
articles as much as possible without sacrificing the rigor of the
peer-review process.

[. . . After an article is accepted, AAAS
sends to OCLC (via the bulletin board system) an SGML version of
the article and the original graphics (if they are not machine
readable, they may have to be physically mailed). OCLC then
completes the SGML markup--in particular, OCLC completes the
tagging of tables and equations as well as a number of other
details. Currently, this tagging is done manually.
After the SGML tagging of the article is completed and
validated, the figures are scanned and the article is typeset.
We are using TeX for this, so the SGML file is run through a
program to convert it into TeX and format it. The resulting
output is reviewed. After the output looks acceptable, it is
faxed to both AAAS and the author for review, any needed changes
are incorporated, and the database is built.

Although we realize that this is ambitious, our goal is to
have articles available within 24 hours of their acceptance. To
accomplish this, we need to be able to finish the SGML coding and
formatting within six hours, and to have the formatting reviewed
by AAAS and the author within two hours. The article will then
be loaded into the database overnight. Even if this schedule is
not met, we will have the information available to users within
days of acceptance rather than the weeks or months that paper
journals require.
\</quote>

\<references>
References and Notes
. . .

2. Thomas B. Hickey, "Using SGML and TeX for an Interactive
Chemical Encyclopedia," in Proceedings of the 1989 National
Online Meeting (Medford, NJ: Learned Information, 1989), 187-195.
\</references>

\<biblio.for.current.article>
Hickey, Thomas B., and Terry Noreault. "The Development of a
Graphical User Interface for The Online Journal of Current
Clinical Trials." The Public-Access Computer Systems Review 3,
no. 2 (1992): 4-12. (To retrieve this article, send an e-mail
message that says "GET HICKEY PRV3N2 F=MAIL" to LISTSERV@UHUPVM1
or LISTSERV@UHUPVM1.UH.EDU.)
</>

Submitted by Robin Cover

-------------------------------------------------------------------------
Robin Cover BITNET: zrcc1001@smuvm1 ("one-zero-zero-one")
6634 Sarah Drive Internet: robin@utafll.uta.edu ("uta-ef-el-el")
Dallas, TX 75236 USA Internet: zrcc1001@vm.cis.smu.edu
Tel: (1 214) 296-1783 Internet: robin@ling.uta.edu
FAX: (1 214) 709-3387 Internet: robin@txsil.sil.org
=========================================================================
</message>
<message id="<92167.112039U35395@uicvm.uic.edu>" date="2917614039">
Newsgroups: comp.text.sgml
Date: 15 Jun 1992 16:20:39 UT
From: C. M. Sperberg-McQueen \<U35395@uicvm.uic.edu>
Organization: University of Illinois at Chicago
Message-ID: <92167.112039U35395@uicvm.uic.edu>
References: \<brueggem.708031762@fidji> <23202A@erik.naggum.no> <92163.154050U35395@uicvm.uic.edu> <23206A@erik.naggum.no>
Subject: Re: Defaulting mechanism for CURRENT attributes

Last week I wrote:

> If there is any explicit specification in 8879 that an SGML parser
> must process a document through a left-to-right depth-first traversal
> of the document tree, I would very much like to know where it is. I
> don't think it exists: ...

Erik Naggum's long posting does point to the specification I thought
did not exist (among other things which however do not constitute such a
specification). He did not quote it, however. For the benefit of
those without copies of 8879 or Goldfarb to hand, it is clause 6.2
"SGML Entities", second paragraph after production 4:

Each SGML character is parsed in the order it occurs, in the
following manner: ...

This explicit requirement that characters be parsed in their order of
occurrence (i.e. in what I've been calling 'left-right order') provides
the restriction which makes the definition of current attributes
unambiguous. (Given SGML's grammar, parsing characters in this order
means the document will be parsed in the same order as a left-right
depth-first scan of the document tree; this gives unambiguous sense to
the phrase 'most recently specified' used in the definition of 'current
attribute'.) It also requires SGML parsers to use left-right scanning
algorithms, rather than the right-left or non-directional algorithms one
sometimes reads about. Thanks to Erik for pointing out this passage.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
</message>
<message id="<1992Jun15.172209.22762@ens.fr>" date="2917617729">
Newsgroups: comp.text.sgml
Date: 15 Jun 1992 17:22:09 UT
From: Denis Excoffier \<scof@dmi.ens.fr>
Organization: Ecole Normale Superieure, PARIS, France
Message-ID: <1992Jun15.172209.22762@ens.fr>
Keywords: #CURRENT
Summary: a final answer ?
Subject: #CURRENT attributes

I think I missed some essential postings in the discussion
about #CURRENT attributes.

Could someone please give me the final answer to the following question :

-----------------------------------
Is this document a correct SGML document ?
\<!DOCTYPE doc [
\<!ELEMENT doc - - (foo|bar)>
\<!ELEMENT (foo|bar) - O EMPTY>
\<!ATTLIST (foo|bar)
zot CDATA #CURRENT
>
]>
\<doc>
\<foo zot="yuyu">
\<bar>
\</doc>
-----------------------------------

Thanks,

Scof.
</message>
<message id="<2344@irit.irit.fr>" date="2917617758">
Newsgroups: comp.text.sgml
Date: 15 Jun 1992 17:22:38 UT
From: jean-luc GUIMPIER \<guimpier@irit.irit.fr>
Organization: IRIT-UPS, Toulouse, France
Message-ID: <2344@irit.irit.fr>
Subject: re: re:ODA vs SGML (naive answer included...)

hi all,

I seemed ignorant while posting my request for help concerning a comparison between ODA and SGML and in fact I was. I knew nothing about them both, now, thanks to some of you, I know a few things about SGML but I still have problems to find interesting information about ODA. The problem is the same as before, can someone help me position ODA and SGML from one another.

Thanks again.

P.S. : naive answer to naive question : because I was asked to do so ...!
</message>
<message id="<92167.133032U35395@uicvm.uic.edu>" date="2917621831">
Newsgroups: comp.text.sgml
Date: 15 Jun 1992 18:30:31 UT
From: C. M. Sperberg-McQueen \<U35395@uicvm.uic.edu>
Organization: University of Illinois at Chicago
Message-ID: <92167.133032U35395@uicvm.uic.edu>
Subject: Re: Defaulting mechanism for CURRENT attributes

Last week I wrote:

> If there is any explicit specification in 8879 that an SGML parser
> must* process a document through a left-to-right depth-first traversal
> of the document tree, I would very much like to know where it is. I
> don't think it exists: ...

Both Wayne Wohler and Erik Naggum have pointed out some passages in
8879 which they think may imply or be the specification of left-right
recognition which I was asking for.

Several of these do not seem to me to require left-to-right scanning at
all; many of them talk about the order in which SGML constructs occur in
the document, but this does not constitute a requirement that constructs
be recognized in that order. (Reverse Polish notation, for example, can
be parsed right-to-left, so the operators are recognized before the
expressions they follow, but this does not mean we cannot speak of the
expressions coming before the operator.)

However, two of the passages mentioned do explicitly require left-right
recognition of SGML constructs, and with it a LR, depth-first traversal
of the document tree. Both WW and EN mention clause 6.2, on "SGML
Entities", which says in the second paragraph after production 4: "Each
SGML character is parsed in the order it occurs, in the following
manner: ..."

And clause 9.6.3, mentioned by WW, explicitly says "Delimiter strings
... are recognized in the order they occur, with no overlap."

These two passages do explicitly require that characters be parsed and
language constructs be recognized in their order of occurrence (i.e. in
what I've been calling 'left-right order') and thus restrict the order
of recognition enough to make the definition of current attributes
unambiguous. Given SGML's grammar, parsing characters in this order
means the document will be parsed in the same order as a left-right
depth-first scan of the document tree; this gives unambiguous sense to
the phrase 'most recently specified' used in the definition of 'current
attribute'. They also require SGML parsers to use left-right scanning
algorithms, rather than the right-left or non-directional algorithms one
sometimes reads about. Thanks to Wayne Wohler and Erik Naggum for
pointing out these passages.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
</message>
<message id="<23211A@erik.naggum.no>" date="2917643666">
Newsgroups: comp.text.sgml
Date: 16 Jun 1992 00:34:26 UT
From: Erik Naggum \<erik@naggum.no>
Reply-To: enag@ifi.uio.no
Message-ID: <23211A@erik.naggum.no>
References: <1992Jun15.172209.22762@ens.fr>
Subject: Re: #CURRENT attributes

Denis Excoffier \<scof@dmi.ens.fr> writes:
|
| Could someone please give me the final answer to the following question :
|
| Is this document a correct SGML document ?
| -----------------------------------
| \<!DOCTYPE doc [
| \<!ELEMENT doc - - (foo|bar)>
| \<!ELEMENT (foo|bar) - O EMPTY>
| \<!ATTLIST (foo|bar)
| zot CDATA #CURRENT
| >
| ]>
| \<doc>
| \<foo zot="yuyu">
| \<bar>
| \</doc>
| -----------------------------------

Yes. (Provided SHORTTAG or OMITTAG are active features.)

See 7.9.1.1 Omitted Attribute Specification in ISO 8879:1986, or in
Goldfarb's Handbook (page 329), included below for your reference:

7.9.1.1 Omitted Attribute Specification

If "SHORTTAG YES" or "OMITTAG YES" is specified
on the SGML declaration:

a) There need be an attribute specification only
for a required attribute, and for a current
attribute on the first occurrence of any element
in whose attribute definition list it appears.
Other attributes will be treated as though
specified with an attribute value equal to the
declared default value.

b) If there is an attribute value specification for a
current attribute, the specified attribute value
will become the default value. The new default
affects all elements associated with the
attribute definition list in which the attribute
was defined.

Best regards,
\</Erik>

--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \<erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \<enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis.
</message>
<message id="<1992Jun17.081114.13439@ens.fr>" date="2917757474">
Newsgroups: comp.text.sgml
Date: 17 Jun 1992 08:11:14 UT
From: Denis Excoffier \<scof@dmi.ens.fr>
Organization: Ecole Normale Superieure, PARIS, France
Message-ID: <1992Jun17.081114.13439@ens.fr>
References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no>
Keywords: #CURRENT DTD
Summary: strange implicit implication of standard
Subject: Re: #CURRENT attributes

In article <23211A@erik.naggum.no> enag@ifi.uio.no writes:
>|
>| Is this document a correct SGML document ?
>| -----------------------------------
>| \<!DOCTYPE doc [
>| \<!ELEMENT doc - - (foo|bar)>
>| \<!ELEMENT (foo|bar) - O EMPTY>
>| \<!ATTLIST (foo|bar)
>| zot CDATA #CURRENT
>| >
>| ]>
>| \<doc>
>| \<foo zot="yuyu">
>| \<bar>
>| \</doc>
>| -----------------------------------
>
>Yes. (Provided SHORTTAG or OMITTAG are active features.)
>
So I have to understand that the two DTDs :

\<!DOCTYPE doc [
\<!ELEMENT doc - - (foo|bar)>
\<!ELEMENT (foo|bar) - O EMPTY>
\<!ATTLIST (foo|bar) zot CDATA #CURRENT>
]>

\<!DOCTYPE doc [
\<!ELEMENT doc - - (foo|bar)>
\<!ELEMENT (foo|bar) - O EMPTY>
\<!ATTLIST (foo) zot CDATA #CURRENT>
\<!ATTLIST (bar) zot CDATA #CURRENT>
]>

are NOT the same DTDs (i.e. there exist some document(s) e.g.
\<doc>\<foo zot="yuyu">\<bar zot="toto">\<foo>\</doc> which don't
show the same ESIS when parsed with those 2 DTDs)

Am I right ?
</message>
<message id="<23212A@erik.naggum.no>" date="2917777889">
Newsgroups: comp.text.sgml
Date: 17 Jun 1992 13:51:29 UT
From: Erik Naggum \<erik@naggum.no>
Reply-To: enag@ifi.uio.no
Message-ID: <23212A@erik.naggum.no>
References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> <1992Jun17.081114.13439@ens.fr>
Subject: Re: #CURRENT attributes

Denis Excoffier \<scof@dmi.ens.fr> writes:
|
| So I have to understand that the two DTDs :
|
| \<!DOCTYPE doc [
| \<!ELEMENT doc - - (foo|bar)>
| \<!ELEMENT (foo|bar) - O EMPTY>
| \<!ATTLIST (foo|bar) zot CDATA #CURRENT>
| ]>
|
| \<!DOCTYPE doc [
| \<!ELEMENT doc - - (foo|bar)>
| \<!ELEMENT (foo|bar) - O EMPTY>
| \<!ATTLIST (foo) zot CDATA #CURRENT>
| \<!ATTLIST (bar) zot CDATA #CURRENT>
| ]>
|
| are NOT the same DTDs (i.e. there exist some document(s) e.g.
| \<doc>\<foo zot="yuyu">\<bar zot="toto">\<foo>\</doc> which don't
| show the same ESIS when parsed with those 2 DTDs)

Good observation. This is an important difference from programming
languages where such lists can be rolled out, e.g., int a,b; is
identical to int a; int b;.

| Am I right ?

Yes.

Best regards,
\</Erik>
--
Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento,
Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena.
Boks 1570, Vika | \<erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento,
0118 OSLO, NORWAY | \<enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis.
</message>
<message id="<1992Jun17.140923.14795@exu.ericsson.se>" date="2917778963">
Newsgroups: comp.text.sgml
Date: 17 Jun 1992 14:09:23 UT
From: "Tom Boudreau,cs,x0364" \<exutgb@exu.ericsson.se>
Reply-To: exutgb@exu.ericsson.se
Organization: Ericsson Network Systems, Inc.
Message-ID: <1992Jun17.140923.14795@exu.ericsson.se>
Keywords: AECMA 1000D, database publishing, SGML
Subject: ACEMA 1000D

Can anyone give me more information on AECMA 1000D. Is it released, where
can I get a copy, is it worth reading ??????

I'm also interested in any other information on database publishing using SGML.

Thanks,

Tom Boudreau
Ericsson Network Systems

</message>
<message id="<92169.192113U35395@uicvm.uic.edu>" date="2917815672">
Newsgroups: comp.text.sgml
Date: 18 Jun 1992 00:21:12 UT
From: "C. M. Sperberg-McQueen" \<U35395@uicvm.uic.edu>
Organization: University of Illinois at Chicago
Message-ID: <92169.192113U35395@uicvm.uic.edu>
References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> <1992Jun17.081114.13439@ens.fr>
Subject: Re: #CURRENT attributes

Denis Excoffier asks:
> So I have to understand that the two DTDs :
[identical DTDs except for:]
> \<!ATTLIST (foo|bar) zot CDATA #CURRENT>
>
> \<!ATTLIST (foo) zot CDATA #CURRENT>
> \<!ATTLIST (bar) zot CDATA #CURRENT>
>
> are NOT the same DTDs (i.e. there exist some document(s) e.g.
> \<doc>\<foo zot="yuyu">\<bar zot="toto">\<foo>\</doc> which don't
> show the same ESIS when parsed with those 2 DTDs)
>
> Am I right ?

That's certainly my understanding of 8879. The ability to affect the
defaulting behavior may be useful in some cases, though in our experience
with the TEI DTDs the cases of elements with exactly identical attribute
definition lists are few and far between (and when they occur, they
often don't involve CURRENT attributes), so the issue doesn't arise as
often as I expected at first. The TEI DTDs always specify 'attribute
definition list declarations' for one GI at a time, in part to make it
easier to find stuff in them (vain hope). This allows us to avoid
having to decide when to merge and when not to merge attribute
definition list declarations. We also avoid having to explain to our
users how this feature of CURRENT works, which may be as well. If this
construct has a 'astonishment factor' for you, you can avoid it by
simply not using the construct. Simplicity can have its rewards.

-C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago
</message>
<message id="<1992Jun19.104220.1399@ens.fr>" date="2917939340">
Newsgroups: comp.text.sgml
Date: 19 Jun 1992 10:42:20 UT
From: Denis Excoffier \<scof@dmi.ens.fr>
Organization: Ecole Normale Superieure, PARIS, France
Message-ID: <1992Jun19.104220.1399@ens.fr>
References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> <1992Jun17.081114.13439@ens.fr> <92169.192113U35395@uicvm.uic.edu>
Keywords: #CURRENT conventions
Subject: Re: #CURRENT attributes

In article <92169.192113U35395@uicvm.uic.edu> U35395@uicvm.uic.edu (C. M. Sperberg-McQueen) writes:
>We also avoid having to explain to our
>users how this feature of CURRENT works, which may be as well. If this
>construct has a 'astonishment factor' for you, you can avoid it by
>simply not using the construct. Simplicity can have its rewards.
>
I agree. It's easier to explain that those ``attribute definition
lists'' are simply a means to factorize information. I hope that this
``commoncurrent attributes'' feature will be pointed out as deprecated
in the standard in the next revision.
</message>
<message id="<1992Jun22.055448.19261@sserve.cc.adfa.oz.au>" date="2918181288">
Newsgroups: comp.text.sgml
Date: 22 Jun 1992 05:54:48 UT
From: Tom Worthington \<tomw@ccadfa.cc.adfa.oz.au>
Organization: Australian Defence Force Academy, Canberra, Australia
Message-ID: <1992Jun22.055448.19261@sserve.cc.adfa.oz.au>
Subject: SGML for legal documents?

A lawyer asked me about standards for transfering legal documents
from one word processor to another. Sounds like an application for
SGML. Has any work been done in this area? Any existing standards?
</message>
<message id="<7430@hemuli.tik.vtt.fi>" date="2918275939">
Newsgroups: comp.sys.next.misc,comp.sys.next.software,comp.text.sgml
Followup-To: comp.sys.next.software
Date: 23 Jun 1992 08:12:19 UT
From: Timo Vendelin \<timo.vendelin@vtt.fi>
Reply-To: timo.vendelin@vtt.fi (Timo Vendelin)
Message-ID: <7430@hemuli.tik.vtt.fi>
Keywords: sgml
Subject: SGML based editors for NeXT

Hi,

I am looking for SGML based editors for NeXT workstation.
I do not know any yet, so any help will be valuable for me.
Thank you.

I'm also interested to know is it syntax directed SGML editor and
could I also create DTD with it.

Thanks for any info,

Timo Vendelin

--

UNIX-email: Timo.Vendelin@vtt.fi
X.400-email: C=FI,ADMD=MAILNET,PRMD=VTT,PN=Timo Vendelin
name: Timo Vendelin
office_phone: +358 (9)0 456 4505
fax: +358 (9)0 455 2839
</message>
<message id="<1992Jun23.140002.27827@bradford.ac.uk>" date="2918296802">
Newsgroups: comp.text.sgml
Date: 23 Jun 1992 14:00:02 UT
From: "NWR.AYRES" \<N.W.R.Ayres@bradford.ac.uk>
Organization: University of Bradford, UK
Message-ID: <1992Jun23.140002.27827@bradford.ac.uk>
References: <23205B@erik.naggum.no>
Subject: Re: more about national DTDs

erik@naggum.no (Erik Naggum) writes:
: The reason I, too, want national DTD's is that the people who keyboard
: the documents will want to use their natural language also for the
: markup. With conventional markup languages, this has not been possible
: without an incredible lot of work, and it is consequently seldom done.
:
: I am concerned about the "magic" involved if tag names are cryptic in a
: foreign language. People who want to express their ideas in written
: form will have to find SGML useful for their needs, and will have enough
: effort poured into the written expression of their ideas even before
: they would be forced to learn another bloody form of computer-imposed
: magic. I'm perfectly happy with English comments in the DTD, English
: parameter entity names, etc, because they are used by document designers
: and programmers, who _also_ have better things to do than to learn
: another language every time they turn around. Programmers are very
: special users, and making any sort of concession to them (and I'm
: talking about myself here, too), is likely to scare many users off.
:
: As regards document interchange: If the document language is a foreign
: to you, the tag names are the least of your problems. As far as SGML
: goes, we do have explicit link, and I'm providing an explicit link type
: declaration with the Norwegian General Document just for this purpose.
: (I notice, with some regret, that attributes can't be "renamed" with
: this feature, and will need special treatment.)
:
: That "everybody" knows English is no reason to believe that everybody in
: fact know English well enough to not feel intimidated by English tag
: names. SGML even provides a means to rename keywords of the language.
: A parser should be able to accomodate such trivial modifications for
: national needs.

All of this has to do with user interface. The average user of a computer
doesn't want to be a guru, they just want a machine and application that is
as easy to use as possible. This means surely native languages and tag names
as presented to them, meaning something obvious. It is probably good not to
have tags shown at all, but some other means of the user being able to
demarcate content. Thus if \<p> is _paragraph_ then surely it is better to have
the interface to the user show the word _paragraph_ to the user in their own
language than the obscure \<p> (maybe via menues or a line at the bottom of the
screen showing the type of object the cursor is in or.. or... rather than
included tags in the text).
It is the same thing with command line
versus direct manipulation interface. The latter is coming more common
because it is easier to understand than having to remember large amounts
of obscure commands. Application designers/ programmers will have to play
around with guru information, but preferably not application users.:
</message>
<message id="<1992Jun23.142647.29166@bradford.ac.uk>" date="2918298407">
Newsgroups: comp.text.sgml
Date: 23 Jun 1992 14:26:47 UT
From: "NWR.AYRES" \<N.W.R.Ayres@bradford.ac.uk>
Organization: University of Bradford, UK
Message-ID: <1992Jun23.142647.29166@bradford.ac.uk>
References: <2344@irit.irit.fr>
Subject: Re: ODA vs SGML (naive answer included...)

guimpier@irit.irit.fr (jean-luc GUIMPIER) writes:
: hi all,
:
: I seemed ignorant while posting my request for help concerning a comparison between ODA and SGML and in fact I was. I knew nothing about them both, now, thanks to some of you, I know a few things about SGML but I still have problems to find interesting information about ODA. The problem is the same as before, can someone help me position ODA and SGML from one another.
:
: Thanks again.

ODA is designed as an interchange format for Open Systems, not really for
modelling documents. It has the ability to model the logical structure of
a document but is clumsier than SGML in doing so. It has the advantage that
human readable names can be placed in object descriptions such that more people
than gurus can use it if they wish.:-) Its idea is that it will be used to
transfer office documents between people such that the sender doesn't need to
care what document preperation system the receiver has (it has standardised
the content formats to a number of international standards to ease
the filtering of content formats to the ones that the receiver will use -
this is different from SGML where you need private agreements, for example
that you will use PostSript).
It also models the layout of a document if you wish to do so. This is not
up to publishers standards but is perfectly good for most uses.

Its main advantage is this ability to do _blind interchange_ of documents
so that they are editable by the receiver

For a good technical description read_Document Arhitecture in open Systems:
The ODA standard_ by Appelt,W. Springer-Verlag 1991 0387545395

Alternatively
Brown, H. Introduction to the Office Document Architecture in
Multi-Media document traslation ODA and the EXPRES Project (Rosenburg, J;
Springer-Verlag). Shorter and easier.

If you would like a about 100 references on ODA send me an email and I will
mail it to you.

All the best

Nick:
</message>
<message id="<14100@umd5.umd.edu>" date="2918304365">
Newsgroups: comp.text.sgml
Date: 23 Jun 1992 16:06:05 UT
From: e2cs005@fre.fsu.umd.edu
Reply-To: e2cs005@fre.fsu.umd.edu
Organization: Frostburg State University, Frostburg, MD
Message-ID: <14100@umd5.umd.edu>
Subject: SGML Parser

I would like any information as to where I can find the common SGML
parser written in C (the source code would be nice too) for the IBM
pc.

Any additional information on finding documentation about the specs
for Hytime and/or SGML would also be helpful. I am involved in a
project developing ETM's (Electronic Tech Manuals) and would like to
incorporate SGML / Hytime. Any response is greatly appreciated.

- Robert S. Sommerville
Program Analyst/ ARC
8201 Corporate Drive
Landover, MD 20785, USA
1-301-731-2266
E2CS005@fsu.umd.edu
</message>
<message id="<3446.9206231406@infsc1.hatfield.ac.uk>" date="2918311576">
Newsgroups: comp.text.sgml
Date: 23 Jun 1992 18:06:16 UT
From: British-library \<comrbl@hatfield.ac.uk>
Message-ID: <3446.9206231406@infsc1.hatfield.ac.uk>
Subject: HyTime & Scripts

Hello everyone!

I am posting an email message I received from Steven Newcomb
in reply to a question I sent him concerning script languages
and HyTime.

I hope it is of general interest.

------------------------------------------------------------------------------

> HyTime provides no control mechanisms for user
> interaction.The only way to express control is
> through application script languages embedded in
> SGML elements. These scripts may be executed by
> the application to implement some user initiated
> action, such as starting a video sequence on
> traversal of a hyperlink.

That's not strictly true. It depends on the kind of interaction you're
talking about. For example, HyTime does very explicitly provide quite
elaborate traversal rules in hyperlinks where traversal rules are
relevant. It also provides for explicit designation of ``endterms'' --
cues to users that traversal (e.g.) is possible. HyTime also provides
means (finite coordinate spaces) whereby on-screen real estate can be
allocated to a variety of hypertext and other purposes.

> If HyTime is dependent on application script
> languages how can it be of use as an application
> independent hyper-document interchange format ?

It's true that much of the functionality we find in script languages is
not covered by unadorned HyTime, much the same as the semantics of
typesetting are not covered by unadorned SGML. Perhaps the best way to
understand what HyTime does, and the reasons for its existence, is to
consider the implications of basing hyperdocument interchange on a
script language. A script language is procedural; the structure that it
imposes on the information can be inferred only with difficulty, with
doubtful accuracy, and probably not automatically. By contrast, in a
HyTime hyperdocument, the structure of the information is quite
explicit. It is not necessary for a user to run the script(s) in order
to query the document, or to use it in other ways not originally planned
by the author(s). HyTime's design is informed by the notion that
hyperdocuments are, first and foremost, information. The precise
procedures intended by the author(s) to be applied to the information
are less important than the information itself, in the same way that
procedural markup (typesetting/formatting instructions) are less
important than generic markup.

> Are there plans for a standardised script
> language for HyTime ? If so is it related to the
> AVIS (audiovisual interactive scriptware) standard ?

I myself am very interested to see how AVIs and other script languages
fare as they bump heads in the HyTime arena. HyTime will provide an
arena in which many of the design assumptions of various script
languages will become glaringly apparent.

> Or alternatively could the syntax of the script
> language being used be communicated using a HyTime
> lexical model ? In which case how are the semantics
> of the language communicated ?

As always in SGML, the degree to which the SGML and HyTime markup
penetrate into the detailed semantics of a class of documents is
arbitrary, and is usually decided by the same person(s) who design the
DTD(s). To the extent that a given DTD uses HyTime constructs, to that
same degree HyTime semantics are present and are therefore communicated
by the HyTime standard itself. Additional semantics are normally
communicated by means of documentation accompanying the DTD, especially
via interlinear comments in the DTD itself.

Additional semantics will be formally expressible in DSSSL when DSSSL
becomes an international standard.

> Erik's answer was basically that he thought the application
> and application scripts were out of the scope of the interchange
> mechanism. This is fair enough, but it does seem to limit
> the usefulness of HyTime in real world problems. A standard
> script language would appear to me to be required. Do you know
> of one ?

No. As I said, it will be interesting to see how it all works out.
Because of its potential to become an international standard, I think
AVIS is certainly noteworthy, and evidently there are good people
working on it, but, as a work in progress, it cannot be fairly evaluated
yet.

***

I will not object if you choose to post this reply to comp.text.sgml.

Best regards,

Steven R. Newcomb, Chairman, SGML SIGhyper (International SGML Users'
Group Special Interest Group on Hypertext and Multimedia)
c/o TechnoTeacher, Inc. Voice: +1 904 422 3574
1810 High Road Fax: +1 904 386 2562
Tallahassee, FL 32303-4408 USA Internet: srn@techno.com
NOTE NEW INTERNET ADDRESS: ^^^^^^^^^^^^^^

------------------------------------------------------------------------------

Stephen Baird Tel: 0707-279166
Research Associate Fax: 0707-279185
Hatfield Polytechnic email: comrbl@infsc1.hatfield.ac.uk
Hatfield, Herts
AL10 8BW
United Kingdom
</message>
<message id="<1992Jun24.042804.10035@utagraph.uta.edu>" date="2918351975">
Newsgroups: comp.text.sgml
Date: 24 Jun 1992 05:19:35 UT
From: Robin Cover \<robin@utafll.uta.edu>
Organization: UT Arlington
Message-ID: <1992Jun24.042804.10035@utagraph.uta.edu>
Subject: AACR2 in SGML

Readers familiar with the AACR2 standard for descriptive cataloging
may be interested to learn of plans to commit the volume to SGML
format. I think this conversion may be part of a larger effort
to create a CDROM with several online resources for librarians
(including LC Rule Interpretations and other commentary).

\<quote>
AACR2 TO BE MADE AVAILABLE IN MACHINEREADABLE FORM

The publishers and copyright holders of the AngloAmerican
Cataloguing Rules, 2nd edition, revised (AACR2), have agreed to prepare a
machinereadable and searchable version of AACR2. The American Library
Association (ALA), one of the three publishers of AACR2, expects to
release the AngloAmerican Cataloguing Rules, electronic edition (AACRE)
in early 1993.
The ALA has retained two consultants to work with the authors and
publishers of AACR2 to develop the document type definition and
texttagging scheme required to create a Standard Generalized Markup
Language (SGML) version of AACR2.
John Duke, assistant director for network and technical services at
Virginia Commonwealth University, will be the librarian/cataloger
consultant responsible for working with the technical consultant to plan
file coding and structure that will support effective data retrieval and
manipulation by various text retrieval software systems available to
library catalogers.
George Alexander, president of MindMeetings, a software company which
does custom file conversions for publishers, will design a file structure
to be usable with various text retrieval software retrieval systems,
develop the most effective method to apply the SGML convention to the
structure of AACR2, and develop the fileconversion programs necessary to
produce AACRE from existing text files.
ALA, which holds the copyright to AACR2 along with the Library
Association and the Canadian Library Association, will initially grant
limited permission for use of the copyrighted AACRE text files for
purposes of experimentation and research on their possible uses. Requests
for experimental use may be submitted to David Epstein, General Manager,
ALA Books, or Karen Muller, Executive Director, Association for Library
Collections & Technical Services.
\</quote>
\<source>
*
File "AN2 V3_NO31" ISSN: 10566694

ALCTS NETWORK NEWS

An electronic publication of the
Association for Library Collections & Technical Services

Volume 3, Number 31
June 23, 1992

In this issue

AACR2 TO BE MADE AVAILABLE IN MACHINEREADABLE FORM
CIC PUBLISHES REPORT ON MASS DEACIDIFICATION
OPEN LETTER TO COUNCIL
\</source>

Submitted by RCC

Robin Cover BITNET: zrcc1001@smuvm1 ("onezerozeroone")
6634 Sarah Drive Internet: robin@utafll.uta.edu ("utaefelel")
Dallas, TX 75236 USA Internet: zrcc1001@vm.cis.smu.edu
Tel: (1 214) 2961783 Internet: robin@ling.uta.edu
FAX: (1 214) 7093387 Internet: robin@txsil.sil.org
=========================================================================

</message>
<message id="<23221B@erik.naggum.no>" date="2918403234">
Newsgroups: comp.text.sgml,comp.text
Followup-To: comp.text
Date: 24 Jun 1992 19:33:54 UT
From: Erik Naggum \<erik@naggum.no>
Message-ID: <23221B@erik.naggum.no>
References: <2344@irit.irit.fr> <1992Jun23.142647.29166@bradford.ac.uk>
Subject: Re: ODA vs SGML (naive answer included...)

NWR.AYRES \<N.W.R.Ayres@bradford.ac.uk> writes:
|
| ... Its idea is that it will be used to transfer office documents
| between people such that the sender doesn't need to care what
| document preperation system the receiver has (it has standardised
| the content formats to a number of international standards to ease
| the filtering of content formats to the ones that the receiver will
| use - this is different from SGML where you need private agreements,
| for example that you will use PostSript).

Really? PostScript?

| It also models the layout of a document if you wish to do so. This
| is not up to publishers standards but is perfectly good for most
| uses.

Really? How come the standard has already reached 70 amendments, most
of them very elaborate? Something tells me that ODA is far from being
useful as it is, and that this is a view held by its developers.

I have been less and less impressed with ODA the more I've read about
it. Two and a half years ago, I set out to find out what both ODA and
SGML were all about. Two years ago, I decided that ODA was not
something on which I would spend my time. Today, as I received another
batch of ISO documents full of additions and corrections to ISO 8613, I
think fewer people should spend their time on ODA, and instead more on
PostScript and SPDL for the page description part, and on SGML for the
structure description part. ODA has survived itself, as is becoming
clear to more and more of the parties to its creation.

I think we would benefit from a description of the features of ODA as it
exists, at its current revision level, instead of marketing description
of how happy I will be the results if and when I get them. In my mind,
however, ODA is the single biggest case of "vaporware", and I would not
be inclined to believe anything I don't see.

Add to this the impenetrability of the specifications (I own a copy of,
and have read ISO 8613, all parts, and I continue to read the documents
I get from ISO about amendments and views about ODA at SC 18 level), and
you get a standard which is kept alive for its own sake, by the people
who have spent too much time on it to be able to throw it away without
taking their prestige with it.

I think ODA is intended to be only a transfer format for documents which
are developed in one context and format, on the way to another context
and format, in which they will be printed, or occationally edited.
Considering the cost overhead of "going ODA", I think PostScript or a
(Standard) Page Description Language which the printer understands is
the best solution, and that for editability, nothing beats clear text
encoding.

Others have indicated that comp.text.sgml should not be used for
discussions about ODA, so I ask that follow-ups be directed to
comp.text, to which I've set Followup-To, or comp.protocols.iso, as the
OSI-ness and the protocol-ness of the standard are both irrelevant to
the issue of text processing.

Regards,
\</Erik>

--
Erik Naggum | ISO 8879 SGML |
| ISO 10744 HyTime |
+47-295-0313 | ISO 9899 C | Memento, terrigena.
\<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis.
</message>
<message id="<1992Jun26.144804.1@tnclus.tele.nokia.fi>" date="2918551684">
Newsgroups: comp.text.sgml
Date: 26 Jun 1992 12:48:04 UT
From: hehanninen@tnclus.tele.nokia.fi
Organization: Nokia Telecommunications.
Message-ID: <1992Jun26.144804.1@tnclus.tele.nokia.fi>
Subject: How to take care of sections ?

Hi Everybody !

Sorry I missed possible answers for my last question because off
my vacation.

So here it comes again an some more...

Questions: Is it necessary to include structural info in tag name
(in prefix an in suffix) if you can see (and follow)
the structure in DTD ?

f.ex. %p.em.ph or %pharases

What is flexible way to take care of multilingual and
application sections.
Sections might contain any kind of text-elements; words,
clauses, chapters, in application version sections
also figures and tables.

-with specified tag for each language, f.ex.
...
\<eng>
\<h1>
\<h1t> 1 Heading title
\<p> What is flexible way to take care of multilingual sections:
\</eng>
...
\<fin>
\<p>Nyt hieman suomen kielta sekaan.
\</fin>

-with language-tag with specified attribute, f.ex.
...
\<lan type=eng>
\<h1>
\<h1t> 1 Heading title
\<p> What is flexible way to take care of multilingual sections:
\</lan>
\<lan type=fin>
\<p>Nyt hieman suomen kielta sekaan.
\</lan>

-or with Marked Sections

?? any proposal ??

yours Heimo
</message>
<message id="<1992Jun26.155541.25290@infoserver.th-darmstadt.de>" date="2918562941">
Newsgroups: comp.text.sgml
Date: 26 Jun 1992 15:55:41 UT
From: Christine Detig \<detig@hp13.iti.informatik.th-darmstadt.de>
Organization: TU Darmstadt
Message-ID: <1992Jun26.155541.25290@infoserver.th-darmstadt.de>
Subject: Novice user question

Hello,

I'm quite new to SGML and thought it would be good to get acquaintaned
with the tools available. So I took the sgmls-0.8 parser and tried to
apply it to the demo DTDs I could scratch from everywhere, e.g.
the write-it.dtds, a-w.dtd (used for the goldfarb book, as far as I know) etc.
But sgmls didn't process any of them without errors
(neither on Sun3 nor on HP400).

Instead, I got error messages like

sgmls: SGML error at memo.dtd, line 1 in declaration parameter 1:
Amendment 1 requires "ISO 8879:1986" instead of "ISO 8879-1986"

sgmls: SGML error at memo.dtd, line 21 in declaration parameter 50:
Non-significant shunned character number 0 not declared UNUSED
[...]
sgmls: SGML error at memo.dtd, line 23 in declaration parameter 58:
Character number 13 was described using an unknown base set

etc (this being taken from write-it/memo.dtd) wich starts:

\<!SGML "ISO 8879-1986"
--
MEMO PROCESSING SGML DECLARATION

This SGML Declaration contains the rules needed to process the
memo handling DTD supplied with WRITE-IT, Sema Group's
SGML-based syntax directed editor.

--
CHARSET
BASESET "ISO 646-1983//CHARSET International Reference Version
(IRV)//ESC 2/8 4/0"
DESCSET 0 128 0

CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN"
SCOPE DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 127

BASESET "ISO 646 IRV"
DESCSET 0 128 0

could a kind soul point out to me what I'm missing?
Something I have to install? Wrong tool for the wrong thing?
Usage error?

Thanks a lot, Christine
</message>
<message id="<23223B@erik.naggum.no>" date="2918584728">
Newsgroups: comp.text.sgml
Date: 26 Jun 1992 21:58:48 UT
From: Erik Naggum \<erik@naggum.no>
Message-ID: <23223B@erik.naggum.no>
References: <1992Jun26.144804.1@tnclus.tele.nokia.fi>
Subject: Re: How to take care of sections ?

hehanninen@tnclus.tele.nokia.fi writes:
|
| Hi Everybody !
|
| Sorry I missed possible answers for my last question because off
| my vacation.

You could check out the archive at ifi.uio.no with Gopher
(gopher.ifi.uio.no, port 70) or WAIS (comp.text.sgml.src).

| Questions: Is it necessary to include structural info in tag name
| (in prefix an in suffix) if you can see (and follow)
| the structure in DTD ?
|
| f.ex. %p.em.ph or %pharases

Brief answer: Those are not tag names, they're parameter entities used
in the DTD to simplify (and make possible) modifications in the document
type declaration subset.

Regards,
\</Erik>

--
Erik Naggum | ISO 8879 SGML |
| ISO 10744 HyTime |
+47-295-0313 | ISO 9899 C | Memento, terrigena.
\<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis.
</message>
<message id="<23224A@erik.naggum.no>" date="2918592332">
Newsgroups: comp.text.sgml
Date: 27 Jun 1992 00:05:32 UT
From: Erik Naggum \<erik@naggum.no>
Message-ID: <23224A@erik.naggum.no>
References: <1992Jun26.155541.25290@infoserver.th-darmstadt.de>
Subject: Re: Novice user question

Christine Detig \<detig@hp13.iti.informatik.th-darmstadt.de> writes:
|
| I'm quite new to SGML and thought it would be good to get
| acquaintaned with the tools available. So I took the sgmls-0.8
| parser and tried to apply it to the demo DTDs I could scratch from
| everywhere, e.g. the write-it.dtds, a-w.dtd (used for the goldfarb
| book, as far as I know) etc. But sgmls didn't process any of them
| without errors (neither on Sun3 nor on HP400).

The a-w.dtd (for Addison-Wesley) was not used by Goldfarb, but by Martin
Bryan in his "Author's Guide" to SGML. Martin Bryan is also the source
of the write-it.dtd. He's also the source of some major confusions
about SGML, and I positively hate to say that the strong negative view
of his work that made me decide against putting the above-mentioned DTDs
in the public SGML archive even before looking at them proved prudent.
It would have been much more interesting if I had had an occation to
update my opinion of the source of this material.

| sgmls: SGML error at memo.dtd, line 1 in declaration parameter 1:
| Amendment 1 requires "ISO 8879:1986" instead of "ISO 8879-1986"

That's right. Those who still think that ISO use "-" between standard
number and publication year need to update their software. This change
occurred sometime before 1988, and it's really about time we get this
right. sgmls is quite right in pointing to Amendment 1 to ISO 8879, and
it's purely the fault of the writer of the SGML declaration to not abide
by the new syntax rules.

The old syntax was

ISO[/xxx] nnnn[/p]-yyyy

where
[x] means x is optional,
xxx is a co-sponsoring organization,
nnnn is the number of the standard,
p is the part number, and
yyyy is the publication year

The new syntax is

ISO[/xxx] nnnn[-p]:yyyy

This is relevant for ISO owner identifiers in public identifiers, such
as "ISO 646:1983".

| sgmls: SGML error at memo.dtd, line 21 in declaration parameter 50:
| Non-significant shunned character number 0 not declared UNUSED

\<shot type=cheap>
This in an SGML declaration from the guy who thinks he's an expert on
SGML declarations.
\</shot>

See below for a detailed answer.

| sgmls: SGML error at memo.dtd, line 23 in declaration parameter 58:
| Character number 13 was described using an unknown base set

\<shot type=cheap>
This in an SGML declaration...
\</shot>

| etc (this being taken from write-it/memo.dtd) wich starts:

| \<!SGML "ISO 8879-1986"
| --
| MEMO PROCESSING SGML DECLARATION
|
| This SGML Declaration contains the rules needed to process the
| memo handling DTD supplied with WRITE-IT, Sema Group's
| SGML-based syntax directed editor.
|
| --
| CHARSET
| BASESET "ISO 646-1983//CHARSET International Reference Version
| (IRV)//ESC 2/8 4/0"

ESC 2/8 4/0 is technically better than ESC 2/5 4/0 which the standard
mandates as the sequence to use for the entire ISO 646 character set.
However, ESC 2/8 4/0 designates a G0 character set, and does not include
any control characters. It's therefore wrong, unless supplemented by a
C0 character set with designator ESC 2/1 4/0.

| DESCSET 0 128 0

Quite apart from the fact that positions 0 through 31 and 127 have no
character assigned to them in the character set designated by ESC 2/8
4/0, the syntax (below) has specifically declared all these codes to be
unused, and then they must either be truly unused, or be function
characters. One could argue that they are unused in the designated
character set because the positions are unused, but then there are no
codes for RS or RE. Something is clearly losing big time, here.

This declaration should have been something like this;

DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED

| CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN"
| SCOPE DOCUMENT
| SYNTAX
| SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| 21 22 23 24 25 26 27 28 29 30 31 127

This declaration specifies that a character with this number should not
occur in the document, unless specifically identified as a function
character (there are three such normally identified, 9, 10, and 13).

| BASESET "ISO 646 IRV"

This is a syntax violation for a public identifier. The public
identifier should be a formal public identifier, and sgmls is quite
right in complaining that significant SGML characters can't be found.

| DESCSET 0 128 0

This is at least legal, per se, but DESCSETs cannot be viewed apart from
their BASESETs.

| could a kind soul point out to me what I'm missing?
| Something I have to install? Wrong tool for the wrong thing?
| Usage error?

Christine, you're not missing anything. You've just come across a badly
designed and syntactically invalid SGML declaration, and you've used an
accurate tool which has rejected it, as any quality tool should.

Best regards,
\</Erik>

--
Erik Naggum | ISO 8879 SGML |
| ISO 10744 HyTime |
+47-295-0313 | ISO 9899 C | Memento, terrigena.
\<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis.
</message>
<message id="<23224F@erik.naggum.no>" date="2918669824">
Newsgroups: comp.text.sgml
Date: 27 Jun 1992 21:37:04 UT
From: Erik Naggum \<erik@naggum.no>
Message-ID: <23224F@erik.naggum.no>
Subject: [alt.hypertext] Re: SGML information

------------------------------------------------------------------------
From: neilp@cs.hw.ac.uk (Neil Postlethwaite)
Newsgroups: alt.hypertext
Subject: Re: SGML information
Message-ID: <1992Jun23.182812.2951@cs.hw.ac.uk>
Date: 23 Jun 1992 18:28:12 UT
References: <1992Jun15.161342.23132@uxa.ecn.bgu.edu>
Sender: news@cs.hw.ac.uk (News Administrator)
Organization: Dept of Computer Science, Heriot-Watt University, Scotland
Lines: 9

In article <1992Jun15.161342.23132@uxa.ecn.bgu.edu> cfkfb@uxa.ecn.bgu.edu (Karl Bridges) writes:
>
> I'm looking for information about SGML. Could anyone suggest some
>references, either articles or books on the subject?

Try the current (June '92) issue of Byte. In the 'State of the Art' section
on InfoGlut.

Neil
------------------------------------------------------------------------

--
Erik Naggum | ISO 8879 SGML |
| ISO 10744 HyTime |
+47-295-0313 | ISO 9899 C | Memento, terrigena.
\<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis.
</message>
<message id="<709791160snx@sgmlinc.com>" date="2918784966">
Newsgroups: comp.text.sgml
Date: 29 Jun 1992 05:36:06 UT
From: Brian Travis \<brian@sgmlinc.com>
Organization: SGML Associates, Inc.
Message-ID: <709791160snx@sgmlinc.com>
References: <23224F@erik.naggum.no>
Subject: SGML information

In article <1992Jun15.161342.23132@uxa.ecn.bgu.edu> cfkfb@uxa.ecn.bgu.edu (Karl > Bridges) writes:
>
> I'm looking for information about SGML. Could anyone suggest some
>references, either articles or books on the subject?

There is a monthly SGML newsletter available. \<TAG> is the
only regular publication devoted to SGML. Each monthly issue
is "packed with the most current news, features, reviews,
and reports".

Information about \<TAG> is available from the Graphic
Communications Association at +1 703-519-8157. Their fax is
+1 703-548-2867. GCA is the predominant industry
organization that covers SGML. They hold four or five
conferences each year devoted to the subject. This is a good
group to know about.

I hope this helps!
Brian.

-------------------------------------------------------------
<> Brian Travis <> brian@sgmlinc.com <>
<> SGML Architect <> InfoDesign Corp. <>
<> Managing Editor \<TAG>: The SGML Newsletter <>
<> 6360 S. Gibraltar Cir. <> Aurora CO 80016 USA <>
<> Tele: +1 303 680-0875 <> Fax: +1 303 680-4906 </>
</message>
<message id="<CDUPREE.92Jun29110830@hqsun2.oracle.com>" date="2918833710">
Newsgroups: comp.text.sgml
Date: 29 Jun 1992 19:08:30 UT at Oracle Corporation. The opinions expressed are those of the user and not necessarily those of Oracle.
From: Chuck Dupree \<cdupree@oracle.com>
Organization: Oracle Corporation, Belmont, Ca.
Message-ID: \<CDUPREE.92Jun29110830@hqsun2.oracle.com>
Subject: Question: SGML parsers

Can someone please mail or post the location of FTP archives that
contain public-domain SGML parsers? I know this has been posted
before, and thought I'd saved a posting with this information; but I
can't find it locally.

Since my company is a bit paranoid about network security, I need to
make a formal request to our data center folks to do the FTP'ing.
Thus, I would appreciate getting as much information on the full path
names of files and directories as possible.

Also, I've heard that there is a public-domain parser generator. Is
this true? Is the source code available as well as the executable
form?

Thanks for the assistance!

- Chuck Dupree
Oracle Corp.
Redwood Shores, CA
</message>}