Newsgroups: comp.text.sgml Date: 01 Jun 1992 01:14:05 UT From: Brian Travis \ Organization: SGML, Inc. Message-ID: <707361245snx@sgmlinc.com> References: <1992May29.142718.4971@crd.ge.com> Subject: Yacc'able sgml grammar In article <1992May29.142718.4971@crd.ge.com> barnettj@pookie.crd.ge.com writes: > > I apologize if this is a FAQ, but I've just started reading this > group. I'd like to know if there exists a yacc parser built to > recognize sgml'd text or a subset of sgml. If none exists, is > there any tool that parses sgml and allows you to insert actions > when rules are reduced (like yacc)? The ARC Parser, available from the usual sources (check the FAQ), comes with a REXX interface, from which the programmer can get the Element Structure Information Set (ESIS) information...things like attribute values, element-in- context information, expanded entity information and stuff like that, for use in a REXX program. The source is available, so it would be a trivial task to build whatever interface around the ESIS data. I imagine that parsers based on ARC (SGML-S and others) have the same kind of output accessible, but I haven't had time to investigate these yet. Brian. --------------------------------------------------------------------------- <> Brian Travis <> Managing Editor <> brian@sgmlinc.com <> \: The SGML Newsletter <> 6360 S. Gibraltar Cir. <> Aurora CO 80016 <> <> (303) 680-0875 <> Fax: (303) 680-4906 Newsgroups: comp.text.sgml Date: 01 Jun 1992 18:34:48 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23172C@erik.naggum.no> Subject: Access to the comp.text.sgml archive I'm pleased to announce that the comp.text.sgml is now available with Gopher and WAIS. The archive is available by date, by message-id, and with keyword search. The keyword index and directory chaches are generated every night. Credits go to Anders Ellefsrud of the operations staff here at the Institute of Informatics at the University of Oslo. Thanks, Anders. The WAIS information base is updated with the access parameters, and can be accessed immediately. The Gopher system is available as part of the the international Gopher system, but access isn't so easy unless you know a little Norwegian. Find the entry for Norway, go down to "Universitetet i Oslo", then down to "Matematisk-Naturvitenskapelig Fakultet", then "Institutt for Informatikk", then "SGML", and you're there. The available directories include "SGML", "SIGhyper", and "comp.text.sgml". The latter has "by.date", "by.msg-id", and the keyword search facility. There's a way to specify how to get right there, but I moved this weekend, and can't find the details (on a piece of paper, naturally). If you know Gopher, you can possibly make sense out of these details: Host: gopher.ifi.uio.no Port: 70 Dir: /SGML This could be put in a file to make access simpler, and I'll get back to you with more information as soon as I can find it, or Anders patiently tells me for the third time... Let's give Anders a big hand for this! Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 03 Jun 1992 10:35:18 UT From: jean-luc GUIMPIER \ Organization: IRIT-UPS, Toulouse, France Message-ID: <2320@irit.irit.fr> Subject: ODA vs SGML Hi everybody, having to evaluate both norms to choose one for future developement I would welcome any comments, advices or information. If comparison work is available somewhere I'm interested. Thanks for helping. Newsgroups: comp.text.sgml Date: 03 Jun 1992 15:14:57 UT From: Siamak Khoubyari \ Reply-To: khoub-s@cs.buffalo.edu Organization: SUNY at Buffalo, Computer Science / CEDAR Message-ID: \ Subject: SGML Parser Hi all, I'm looking for an SGML parser or interpreter of some kind. I would appreciate any information you may be able to mail me regarding what is available, wnd where it can be obtained. Thanks, -- Siamak =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Siamak Khoubyari | Internet: khoub-s@cs.buffalo.edu Department of Computer Science / CEDAR | BITNET: khoub-s@sunybcs.BITNET State University of New York at Buffalo | UUCP: !uunet!cs.buffalo.edu!khoub-s =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.text.sgml Date: 04 Jun 1992 09:07:00 UT From: Denis Excoffier \ Organization: Ecole Normale Superieure, PARIS, France Message-ID: <1992Jun4.090700.10778@ens.fr> Subject: ESIS -> SGML A question to all those SGML hackers : Suppose you have a DTD and the ESIS of some SGML instance you don't have. From these, how do you derive an SGML instance that would, if parsed, give the same ESIS ? Example : if DTD is : \ ]> if ESIS is : (DOC -Yes, I have a number. )DOC An SGML instance could be : \Yes, I have a number.\ Is it so easy in all cases ? Newsgroups: comp.text.sgml Date: 04 Jun 1992 09:30:24 UT From: Luc Dupuy \ Organization: Universite du Quebec a Montreal Message-ID: <1992Jun4.093024.1834@cari.telecom.uqam.ca> References: <2320@irit.irit.fr> Subject: Re: ODA vs SGML In article <2320@irit.irit.fr> guimpier@irit.irit.fr (jean-luc GUIMPIER) writes: >Hi everybody, > >having to evaluate both norms to choose one for future developement I would welcome any comments, advices or information. >If comparison work is available somewhere I'm interested. > >Thanks for helping. Just one naive question : why choose? Newsgroups: comp.text.sgml Date: 05 Jun 1992 11:16:59 UT From: Mike Popham \ Organization: Computer Unit, Exeter University Message-ID: <2294@exua.exeter.ac.uk> Subject: Sample SEMA DTDs available A new directory (write-it.dtds) has been added to the SGML archive held at The SGML Project (c/o University of Exeter, UK). A copy of the README file for this directory follows: ------------------------ write-it.dtds/README --------------------------- SGML Project 05 Jun. 1992 write-it.dtds This directory contains copies of some DTDs kindly donated to the archive by Martin Bryan (SGML Products Manager of SEMA Software Technology). The DTDs were copied straight from an MS-DOS disk to a Sun SPARC- station, then packed using the UNIX tar and compress utilities. These DTDs are offered "as is", and are not supported in any way. A copy of the README file that accompanied the files on the disk follows: This disc contains 4 DTDs. Three are stripped down versions of the training DTDs that are used in The WRITE-IT Manual for teaching users how to create marked up memos, letters and reports. Each of these DTDs is preceded by the SGML Declaration used to process the DTD. The fourth DTD in the root directory is the A-W.DTD used to create SGML - An Author's Guide to the Standard Generalized Markup Language by M. Bryan. (The latter has been copyrighted by Addison-Wesley but can be modified for use as the basis of other DTDs.) Within WRITE-IT extensive use is made of short references, and information on the use of elements is stored in link processing instructions that form part of the DTD. The full version of each DTD, as used by WRITE-IT, can be found in the WRITE-IT subdirectory, which also contains the files needed to configure WRITE-IT to handle each class of DTD (the .cfg files). NAME TYPE CONTENTS README File (This file) write-it.dtds.tar.Z File DTDs for letter, memo, report and book(c) donated by SEMA Software Technology. ------------------------------------------------------------------- Michael Popham SGML Project - Computing Development Officer Computer Unit - Laver Building North Park Road, University of Exeter Exeter EX4 4QE, United Kingdom email: sgml@uk.ac.exeter OR M.G.Popham@uk.ac.exeter Phone: +44 0392 263946 Fax: +44 0392 211630 ------------------------------------------------------------------- Newsgroups: comp.text.sgml Date: 05 Jun 1992 13:01:45 UT From: Mike Popham \ Organization: Computer Unit, Exeter University Message-ID: <2295@exua.exeter.ac.uk> References: <2294@exua.exeter.ac.uk> Subject: Re: Sample SEMA DTDs available In my last posting, I forgot to say HOW to connect to the archive held by The SGML Project. Using ftp: Host: sgml1.ex.ac.uk [144.173.6.61] Login: anonymous Password: \ the files are in directory write-it.dtds. Many apologies for any inconvenience/confusion caused by my earlier omission! ------------------------------------------------------------------- Michael Popham SGML Project - Computing Development Officer Computer Unit - Laver Building North Park Road, University of Exeter Exeter EX4 4QE, United Kingdom email: sgml@uk.ac.exeter OR M.G.Popham@uk.ac.exeter Phone: +44 0392 263946 Fax: +44 0392 211630 ------------------------------------------------------------------- Newsgroups: comp.text.sgml Date: 05 Jun 1992 20:29:46 UT From: Siamak Khoubyari \ Reply-To: khoub-s@cs.buffalo.edu Organization: SUNY at Buffalo, Computer Science / CEDAR Message-ID: \ Subject: SGML Collection / Corpus? Hi all, I was wondering if you know of any large on-line collection of SGML-formatted documents. If so, how can I obtain it? Thanks, -- Siamak =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Siamak Khoubyari | Internet: khoub-s@cs.buffalo.edu Department of Computer Science / CEDAR | BITNET: khoub-s@sunybcs.BITNET State University of New York at Buffalo | UUCP: !uunet!cs.buffalo.edu!khoub-s =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Newsgroups: comp.text.sgml Date: 05 Jun 1992 20:56:39 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun5.205639.21823@news.eng.convex.com> Subject: using NOTATIONs inline The WWW group is attempting to define a multimedia interchage format called HTML. It is intended to be an SGML language, but most existing HTML has never been through an SGML parser. Anyway, the question is: how do you put a bitmap in an HTML document? That is, is it possible to put an arbitrary 8 bit binary stream _inside_ an SGML document? My guess is: no. But if we use CDATA, can we include anything that doesn't contain the closing tag in full? For example: \ \ ... \@#$@#$@#$@ raw gif data @#$@#$@#\ ... \ Someone made the point that an SGML document is only allowed to include SGML characters as specified by the SGML declaration, and if we're going to use the default SGML declaration, we have to stick to the characters blessed by it. That's not my understanding. I thought that inside CDATA (or SDATA, I think) you could put _anything_ but the closing tag in full. What's the scoop? Do we have to use external entities for raw data? Dan Newsgroups: comp.text.sgml Date: 06 Jun 1992 19:55:11 UT From: Stephen P Spackman \ Organization: University of Chicago CILS Message-ID: \ References: <1992Jun5.205639.21823@news.eng.convex.com> Subject: Re: using NOTATIONs inline Alarm bells. Are you seriously considering a graphics format that stipulates that "the sequence '\' may not appear anywhere in the encoded form of the image" as a validity constraint *on the original source image*? Don't do it. Of course, if I've missed the point, ignore me :-). ---------------------------------------------------------------------- stephen p spackman Center for Information and Language Studies stephen@estragon.uchicago.edu University of Chicago ---------------------------------------------------------------------- Believe in Strong AI? I don't even believe in Strong I! Newsgroups: comp.text.sgml Date: 06 Jun 1992 20:03:27 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23177A@erik.naggum.no> References: <1992Jun5.205639.21823@news.eng.convex.com> Subject: Re: using NOTATIONs inline Dan Connolly \ writes: | | The WWW group is attempting to define a multimedia interchange | format called HTML. . . . Why not use HyTime? : | That is, is it possible to put an arbitrary 8 bit binary stream | _inside_ an SGML document? My guess is: no. But if we use | CDATA, can we include anything that doesn't contain the closing | tag in full? If you by "the closing tag in full" mean the entire end-tag, complete with etago, generic identifier, and tagc, as in "\", this is not the way SGML does it. CDATA and SDATA are terminated by a etago "delimiter-in-context", which is an etago (end-tag open, ""). : | Someone made the point that an SGML document is only allowed to | include SGML characters as specified by the SGML declaration, and if | we're going to use the default SGML declaration, we have to stick to | the characters blessed by it. Blessed and blessed. The SGML declaration is supposed to reflect the reality of the document, not enforce arbitrary limits on them. So you write an SGML declaration which fits the document. | That's not my understanding. I thought that inside CDATA (or SDATA, | I think) you could put _anything_ but the closing tag in full. As said above, the etago delimiter-in-context terminates the data, regardless of whether it's a legal end-tag in that context. You should be aware that the SGML parser will parse the contents of the "binary" content, and ignore record start, and treat record ends different from other characters. In addition, it's an error for an SGML entity to contain characters with any of the numbers listed in the SHUNCHAR part of the SYNTAX declaration. This is _not_ what you want with binary data. | What's the scoop? Do we have to use external entities for raw data? Yes. An external entity that is not an SGML text entity requires a notation identifier, so you only need to list the entities in the DTD, with notation, and refer to them by name in the document instance. If this is not satisfactory, you should declare the objects to be CDATA, and use a binary to text-only transformation scheme. There are several such schemes. Among them, base64 is the preferred encoding in my view, since it's available as part of the new Multipurpose Internet Mail Extensions (MIME) RFC-to-be. (The latest draft is available for anonymous FTP as ftp.ifi.uio.no:/pub/SGML/MIME.6.ps and MIME.6.txt for two weeks from today. Section 5.2 which concerns the base64 encoding is also available as ftp.ifi.uio.no:/pub/SGML/base64.txt.) Transformation back to the binary form from the text-only form may be done on the fly by the application before sending the data to the notation interpreter. In addition to being much easier to deal with in SGML, this also makes SGML documents containing such content robust with respect to file transfer, etc. Hope this helps, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 07 Jun 1992 04:23:58 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun7.042358.29367@news.eng.convex.com> Subject: MIME for global hypertext The WAIS, gopher, and world-wide-web projects are all client/server information retrieval systems. All three deliver plain text information quite well, and they each have evolving mechanisms for delivering other forms of information. The MIME RFC defines a system for processing multi-part, multimedia messages on the internet. I would like to see these systems, along with USENET news and internet mail, interoperate with MIME as the substrate. The clients for these systems go something like this: 0 user invokes client (and chooses a starting point) 1 client displays user's request 2 user reads page, chooses a reference to more info 3 user informs client of choice (e.g. "show me item #1," or "search for googoo") 4 go to step 1 These systems often consist of a hierarchy of menus with text files at the leaf nodes. The system allows the user to interactively navigate the menus and browse leaf nodes. But 1) the format of the menus is particular to the system (USENET newsgroups/articles, unix directories/files, WAIS source/database/document). And 2) once a user is at a leaf node, the system can no longer interactively follow references. The novel aspect of hypertext is that the distinction between the menu pages and the text pages disappears. In the world-wide-web, text documents have machine-readable links inside them, and all menus are represented as hypertext documents. The WWW format works well, but it would benefit from use of MIME's features. For a common hypertext document format, I propose we define a subtype of the MIME multipart message: X-HYPERTEXT. The first part of a multipart/X-HYPERTEXT message is the content of the document, and the remaining parts are multimedia attachments and links to other documents. The content part contains references (by Content-ID) to the attachments and links. The client software allows the user to interactively choose references to display/follow. The remaining parts may be attached image/audio/video using MIME's various types and transfer encodings (text attachments would work too) or they may be references to information accessible elsewhere using MIME's message/external-body type. The parameters to the external-body content-type provide the same information as WWW's Universal Document Indentifier. (MIME only defines ANON-FTP, FTP, TFTP, LOCAL-FILE and AFS. The remaining access-types (WAIS, gopher, etc) would be experimental (X-WAIS, X-GOPHER) until standardized.) The emerging standard for structured, platform-independent text is SGML. The WWW project defines an SGML document type with traditional elements (title, heading, paragraph, list) and new hypertext elements (anchor). Soon it will have multimedia elements (image, audio). The current design places external document references (to files, WWW servers, WAIS documents, gophers, etc.) inside the SGML as attributes. There are lexical incompatibilities, and the design is under strain. I suggest that we implement references as as SGML entities that identify message/external-body parts by content-id. Representing document content in SGML allows the same information to be accessed using different user interface paradigms (e.g. dumb terminals vs. curses style vs. x windows point-and-click). Short of full SGML parsing, we could adopt the MIME text/richtext format, with the addition of a \...\ tag. In fact, any representation that allows the user to interactively indicate one of the attached body parts by content-id will do. For example, plain text with one-line descriptions would do. The Andrew ez data stream would also work, but only Andrew sites could parse it. This brings up the issue of format negociation. No one format is optimal for all information. Clients are likely to be able to process information in several formats, and servers are likely to be able to provide different representations. The various formats can be enclosed in a MIME multipart/alternative message. And rather than including the data for all formats in the message, the data could be in message/external-body parts. The client chooses the type of data it likes and retrieves the corresponding external-body. This (modified) example from the MIME rfc may help explain: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=42 --42 Content-Type: message/external-body; name="BodyFormats.ps"; site="thumper.bellcore.com"; access-type=ANON-FTP; directory="pub"; mode="image"; Content-type: application/postscript --42 Content-Type: message/external-body; name="/u/nsb/writing/rfcs/RFC-XXXX.ez"; site="thumper.bellcore.com"; access-type=AFS; Content-type: application/x-ez --42 Content-Type: message/external-body; name="BodyFormats.txt"; site="thumper.bellcore.com"; access-type=ANON-FTP; directory="pub"; Content-type: text/plain --42-- The client can choose between postscript, ez, and plain text, and retrieve the corresponding message body. The question then becomes: how do these systems interoperate? By making information available as multipart/X-HYPERTEXT MIME messages. The WWW client interfaced to the other systems by defining "addressing schemes" and implementing the various protocols and translating the data into HTML. Gopher has a similar typing scheme -- one character is reserved to indicate the access type and the data type. WAIS clients have yet another method of resolving types, though they only support one protocol. The NewsGrazer application has its own encapsulation mechanism. This is becoming a mess. In the short term, global hypertext viewers will have to support the access-type and content-type of each system with which it interoperates (so we have X-WAIS, X-HTTP, X-GOPHER, X-NNTP, as well as Some of the access types will become standard, and some will die out. But all the data types should be encapsulated in MIME messages. Any data that has machine-readable pointers to other data should be made into a multipart/X-HYPERTEXT message. For example, a WAIS question should have attachments for each of the result documents (the content part can stay application/x-wais-question, or it could be converted to a text type, or both), at least in the case where those documents are available by some standard access method. [I wrote a perl script that will change an HTML document into a MIME message with attachments.] Leaf documents, i.e. documents with no external links, can stay in single part types. e.g. Plain text files become MIME messages by simply adding a blank line at the beginning (to separate the headers (none) from the body). Under this model, a mail message can point to a news article which references a WAIS document which contains several drawings and pointers to several more available by FTP, and a user could just point-and-click between them. The only need for protocols like gopher and HTTP is to encapsulate data that's not already MIME compliant. This is clearly a pipe dream, but it's the kind of thing we can work towards today. Dan Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 07 Jun 1992 14:32:41 UT From: Nathaniel Borenstein \ Organization: Bellcore Message-ID: <1992Jun7.143241.7491@walter.bellcore.com> Subject: Re: MIME for global hypertext I think that Dan's message makes a lot of sense, and I'd had similar thoughts myself. The one change I'd suggest is that instead of multipart/x-hypertext Some group of interested parties should take the time to write a clear RFC describing this content-type, register it with IANA, and use something more like multipart/hypertext In other words, I think this application is important enough to take seriously enough to work towrads standardizing it. Certainly, though, we could start with x-hypertext and standardize it after we had some experience with it -- that's a reasonable approach. But ultimately, this could be a very important MIME type. -- Nathaniel Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 07 Jun 1992 18:36:43 UT From: Edward Vielmetti \ Organization: Msen, Inc. -- Ann Arbor, Michigan Message-ID: <10tl09INN20u@nigel.msen.com> References: <1992Jun7.042358.29367@news.eng.convex.com> Subject: Re: MIME for global hypertext connolly@convex.com (Dan Connolly) writes: : : The WAIS, gopher, and world-wide-web projects are all client/server : information retrieval systems. All three deliver plain text information : quite well, and they each have evolving mechanisms for delivering : other forms of information. : : The MIME RFC defines a system for processing multi-part, multimedia : messages on the internet. I would like to see these systems, along : with USENET news and internet mail, interoperate with MIME as the substrate. There are a couple of servers that already return MIME documents within WAIS: the "mime-samples" source has the Bellcore set of sample documents, and there's a server at Inria, France with album covers etc. Search the directory of servers for MIME and you will get the current list back. It would make some sense to rebuild a few of the other sources -- I'm thinking of the "uunet", "wuarchive", and "cica-win3" servers running at CICnet -- as MIME formatted collections. That would let people with proper viewers connect directly to the FTP site that they index rather than cut and paste. If anyone wants to give this a try drop me some mail and I'll see what I can do to bash something together that would work OK. Edward Vielmetti, vice president for research, Msen Inc. emv@Msen.com Msen Inc., 628 Brooks, Ann Arbor MI 48103 +1 313 741 1120 Newsgroups: comp.text.sgml Date: 08 Jun 1992 19:39:21 UT From: Anne Brueggemann-Klein \ Organization: Institut fuer Informatik der Universitaet Freiburg, Deutschland Message-ID: \ Keywords: attributes, CURRENT, defaults Subject: Defaulting mechanism for CURRENT attributes Let us assume we have a CURRENT attribute *att* and an ID attribute *ident* defined for element type *elem*. The content model for *elem* is (elem | #PCDATA)*. Consider now the partial document \ \ \ att=ZZZ ident=c>\ \ \ \ \. Now, the attribute *att* of *elem* instance *d* defaults to the most recently specified value. Is this ZZZ (thus considering the left-to-right ordering of value specifications) or is this XXX (thus considering the top-down hierarchical ordering? Thank you for your help, Anne Brueggemann-Klein (brueggemann@informatik.uni-freiburg.de) Newsgroups: comp.text.sgml Date: 09 Jun 1992 00:36:55 UT From: "Steven R. Newcomb" \ Message-ID: <9206090036.AA12682@tti> Subject: HTML vs. HyTime Dan Connolly writes: > The WWW group is attempting to define a multimedia interchage > format called HTML. It is intended to be an SGML language, but > most existing HTML has never been through an SGML parser. Why create another SGML-based multimedia interchange format when HyTime was just approved on May 1 as an International Standard for that very purpose? Steven R. Newcomb, President TechnoTeacher, Inc. Voice: +1 904 422 3574 1810 High Road Fax: +1 904 386 2562 Tallahassee, FL 32303-4408 USA Internet: srn@techno.com Newsgroups: comp.text.sgml Date: 09 Jun 1992 12:46:14 UT From: Brian Travis \ Organization: SGML, Inc. Message-ID: <708093974snx@sgmlinc.com> References: \ Subject: Defaulting mechanism for CURRENT attributes In article \ brueggem@informatik.uni-freiburg.de writes: > > > Let us assume we have a CURRENT attribute *att* > and an ID attribute *ident* > defined for element type *elem*. > The content model for *elem* is (elem | #PCDATA)*. > > Consider now the partial document > > \ > \ > \ att=ZZZ ident=c>\ > \ > \ > \ > \. > > Now, the attribute *att* of *elem* instance *d* > defaults to the most recently specified value. > Is this ZZZ (thus considering the left-to-right ordering > of value specifications) > or is this XXX (thus considering the top-down > hierarchical ordering? The standard says a CURRENT element is "The open element whose start-tag most recently occurred (or was omitted through markup minimization)" (Definition 4.68). This would seem to indicate that the hierarchy does not matter; that the value is inherited from "most recently occurred" start-tag, not the "most recently opened but not closed" start-tag, as might be assumed from a hierarchical reading. In this case, it is "ZZZ". I put this question to Exoterica's parser, and got this minimal result (after removing the errant TAGC for ident=c): \ \ \ \ \ \ \ \ Brian. Newsgroups: comp.text.sgml Date: 09 Jun 1992 19:44:45 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23202A@erik.naggum.no> References: \ Subject: Re: Defaulting mechanism for CURRENT attributes Anne Brueggemann-Klein \ writes: | | Let us assume we have a CURRENT attribute *att* and an ID attribute | *ident* defined for element type *elem*. The content model for | *elem* is (elem | #PCDATA)*. I assume you have something like this in mind: \ \ | \ | \ | \\ | \ | \ | \ | \. | Now, the attribute *att* of *elem* instance *d* defaults to the most | recently specified value. Is this ZZZ (thus considering the | left-to-right ordering of value specifications) or is this XXX (thus | considering the top-down hierarchical ordering? I honestly think you would find ISO 8879 and SGML easier to deal with if you didn't attempt to make it fit a mathematical model which wasn't derived from it. On the other hand, if you could formulate a mathematical model consistent with the standard, it would probably be much appreciated by the entire SGML community. In contrast to the parts of the standard which explicitly discusses the hierarchical aspects of the document instance, the standard text in 7.9.1.1 Omitted Attribute Specification, item b) (Goldfarb [329:9]) talks about how the default value for a current attribute is assigned, and to which elements the default applies. It's evident from this that there is no notion of hierarchy involved. That is, "most recently specified" as used about current attributes is to be understood in the temporal sense, where time is understood as parse time. This can at least partially be understood by considering the fact that an attribute specification for a current attribute affects _all_ the elements for which the attribute definition applies. E.g. Given \ \ then \ \ is identical to \ \ This would make it very complex for the human user to figure out what exactly the value of an attribute would be. Not that it isn't already non-trivial, but the effect of hierarchical inheritance on attributes that were declared in a group is a little too much just to contemplate. It's also worth noting that the mechanisms that deal with aspects of the document instance hierarchy also deal with implicit scoping, in that when the element is no longer available, neither is the information pertaining to it. (E.g., short reference sets, as well as the trivial case of sub-element content.) The default value survives the element whose start-tag defined it, and a different mechanism than that of the other hierarchical ones would have to be defined in either case. Hope this helps. Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 10 Jun 1992 08:17:43 UT From: "Jungju Kim (CSDept)" \ Organization: KAIST in Seoul Korea Message-ID: <1992Jun10.081743.19470@kum.kaist.ac.kr> References: <1992Jun7.042358.29367@news.eng.convex.com> Subject: Re: MIME for global hypertext In article <1992Jun7.042358.29367@news.eng.convex.com> connolly@convex.com (Dan Connolly) writes: > >The MIME RFC defines a system for processing multi-part, multimedia >messages on the internet. I would like to see these systems, along >with USENET news and internet mail, interoperate with MIME as the substrate. > >Short of full SGML parsing, we could adopt the MIME text/richtext >format, with the addition of a \...\ tag. >In fact, any representation that allows the user to interactively indicate >one of the attached body parts by content-id will do. For example, >plain text with one-line descriptions would do. The Andrew ez >data stream would also work, but only Andrew sites could parse it. > IMHO, I don't think it's that easy. There can be various kinds of links in a document. One may possibly want to see the next part or the previous part, or one can a have a question on the specific region of the image he/she is watching. Richtext is not designed to deal with the problems stated above, and I don't think it will evolve in that direction. And if it is mixed up with WAIS, how do you think the other fields of the question be filled ? (related documents, servers,...) Anyway, could somebody out there kindly send me the list of documents describing WWW & gopher ? Jungju __ _ _ _ | | / / |_| | Jungju Kim - jjkim@cosmos.kaist.ac.kr | | / / _ _________ | | |/ / | | | _ _ | | Tel : +82-2-962-8861 | |\\ \\ | | | | | | | | | Fax : +82-2-969-0239 | | \\ \\ | | | | | | | | | GUI Consortium Project / System Architecture Lab. |_| \\_\\ |_| |_| |_| |_| | Computer Science Department, KAIST,KOREA Newsgroups: comp.text.sgml Date: 10 Jun 1992 13:45:46 UT From: hehanninen@tnclus.tele.nokia.fi Organization: Nokia Telecommunications. Message-ID: <1992Jun10.154546.1@tnclus.tele.nokia.fi> Subject: ansi vs. iso while naming tags ? Hi ! We are defining tag-names for our technical manuals. Could somebody comment following issues ?: -Should we rest on f.exam. ANSI/NISO z39.59-1988 standard while specifying common tag names or ISO/IEC TR 9573 ? Those offer a bit different decision for tag-name strategy, f.exam. : ANSI/NISO ISO/IEC \ \ \ \ \

Heading level 1 and title \

Heading level 1 \ Heading level 1 title \

Heading level 2 \

Heading level 2 and title \ Heading level 2 title \ bold text \ bold text -How to take care of multilingual phrases etc. With Marked section or via atributes or with specified tags. Sgml-formed text will be imported to Interleaf (or FrameMaker). Need some piece of practical advice ! Many thanks ! Heimo H{nninen internet: HEHANNINEN@tnclus.tele.nokia.fi Newsgroups: comp.text.sgml Date: 10 Jun 1992 15:03:57 UT From: ""Wayne Wohler"" \ Message-ID: <9206101513.AA14868@ucbvax.Berkeley.EDU> Subject: Scope of the CURRENT attribute Regarding the scope of the specification of a "current" attribute, check 7.9.1.1 (329:3). In part "a", it states that "There need be an attribute specification only for a required attribute, and for a current attribute on the first occurrence of ANY ELEMENT IN WHOSE ATTRIBUTE DEFINITION LIST IT APPEARS." (emphasis is mine.) In addition, in part "b", it states that "the new default affects ALL ELEMENTS associated with the attribute definition list in which the attribute was defined." (again, emphasis is mine.) Based on these quotes from the standard its pretty clear that Erik has correctly coded his example. Wayne L. Wohler IBM Corp Publishing Systems Boulder, Colorado Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 10 Jun 1992 16:08:36 UT From: "dennis.r.vogel" \ Organization: AT\&T Message-ID: <1992Jun10.160836.12078@cbnewsj.cb.att.com> References: <1992Jun10.081743.19470@kum.kaist.ac.kr> Subject: Re: MIME for global hypertext From article <1992Jun10.081743.19470@kum.kaist.ac.kr>, by jjkim@kum.kaist.ac.kr (Jungju Kim (CSDept)): > > Anyway, could somebody out there kindly send me the list of documents > describing WWW & gopher ? > Please post the list to this newsgroup. There are other folks who are interested in this topic. Dennis R. Vogel AT\&T Bell Laboratories Middletown, NJ Newsgroups: comp.text.sgml Date: 10 Jun 1992 16:31:22 UT From: Brad Might \ Organization: HaL Computer Systems Message-ID: \ References: \ <23202A@erik.naggum.no> Subject: Re: Defaulting mechanism for CURRENT attributes In article <23202A@erik.naggum.no> erik@naggum.no (Erik Naggum) writes: Erik, I think we had discussions on this about a year ago, but I cannot find the mail or postings. Perhaps you have it. > From: erik@naggum.no (Erik Naggum) > Date: 9 Jun 92 19:44:45 GMT > > E.g. > > Given > > \ > \ zot NAME #CURRENT > > > > then > > \ > \ > > is identical to > > \ > \ > Not true: \ \ is an error (if this is the first occurrence of \ in the document. The following is the only reference I can find to back it up, but I think there is another (better) statement that I cannot locate at the moment: B.5.2.4 Changing Default Values (pg. 38 SGMLh) If the default value is specified as "CURRENT", the default will automatically become the most recently specified value. This allows an attribute value to be "inherited" by default from the previous element of the same type. ^----------------------^ Overview 4.4.3.1 8 NOTE -- The start-tag cannot be omitted for the first occurrence of an element with a current attribute. This does not say anything about the current attribute value being supplied by another element. -- - standard disclaimers apply - jbm@hal.com (Brad Might) HaL Computer Systems - (512)794-2855 \more fun than a barrel of macros Newsgroups: comp.text.sgml Date: 10 Jun 1992 23:31:58 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun10.233158.29502@news.eng.convex.com> References: <9206090036.AA12682@tti> Subject: Re: HTML vs. HyTime In article <9206090036.AA12682@tti> srn@elvin.UUCP (Steven R. Newcomb) writes: >Dan Connolly writes: > >> The WWW group is attempting to define a multimedia interchage >> format called HTML. It is intended to be an SGML language, but >> most existing HTML has never been through an SGML parser. > >Why create another SGML-based multimedia interchange format when HyTime >was just approved on May 1 as an International Standard for that very >purpose? > You got a public implementation I can use? At least for SGML I can grab the smgls package and go. The HyTime standard is all well and good, but 1) I can't even read it without buying hardcopy, and 2) even if I had a hardcopy, it's so involved that it would take me years to implement it. I'm not saying we should ignore it -- I'm just not doing anything that extensive. If the HyTime spec intersects or includes the functionality I'm after, I'd like to know what it's strategies are and how hard they are to implement. But I've seen parts of the standard and it looks even huger than SGML! Dan Newsgroups: alt.gopher,comp.infosystems.wais,comp.text.sgml,comp.mail.multi-media,comp.sys.next.programmer Date: 11 Jun 1992 01:47:11 UT Corp. The opinions expressed are those of the user and not necessarily those of CONVEX. From: Dan Connolly \ Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA Message-ID: <1992Jun11.014711.10853@news.eng.convex.com> References: <1992Jun7.042358.29367@news.eng.convex.com> <1992Jun10.081743.19470@kum.kaist.ac.kr> Subject: Re: MIME for global hypertext --cut-here In article <1992Jun10.081743.19470@kum.kaist.ac.kr> jjkim@kum.kaist.ac.kr (Jungju Kim (CSDept)) writes: >In article <1992Jun7.042358.29367@news.eng.convex.com> connolly@convex.com (Dan Connolly) writes: >> >>The MIME RFC defines a system for processing multi-part, multimedia >>messages on the internet. I would like to see these systems, along >>with USENET news and internet mail, interoperate with MIME as the substrate. >> >>Short of full SGML parsing, we could adopt the MIME text/richtext >>format, with the addition of a \...\ tag. >>In fact, any representation that allows the user to interactively indicate >>one of the attached body parts by content-id will do. For example, >>plain text with one-line descriptions would do. The Andrew ez >>data stream would also work, but only Andrew sites could parse it. >> >IMHO, I don't think it's that easy. There can be various kinds of >links in a document. One may possibly want to see the next part >or the previous part, or one can a have a question on the specific >region of the image he/she is watching. Sure, a client can implement all sorts of fancy semantics on top of a MIME-encapsulated document, especially if the structure is described by an SGML document. I don't see the conflict. > Richtext is not designed to >deal with the problems stated above, and I don't think it will evolve >in that direction. Ok, don't use RichText. You're right: it's more for formatting anyway. But interoperability is always a good thing. And if RichText doesn't cost much to implement, I might support it. > And if it is mixed up with WAIS, how do you think >the other fields of the question be filled ? (related documents, >servers,...) > Like this: SGML entites reference MIME body parts by ID. The body of the body part is the content of the entity. We use the expressive power of MIME, especially the message/external body semantics and the ghost-body area to get from one place to another in the global hypernet. The rest of this message, i.e. the second part of it, is a hypothetical global hypertext. Most of the semantics it demonstrates exist in avalable systems. For example, memtamail can be configured to call gopher (or even a shell script that calls telnet) to get the IMAP RFC. --cut-here Content-Type: multipart/X-HYPERTEXT; separator=attachment --attachment Content-Type: text/SGML \ \ \ \ \ ]> \example global hypertext\ \
    \
  • Here are some comments on the MIME draft: Your client might include it inline, or allow you to click an icon and show it in another window. \ \ \
  • here's an the IMAP RFC. Your client will probably display it in place of this text if you select this element: \ \
  • here's an image. Your client might display it in a separate window. If you have xfig, you can ftp the source and edit the graphic: \
    nifty graphic\
    \
  • Here's a database of MIME samples. Your client should show a description of the database and allow you toinvoke a wais client and make queries: \ \
--attachment Content-id: att1 Content-Description: "Keith Moor Re: comments on mime draft, esp. richtext" Content-Type: message/external-body; access-type=x-wais Content-Type: message ;; This is the ghost-body. The real body is external, i.e. ;; it's the content of the following WAIS document: (:document-id :score 1000 :document (:document :headline "Keith Moor Re: comments on mime draft, esp. richtext" :doc-id (:doc-id :original-database "/usr/spool/uucppublic/pub/wais-indices/mime-samples" :original-local-id "0 16412 /var/spool/uucppublic/pub/mime/samples/+kdPOk1u0M2Yt8hPuAh" :copyright-disposition 0 ) :source (:source-id (:source :version 3 :ip-name "wais.msen.com" :tcp-port 210 :database-name "mime-samples" ) :number-of-lines 638 :number-of-bytes 16412 :type "MIME" ;; obsolete, if I get my way :content-type "message" ) ) --attachment Content-id: att2 Content-type: message/external-body; access-type=gopher; host=bauhaus.micro.umn.edu; port=70; Content-type: text 0/RFC Reference/RFCs/rfc1064.txt --attachment Content-id: att3 Content-Type: multipart/alternative; boundary=illegal-base-64-string --illegal-base-64-string Content-Type: image/gif Content-transfer-encoding: base64 l3kj45l3k4j5l3k4j5l34kj534l5j34l5kj34l5kj34l5kj34l5kj ... base 64 data inline ... lkj2345lk34j5l34kj5l34kj534lk5j= --illegal-base-64-string Content-description: editable source to the figure Content-Type: message/external-body; access-type=anon-ftp; site="prep.ai.mit.edu"; name=/pub/stuff/thingy.xfig Content-type: application/x-fig --illegal-base-64-string-- --attachment Content-Type: multipart/parallel; boundary=alt --alt Content-type: text Server created with WAIS release 8 b3.1 on Feb 20 04:44:57 1992 by emv@midori.msen.com multimedia documents in internet MIME multimedia mail format. originally from thumper.bellcore.com:/pub/nsb/. type for each of these things is MIME. with a proper viewer this should yield voice, pictures, text, nice pretty formatted text, and smell-o-vision. --alt Content-Type: application/x-wais-source (:source :version 3 :ip-name "wais.msen.com" :tcp-port 210 :database-name "mime-samples" :cost 0 :cost-unit :free :maintainer "emv@cic.net" :description "Server created with WAIS release 8 b3.1 on Feb 20 04:44:57 1992 by emv@midori.msen.com multimedia documents in internet MIME multimedia mail format. originally from thumper.bellcore.com:/pub/nsb/. type for each of these things is MIME. with a proper viewer this should yield voice, pictures, text, nice pretty formatted text, and smell-o-vision. ") --alt-- --attachment-- --cut-here--
Newsgroups: comp.text.sgml Date: 11 Jun 1992 09:19:40 UT From: Brad Might \ Organization: HaL Computer Systems Message-ID: \ References: <9206101513.AA14868@ucbvax.Berkeley.EDU> Subject: Re: Scope of the CURRENT attribute The (below) quoted paragraph however begins with If "SHORTTAG YES" or "OMMITTAG YES" is specified on the SGML declaration: So what happens if they are not specified ? Are all standards this difficult and obtuse ? Even with all the cross referencing in the Handbook, it still takes a long time to find all relevant information about almost anything if you can even find it (Current attributes are not cross referenced to this section which is TOTALLY RELEVANT to the question). In article <9206101513.AA14868@ucbvax.Berkeley.EDU> WOHLER@BLDVM1.VNET.IBM.COM ("Wayne Wohler") writes: > Regarding the scope of the specification of a "current" attribute, check > 7.9.1.1 (329:3). In part "a", it states that "There need be an > attribute specification only for a required attribute, and for a current > attribute on the first occurrence of ANY ELEMENT IN WHOSE ATTRIBUTE > DEFINITION LIST IT APPEARS." (emphasis is mine.) In addition, in part > "b", it states that "the new default affects ALL ELEMENTS associated > with the attribute definition list in which the attribute was defined." > (again, emphasis is mine.) Based on these quotes from the standard its > pretty clear that Erik has correctly coded his example. > > Wayne L. Wohler > IBM Corp > Publishing Systems > Boulder, Colorado > -- - standard disclaimers apply - jbm@hal.com (Brad Might) HaL Computer Systems - (512)794-2855 \more fun than a barrel of macros Newsgroups: comp.text.sgml Date: 11 Jun 1992 11:26:10 UT From: Steve Pepper \ Organization: Falch Hurtigtrykk as, Oslo, Norway Message-ID: <1992Jun11.112610.3834@falch.no> References: <1992Jun10.154546.1@tnclus.tele.nokia.fi> Subject: Re: ansi vs. iso while naming tags ? hehanninen@tnclus.tele.nokia.fi writes: > -Should we rest on f.exam. ANSI/NISO z39.59-1988 standard while > specifying common tag names or ISO/IEC TR 9573 ? Terve Heimo! Some colleagues and I have recently prepared a Norwegian version of the 'General' DTD found in TR 9573. (We call it General DTD//NO or 'GenDok'.) The main purpose of the exercise for us was to try to establish a standard Norwegian nomenclature for the most common tag names. In the process we came up against the same questions you are asking. In addition to 'General', we looked closely at the usage in ANSI/NISO Z39.59-1988 (aka the AAP DTDs) and a beta (0.8) version of the 'Majour' DTD (Modular Application for Journals) being developed by EWS (European Workgroup on SGML). Here are some of the conclusions that I personally drew from this work: 1. Tag name length ------------------ The AAP DTDs were originally developed at a time when there were no SGML-aware editors around and almost all tagging had to be done manually. This (I believe) is the main reason for the tendency towards short and cryptic tags in the ANSI DTD. Today the situation is different: there are many editors that let you (or even force you to) do your tagging via menus, and that allow you to hide your tags if you wish. There is therefore no longer any reason to save single keystrokes (bdy/body, fm/frontm) at the expense of making the tag names less intuitive - especially with tags that only occur a handful of times in a particular document. The corresponding tags in GenDok are \ and \. Majour uses \
("front matter" is inappropriate in journal articles) and \. (TEI also uses \.) 2. Tags for font styles ----------------------- One of our main tasks in explaining SGML to authors is getting across to them the importance of separating content (structure) from appearance (processing). (Except where the appearance _is_ a part of the content, as in many TEI applications.) Allowing tags like \ for "bold" flies in the face of this and should be avoided. (Can you be sure that you will _always_ want your \ text formatted using bold face?) As an article by J. Sperling Martin in the March 1992 issue of EPSIG News tells us, the AAP team was a little at odds with itself here, allowing \, \ etc. (for 'emphasised text') in addition to \, \ etc. In GenDok we use \, \ etc. (for 'uthevet tekst'). Majour has opted for \..\. 3. Headed sections -- \

etc. ------------------------------- These were actually the GIs that gave us most trouble and the conclusion that I personally have come to is that the author of General cheated, the AAP team took the easy way out, and EWS are attempting to sort out the muddle! The confusion seems to stem from the fact that authors (especially those used to older generic markup schemes) think in terms of headings, subheadings and text rather than sections and sub- sections comprised of headings, paragraphs etc. When they put a tag in front of a heading - say \

- they think of it as a code for the heading itself. But in most SGML applications that is not the case. In General \

is a _headed section_ (level 1); the actual heading (or headed section _title_) is \. What I feel the author of this DTD did was to choose a GI that means one thing for the DTD ('headed section'), and something else for the author ('heading'). In so doing, he/she established a source of confusion that has some far-reaching consequences. The fact that you call \

'Heading level 1' in your message testifies to this, but you are not alone. Just take a look at the ways \ (n=0-4) is translated in 9573: in three of the five languages, \ is translated to something meaning 'title' or 'heading' level 'n' (German 'Ueberschrift', Swedish 'rubrik', Danish 'overskrift'). Only the French and Dutch translations accurately reflect the fact that we are talking about structured parts or sections (French 'element' (2x \é), Dutch 'onderdeel'). Interestingly, section 5.7 of 9573 does not include translations for \, \ etc. Why not? What would, say, the Danes have called these? Perhaps 'overskrift niveau 1 titel' (heading level 1 title)? What about the ANSI/NISO solution? The AAP team seems to have been hedging its bets (sometimes you think they have everything except the kitchen sink in their DTD). Here we find \

, \

etc. - but not as headed sections! The AAP calls them 'Head, level 1', etc., and they have no direct hierarchical function. They are treated as 'subsection elements' on a level with paragraphs (so there is nothing to stop you having as many assorted \ (n=1-4) elements as you want, in whatever order, more-or-less anywhere you can put a simple paragraph). In addition to \

, \

etc., ANSI/NISO has real sections and subsections, with corresponding (optional) section titles. They are called \, \ (n=1-3) and \ respectively. I have my doubts about the wisdom of mixing section titles and 'floating' heads in the same DTD, but at least the GI names and descriptions are not deliberately misleading, as they are in General. The EWS seems to be aware of all this confusion. At any rate they manage avoid it - by steering well clear of \

, etc. The proposed Majour 'body' consists of sections called \, \, \ and \. Each section has an optional number, \, and title, \. The price they pay is forcing the use of two tags, where ANSI and ISO, using tag omission, usually get by with one, that is: \\Heading/title instead of \

Heading/title My personal opinion is that EWS has the best solution (we have proposed \ etc. for the Norwegian GenDok), and I would strongly advise you to follow their example. By the way, the exercise of translating TR 9573's General DTD was very useful. Perhaps you should get together with the people from WSOY Information Systems (Jouko Riikonen?) and do a Finnish version? (If you'd like a copy of our GenDok, send me an email.) Cheers, Steve -- pepper@falch.no ------------------------------------------------------------------ falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway tel +47 (2) 163040 fax +47 (2) 162350 bbs +47 (2) 162650 Newsgroups: comp.text.sgml Date: 11 Jun 1992 14:34:05 UT From: "David J. Fiander" \ Organization: Mortice Kern Systems Inc., Waterloo, Ontario, CANADA Message-ID: <1992Jun11.143405.27527@mks.com> Subject: Changing parts of a concrete syntax I'm using sgmls 0.7, and have found the 8-character limit on names rather restrictive. The sgmls man page says that "[t]he upper limit on NAMELEN is 239." So how do I change it? Do I have to provide an entire SGML declaration just to change one quantity? -- David J. Fiander |The manager will be continually amazed that \ |policies he took for common knowledge are totally Mortice Kern Systems Inc. |unknown to some member of his team. Waterloo, Ontario, Canada | - Fredrick P. Brooks, Jr. Newsgroups: comp.text.sgml Date: 11 Jun 1992 16:11:11 UT From: jaap \ Reply-To: jaap@alice.UUCP () Organization: AT\&T, Bell Labs Message-ID: <23011@alice.att.com> References: <1992Jun10.154546.1@tnclus.tele.nokia.fi> <1992Jun11.112610.3834@falch.no> Subject: Re: ansi vs. iso while naming tags ? In article <1992Jun11.112610.3834@falch.no> pepper@falch.no (Steve Pepper) writes: > > Terve Heimo! > > Some colleagues and I have recently prepared a Norwegian version > of the 'General' DTD found in TR 9573. (We call it General DTD//NO > or 'GenDok'.) The main purpose of the exercise for us was to try > to establish a standard Norwegian nomenclature for the most common > tag names... And later on in section ``1. Tag name length'': > .... Today the > situation is different: there are many editors that let you (or > even force you to) do your tagging via menus, and that allow you > to hide your tags if you wish. There is therefore no longer any > reason to save single keystrokes (bdy/body, fm/frontm) at the > expense of making the tag names less intuitive - especially with > tags that only occur a handful of times in a particular document. This really confuses me. If in today's systems the tag names can be hidden form the user, why would anybody do the effort to translate these names of them? Why not translate only the way the tags are presented to the user? Isn't translating the tags as well just making thing needlessly complicated? It seems to me that when I have the General DTD in my system and you send me a document using the Norwegian version of the dtd, I need that dtd as well although the functionality might be the same as the one I have. And if this trent continues, I can soon expect French, Spanish, Finnish, Dutch, South-African and other language dtd's with no functional differences? I'm afraid I don't get it. It would be similar to translating in changing ``switch(){}'' into ``schakelaar(){}'' for the C-language. I know, in some french algol-60 compilers one can use french for begin ... end as well, but it always seemed to me hardly worth the trouble, and if you give away your program to a non-french compiler, you had to translate these things anyway. Of is there something that I miss? Please enlighten me. jaap Newsgroups: comp.text.sgml Date: 11 Jun 1992 16:54:02 UT From: wathu@violet.ccit.arizona.edu Reply-To: wathu@arizona.edu Message-ID: <1992Jun11.095402.1@violet.ccit.arizona.edu> Keywords: SGML, Converterss Subject: SGML Converters SGML Converters =============== We are in the process of setting up a WorldWideWeb (WWW) server for computer center documentation. Our current documents are in many different word processor formats, such as Ventura, WordPerfect (DOS), MS-Word (Mac) TeX, LaTeX, PostScript, RTF and etc. We would like to convert them to SGML so that we can link them to WWW. We would like recommendations for good products (commercial or free). If any of you have tried this type of conversions, please comment on your experiences. Thank you Wije Wathugala wathu@arizona.edu Newsgroups: comp.text.sgml Date: 11 Jun 1992 20:40:50 UT From: C. M. Sperberg-McQueen \ Organization: University of Illinois at Chicago Message-ID: <92163.154050U35395@uicvm.uic.edu> References: \ <23202A@erik.naggum.no> Subject: Re: Defaulting mechanism for CURRENT attributes > | Now, the attribute *att* of *elem* instance *d* defaults to the most > | recently specified value. Is this ZZZ (thus considering the > | left-to-right ordering of value specifications) or is this XXX (thus > | considering the top-down hierarchical ordering? > > I honestly think you would find ISO 8879 and SGML easier to deal with if > you didn't attempt to make it fit a mathematical model which wasn't > derived from it. ... For the record, I would like to say Erik is speaking for himself here, but not for me; I find Dr. Brueggemann-Klein's work on SGML, and her question here, extremely useful. I believe the intention of 8879 to be as described by earlier replies to the query (current value is that most recently specified in a depth-first left-to-right scan of the entire tree, not in a direct descent from the root). But the question is not nearly so uninteresting or obvious as Erik suggests. The standard says the value is the "most recently specified" value (clause 4.67, definition of "current attribute", which 11.3.4 says ATT in Dr. Brueggemann-Klein's example is). But what happened most recently depends in reality rather dramatically on what order you have been doing things in. The interpretations offered thus far assume you have been processing the text in a left-to-right, depth-first scan of the document tree. But the original question effectively asks whether that is guaranteed; what would happen if we processed the document in a breadth-first traversal? If there is any explicit specification in 8879 that an SGML parser *must* process a document through a left-to-right depth-first traversal of the document tree, I would very much like to know where it is. I don't think it exists: the standard doesn't even specify explicitly that the input has to be electronic characters (though it is hard to understand how to declare or recognize delimiters if it's anything else), and does explicitly say "This International Standard does not constrain the physical organization of the document ..." (clause 6.1 note 1). If one wrote an SGML parser that did not process the text left to right--and let us recall that there *are* parsing algorithms for other orders, including Unger's and the Cocke/Younger/Kasami (CYK) method--would one have the right to specify a different value for a CURRENT attribute from the value given by a left-to-right depth-first scan? In practice, I don't see how one can handle SGML in any way other than left-to-right scan of the data stream. But then, there's a lot I don't know. And the standard does say (Annex F) that 8879 does not require any particular implementation techniques or architecture. So perhaps we should have a little more patience with questions that seem to have obvious answers. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago Newsgroups: comp.text.sgml Date: 11 Jun 1992 21:17:54 UT From: C. M. Sperberg-McQueen \ Organization: University of Illinois at Chicago Message-ID: <92163.161754U35395@uicvm.uic.edu> Subject: Entity references in attribute values: a new conundrum Two related questions for the group, only slightly loaded. 1 is a conforming parser supposed to recognize entity references inside attribute values? (please cite chapter and verse) 2 do existing implementation in fact recognize entity references inside attribute values? To anyone who thinks they know the answer to the first question off hand: good, so did I, but then I found I was unable to back it up from the standard, and in fact at the passage I found the standard seems to be saying, quite clearly, the opposite of what I thought was the case, and think *should* be the case, and believe was *intended* by the drafters of 8879 to be the case. (If I am right, and the text says the opposite of what was intended, then we really ought to make sure it gets fixed in the revision!) In order to allow us all to go to the standard without preconceptions, I won't say what I thought was the case, at least not now. I would be very happy to see postings on this question from some of the implementers who read this list. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago Newsgroups: comp.text.sgml Date: 12 Jun 1992 04:03:29 UT From: David Durand \ Organization: Roberstool Research Labs Message-ID: \ References: <9206090036.AA12682@tti> <1992Jun10.233158.29502@news.eng.convex.com> Subject: Re: HTML vs. HyTime In article <1992Jun10.233158.29502@news.eng.convex.com> connolly@convex.com (Dan Connolly) writes: In article <9206090036.AA12682@tti> srn@elvin.UUCP (Steven R. Newcomb) writes: >Why create another SGML-based multimedia interchange format when HyTime >was just approved on May 1 as an International Standard for that very >purpose? > [stuff deleted] The HyTime standard is all well and good, but 1) I can't even read it without buying hardcopy, and 2) even if I had a hardcopy, it's so involved that it would take me years to implement it. I'm not saying we should ignore it -- I'm just not doing anything that extensive. If the HyTime spec intersects or includes the functionality I'm after, I'd like to know what it's strategies are and how hard they are to implement. But I've seen parts of the standard and it looks even huger than SGML! Dan HyTime is a framework that can be added to _any_ DTD, and you could write an application that uses a private markup language, but by attaching appropriate #FIXED attributes to your DTD, it could be accepted by a HyTime engine. Later on, when you got more ambitious you migth want to implement a HyTime engine of your own (sgmls could be a base for such...). HyTime is also a modular standard, which covers a huge scope -- _if_ all modules are included. For most hyptertext applications (those that do not involve sophisticated multi-media links) only a small portion of the standard is needed. I think (based on limited scanning of the approved version of the standard) that the "base", "location address", and "hyperlinks" modules would be needed. Whatever you might propose should at least fit the mold of HyTime -- It's not that hard to ensure at least _lack of in-compatibility_. Seriously, write any spec for your protocol that you want, but if you're going to try to set standards, you need to take pre-existing standards into careful account. HyTime _is_ a bear to get the hang of, but I think that this forum should be able to help you with specific problems as they arise. -- David Newsgroups: comp.text.sgml Date: 12 Jun 1992 07:11:08 UT From: Steve Pepper \ Organization: Falch Hurtigtrykk as, Oslo, Norway Message-ID: <1992Jun12.071108.7248@falch.no> References: <1992Jun10.154546.1@tnclus.tele.nokia.fi> <1992Jun11.112610.3834@falch.no> <23011@alice.att.com> Subject: Re: ansi vs. iso while naming tags ? jaap@alice.att.com (jaap) writes: > It seems to me that when I have the General DTD in my system and you > send me a document using the Norwegian version of the dtd, I need that > dtd as well although the functionality might be the same as the one I > have. And if this trent continues, I can soon expect French, Spanish, > Finnish, Dutch, South-African and other language dtd's with no > functional differences? I'm afraid I don't get it. Dear jaap, Have no fear! I wouldn't dream of sending you a document that used GenDok (the Norwegian version of General in TR 9573). If I had an instance of a GenDok document that you needed, and I knew you had the General DTD, I would convert it (perhaps using an explicit link) before transmitting to you. Because GenDok is a straight translation of General (warts and all :-), that would simply involve a one-to-one mapping of GIs. Having said that, I doubt whether I'll ever have a GenDok document to send you anyway. Our aim in doing the translation was not to end up with a DTD that people will actually use _as is_ (except perhaps for training and experimentation), but to try to establish some conventions for Norwegian tag names. Out in the big wide world (and even some places in Norway!), where the natural thing is to use English GIs, certain conventions have emerged. For example, when I see the tag \
  • I now automatically think of 'list item' and in 95% of cases I am right; \

    means 'paragraph' unless otherwise specified; etc. Conventions like this make it easier for document designers to read other people's DTDs, and they make life easier for the end user who must switch between different DTDs. Of course, things are far from perfect in the English-speaking world, as J. Sperling Martin points out in the article I mentioned in my original posting (EPSIG News, March 92): How many ways do you want to have to remember the correct tag for a "heading" -- \, \, \, \ and so on. (I've seen each of these in actual DTDs. The first is the Z39.59 flavor, and the last is from the TEI.) We should always keep in mind that in some instances an author writing an article for, say, the ACS Journal may also be preparing a manuscript for his latest book for Wiley... SGML is still in its infancy here in Norway, but things are starting to move fast and we wanted to try to nip this kind of chaos in the bud. Our starting point is that Norwegian authors ought to be able to use Norwegian tag names (unless there is a very good reason not to), and that they shouldn't be called upon to learn umpteen different GIs for elements that perform exactly the same role in different DTDs. We hope that by publishing GenDok under the auspices of the Norwegian SGML Users' Group, we will establish a standard practice - e.g. \ ('punkt') and \ ('avsnitt') as equivalents for \

  • and \

    . As to your suggestion about translating the way tag names are presented to the user, I agree - that would be one solution, for systems that allow it. But those that I have seen generally do not. They allow you to customise the _description_ of the tag, but not the actual tag name, nor the way it is presented on the screen when you choose 'Show Tags' instead of 'Hide Tags'. Thus, taking the \

  • element as an example, Author/Editor would allow me to add the (Norwegian) description 'Punkt i en nummerert eller unummerert liste' in the dialog box from which the tag would be chosen. But when I press Cmd-Spacebar for 'Show Tags', the screen representation would be [ li > and not (as my Norwegian users would like) [ pkt > . Hope this clears up the confusion. Cheers, Steve -- pepper@falch.no ------------------------------------------------------------------ falch hurtigtrykk a.s, postboks 130 kalbakken, n-0902 oslo, norway tel +47 (2) 163040 fax +47 (2) 162350 bbs +47 (2) 162650 Newsgroups: comp.text.sgml Date: 12 Jun 1992 12:11:52 UT From: hehanninen@tnclus.tele.nokia.fi Organization: Nokia Telecommunications. Message-ID: <1992Jun12.141152.1@tnclus.tele.nokia.fi> Subject: more about national DTDs >Message-ID:<23011@alice.att.com> >In article <1992Jun11.112610.3834@falch.no> pepper@falch.no (Steve Pepper) writes: > > > > Terve Heimo! > > > > Some colleagues and I have recently prepared a Norwegian version > > of the 'General' DTD found in TR 9573. (We call it General DTD//NO > > or 'GenDok'.) The main purpose of the exercise for us was to try > > to establish a standard Norwegian nomenclature for the most common > > tag names... > >And later on in section ``1. Tag name length'': > > > .... Today the > > situation is different: there are many editors that let you (or > > even force you to) do your tagging via menus, and that allow you > > to hide your tags if you wish. There is therefore no longer any > > reason to save single keystrokes (bdy/body, fm/frontm) at the > > expense of making the tag names less intuitive - especially with > > tags that only occur a handful of times in a particular document. > >This really confuses me. If in today's systems the tag names can be >hidden form the user, why would anybody do the effort to translate >these names of them? Why not translate only the way the tags are >presented to the user? Isn't translating the tags as well just making >thing needlessly complicated? > >It seems to me that when I have the General DTD in my system and you >send me a document using the Norwegian version of the dtd, I need that >dtd as well although the functionality might be the same as the one I >have. And if this trent continues, I can soon expect French, Spanish, >Finnish, Dutch, South-African and other language dtd's with no >functional differences? I'm afraid I don't get it. I agree with this. We are not going to modify the finnish version of GEN-DTD with finnish tag names. Let's say, it's against our commercial policy and people have gotten used to play with english style names f. exam. in W4W and IL. Last but not least, why should we upset foreigners, although text would be offered with finnish DTD. Like Steve wrote, we may also supply a description for cryptic tag names in editors 'help field'. I guess the situation here in Finland is quite similar to Norway. Thus it would be really useful to offer finnish DTD for people who are studying SGML technique !! Question: Is it necessary to include structural info in tag name (in prefix an in suffix) if you can see (and follow) the structure in DTD ? f.ex. %p.em.ph or %pharases Sun tan for everybody , Heimo Newsgroups: comp.text.sgml Date: 12 Jun 1992 14:24:50 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23205B@erik.naggum.no> References: <1992Jun12.141152.1@tnclus.tele.nokia.fi> Subject: Re: more about national DTDs The reason I, too, want national DTD's is that the people who keyboard the documents will want to use their natural language also for the markup. With conventional markup languages, this has not been possible without an incredible lot of work, and it is consequently seldom done. I am concerned about the "magic" involved if tag names are cryptic in a foreign language. People who want to express their ideas in written form will have to find SGML useful for their needs, and will have enough effort poured into the written expression of their ideas even before they would be forced to learn another bloody form of computer-imposed magic. I'm perfectly happy with English comments in the DTD, English parameter entity names, etc, because they are used by document designers and programmers, who _also_ have better things to do than to learn another language every time they turn around. Programmers are very special users, and making any sort of concession to them (and I'm talking about myself here, too), is likely to scare many users off. As regards document interchange: If the document language is a foreign to you, the tag names are the least of your problems. As far as SGML goes, we do have explicit link, and I'm providing an explicit link type declaration with the Norwegian General Document just for this purpose. (I notice, with some regret, that attributes can't be "renamed" with this feature, and will need special treatment.) That "everybody" knows English is no reason to believe that everybody in fact know English well enough to not feel intimidated by English tag names. SGML even provides a means to rename keywords of the language. A parser should be able to accomodate such trivial modifications for national needs. hehanninen@tnclus.tele.nokia.fi writes: | | Question: Is it necessary to include structural info in tag name | (in prefix an in suffix) if you can see (and follow) | the structure in DTD ? | | f.ex. %p.em.ph or %pharases Ah, but those are not tag names, those are parameter entities! I find this prefix and suffix thingy to be very useful for DTD designers, but users will seldom see, much less know about, them. Again, programmers and users should not be treated the same. | Sun tan for everybody , Heimo (How ironic that you should say that; I'm recovering from a bad case of sun burn... :-) Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 12 Jun 1992 15:10:01 UT From: ""Wayne Wohler"" \ Message-ID: <9206121721.AA02395@mammoth.Berkeley.EDU> Subject: Re: Scope of the CURRENT attribute Reference: \ -> The (below) quoted paragraph however begins with -> -> If "SHORTTAG YES" or "OMMITTAG YES" is specified -> on the SGML declaration: -> -> So what happens if they are not specified ? If SHORTTAG and OMITTAG are both NO, then attributes may not be omitted and therefore CURRENT becomes a mute point; there can be no defaulting. This makes some sense since these features would not be very useful if all attributes had to be specified. If both are NO, then the document needs to have all markup fully specified, including all attribute values. -> Are all standards this difficult and obtuse ? -> Even with all the cross referencing in the Handbook, it -> still takes a long time to find all relevant information -> about almost anything if you can even find it -> (Current attributes are not cross referenced to this section -> which is TOTALLY RELEVANT to the question). I don't have much experience with international standards outside of the SGML area. It is pretty clear the standards I have read are not intended to be user's guides! I understand your frustration with the standard, it does take some study to learn the terminology and to learn to navigate the standard. The definitive author's guide and DTD writer's guide hasn't been written yet. In general, I've found the handbook's indexing to be quite helpful; following the "current attribute" entry to page 328 was how I found the standard citations I used. That particular index entry should also have included page 329. Wayne L. Wohler IBM Corp Publishing Systems Boulder, Colorado Newsgroups: comp.text.sgml Date: 12 Jun 1992 15:58:53 UT From: ""Wayne Wohler"" \ Message-ID: <9206121721.AA02386@mammoth.Berkeley.EDU> Subject: Re: Defaulting mechanism for CURRENT attributes Reference: <92163.154050U35395@uicvm.uic.edu> > > | Now, the attribute *att* of *elem* instance *d* defaults to the most > > | recently specified value. Is this ZZZ (thus considering the > > | left-to-right ordering of value specifications) or is this XXX (thus > > | considering the top-down hierarchical ordering? > > > > I honestly think you would find ISO 8879 and SGML easier to deal with if > > you didn't attempt to make it fit a mathematical model which wasn't > > derived from it. ... > > For the record, I would like to say Erik is speaking for himself here, > but not for me; I find Dr. Brueggemann-Klein's work on SGML, and her > question here, extremely useful. I believe the intention of 8879 to be > as described by earlier replies to the query (current value is that most > recently specified in a depth-first left-to-right scan of the entire > tree, not in a direct descent from the root). But the question is not > nearly so uninteresting or obvious as Erik suggests. I also found Dr. Brueggemann-Klein's question interesting and after looking for a bit, did not find any clause in the standard to explicitly state that "most recently specified" meant "most recently specified in the linear SGML datastream" or some such. Erik gave some good reasons to surmise that the standard intends the interpretation he gave. All the systems I on which I have used current attributes take the linear view of SGML data and current attribute definition. That's no guarantee but it does mean a fair number of people have come to the same conclusion. > The standard says the value is the "most recently specified" value > (clause 4.67, definition of "current attribute", which 11.3.4 says ATT > in Dr. Brueggemann-Klein's example is). But what happened most recently > depends in reality rather dramatically on what order you have been doing > things in. The interpretations offered thus far assume you have been > processing the text in a left-to-right, depth-first scan of the document > tree. But the original question effectively asks whether that is > guaranteed; what would happen if we processed the document in a > breadth-first traversal? I have never thought of an SGML parser processing a document tree ... it provides the information that allows an application to build one. Like most people, I have always thought of the parser processing a linear sequence of characters. The best citation to support this is in 6.2 "Each SGML character is parsed in the order it occurs ...". There are also several (if not many) occurances of the words "first", "start", "order" which all support a linear, sequential view of SGML processing. Two examples: from 9.6.3 "Delimiter strings ... are recognize IN THE ORDER THEY OCCUR, with no overlap", from 9.6.4 "If multiple delimiter strings START with the same character". Delimiter recognition is particularly important since one cannot build a tree until the delimiters are recognized and their meaning applied. Another interesting case is USEMAP declarations in content (as opposed to within the DTD). The implementations I have seen change the map at the point the USEMAP declarations occur; it does not apply to data that occurred before the declaration in the same element. Reading 11.6, I don't see anything to support this interpretation. 11.6.3 hints at it when it talks about the current map being superceded by a short reference use declaration occurring in an instance of the element. Wayne L. Wohler International Business Machines Corporation Publishing Solutions Boulder, Colorado The opinions expressed are my own and do not represent the opinions of IBM. Newsgroups: comp.text.sgml Date: 12 Jun 1992 16:01:59 UT From: ""Wayne Wohler"" \ Message-ID: <9206121721.AA02398@mammoth.Berkeley.EDU> Subject: Entity references in attribute values: a new conundrum Reference: <92163.161754U35395@uicvm.uic.edu> > Two related questions for the group, only slightly loaded. > > 1 is a conforming parser supposed to recognize entity references inside > attribute values? (please cite chapter and verse) Check Figure 3. The LIT and LITA delimiters are recognized in TAG mode (which is the mode active as element tags are parsed). This sets LIT mode. You'll find that ERO is recognized in the LIT recognition mode. Attribute value literals may therefore contain entity references. That means, by the way, that entity references must occur within literal delimiters in attribute values like attribute='\&value' and not attribute=\&value. You can also check the definition for attribute value literal in 4.17 which states: "attribute value literal: A delimited character string that is interpreted as an attribute value by REPLACING REFERENCES and ..." (emphasis is mine). > 2 do existing implementation in fact recognize entity references inside > attribute values? All the parsers I've tried do. > To anyone who thinks they know the answer to the first question off > hand: good, so did I, but then I found I was unable to back it up from > the standard, and in fact at the passage I found the standard seems to > be saying, quite clearly, the opposite of what I thought was the case, > and think *should* be the case, and believe was *intended* by the > drafters of 8879 to be the case. (If I am right, and the text says the > opposite of what was intended, then we really ought to make sure it gets > fixed in the revision!) What passage was that? Newsgroups: comp.text.sgml Date: 12 Jun 1992 16:53:55 UT From: Joachim Schrod \ Organization: TU Darmstadt Message-ID: <1992Jun12.165355.26433@infoserver.th-darmstadt.de> References: <2294@exua.exeter.ac.uk> Subject: Re: Sample SEMA DTDs available In article <2294@exua.exeter.ac.uk>, MGPopham@exua.exeter.ac.uk (Mike Popham) writes: > A new directory (write-it.dtds) has been added to the SGML archive > held at The SGML Project (c/o University of Exeter, UK). Exeter's connection is sometimes not the best. You can also fetch it from ftp.th-darmstadt.de [130.83.55.75], directory pub/text/sgml/DTD. Changes are updated automatically each Sunday night. Enjoy. -- Joachim =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Joachim Schrod Email: schrod@iti.informatik.th-darmstadt.de Computer Science Department Technical University of Darmstadt, Germany ``How do we persuade new users that spreading fonts across the page like peanut butter across hot toast is not necessarily the route to typographic excellence? -- Peter Flynn Newsgroups: comp.text.sgml Date: 12 Jun 1992 17:21:40 UT From: C. M. Sperberg-McQueen \ Organization: University of Illinois at Chicago Message-ID: <92164.122141U35395@uicvm.uic.edu> Subject: national language versions of DTDs -- and architectural forms Those interested in national-language versions of SGML DTDs may be interested in the following method, developed for the TEI, which simplifies the task of generating national-language versions (or any alternate-name versions) of a DTD. The credit for this goes to the TEI's Metalanguage and Syntax committee, headed by David Barnard of Queen's University, Ontario. The method is simple. 1 take the canonical DTD and rewrite it, substituting a suitable parameter entity reference for every occurrence of every generic identifier. The parameter entity needs to have a predictable form, which leads to no ambiguities or name clashes. E.g. prepend 'n.' to the gi itself, so that \ \ \ is rewritten \ \ \ 2 for each such parameter entity, supply an appropriate definition: \ \ \ That's it. To rename an element, the user need only supply an overriding parameter entity reference in the DTD subset: \ \ \ ]> It is also easy to provide, in advance, sets of translation equivalents which the user can embed if desired: \ %names.nor; ]> And of course one can also rename gis if one prefers different names, for whatever reason, as long as one avoids name conflicts. This may impose a burden on a processor written to be aware of a particular tag set; how does it know that your 'pkt' is what it knows as 'item'? The (current) TEI solution to this is to borrow from the HyTime notion of architectural forms: each TEI element has an attribute of the name 'TEI.form' which provides the 'canonical' form of the element's GI. (At the moment, the canonical form is expected to be the English-language GI used in the Guidelines. There is a small but vocal minority of users who are plumping hard for numeric identifiers instead -- sort of like MARC field numbers.) So the TEI definition of ITEM might look like that above, with an attribute list declaration something like this (simplifying slightly): \ Since when you redefine the entity 'n.item' you leave the attribute list untouched, the value of TEI.form is still 'item'. So a TEI-aware processor can know how to process your \ elements, without much fuss, simply by checking the value of TEI.form. Attributes can also be renamed legally in TEI-conformant documents, but there is no way (yet?) for a TEI-aware processor to know what the new names mean. Suggestions welcome. We have not taken over all the details of architectural forms, in part because we are engaged in an ongoing discussion within the TEI as to just what those details are. So some possible uses of TEI.form attributes will not be specified in the next draft of the TEI Guidelines. In particular, it is not clear what the rule should be if the user *modifies* an element declaration. If I modify a list to *require* a head (title), should I provide TEI.form as an attribute with the value 'list'? May I? What if I require something else not present in the original -- say, if I require each item to have a preceding enumerator element? One possible rule is: TEI.form gets its old original value only if the only change made to the element or its attributes is a renaming. If the content model or attribute list change, TEI.form should not be defined. This rule effectively says that if I specify 'foo' as the value for TEI.form on an element called BLORT, it means that means I am using the exact content model and attlist of FOO as defined in the TEI Guidelines. This will probably be the rule in the TEI Guidelines version 2. A second rule is like the one I believe HyTime specifies (unless it changed since Steve Newcomb's article in CACM): I can specify TEI.form=foo for my element BLORT if, and only if, every instance of BLORT could be parsed with the TEI content model for FOO, and if no attributes (? or no *required* attributes?) are dropped from the attlist declaration. This content model effectively says that defining 'TEI.form=foo' for BLORT means that all instances of BLORT would be acceptable as \s under the TEI declarations, hence that BLORT is a sub-class of the TEI \ class. (Technical note: the restriction is on instances of BLORT, not on the content model of BLORT. I presume this is to avoid burdening the user with attempting to prove that one regular expression defines a sublanguage of another regular expression, which I understand to be hard, if not intractable.) A third rule is that one can specify TEI.form=foo whenever one likes, without regard for the changes made. If I specify 'TEI.form=foo' for my BLORT element, then, I am saying only "to understand what a BLORT is all about, read the (prose) description of \ in the TEI documentation". If an instance of BLORT turns out not to be parsable as a FOO, well, that just shows that although all BLORTs are FOOs *semantically*, they may not be *syntactically* a subset of FOOs. A processor is supposed to make a best-guess attempt to process a BLORT using its routines for FOO; this may mean ignoring any material it doesn't know what to do with. If the processor cannot manage this, then it has the legal right to issue a warning or error message saying 'I cannot process this BLORT as a FOO; entering default processing mode' and abend or switch to a very simple processing routine. Some TEI participants who read this group may wish to refine my description of these possible rules; others are welcome to offer their opinions on which of these rules makes most sense to them. If anyone can tell us clearly what restrictions HyTime places on uses of architectural forms, I'd be grateful for that too. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago Newsgroups: comp.text.sgml Date: 12 Jun 1992 19:23:08 UT From: Eric Freese \ Organization: Mead Data Central, Dayton OH Message-ID: <1992Jun12.192308.10569@meaddata.com> Subject: #CONREF The international standard is a little vague on the use of #CONREF and the effects of its use. I know that when the attribute which is declared to have a #CONREF value is explicitly stated, the element with the ID referenced becomes the content of the referencing element. Does this include the start and end tags of the referenced element or just the content? For example: DTD: \ \ Instance: \test data\ \\ When the above example is resolved, is it equivalent to: \test data\ \\test data\\ or \test data\ \test data\ I defining a FOSI for a DTD which has some #CONREF elements and can't determine what the contexts of the referenced elements would be. Thanks in advance for any help. -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | Eric Freese | My thoughts are my own | (513) 865-6800 x5311 | | Mead Data Central | and are not necessarily | Lead Software Engineer | | P.O. Box 933 | shared by Mead Data | Source Packaging Systems | ! Dayton, Ohio 45401 | Central, Inc. | ericf@meaddata.com | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ! There are no perfect men in this world; just perfect intentions. | | - Morgan Freeman as "Azeem" in "Robin Hood: Prince of Thieves" | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Newsgroups: comp.text.sgml Date: 12 Jun 1992 21:34:12 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23205E@erik.naggum.no> References: \ <9206121721.AA02395@mammoth.Berkeley.EDU> Subject: Re: Scope of the CURRENT attribute Brad Might \ writes: | | Are all standards this difficult and obtuse ? | Even with all the cross referencing in the Handbook, it still takes | a long time to find all relevant information about almost anything | if you can even find it (Current attributes are not cross referenced | to this section which is TOTALLY RELEVANT to the question). Wayne Wohler \ writes: | | I don't have much experience with international standards outside of | the SGML area. It is pretty clear the standards I have read are not | intended to be user's guides! I understand your frustration with | the standard, it does take some study to learn the terminology and | to learn to navigate the standard. The definitive author's guide | and DTD writer's guide hasn't been written yet. I have worked with standards of various breeds for about 5 years, and SGML is better than most at several points. It's worse than others at only one point, and that's the changes to the syntax of the language according as various features are turned on or off. This is precisely the point which produces Brad's frustration. SGML is more consistent than all the other standards of a comparable level of complexity that I have worked with, but the flip side of this is that it takes an inordinately high level of concentration and attention to detail to make use of this consistency. Learning the terminology where one word carries so much information is costly, but once you do, you also have much better odds to find what you're looking for, because you know what it's called. Personally, I think Goldfarb's Handbook is nothing less than essential to understand SGML, but the index doesn't maintain the high quality of the body of the book. I have several times found items where an index entry should have pointed, but didn't, and this is frustrating. I do my best to use SGML's terminology when I write technical comments, so that others can look things up, and also perhaps make connections between SGML terms. However, it often takes the most time to translate a question into this terminology. On the other hand, coming to SGML with several preconceived notions about what SGML terms mean is asking for trouble. Take "ambiguity" as an example, where those who know this term well from language theory will get confused by the SGML usage of the term unless they're willing to listen to SGML's definitions and dispense with the previous definition. Ideally, this shouldn't be difficult at all -- we're used to call similar things by various conflicting names and notations in all areas of human knowledge. It must be our great task, then, to reach behind the names and notations and grok the real meaning, and turn around and express it again with suitable names and notations. Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 12 Jun 1992 23:14:09 UT From: C. M. Sperberg-McQueen \ Organization: University of Illinois at Chicago Message-ID: <92164.181410U35395@uicvm.uic.edu> Subject: tree-based inheritance of attribute values Anne Brueggemann-Klein's posting about #CURRENT attributes raises a question I didn't address in my earlier posting. If #CURRENT provides inheritance from the attribute value specified most recently in a left-right depth-first scan of the tree, and not inheritance from a parent or ancestor element, one may ask: is there any way to obtain the second sort of behavior with SGML? One of SGML's great strengths is its representation of documents as trees; it would be nice if one could use that tree and its structure in specifying attribute defaults, etc. The answer, as far as I know, is no. But readers of this group may be interested in a convention the TEI has adopted to address this problem. The TEI provides a global attribute LANG, the value for which indicates the natural language of an element's contents. If a chapter is in English, by and large the paragraphs of that chapter will also be in English; if a chapter is in German, its paragraphs are likely also to be in German. We want to exploit this fact in specifying the default value of LANG. If an element specifies a value for LANG, that value is accepted. If no value is specified, the LANG value of the parent element is to be inherited. (On the document element, a LANG value is logically required.) In version 1 of the TEI Guidelines, we simply defined LANG with a declared value of CDATA and a default of #IMPLIED. Many uses of #IMPLIED really mean the attribute value is optional; this one really does mean it is implied: the processor can infer it from the rest of the document. On the other hand, it would be nice if the specification of the default value were able to indicate the specific algorithm used to calculate a value for the attribute. So in version 2 of the Guidelines, we expect to stress the specific algorithm involve here by specifying the global LANG attribute with an attribute definition like lang CDATA '*INHERITED' The special (magic?) value '*INHERITED' is to be interpreted by TEI-aware processors as a signal to take the LANG value from the parent element. (We considered using '#INHERITED', to make a closer analogy to #CURRENT and so on, but decided against using the SGML reserved name indicator, in order not to have to try to explain to SGML novices why #CURRENT can be used as is in the DTD, while '#INHERITED' requires quotation marks. It seemed simpler to use a star as a sort of TEI-magic-word marker.) It would be nice to have similar facilities in SGML, though one might experience some difficulty devising a nice notation to distinguish - attributes which take their default from a parent or ancestor of the same GI (or of any GI used in the ATTLIST declaration) - attributes which take their default from any parent or ancestor with an attribute of this name (such as the TEI global LANG attribute). Of course, one has the same problem with #CURRENT, or would have if one wanted to specify inheritance from any element which happened to have an attribute of this name. If ATTLIST declarations could be repeated with additive effect, of course, this problem would go away. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago Newsgroups: comp.text.sgml Date: 13 Jun 1992 00:20:53 UT From: C. M. Sperberg-McQueen \ Organization: University of Illinois at Chicago Message-ID: <92164.192053U35395@uicvm.uic.edu> Subject: entity references in attribute values Several readers of this group have asked what I found in 8879 about entity references in attribute values that was so mysterious, or hard to understand, or confusing. The answer: clause 11.3.3, which says that if the attribute is defined with declared value of CDATA, then the attribute value is 'character data' -- which means no markup is recognized in it, which means no entity references are recognized in it. Now, I knew that every implementation I had ever seen or heard of did in fact recognize entity references, including Goldfarb's ARCSGML, so I was puzzled and thought perhaps this was just a leftover from the days before character data had been subdivided into parsed character data, and replaceable character data, and the various other flavors of character data -- i.e. when they made the definition of 'character data' more restrictive they forgot to update 11.3.3, so it says the wrong thing. But I was a bit worried as well: I could not find any way to avoid the conclusion that what looked like a simple error in editing the standard had led to a really problematic rule. (I did not see, at the time, clause 4.17, the definition of 'attribute value literal'.) I 'knew' that entity references had to be legal, but couldn't prove it. And we have just had several demonstrations here that sometimes we 'know' things about 8879 that turn out to be false -- like knowing that the SGML declaration is always in ISO 646 IRV. So I admit it, I was very worried. Before you laugh at me, answer me honestly: how many of you know *now*, before I tell you and before you look at the text again, why 11.3.3 does *not* mean that SGML parsers should not recognize entity references in attribute value specifications? If there are *any* of you, then my hat's off to you. To make a long story short, Wayne Wohler appears to be right, and entity references are legal in attribute value specifications (but only with quotation marks around them). The story of how I persuaded myself of this fact after posting my query to the net can be read in the appended exercise in SGML exegesis. I leave to the reader to decide whether any technical specification, even an ISO standard, should require this kind of analysis. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago ------ Sic et Non: on Entity References in Attribute Values (with apologies to Thomas Aquinas) Question: whether entity references are recognized in attribute values? I. It would seem the answer is NO, because 1 Either the attribute has a declared value of CDATA, or it has a declared value of ENTITY, ENTITIES, ID, IDREF, IDREFS, NAME, NAMES, NMTOKEN, NMTOKENS, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, NOTATION, or a name group. 2 If the attribute's declared value is not CDATA, it would appear entity references cannot be recognized, because: 21 The tokens in these types of values are defined as having the lexical type NAME. 22 Clause 9.3, production 55 defines NAME as a series of name start characters and name characters, and does not define them as containing entity references. 23 Therefore, 2 is correct: entity references cannot be recognized in values of attributes with declared value other than CDATA. 3 If the attribute's declared value is CDATA, it would appear entity references cannot be recognized, because: 31 Clause 11.3.3 says that if the declared value of an attribute is CDATA, "the attribute value is character data". 32 Entity references appear not to be recognized in character data. 321 Markup is not recognized in character data. 3211 Clause 9.2 defines character data as a sequence of data characters. It opposes it, in this way, to 'replaceable character data', defined in 9.1 as containing data characters, character references, general entity references, and entity end signals. 3212 Goldfarb, in his commentary on 9.2 (p. 344 of Handbook) says "no markup will be recognized in character data other than the delimiters that would terminate the character data." 3213 Clause 4.33 defines character data as "Zero or more characters that occur in a context in which no markup is recognized, other than the delimiters that end the character data. Such characters are classified as data characters because they were declared to be so." 3214 Therefore, 321 is correct: Markup is not recognized in character data. 322 Entity references are markup. 3221 The note to clause 4.183 identifies "references" as being one kind of markup. 3222 Clause 4.144 defines a general entity reference as a named entity reference to a general entity. 3223 Clause 4.205 defines a named entity reference as an entity reference. 3224 Clause 4.124 defines an entity reference as a reference. 3225 Clause 4.256 defines a reference as 'Markup that is replaced by other text ...' 3226 Therefore, 322 is correct: entity references are markup. 323 Therefore, 32 is correct: entity references are not recognized in character data. 33 Therefore, 3 is correct: entity references cannot be recognized in values of attributes with declared value of CDATA. 4 No matter what the declared value of the attribute, entity references cannot be recognized within its value, because: 41 Clause 7.9.3 (note after production 34) says "Interpretation of an attribute value literal occurs as though the attribute were character data, regardless of its actual declared value." 42 Entity references are not recognized within character data (see statement 32 above). II. It would seem the answer is YES, because 1 Clause 7.9.3 defines 'attribute value specification' as either 'attribute value' or 'attribute value literal'. 2 If the attribute value is specified as an attribute value literal, entity references are recognized, because: 21 Clause 7.9.3 says "An attribute value literal is interpreted as an attribute value by replacing references within it, ignoring Ee and RS, and replacing an RE or SEPCHAR with a SPACE." 22 Clause 4.17 defines 'attribute value literal' as "A delimited character string that is interpreted as an attribute value by replacing references and ignoring or translating function characters." III About the arguments against the proposition, we may say: 1 All statements made are correct, but apply only to the 'attribute value' supplied or derived for the attribute. 2 Clause 7.9.3 specifies that attribute values may be specified either directly, or as attribute value literals. A value specified as an attribute value literal is processed by the parser into a determinate attribute value. 3 The specification that attribute values are treated as character data (rather than replaceable character data) therefore applies only to the end product of the processing specified in 7.9.3, not to the attribute value literal possibly provided in the document instance. IV Conclusions 1 The current formulations of 8879 do not in fact require entity references to be unrecognized in attribute value specifications. 2 They do however require a Talmudic or Jesuitical process to unravel, in order to establish that fact. 3 The revision of ISO 8879 should eliminate the misleading use of the term 'attribute value', either by reformulating all the sections on declared value specifications and attribute value specifications, or by introducing a suitably unambiguous term such as 'internal' or 'processed attribute value'. Newsgroups: comp.text.sgml Date: 13 Jun 1992 00:41:48 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23206A@erik.naggum.no> References: \ <23202A@erik.naggum.no> <92163.154050U35395@uicvm.uic.edu> Subject: Re: Defaulting mechanism for CURRENT attributes C. M. Sperberg-McQueen \ quotes me: | | > I honestly think you would find ISO 8879 and SGML easier to deal with if | > you didn't attempt to make it fit a mathematical model which wasn't | > derived from it. ... I feared something like this would happen. Therefore, I did in fact write the following _second_ sentence of the above quoted paragraph: || On the other hand, if you could formulate a mathematical model || consistent with the standard, it would probably be much appreciated || by the entire SGML community. The key here is "a mathematical model WHICH WASN'T DERIVED FROM [SGML]", and the attendant emphasis on what I _would_ find more valuable than attempting to shoe-horn SGML into something it was never meant to be even associated with, namely a formal model for what SGML does in fact say. This discussion talks about the tree which SGML _represents_, not about SGML's representation of that tree. To me this difference is truly obvious. C. M. Sperberg-McQueen \ writes: | | For the record, I would like to say Erik is speaking for himself | here, but not for me; I find Dr. Brueggemann-Klein's work on SGML, | and her question here, extremely useful. I believe the intention of | 8879 to be as described by earlier replies to the query (current | value is that most recently specified in a depth-first left-to-right | scan of the entire tree, not in a direct descent from the root). | But the question is not nearly so uninteresting or obvious as Erik | suggests. I think we're losing track of the context in which SGML was defined, and maybe even losing track of what ISO 8879 does define: a language to represent a document structure, whatever it might be. Again, the difference between object represented and its representation is important. Just like a noun is not the thing itself, like a C source program is not the executing process, like a sequence of printed glyphs is not the meaning of the sentence they form if read by a human, a character stream is not the document structure it represents according to ISO 8879. Why is this a problem? | The standard says the value is the "most recently specified" value | (clause 4.67, definition of "current attribute", which 11.3.4 says | ATT in Dr. Brueggemann-Klein's example is). But what happened most | recently depends in reality rather dramatically on what order you | have been doing things in. The interpretations offered thus far | assume you have been processing the text in a left-to-right, | depth-first scan of the document tree. But the original question | effectively asks whether that is guaranteed; what would happen if we | processed the document in a breadth-first traversal? But we're not processing a document tree, we're processing a character stream, which, under a suitable interpretation, can give rise to an understanding of a document tree in the mind of the reader, or to another representation of a document tree in a parser or application, which just formalizes that understanding. What you want to do with that document tree afterwards does not, and can not, concern SGML. SGML is a language which can be used to represent such a tree in a character stream, and certain features are useful in that context only. The current attribute is among them, as is markup minimization, short references, indeed all entity references, even the very concept of markup recognition. There is no _markup_ in the interpreted character stream, because the document structure (ESIS) have other means to express what the markup expressed in its linear, sequential form. | If there is any explicit specification in 8879 that an SGML parser | *must* process a document through a left-to-right depth-first | traversal of the document tree, I would very much like to know where | it is. The standard views an SGML document as a sequence of characters, which are interpreted as markup or data according to the rules of the standard. See clause 6 for the details. You claim that there is room for a different interpretation, which I frankly think is pure fantasy. In particular, the text in sub-clause 6.2 SGML Entities (Goldfarb [297:1]) could not be clearer on this issue. | I don't think it exists: the standard doesn't even specify | explicitly that the input has to be electronic characters (though it | is hard to understand how to declare or recognize delimiters if it's | anything else), and does explicitly say "This International Standard | does not constrain the physical organization of the document ..." | (clause 6.1 note 1). You have to rip that sentence pretty far out of its context if you think it refers to the character-streamness of the entities _as_ _SGML_ _sees_ _them_. The physical organization of the document is indeed irrelevant to SGML, as long as the parser will see the contents of each entity, character by character until it is exhausted (at which point the Entity end occurs), all clearly sequential. See production [4] SGML text entity for the most eloquent example of this. | If one wrote an SGML parser that did not process the text left to | right--and let us recall that there *are* parsing algorithms for | other orders, including Unger's and the Cocke/Younger/Kasami (CYK) | method--would one have the right to specify a different value for a | CURRENT attribute from the value given by a left-to-right | depth-first scan? We don't _have_ a "left-to-right depth-first scan", dammit! We have a character stream which, if _interpreted_ _by_ _a_ _parser_ (according to ISO 8879) can be _regarded_ _as_ a left-to-right depth-first view of a tree, but only _after_ the markup is recognized, all attributes are specified, all elements have their start and end established, etc. I think, honestly, that if SGML had been intended to be parsed in any way but the obvious sequential, the standard would have said so. One of the main things I learned when I first started to read the peculiar prose that standards make, is that the standard says what the standard says, nothing more and nothing less, that any interpretation must be confined to the context in which the standard _does_ say something, that any inference from what the standard does _not_ say is, ipso facto, invalid, and that it's always the reader's responsibility to understand the context in which the specifications and requirements of the standard have meaning. As a corollary to the last: that context can without harm be treated as always differing between any two standards. The context can, however, often be established by reference to related standards. This process is naturally recursive, but does converge. My point is that this is _identical_ to any other field of human endeavor, and we should hone out abilities to enter a new terminology-universe without too much baggage. As a result, we have to be even more careful about using prior knowledge from other contexts in new standards, because we have to check that the knowledge applies in the new contexts at every single point. Yes, this makes reading standards incredibly hard work. It was incredibly hard writing it, too. (Sorry, I just had to.) | In practice, I don't see how one can handle SGML in any way other than | left-to-right scan of the data stream. But then, there's a lot I don't | know. And the standard does say (Annex F) that 8879 does not require | any particular implementation techniques or architecture. Of course it doesn't require any such thing, and you're free to parse an SGML document's characters in any random order, as long as the result is according to ISO 8879, which does specify that things happen in a certain order. See 5.2 Ordering and Selection Symbols for the crucial evidence about the syntax of the productions. I regard your "but then, there's a lot I don't know", as an instance of the argumentum ad ignorantiam fallacy in rhetoric, and challenge you to show me what you _do_ _know_ which allows you to interpret the standard contrary to what I say about this particular issue. | So perhaps we should have a little more patience with questions that | seem to have obvious answers. I fully agree that we should not dismiss such questions immediately or without consideration, and looking behind the obvious can oftentimes result in very valuable discoveries. Fundamental parts of philosophy, such as epistemology, indeed consist of taking a careful look at what people would regard as obvious. This doesn't mean that questioning the obvious will always result in valuable discoveries, and most often, the obvious is the obvious, and nothing more. I prefer to cut through the noise if I can establish beyond doubt that that is what it is. I really wonder why this has become a topic of discussion. Brueggemann- Klein's original question was relevant, but we've far digressed beyond the point of relevance if we start discussing alternative ways to parse formal languages when we're dealing with a markup language for text documents, or question basic truisms. If I sound unduly pissed, it's probably because my sun burn is itchy. Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 13 Jun 1992 01:05:29 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23206B@erik.naggum.no> References: <92163.161754U35395@uicvm.uic.edu> Subject: Re: Entity references in attribute values: a new conundrum C. M. Sperberg-McQueen \ writes: | | 1 is a conforming parser supposed to recognize entity references inside | attribute values? (please cite chapter and verse) I assume you really mean "attribute value specification", and under that assumption the answer is clearly "Yes". Let's walk down the syntax hierarchy (page reference to Goldfarb's Handbook in brackets): Clause 7.4, production 14, start-tag, references attribute specification list. [314] Clause 7.9, production 31, attribute specification list, references attribute specification. [327] Clause 7.9, production 32, attribute specification, references attribute value specification. [328] Clause 7.9.3, production 33, attribute value specification, references attribute value literal. [331] Clause 7.9.3, production 34, attribute value literal, references replaceable character data. [331] Clause 9.1, production 46, replaceable character data, references general entity reference. [343] If you really meant "attribute value", the answer is equally clearly "No". See also Goldfarb, 7.9.3 Attribute Value Specification [330], in particular the first paragraph on page 331. It's interesting to note that Goldfarb says that the distinction between attribute value and attribute value literal "is a very important and sometimes misunderstood distinction." | To anyone who thinks they know the answer to the first question off | hand: good, so did I, but then I found I was unable to back it up from | the standard, and in fact at the passage I found the standard seems to | be saying, quite clearly, the opposite of what I thought was the case, | and think *should* be the case, and believe was *intended* by the | drafters of 8879 to be the case. (If I am right, and the text says the | opposite of what was intended, then we really ought to make sure it gets | fixed in the revision!) I don't know what you're talking about. It took me about 65 ms to realize that the answer to your question would be in clause 7.9.3, but I have no idea whatsoever what passage you're talking about. It couldn't be 7.9.4, could it? | I would be very happy to see postings on this question from some of | the implementers who read this list. I hope you're happy, then. :-) Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 13 Jun 1992 01:25:51 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23206C@erik.naggum.no> References: <92164.122141U35395@uicvm.uic.edu> Subject: Re: national language versions of DTDs -- and architectural forms C. M. Sperberg-McQueen \ writes: | | Those interested in national-language versions of SGML DTDs may be | interested in the following method, developed for the TEI, which | simplifies the task of generating national-language versions (or any | alternate-name versions) of a DTD. The credit for this goes to the | TEI's Metalanguage and Syntax committee, headed by David Barnard of | Queen's University, Ontario. | | The method is simple. Indeed, and I like it. Thank you for sharing this with us. | So the TEI definition of ITEM might look like that above, with an | attribute list declaration something like this (simplifying slightly): | | \ | | Since when you redefine the entity 'n.item' you leave the attribute list | untouched, the value of TEI.form is still 'item'. So a TEI-aware | processor can know how to process your \ elements, without much | fuss, simply by checking the value of TEI.form. | | Attributes can also be renamed legally in TEI-conformant documents, but | there is no way (yet?) for a TEI-aware processor to know what the new | names mean. Suggestions welcome. How about a fixed attribute which contains a list of pairs of names, the first being the canonical attribute name or attribute value, and the second the name or value specified in the attribute definition list? E.g., \ I note, with some regret, that it's not possible to parameterize the attribute value completely with a parameter entity, since the attribute specification will only interpret general entity references, whereas the markup declaration will interpret parameter entity references. Something like this will be needed: \ \ \ This may be unduly messy. Occam's razor may apply: Entities should not be multiplied. Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 13 Jun 1992 02:06:14 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23206D@erik.naggum.no> References: <92164.192053U35395@uicvm.uic.edu> Subject: Re: entity references in attribute values C. M. Sperberg-McQueen \ writes: | | Several readers of this group have asked what I found in 8879 about | entity references in attribute values that was so mysterious, or | hard to understand, or confusing. The answer: clause 11.3.3, which | says that if the attribute is defined with declared value of CDATA, | then the attribute value is 'character data' -- which means no | markup is recognized in it, which means no entity references are | recognized in it. Hmmm. This is an example of why I really love Goldfarb's Handbook. At [423:18], we get a pointer to "attribute value" at [333:1], so we go look that up, and also look it up in the index, and find "used in production, 331:2", then read all of 7.9.3, and voila! there is no problem. Since I already knew that, I may be somewhat biased, though. | Before you laugh at me, answer me honestly: how many of you know *now*, | before I tell you and before you look at the text again, why 11.3.3 does | *not* mean that SGML parsers should not recognize entity references in | attribute value specifications? If there are *any* of you, then my | hat's off to you. I take pride in the fact that I knew the answer. The reason is that I spent a fair amount of time trying to figure out the note starting at [331:14] some time ago. I think this note should be removed. | To make a long story short, Wayne Wohler appears to be right, and entity | references are legal in attribute value specifications (but only with | quotation marks around them). The story of how I persuaded myself of | this fact after posting my query to the net can be read in the appended | exercise in SGML exegesis. I leave to the reader to decide whether any | technical specification, even an ISO standard, should require this kind | of analysis. But your analysis is much too convoluted. It rests on an equivocation over what "attribute value" means, where you have interpreted it to be whatever is at the right of the value indicator (vi) delimiter, quotation marks or not. This is understandable, and the difference between the attribute value and attribute value literal is subtle. However, even if you didn't realize this, you should have triggered on Goldfarb's comments at the end of his annotations in 7.9.4: "... is the result of interpreting the attribute value literal to produce the attribute value." | ------ | Sic et Non: on Entity References in Attribute Values | | (with apologies to Thomas Aquinas) | | Question: whether entity references are recognized in attribute values? : | 41 Clause 7.9.3 (note after production 34) says "Interpretation of | an attribute value literal occurs as though the attribute were character | data, regardless of its actual declared value." Ah, here it is! The way I interpret this, after much struggling, is that there is a uniform interpretation of attribute value literals independent of attribute type, and that the only attribute type for which there is no additional interpretation is the character data attribute type. So, to emphasize that the attribute value literal interpretation occurs in two stages, this note was included to pave the way for a later interpretation if and only if the attribute value is not character data. Yes, I did find this very hard to figure out. | IV Conclusions | | 1 The current formulations of 8879 do not in fact require entity | references to be unrecognized in attribute value specifications. Careful with that double negation. You're saying a tiny little bit more than the standard says... | 2 They do however require a Talmudic or Jesuitical process to unravel, | in order to establish that fact. I'm sympathetic to your problems (because I've spent an incredible amount of time doing the same thing), but once I stopped thinking that "it had to be there", and instead go looking for what it _did_ say, things got much, much simpler. | 3 The revision of ISO 8879 should eliminate the misleading use of the | term 'attribute value', either by reformulating all the sections on | declared value specifications and attribute value specifications, or | by introducing a suitably unambiguous term such as 'internal' or | 'processed attribute value'. I think we should perhaps add a little note right under the heading for 7.9.4, saying something like: NOTE -- An attribute value may be specified in an attribute value specification as an attribute value literal. Do you think this would help? Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 13 Jun 1992 02:44:46 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23206E@erik.naggum.no> References: <1992Jun12.192308.10569@meaddata.com> Subject: Re: #CONREF Eric Freese \ writes: | | The international standard is a little vague on the use of #CONREF | and the effects of its use. Not really vague, just a little brief. I hope you have Goldfarb's The SGML Handbook [1], as it goes into more detail on this issue. (See the overview section 4.4.3.3, on page 159, in particular.) | I know that when the attribute which is declared to have a #CONREF | value is explicitly stated, the element with the ID referenced | becomes the content of the referencing element. CONREF doesn't do this, but I see where you were misled. CONREF means that the attribute value somehow refers to something which the application could use to generate content, or put as simply as "the element either has content or attribute", and it isn't restricted to IDREF attributes or any real "references" at all. You can have any kind of attribute have a CONREF default value, and the effect is simply to make the element empty if the attribute is present (specified). (Standardese note: I didn't say anything about what happens if the attribute is not specified, this is taken care of elsewhere.) CONREF can be used to make external references, where the ID/IDREF mechanism cannot conveniently reach. E.g. a figure reference that either contains a descriptive reference, or relies on the application to provide such reference automatically based on the ID of the figure. It could also be used if you want to provide an option to use an entity to hold the contents of an element, and don't want to use an entity reference in the content: \ \ \ \ \ : \ inline content \ : \ | Thanks in advance for any help. Hope this helps, even though I didn't address your question directly. Best regards, \ ------- References: [1] Charles F Goldfarb: The SGML Handbook. Oxford University Press, 1991. ISBN 0-19-853737-9. -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 13 Jun 1992 03:06:19 UT From: Erik Naggum \ Reply-To: enag@ifi.uio.no Message-ID: <23206F@erik.naggum.no> References: <1992Jun11.143405.27527@mks.com> Subject: Re: Changing parts of a concrete syntax David J. Fiander \ writes: | | I'm using sgmls 0.7, and have found the 8-character limit on | names rather restrictive. The sgmls man page says that "[t]he | upper limit on NAMELEN is 239." | | So how do I change it? Do I have to provide an entire SGML | declaration just to change one quantity? Yes. In fact, you have to specify the entire syntax. If you think that's a little excessive, I agree with you, and I hope we can get something like a "declaration subset" in the next revision for both the SGML declaration and the syntax declaration. Note that sgmls 0.8 was released May 11. Say you want to use the reference concrete syntax (revised to refer to ISO 646:1991 properly), you need to provide something like SGML declaration: \ Whew! Best regards, \ -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \ | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \ | SGML UG SIGhyper | vita brevis. Newsgroups: comp.text.sgml Date: 14 Jun 1992 05:01:22 UT From: Robin Cover \ Organization: UT Arlington Message-ID: \ Subject: SGML and Online Journal of Current Clinical Trials I ran across an interesting article on the use of SGML in the electronic \Online Journal of Current Clinical Trials (CCT)</>, developed by OCLC. The network address for the article is given at the end of this poster for anyone who wishes to see the full text. Two excerpts and one reference follow here. \<quote> 1.0 Introduction The Online Journal of Current Clinical Trials (CCT) is a peer- reviewed, interactive electronic journal. The primary form of publication is electronic--no paper version of the journal is planned. In addition to the full text of articles, CCT includes tables, equations, and graphics. . . \</quote> \<quote> 6.0 Database Construction Articles are peer reviewed using a bulletin board system at AAAS, to which all the editors and reviewers have dial-up access. One of the goals of AAAS is to reduce the time taken to publish articles as much as possible without sacrificing the rigor of the peer-review process. [. . . After an article is accepted, AAAS sends to OCLC (via the bulletin board system) an SGML version of the article and the original graphics (if they are not machine readable, they may have to be physically mailed). OCLC then completes the SGML markup--in particular, OCLC completes the tagging of tables and equations as well as a number of other details. Currently, this tagging is done manually. After the SGML tagging of the article is completed and validated, the figures are scanned and the article is typeset. We are using TeX for this, so the SGML file is run through a program to convert it into TeX and format it. The resulting output is reviewed. After the output looks acceptable, it is faxed to both AAAS and the author for review, any needed changes are incorporated, and the database is built. Although we realize that this is ambitious, our goal is to have articles available within 24 hours of their acceptance. To accomplish this, we need to be able to finish the SGML coding and formatting within six hours, and to have the formatting reviewed by AAAS and the author within two hours. The article will then be loaded into the database overnight. Even if this schedule is not met, we will have the information available to users within days of acceptance rather than the weeks or months that paper journals require. \</quote> \<references> References and Notes . . . 2. Thomas B. Hickey, "Using SGML and TeX for an Interactive Chemical Encyclopedia," in Proceedings of the 1989 National Online Meeting (Medford, NJ: Learned Information, 1989), 187-195. \</references> \<biblio.for.current.article> Hickey, Thomas B., and Terry Noreault. "The Development of a Graphical User Interface for The Online Journal of Current Clinical Trials." The Public-Access Computer Systems Review 3, no. 2 (1992): 4-12. (To retrieve this article, send an e-mail message that says "GET HICKEY PRV3N2 F=MAIL" to LISTSERV@UHUPVM1 or LISTSERV@UHUPVM1.UH.EDU.) </> Submitted by Robin Cover ------------------------------------------------------------------------- Robin Cover BITNET: zrcc1001@smuvm1 ("one-zero-zero-one") 6634 Sarah Drive Internet: robin@utafll.uta.edu ("uta-ef-el-el") Dallas, TX 75236 USA Internet: zrcc1001@vm.cis.smu.edu Tel: (1 214) 296-1783 Internet: robin@ling.uta.edu FAX: (1 214) 709-3387 Internet: robin@txsil.sil.org ========================================================================= </message> <message id="<92167.112039U35395@uicvm.uic.edu>" date="2917614039"> Newsgroups: comp.text.sgml Date: 15 Jun 1992 16:20:39 UT From: C. M. Sperberg-McQueen \<U35395@uicvm.uic.edu> Organization: University of Illinois at Chicago Message-ID: <92167.112039U35395@uicvm.uic.edu> References: \<brueggem.708031762@fidji> <23202A@erik.naggum.no> <92163.154050U35395@uicvm.uic.edu> <23206A@erik.naggum.no> Subject: Re: Defaulting mechanism for CURRENT attributes Last week I wrote: > If there is any explicit specification in 8879 that an SGML parser > *must* process a document through a left-to-right depth-first traversal > of the document tree, I would very much like to know where it is. I > don't think it exists: ... Erik Naggum's long posting does point to the specification I thought did not exist (among other things which however do not constitute such a specification). He did not quote it, however. For the benefit of those without copies of 8879 or Goldfarb to hand, it is clause 6.2 "SGML Entities", second paragraph after production 4: Each SGML character is parsed in the order it occurs, in the following manner: ... This explicit requirement that characters be parsed in their order of occurrence (i.e. in what I've been calling 'left-right order') provides the restriction which makes the definition of current attributes unambiguous. (Given SGML's grammar, parsing characters in this order means the document will be parsed in the same order as a left-right depth-first scan of the document tree; this gives unambiguous sense to the phrase 'most recently specified' used in the definition of 'current attribute'.) It also requires SGML parsers to use left-right scanning algorithms, rather than the right-left or non-directional algorithms one sometimes reads about. Thanks to Erik for pointing out this passage. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago </message> <message id="<1992Jun15.172209.22762@ens.fr>" date="2917617729"> Newsgroups: comp.text.sgml Date: 15 Jun 1992 17:22:09 UT From: Denis Excoffier \<scof@dmi.ens.fr> Organization: Ecole Normale Superieure, PARIS, France Message-ID: <1992Jun15.172209.22762@ens.fr> Keywords: #CURRENT Summary: a final answer ? Subject: #CURRENT attributes I think I missed some essential postings in the discussion about #CURRENT attributes. Could someone please give me the final answer to the following question : ----------------------------------- Is this document a correct SGML document ? \<!DOCTYPE doc [ \<!ELEMENT doc - - (foo|bar)*> \<!ELEMENT (foo|bar) - O EMPTY> \<!ATTLIST (foo|bar) zot CDATA #CURRENT > ]> \<doc> \<foo zot="yuyu"> \<bar> \</doc> ----------------------------------- Thanks, Scof. </message> <message id="<2344@irit.irit.fr>" date="2917617758"> Newsgroups: comp.text.sgml Date: 15 Jun 1992 17:22:38 UT From: jean-luc GUIMPIER \<guimpier@irit.irit.fr> Organization: IRIT-UPS, Toulouse, France Message-ID: <2344@irit.irit.fr> Subject: re: re:ODA vs SGML (naive answer included...) hi all, I seemed ignorant while posting my request for help concerning a comparison between ODA and SGML and in fact I was. I knew nothing about them both, now, thanks to some of you, I know a few things about SGML but I still have problems to find interesting information about ODA. The problem is the same as before, can someone help me position ODA and SGML from one another. Thanks again. P.S. : naive answer to naive question : because I was asked to do so ...! </message> <message id="<92167.133032U35395@uicvm.uic.edu>" date="2917621831"> Newsgroups: comp.text.sgml Date: 15 Jun 1992 18:30:31 UT From: C. M. Sperberg-McQueen \<U35395@uicvm.uic.edu> Organization: University of Illinois at Chicago Message-ID: <92167.133032U35395@uicvm.uic.edu> Subject: Re: Defaulting mechanism for CURRENT attributes Last week I wrote: > If there is any explicit specification in 8879 that an SGML parser > *must* process a document through a left-to-right depth-first traversal > of the document tree, I would very much like to know where it is. I > don't think it exists: ... Both Wayne Wohler and Erik Naggum have pointed out some passages in 8879 which they think may imply or be the specification of left-right recognition which I was asking for. Several of these do not seem to me to require left-to-right scanning at all; many of them talk about the order in which SGML constructs occur in the document, but this does not constitute a requirement that constructs be recognized in that order. (Reverse Polish notation, for example, can be parsed right-to-left, so the operators are recognized before the expressions they follow, but this does not mean we cannot speak of the expressions coming *before* the operator.) However, two of the passages mentioned do explicitly require left-right recognition of SGML constructs, and with it a LR, depth-first traversal of the document tree. Both WW and EN mention clause 6.2, on "SGML Entities", which says in the second paragraph after production 4: "Each SGML character is parsed in the order it occurs, in the following manner: ..." And clause 9.6.3, mentioned by WW, explicitly says "Delimiter strings ... are recognized in the order they occur, with no overlap." These two passages do explicitly require that characters be parsed and language constructs be recognized in their order of occurrence (i.e. in what I've been calling 'left-right order') and thus restrict the order of recognition enough to make the definition of current attributes unambiguous. Given SGML's grammar, parsing characters in this order means the document will be parsed in the same order as a left-right depth-first scan of the document tree; this gives unambiguous sense to the phrase 'most recently specified' used in the definition of 'current attribute'. They also require SGML parsers to use left-right scanning algorithms, rather than the right-left or non-directional algorithms one sometimes reads about. Thanks to Wayne Wohler and Erik Naggum for pointing out these passages. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago </message> <message id="<23211A@erik.naggum.no>" date="2917643666"> Newsgroups: comp.text.sgml Date: 16 Jun 1992 00:34:26 UT From: Erik Naggum \<erik@naggum.no> Reply-To: enag@ifi.uio.no Message-ID: <23211A@erik.naggum.no> References: <1992Jun15.172209.22762@ens.fr> Subject: Re: #CURRENT attributes Denis Excoffier \<scof@dmi.ens.fr> writes: | | Could someone please give me the final answer to the following question : | | Is this document a correct SGML document ? | ----------------------------------- | \<!DOCTYPE doc [ | \<!ELEMENT doc - - (foo|bar)*> | \<!ELEMENT (foo|bar) - O EMPTY> | \<!ATTLIST (foo|bar) | zot CDATA #CURRENT | > | ]> | \<doc> | \<foo zot="yuyu"> | \<bar> | \</doc> | ----------------------------------- Yes. (Provided SHORTTAG or OMITTAG are active features.) See 7.9.1.1 Omitted Attribute Specification in ISO 8879:1986, or in Goldfarb's Handbook (page 329), included below for your reference: 7.9.1.1 Omitted Attribute Specification If "SHORTTAG YES" or "OMITTAG YES" is specified on the SGML declaration: a) There need be an attribute specification only for a required attribute, and for a current attribute on the first occurrence of any element in whose attribute definition list it appears. Other attributes will be treated as though specified with an attribute value equal to the declared default value. b) If there is an attribute value specification for a current attribute, the specified attribute value will become the default value. The new default affects all elements associated with the attribute definition list in which the attribute was defined. Best regards, \</Erik> -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \<erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \<enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis. </message> <message id="<1992Jun17.081114.13439@ens.fr>" date="2917757474"> Newsgroups: comp.text.sgml Date: 17 Jun 1992 08:11:14 UT From: Denis Excoffier \<scof@dmi.ens.fr> Organization: Ecole Normale Superieure, PARIS, France Message-ID: <1992Jun17.081114.13439@ens.fr> References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> Keywords: #CURRENT DTD Summary: strange implicit implication of standard Subject: Re: #CURRENT attributes In article <23211A@erik.naggum.no> enag@ifi.uio.no writes: >| >| Is this document a correct SGML document ? >| ----------------------------------- >| \<!DOCTYPE doc [ >| \<!ELEMENT doc - - (foo|bar)*> >| \<!ELEMENT (foo|bar) - O EMPTY> >| \<!ATTLIST (foo|bar) >| zot CDATA #CURRENT >| > >| ]> >| \<doc> >| \<foo zot="yuyu"> >| \<bar> >| \</doc> >| ----------------------------------- > >Yes. (Provided SHORTTAG or OMITTAG are active features.) > So I have to understand that the two DTDs : \<!DOCTYPE doc [ \<!ELEMENT doc - - (foo|bar)*> \<!ELEMENT (foo|bar) - O EMPTY> \<!ATTLIST (foo|bar) zot CDATA #CURRENT> ]> \<!DOCTYPE doc [ \<!ELEMENT doc - - (foo|bar)*> \<!ELEMENT (foo|bar) - O EMPTY> \<!ATTLIST (foo) zot CDATA #CURRENT> \<!ATTLIST (bar) zot CDATA #CURRENT> ]> are NOT the same DTDs (i.e. there exist some document(s) e.g. \<doc>\<foo zot="yuyu">\<bar zot="toto">\<foo>\</doc> which don't show the same ESIS when parsed with those 2 DTDs) Am I right ? </message> <message id="<23212A@erik.naggum.no>" date="2917777889"> Newsgroups: comp.text.sgml Date: 17 Jun 1992 13:51:29 UT From: Erik Naggum \<erik@naggum.no> Reply-To: enag@ifi.uio.no Message-ID: <23212A@erik.naggum.no> References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> <1992Jun17.081114.13439@ens.fr> Subject: Re: #CURRENT attributes Denis Excoffier \<scof@dmi.ens.fr> writes: | | So I have to understand that the two DTDs : | | \<!DOCTYPE doc [ | \<!ELEMENT doc - - (foo|bar)*> | \<!ELEMENT (foo|bar) - O EMPTY> | \<!ATTLIST (foo|bar) zot CDATA #CURRENT> | ]> | | \<!DOCTYPE doc [ | \<!ELEMENT doc - - (foo|bar)*> | \<!ELEMENT (foo|bar) - O EMPTY> | \<!ATTLIST (foo) zot CDATA #CURRENT> | \<!ATTLIST (bar) zot CDATA #CURRENT> | ]> | | are NOT the same DTDs (i.e. there exist some document(s) e.g. | \<doc>\<foo zot="yuyu">\<bar zot="toto">\<foo>\</doc> which don't | show the same ESIS when parsed with those 2 DTDs) Good observation. This is an important difference from programming languages where such lists can be rolled out, e.g., int a,b; is identical to int a; int b;. | Am I right ? Yes. Best regards, \</Erik> -- Erik Naggum | +47-295-0313 | ISO 8879 SGML | Memento, Naggum Software | "fuzzface" | ISO 10744 HyTime | terrigena. Boks 1570, Vika | \<erik@naggum.no> | JTC 1/SC 18/WG 8 | Memento, 0118 OSLO, NORWAY | \<enag@ifi.uio.no> | SGML UG SIGhyper | vita brevis. </message> <message id="<1992Jun17.140923.14795@exu.ericsson.se>" date="2917778963"> Newsgroups: comp.text.sgml Date: 17 Jun 1992 14:09:23 UT From: "Tom Boudreau,cs,x0364" \<exutgb@exu.ericsson.se> Reply-To: exutgb@exu.ericsson.se Organization: Ericsson Network Systems, Inc. Message-ID: <1992Jun17.140923.14795@exu.ericsson.se> Keywords: AECMA 1000D, database publishing, SGML Subject: ACEMA 1000D Can anyone give me more information on AECMA 1000D. Is it released, where can I get a copy, is it worth reading ?????? I'm also interested in any other information on database publishing using SGML. Thanks, Tom Boudreau Ericsson Network Systems </message> <message id="<92169.192113U35395@uicvm.uic.edu>" date="2917815672"> Newsgroups: comp.text.sgml Date: 18 Jun 1992 00:21:12 UT From: "C. M. Sperberg-McQueen" \<U35395@uicvm.uic.edu> Organization: University of Illinois at Chicago Message-ID: <92169.192113U35395@uicvm.uic.edu> References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> <1992Jun17.081114.13439@ens.fr> Subject: Re: #CURRENT attributes Denis Excoffier asks: > So I have to understand that the two DTDs : [identical DTDs except for:] > \<!ATTLIST (foo|bar) zot CDATA #CURRENT> > > \<!ATTLIST (foo) zot CDATA #CURRENT> > \<!ATTLIST (bar) zot CDATA #CURRENT> > > are NOT the same DTDs (i.e. there exist some document(s) e.g. > \<doc>\<foo zot="yuyu">\<bar zot="toto">\<foo>\</doc> which don't > show the same ESIS when parsed with those 2 DTDs) > > Am I right ? That's certainly my understanding of 8879. The ability to affect the defaulting behavior may be useful in some cases, though in our experience with the TEI DTDs the cases of elements with exactly identical attribute definition lists are few and far between (and when they occur, they often don't involve CURRENT attributes), so the issue doesn't arise as often as I expected at first. The TEI DTDs always specify 'attribute definition list declarations' for one GI at a time, in part to make it easier to find stuff in them (vain hope). This allows us to avoid having to decide when to merge and when not to merge attribute definition list declarations. We also avoid having to explain to our users how this feature of CURRENT works, which may be as well. If this construct has a 'astonishment factor' for you, you can avoid it by simply not using the construct. Simplicity can have its rewards. -C. M. Sperberg-McQueen ACH / ACL / ALLC Text Encoding Initiative University of Illinois at Chicago </message> <message id="<1992Jun19.104220.1399@ens.fr>" date="2917939340"> Newsgroups: comp.text.sgml Date: 19 Jun 1992 10:42:20 UT From: Denis Excoffier \<scof@dmi.ens.fr> Organization: Ecole Normale Superieure, PARIS, France Message-ID: <1992Jun19.104220.1399@ens.fr> References: <1992Jun15.172209.22762@ens.fr> <23211A@erik.naggum.no> <1992Jun17.081114.13439@ens.fr> <92169.192113U35395@uicvm.uic.edu> Keywords: #CURRENT conventions Subject: Re: #CURRENT attributes In article <92169.192113U35395@uicvm.uic.edu> U35395@uicvm.uic.edu (C. M. Sperberg-McQueen) writes: >We also avoid having to explain to our >users how this feature of CURRENT works, which may be as well. If this >construct has a 'astonishment factor' for you, you can avoid it by >simply not using the construct. Simplicity can have its rewards. > I agree. It's easier to explain that those ``attribute definition lists'' are simply a means to factorize information. I hope that this ``commoncurrent attributes'' feature will be pointed out as deprecated in the standard in the next revision. </message> <message id="<1992Jun22.055448.19261@sserve.cc.adfa.oz.au>" date="2918181288"> Newsgroups: comp.text.sgml Date: 22 Jun 1992 05:54:48 UT From: Tom Worthington \<tomw@ccadfa.cc.adfa.oz.au> Organization: Australian Defence Force Academy, Canberra, Australia Message-ID: <1992Jun22.055448.19261@sserve.cc.adfa.oz.au> Subject: SGML for legal documents? A lawyer asked me about standards for transfering legal documents from one word processor to another. Sounds like an application for SGML. Has any work been done in this area? Any existing standards? </message> <message id="<7430@hemuli.tik.vtt.fi>" date="2918275939"> Newsgroups: comp.sys.next.misc,comp.sys.next.software,comp.text.sgml Followup-To: comp.sys.next.software Date: 23 Jun 1992 08:12:19 UT From: Timo Vendelin \<timo.vendelin@vtt.fi> Reply-To: timo.vendelin@vtt.fi (Timo Vendelin) Message-ID: <7430@hemuli.tik.vtt.fi> Keywords: sgml Subject: SGML based editors for NeXT Hi, I am looking for SGML based editors for NeXT workstation. I do not know any yet, so any help will be valuable for me. Thank you. I'm also interested to know is it syntax directed SGML editor and could I also create DTD with it. Thanks for any info, Timo Vendelin -- UNIX-email: Timo.Vendelin@vtt.fi X.400-email: C=FI,ADMD=MAILNET,PRMD=VTT,PN=Timo Vendelin name: Timo Vendelin office_phone: +358 (9)0 456 4505 fax: +358 (9)0 455 2839 </message> <message id="<1992Jun23.140002.27827@bradford.ac.uk>" date="2918296802"> Newsgroups: comp.text.sgml Date: 23 Jun 1992 14:00:02 UT From: "NWR.AYRES" \<N.W.R.Ayres@bradford.ac.uk> Organization: University of Bradford, UK Message-ID: <1992Jun23.140002.27827@bradford.ac.uk> References: <23205B@erik.naggum.no> Subject: Re: more about national DTDs erik@naggum.no (Erik Naggum) writes: : The reason I, too, want national DTD's is that the people who keyboard : the documents will want to use their natural language also for the : markup. With conventional markup languages, this has not been possible : without an incredible lot of work, and it is consequently seldom done. : : I am concerned about the "magic" involved if tag names are cryptic in a : foreign language. People who want to express their ideas in written : form will have to find SGML useful for their needs, and will have enough : effort poured into the written expression of their ideas even before : they would be forced to learn another bloody form of computer-imposed : magic. I'm perfectly happy with English comments in the DTD, English : parameter entity names, etc, because they are used by document designers : and programmers, who _also_ have better things to do than to learn : another language every time they turn around. Programmers are very : special users, and making any sort of concession to them (and I'm : talking about myself here, too), is likely to scare many users off. : : As regards document interchange: If the document language is a foreign : to you, the tag names are the least of your problems. As far as SGML : goes, we do have explicit link, and I'm providing an explicit link type : declaration with the Norwegian General Document just for this purpose. : (I notice, with some regret, that attributes can't be "renamed" with : this feature, and will need special treatment.) : : That "everybody" knows English is no reason to believe that everybody in : fact know English well enough to not feel intimidated by English tag : names. SGML even provides a means to rename keywords of the language. : A parser should be able to accomodate such trivial modifications for : national needs. All of this has to do with user interface. The average user of a computer doesn't want to be a guru, they just want a machine and application that is as easy to use as possible. This means surely native languages and tag names as presented to them, meaning something obvious. It is probably good not to have tags shown at all, but some other means of the user being able to demarcate content. Thus if \<p> is _paragraph_ then surely it is better to have the interface to the user show the word _paragraph_ to the user in their own language than the obscure \<p> (maybe via menues or a line at the bottom of the screen showing the type of object the cursor is in or.. or... rather than included tags in the text). It is the same thing with command line versus direct manipulation interface. The latter is coming more common because it is easier to understand than having to remember large amounts of obscure commands. Application designers/ programmers will have to play around with guru information, but preferably not application users.: </message> <message id="<1992Jun23.142647.29166@bradford.ac.uk>" date="2918298407"> Newsgroups: comp.text.sgml Date: 23 Jun 1992 14:26:47 UT From: "NWR.AYRES" \<N.W.R.Ayres@bradford.ac.uk> Organization: University of Bradford, UK Message-ID: <1992Jun23.142647.29166@bradford.ac.uk> References: <2344@irit.irit.fr> Subject: Re: ODA vs SGML (naive answer included...) guimpier@irit.irit.fr (jean-luc GUIMPIER) writes: : hi all, : : I seemed ignorant while posting my request for help concerning a comparison between ODA and SGML and in fact I was. I knew nothing about them both, now, thanks to some of you, I know a few things about SGML but I still have problems to find interesting information about ODA. The problem is the same as before, can someone help me position ODA and SGML from one another. : : Thanks again. ODA is designed as an interchange format for Open Systems, not really for modelling documents. It has the ability to model the logical structure of a document but is clumsier than SGML in doing so. It has the advantage that human readable names can be placed in object descriptions such that more people than gurus can use it if they wish.:-) Its idea is that it will be used to transfer office documents between people such that the sender doesn't need to care what document preperation system the receiver has (it has standardised the content formats to a number of international standards to ease the filtering of content formats to the ones that the receiver will use - this is different from SGML where you need private agreements, for example that you will use PostSript). It also models the layout of a document if you wish to do so. This is not up to publishers standards but is perfectly good for most uses. Its main advantage is this ability to do _blind interchange_ of documents so that they are editable by the receiver For a good technical description read_Document Arhitecture in open Systems: The ODA standard_ by Appelt,W. Springer-Verlag 1991 0387545395 Alternatively Brown, H. Introduction to the Office Document Architecture in Multi-Media document traslation ODA and the EXPRES Project (Rosenburg, J; Springer-Verlag). Shorter and easier. If you would like a about 100 references on ODA send me an email and I will mail it to you. All the best Nick: </message> <message id="<14100@umd5.umd.edu>" date="2918304365"> Newsgroups: comp.text.sgml Date: 23 Jun 1992 16:06:05 UT From: e2cs005@fre.fsu.umd.edu Reply-To: e2cs005@fre.fsu.umd.edu Organization: Frostburg State University, Frostburg, MD Message-ID: <14100@umd5.umd.edu> Subject: SGML Parser I would like any information as to where I can find the common SGML parser written in C (the source code would be nice too) for the IBM pc. Any additional information on finding documentation about the specs for Hytime and/or SGML would also be helpful. I am involved in a project developing ETM's (Electronic Tech Manuals) and would like to incorporate SGML / Hytime. Any response is greatly appreciated. - Robert S. Sommerville Program Analyst/ ARC 8201 Corporate Drive Landover, MD 20785, USA 1-301-731-2266 E2CS005@fsu.umd.edu </message> <message id="<3446.9206231406@infsc1.hatfield.ac.uk>" date="2918311576"> Newsgroups: comp.text.sgml Date: 23 Jun 1992 18:06:16 UT From: British-library \<comrbl@hatfield.ac.uk> Message-ID: <3446.9206231406@infsc1.hatfield.ac.uk> Subject: HyTime & Scripts Hello everyone! I am posting an email message I received from Steven Newcomb in reply to a question I sent him concerning script languages and HyTime. I hope it is of general interest. ------------------------------------------------------------------------------ > HyTime provides no control mechanisms for user > interaction.The only way to express control is > through application script languages embedded in > SGML elements. These scripts may be executed by > the application to implement some user initiated > action, such as starting a video sequence on > traversal of a hyperlink. That's not strictly true. It depends on the kind of interaction you're talking about. For example, HyTime does very explicitly provide quite elaborate traversal rules in hyperlinks where traversal rules are relevant. It also provides for explicit designation of ``endterms'' -- cues to users that traversal (e.g.) is possible. HyTime also provides means (finite coordinate spaces) whereby on-screen real estate can be allocated to a variety of hypertext and other purposes. > If HyTime is dependent on application script > languages how can it be of use as an application > independent hyper-document interchange format ? It's true that much of the functionality we find in script languages is not covered by unadorned HyTime, much the same as the semantics of typesetting are not covered by unadorned SGML. Perhaps the best way to understand what HyTime does, and the reasons for its existence, is to consider the implications of basing hyperdocument interchange on a script language. A script language is procedural; the structure that it imposes on the information can be inferred only with difficulty, with doubtful accuracy, and probably not automatically. By contrast, in a HyTime hyperdocument, the structure of the information is quite explicit. It is not necessary for a user to run the script(s) in order to query the document, or to use it in other ways not originally planned by the author(s). HyTime's design is informed by the notion that hyperdocuments are, first and foremost, information. The precise procedures intended by the author(s) to be applied to the information are less important than the information itself, in the same way that procedural markup (typesetting/formatting instructions) are less important than generic markup. > Are there plans for a standardised script > language for HyTime ? If so is it related to the > AVIS (audiovisual interactive scriptware) standard ? I myself am very interested to see how AVIs and other script languages fare as they bump heads in the HyTime arena. HyTime will provide an arena in which many of the design assumptions of various script languages will become glaringly apparent. > Or alternatively could the syntax of the script > language being used be communicated using a HyTime > lexical model ? In which case how are the semantics > of the language communicated ? As always in SGML, the degree to which the SGML and HyTime markup penetrate into the detailed semantics of a class of documents is arbitrary, and is usually decided by the same person(s) who design the DTD(s). To the extent that a given DTD uses HyTime constructs, to that same degree HyTime semantics are present and are therefore communicated by the HyTime standard itself. Additional semantics are normally communicated by means of documentation accompanying the DTD, especially via interlinear comments in the DTD itself. Additional semantics will be formally expressible in DSSSL when DSSSL becomes an international standard. > Erik's answer was basically that he thought the application > and application scripts were out of the scope of the interchange > mechanism. This is fair enough, but it does seem to limit > the usefulness of HyTime in real world problems. A standard > script language would appear to me to be required. Do you know > of one ? No. As I said, it will be interesting to see how it all works out. Because of its potential to become an international standard, I think AVIS is certainly noteworthy, and evidently there are good people working on it, but, as a work in progress, it cannot be fairly evaluated yet. *** I will not object if you choose to post this reply to comp.text.sgml. Best regards, Steven R. Newcomb, Chairman, SGML SIGhyper (International SGML Users' Group Special Interest Group on Hypertext and Multimedia) c/o TechnoTeacher, Inc. Voice: +1 904 422 3574 1810 High Road Fax: +1 904 386 2562 Tallahassee, FL 32303-4408 USA Internet: srn@techno.com NOTE NEW INTERNET ADDRESS: ^^^^^^^^^^^^^^ ------------------------------------------------------------------------------ Stephen Baird Tel: 0707-279166 Research Associate Fax: 0707-279185 Hatfield Polytechnic email: comrbl@infsc1.hatfield.ac.uk Hatfield, Herts AL10 8BW United Kingdom </message> <message id="<1992Jun24.042804.10035@utagraph.uta.edu>" date="2918351975"> Newsgroups: comp.text.sgml Date: 24 Jun 1992 05:19:35 UT From: Robin Cover \<robin@utafll.uta.edu> Organization: UT Arlington Message-ID: <1992Jun24.042804.10035@utagraph.uta.edu> Subject: AACR2 in SGML Readers familiar with the AACR2 standard for descriptive cataloging may be interested to learn of plans to commit the volume to SGML format. I think this conversion may be part of a larger effort to create a CDROM with several online resources for librarians (including LC Rule Interpretations and other commentary). \<quote> AACR2 TO BE MADE AVAILABLE IN MACHINEREADABLE FORM The publishers and copyright holders of the AngloAmerican Cataloguing Rules, 2nd edition, revised (AACR2), have agreed to prepare a machinereadable and searchable version of AACR2. The American Library Association (ALA), one of the three publishers of AACR2, expects to release the AngloAmerican Cataloguing Rules, electronic edition (AACRE) in early 1993. The ALA has retained two consultants to work with the authors and publishers of AACR2 to develop the document type definition and texttagging scheme required to create a Standard Generalized Markup Language (SGML) version of AACR2. John Duke, assistant director for network and technical services at Virginia Commonwealth University, will be the librarian/cataloger consultant responsible for working with the technical consultant to plan file coding and structure that will support effective data retrieval and manipulation by various text retrieval software systems available to library catalogers. George Alexander, president of MindMeetings, a software company which does custom file conversions for publishers, will design a file structure to be usable with various text retrieval software retrieval systems, develop the most effective method to apply the SGML convention to the structure of AACR2, and develop the fileconversion programs necessary to produce AACRE from existing text files. ALA, which holds the copyright to AACR2 along with the Library Association and the Canadian Library Association, will initially grant limited permission for use of the copyrighted AACRE text files for purposes of experimentation and research on their possible uses. Requests for experimental use may be submitted to David Epstein, General Manager, ALA Books, or Karen Muller, Executive Director, Association for Library Collections & Technical Services. \</quote> \<source> * File "AN2 V3_NO31" ISSN: 10566694 ALCTS NETWORK NEWS An electronic publication of the Association for Library Collections & Technical Services Volume 3, Number 31 June 23, 1992 In this issue AACR2 TO BE MADE AVAILABLE IN MACHINEREADABLE FORM CIC PUBLISHES REPORT ON MASS DEACIDIFICATION OPEN LETTER TO COUNCIL \</source> Submitted by RCC Robin Cover BITNET: zrcc1001@smuvm1 ("onezerozeroone") 6634 Sarah Drive Internet: robin@utafll.uta.edu ("utaefelel") Dallas, TX 75236 USA Internet: zrcc1001@vm.cis.smu.edu Tel: (1 214) 2961783 Internet: robin@ling.uta.edu FAX: (1 214) 7093387 Internet: robin@txsil.sil.org ========================================================================= </message> <message id="<23221B@erik.naggum.no>" date="2918403234"> Newsgroups: comp.text.sgml,comp.text Followup-To: comp.text Date: 24 Jun 1992 19:33:54 UT From: Erik Naggum \<erik@naggum.no> Message-ID: <23221B@erik.naggum.no> References: <2344@irit.irit.fr> <1992Jun23.142647.29166@bradford.ac.uk> Subject: Re: ODA vs SGML (naive answer included...) NWR.AYRES \<N.W.R.Ayres@bradford.ac.uk> writes: | | ... Its idea is that it will be used to transfer office documents | between people such that the sender doesn't need to care what | document preperation system the receiver has (it has standardised | the content formats to a number of international standards to ease | the filtering of content formats to the ones that the receiver will | use - this is different from SGML where you need private agreements, | for example that you will use PostSript). Really? PostScript? | It also models the layout of a document if you wish to do so. This | is not up to publishers standards but is perfectly good for most | uses. Really? How come the standard has already reached 70 amendments, most of them very elaborate? Something tells me that ODA is far from being useful as it is, and that this is a view held by its developers. I have been less and less impressed with ODA the more I've read about it. Two and a half years ago, I set out to find out what both ODA and SGML were all about. Two years ago, I decided that ODA was not something on which I would spend my time. Today, as I received another batch of ISO documents full of additions and corrections to ISO 8613, I think fewer people should spend their time on ODA, and instead more on PostScript and SPDL for the page description part, and on SGML for the structure description part. ODA has survived itself, as is becoming clear to more and more of the parties to its creation. I think we would benefit from a description of the features of ODA as it exists, at its current revision level, instead of marketing description of how happy I will be the results if and when I get them. In my mind, however, ODA is the single biggest case of "vaporware", and I would not be inclined to believe anything I don't see. Add to this the impenetrability of the specifications (I own a copy of, and have read ISO 8613, all parts, and I continue to read the documents I get from ISO about amendments and views about ODA at SC 18 level), and you get a standard which is kept alive for its own sake, by the people who have spent too much time on it to be able to throw it away without taking their prestige with it. I think ODA is intended to be only a transfer format for documents which are developed in one context and format, on the way to another context and format, in which they will be printed, or occationally edited. Considering the cost overhead of "going ODA", I think PostScript or a (Standard) Page Description Language which the printer understands is the best solution, and that for editability, nothing beats clear text encoding. Others have indicated that comp.text.sgml should not be used for discussions about ODA, so I ask that follow-ups be directed to comp.text, to which I've set Followup-To, or comp.protocols.iso, as the OSI-ness and the protocol-ness of the standard are both irrelevant to the issue of text processing. Regards, \</Erik> -- Erik Naggum | ISO 8879 SGML | | ISO 10744 HyTime | +47-295-0313 | ISO 9899 C | Memento, terrigena. \<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis. </message> <message id="<1992Jun26.144804.1@tnclus.tele.nokia.fi>" date="2918551684"> Newsgroups: comp.text.sgml Date: 26 Jun 1992 12:48:04 UT From: hehanninen@tnclus.tele.nokia.fi Organization: Nokia Telecommunications. Message-ID: <1992Jun26.144804.1@tnclus.tele.nokia.fi> Subject: How to take care of sections ? Hi Everybody ! Sorry I missed possible answers for my last question because off my vacation. So here it comes again an some more... Questions: Is it necessary to include structural info in tag name (in prefix an in suffix) if you can see (and follow) the structure in DTD ? f.ex. %p.em.ph or %pharases What is flexible way to take care of multilingual and application sections. Sections might contain any kind of text-elements; words, clauses, chapters, in application version sections also figures and tables. -with specified tag for each language, f.ex. ... \<eng> \<h1> \<h1t> 1 Heading title \<p> What is flexible way to take care of multilingual sections: \</eng> ... \<fin> \<p>Nyt hieman suomen kielta sekaan. \</fin> -with language-tag with specified attribute, f.ex. ... \<lan type=eng> \<h1> \<h1t> 1 Heading title \<p> What is flexible way to take care of multilingual sections: \</lan> \<lan type=fin> \<p>Nyt hieman suomen kielta sekaan. \</lan> -or with Marked Sections ?? any proposal ?? yours Heimo </message> <message id="<1992Jun26.155541.25290@infoserver.th-darmstadt.de>" date="2918562941"> Newsgroups: comp.text.sgml Date: 26 Jun 1992 15:55:41 UT From: Christine Detig \<detig@hp13.iti.informatik.th-darmstadt.de> Organization: TU Darmstadt Message-ID: <1992Jun26.155541.25290@infoserver.th-darmstadt.de> Subject: Novice user question Hello, I'm quite new to SGML and thought it would be good to get acquaintaned with the tools available. So I took the sgmls-0.8 parser and tried to apply it to the demo DTDs I could scratch from everywhere, e.g. the write-it.dtds, a-w.dtd (used for the goldfarb book, as far as I know) etc. But sgmls didn't process any of them without errors (neither on Sun3 nor on HP400). Instead, I got error messages like sgmls: SGML error at memo.dtd, line 1 in declaration parameter 1: Amendment 1 requires "ISO 8879:1986" instead of "ISO 8879-1986" sgmls: SGML error at memo.dtd, line 21 in declaration parameter 50: Non-significant shunned character number 0 not declared UNUSED [...] sgmls: SGML error at memo.dtd, line 23 in declaration parameter 58: Character number 13 was described using an unknown base set etc (this being taken from write-it/memo.dtd) wich starts: \<!SGML "ISO 8879-1986" -- MEMO PROCESSING SGML DECLARATION This SGML Declaration contains the rules needed to process the memo handling DTD supplied with WRITE-IT, Sema Group's SGML-based syntax directed editor. -- CHARSET BASESET "ISO 646-1983//CHARSET International Reference Version (IRV)//ESC 2/8 4/0" DESCSET 0 128 0 CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN" SCOPE DOCUMENT SYNTAX SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 BASESET "ISO 646 IRV" DESCSET 0 128 0 could a kind soul point out to me what I'm missing? Something I have to install? Wrong tool for the wrong thing? Usage error? Thanks a lot, Christine </message> <message id="<23223B@erik.naggum.no>" date="2918584728"> Newsgroups: comp.text.sgml Date: 26 Jun 1992 21:58:48 UT From: Erik Naggum \<erik@naggum.no> Message-ID: <23223B@erik.naggum.no> References: <1992Jun26.144804.1@tnclus.tele.nokia.fi> Subject: Re: How to take care of sections ? hehanninen@tnclus.tele.nokia.fi writes: | | Hi Everybody ! | | Sorry I missed possible answers for my last question because off | my vacation. You could check out the archive at ifi.uio.no with Gopher (gopher.ifi.uio.no, port 70) or WAIS (comp.text.sgml.src). | Questions: Is it necessary to include structural info in tag name | (in prefix an in suffix) if you can see (and follow) | the structure in DTD ? | | f.ex. %p.em.ph or %pharases Brief answer: Those are not tag names, they're parameter entities used in the DTD to simplify (and make possible) modifications in the document type declaration subset. Regards, \</Erik> -- Erik Naggum | ISO 8879 SGML | | ISO 10744 HyTime | +47-295-0313 | ISO 9899 C | Memento, terrigena. \<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis. </message> <message id="<23224A@erik.naggum.no>" date="2918592332"> Newsgroups: comp.text.sgml Date: 27 Jun 1992 00:05:32 UT From: Erik Naggum \<erik@naggum.no> Message-ID: <23224A@erik.naggum.no> References: <1992Jun26.155541.25290@infoserver.th-darmstadt.de> Subject: Re: Novice user question Christine Detig \<detig@hp13.iti.informatik.th-darmstadt.de> writes: | | I'm quite new to SGML and thought it would be good to get | acquaintaned with the tools available. So I took the sgmls-0.8 | parser and tried to apply it to the demo DTDs I could scratch from | everywhere, e.g. the write-it.dtds, a-w.dtd (used for the goldfarb | book, as far as I know) etc. But sgmls didn't process any of them | without errors (neither on Sun3 nor on HP400). The a-w.dtd (for Addison-Wesley) was not used by Goldfarb, but by Martin Bryan in his "Author's Guide" to SGML. Martin Bryan is also the source of the write-it.dtd. He's also the source of some major confusions about SGML, and I positively hate to say that the strong negative view of his work that made me decide against putting the above-mentioned DTDs in the public SGML archive even before looking at them proved prudent. It would have been much more interesting if I had had an occation to update my opinion of the source of this material. | sgmls: SGML error at memo.dtd, line 1 in declaration parameter 1: | Amendment 1 requires "ISO 8879:1986" instead of "ISO 8879-1986" That's right. Those who still think that ISO use "-" between standard number and publication year need to update their software. This change occurred sometime before 1988, and it's really about time we get this right. sgmls is quite right in pointing to Amendment 1 to ISO 8879, and it's purely the fault of the writer of the SGML declaration to not abide by the new syntax rules. The old syntax was ISO[/xxx] nnnn[/p]-yyyy where [x] means x is optional, xxx is a co-sponsoring organization, nnnn is the number of the standard, p is the part number, and yyyy is the publication year The new syntax is ISO[/xxx] nnnn[-p]:yyyy This is relevant for ISO owner identifiers in public identifiers, such as "ISO 646:1983". | sgmls: SGML error at memo.dtd, line 21 in declaration parameter 50: | Non-significant shunned character number 0 not declared UNUSED \<shot type=cheap> This in an SGML declaration from the guy who thinks he's an expert on SGML declarations. \</shot> See below for a detailed answer. | sgmls: SGML error at memo.dtd, line 23 in declaration parameter 58: | Character number 13 was described using an unknown base set \<shot type=cheap> This in an SGML declaration... \</shot> | etc (this being taken from write-it/memo.dtd) wich starts: | \<!SGML "ISO 8879-1986" | -- | MEMO PROCESSING SGML DECLARATION | | This SGML Declaration contains the rules needed to process the | memo handling DTD supplied with WRITE-IT, Sema Group's | SGML-based syntax directed editor. | | -- | CHARSET | BASESET "ISO 646-1983//CHARSET International Reference Version | (IRV)//ESC 2/8 4/0" ESC 2/8 4/0 is technically better than ESC 2/5 4/0 which the standard mandates as the sequence to use for the entire ISO 646 character set. However, ESC 2/8 4/0 designates a G0 character set, and does not include any control characters. It's therefore wrong, unless supplemented by a C0 character set with designator ESC 2/1 4/0. | DESCSET 0 128 0 Quite apart from the fact that positions 0 through 31 and 127 have no character assigned to them in the character set designated by ESC 2/8 4/0, the syntax (below) has specifically declared all these codes to be unused, and then they must either be truly unused, or be function characters. One could argue that they are unused in the designated character set because the positions are unused, but then there are no codes for RS or RE. Something is clearly losing big time, here. This declaration should have been something like this; DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED | CAPACITY PUBLIC "ISO 8879-1986//CAPACITY Reference//EN" | SCOPE DOCUMENT | SYNTAX | SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | 21 22 23 24 25 26 27 28 29 30 31 127 This declaration specifies that a character with this number should not occur in the document, unless specifically identified as a function character (there are three such normally identified, 9, 10, and 13). | BASESET "ISO 646 IRV" This is a syntax violation for a public identifier. The public identifier should be a formal public identifier, and sgmls is quite right in complaining that significant SGML characters can't be found. | DESCSET 0 128 0 This is at least legal, per se, but DESCSETs cannot be viewed apart from their BASESETs. | could a kind soul point out to me what I'm missing? | Something I have to install? Wrong tool for the wrong thing? | Usage error? Christine, you're not missing anything. You've just come across a badly designed and syntactically invalid SGML declaration, and you've used an accurate tool which has rejected it, as any quality tool should. Best regards, \</Erik> -- Erik Naggum | ISO 8879 SGML | | ISO 10744 HyTime | +47-295-0313 | ISO 9899 C | Memento, terrigena. \<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis. </message> <message id="<23224F@erik.naggum.no>" date="2918669824"> Newsgroups: comp.text.sgml Date: 27 Jun 1992 21:37:04 UT From: Erik Naggum \<erik@naggum.no> Message-ID: <23224F@erik.naggum.no> Subject: [alt.hypertext] Re: SGML information ------------------------------------------------------------------------ From: neilp@cs.hw.ac.uk (Neil Postlethwaite) Newsgroups: alt.hypertext Subject: Re: SGML information Message-ID: <1992Jun23.182812.2951@cs.hw.ac.uk> Date: 23 Jun 1992 18:28:12 UT References: <1992Jun15.161342.23132@uxa.ecn.bgu.edu> Sender: news@cs.hw.ac.uk (News Administrator) Organization: Dept of Computer Science, Heriot-Watt University, Scotland Lines: 9 In article <1992Jun15.161342.23132@uxa.ecn.bgu.edu> cfkfb@uxa.ecn.bgu.edu (Karl Bridges) writes: > > I'm looking for information about SGML. Could anyone suggest some >references, either articles or books on the subject? Try the current (June '92) issue of Byte. In the 'State of the Art' section on InfoGlut. Neil ------------------------------------------------------------------------ -- Erik Naggum | ISO 8879 SGML | | ISO 10744 HyTime | +47-295-0313 | ISO 9899 C | Memento, terrigena. \<erik@naggum.no> | ISO 9945 POSIX | Memento, vita brevis. </message> <message id="<709791160snx@sgmlinc.com>" date="2918784966"> Newsgroups: comp.text.sgml Date: 29 Jun 1992 05:36:06 UT From: Brian Travis \<brian@sgmlinc.com> Organization: SGML Associates, Inc. Message-ID: <709791160snx@sgmlinc.com> References: <23224F@erik.naggum.no> Subject: SGML information In article <1992Jun15.161342.23132@uxa.ecn.bgu.edu> cfkfb@uxa.ecn.bgu.edu (Karl > Bridges) writes: > > I'm looking for information about SGML. Could anyone suggest some >references, either articles or books on the subject? There is a monthly SGML newsletter available. \<TAG> is the only regular publication devoted to SGML. Each monthly issue is "packed with the most current news, features, reviews, and reports". Information about \<TAG> is available from the Graphic Communications Association at +1 703-519-8157. Their fax is +1 703-548-2867. GCA is the predominant industry organization that covers SGML. They hold four or five conferences each year devoted to the subject. This is a good group to know about. I hope this helps! Brian. ------------------------------------------------------------- <> Brian Travis <> brian@sgmlinc.com <> <> SGML Architect <> InfoDesign Corp. <> <> Managing Editor \<TAG>: The SGML Newsletter <> <> 6360 S. Gibraltar Cir. <> Aurora CO 80016 USA <> <> Tele: +1 303 680-0875 <> Fax: +1 303 680-4906 </> </message> <message id="<CDUPREE.92Jun29110830@hqsun2.oracle.com>" date="2918833710"> Newsgroups: comp.text.sgml Date: 29 Jun 1992 19:08:30 UT at Oracle Corporation. The opinions expressed are those of the user and not necessarily those of Oracle. From: Chuck Dupree \<cdupree@oracle.com> Organization: Oracle Corporation, Belmont, Ca. Message-ID: \<CDUPREE.92Jun29110830@hqsun2.oracle.com> Subject: Question: SGML parsers Can someone please mail or post the location of FTP archives that contain public-domain SGML parsers? I know this has been posted before, and thought I'd saved a posting with this information; but I can't find it locally. Since my company is a bit paranoid about network security, I need to make a formal request to our data center folks to do the FTP'ing. Thus, I would appreciate getting as much information on the full path names of files and directories as possible. Also, I've heard that there is a public-domain parser generator. Is this true? Is the source code available as well as the executable form? Thanks for the assistance! - Chuck Dupree Oracle Corp. Redwood Shores, CA </message>