EASYDTD.DOC
VERSION: 2.3
DATE: 03/01/95

BACK UP THIS DISK BEFORE USING!!!


INTRODUCTION

VERSION: 2.3
03/01/95

One of the most complex parts of SGML is the DTD. Writing a DTD is a tedious task at best. I have always believed that there must be a simpler way to express the information that an SGML application must know to process a document. My answer is EasyDTD. EasyDTD uses a simple outline form of data elements to generate a DTD. For example:

	document
	  fm
	    title
	    author+
	    date
	  body
	    chapter+
	      section*
	        subsection*
	          p(|
	          list)*
	            li	
	            li+
	  bm?
	    index
generates a skeleton DTD. This is the simplest input. The DTD generated from this file may need editing but the work of getting the syntax correct is already done. General Entities and Attributes are also handled.

Two good uses for EasyDTD are DTD prototyping and generating entity references. A DTD can literally be created from scratch in just a few minutes with EasyDTD. This allows you to try different combinations until the optimal DTD is created. This approach saves both time and improves quality because it encourages testing different approaches. The second use is generating entity reference files for custom applications. For example, if you manage HTML hypertext links in SGML using Entity References, EasyDTD can generate the entity reference file with a few simple lines.

EasyDTD is a great time saver for those that write DTDs only occasionally by taking care of generating the correct DTD syntax from a simple outline. You no longer have to spend time looking up things you forget from one DTD writing session to the next.


MODIFICATION HISTORY

ENHANCEMENTS IN V2.3

There are several important enhancements in V2.3. These include several new command line options in addition to significantly enhanced attribute handling:

ENHANCEMENTS IN V2.2

After making the V2.1.1 changes, I wrote down the list of over two dozen enhancements that I want to make to EasyDTD. This release is a mixture of processing improvements and adding new types of output from that list. The new options added to V2.2 are:

The option flags need to be revisited. I originally started out to make the flags be the first letters of the option purpose, i.e. -q stood for quiet mode. There are now so many options that this no longer works.

Additional motivation for V2.2 came in the form of a paper at the 1994 OmniMark User's Group Meeting in 11/94. While the paper is not on EasyDTD, I thought it might be a nice touch to incorporate something related to OmniMark. The new option is -k to generate a skeleton .XOM file that writes the input file back out with tags expanded.

The other changes add the next level of functionality to EasyDTD toward being able to generate complete DTDs. The biggest single remaining area remaining to be addressed is Attributes.

ENHANCEMENTS IN V2.1.1

EasyDTD V2.1.1 was brought about by Warren Baird, who sent me a patch file to V2.1 for compiling EasyDTD under Unix. There are no functional changes in V2.1.1 over V2.1. Warren's changes and cleaning up a few compile errors are the only changes in the code. There is also some minor updating of the documentation.

ENHANCEMENTS IN V2.1

The initial versions of EasyDTD almost always required some editing, especially comment models. Each release has added additional features that generate more and more complete DTDs. Now, complete content models can be specified, some entity statements are automatically, and generating relatively complex DTDs that parse without modification is possible. The biggest potential area that may require modification now is attributes.

Version 2.1 was brought about by a bug I ran across when generating a very large DTD (250-300 elements). The output DTD contained garbage at seemingly random locations. To make a long story short, the C stack and Heap bumped into each other. After I fixed the problem, I decided to go ahead and add a few enhancements. The enhancements added to EasyDTD V2.0 to make V2.1 include:


WHAT'S INCLUDED IN THE DISTRIBUTION

Files included in this package are:

EASYDTD.EXE     EasyDTD DOS executable file
EASYDTD.DOC     This documentation file
EASYDTD.HTM     This documentation file in HTML, viewable via Mosaic
EASYDTD.C       C source code
EASYDTD.LIC     License
MERGENT.EXE	Merge ENTITY.ENT file after <!DOCTYPE
DATADICT.EZY	DTD Data Dictionary EasyDTD input file.
DATADICT.DTD	Production Data Dictionary DTD.
DATADICT.SGM	The Data Dictionary for DATADICT.DTD.
DD2HTML.AWK	Convert the Data Dictionary SGML file to HTML for viewing
DEMO.BAT        Runs EasyDTD with different options.
SAMPLE.IN       Sample EasyDTD input file. Used in DEMO.BAT
SAMPLE.DTD      Output from EasyDTD and SAMPLE.IN
REPORT.EZY	A more complex example than SAMPLE.IN
REPORT.DTD	Output DTD created from REPORT.EZY
MOSAIC.EZY	A version of the HTML DTD in outline form
MOSAIC.DTD	A version of the HTML DTD
HISTORY.TXT	Detailed change history
TORTURE.EZY	A torture test input file for EasyDTD

I would like to remain the clearing house for modifications to this software. Please forward any modifications you make to this software so they can be folded into a future official release.


COMPILING EasyDTD

I have compiled EasyDTD only with the Borland C++ compilers. It has also been compiled under GCC. The only Borland specific code is the inclusion for <dos.h> and setting the value of _stklen. Both are conditionally compiled only for the Borland compilers.

EasyDTD is pretty generic and is entirely contained in a single C source file, so compiling should be very straight forward. Under a Unix you should only have to type:

     $ cc easydtd.c
then rename a.out to easydtd.

EASYDTD.C is getting a little too large and will probably be broken into modules in the next release.


EasyDTD REFERENCE

This section is a quick and dirty description of how to use EasyDTD. The steps are simple:

Generating a new DTD is so quick and EASY that you can experiment with various versions until you get the document structure set up just right. An excellent use of EasyDTD is working on content models for small segments of the DTD during document analysis or DTD development.

The input file has the following rules:

ENTITIES

The format for <ENTITIES> are:

The following example shows both general and parameter entities:

	<ENTITIES>
	SGML "Standard Generalized Markup Language"
	%fm  "title,author,date"

Entities and the -W Option

A very good reason for using EasyDTD to define General Entities is the -W option to generate an AWK script to perform entity substitution in document instances. Simply define your general entities:

----------Cut Here, EasyDTD input------------------
<ENTITIES>
SGML  "Standard Generalized Markup Language"
DSSSL "Document Something Something Something Language"
----------Cut Here, End of Example-----------------

Run the file through EasyDTD with the -W flag to create the AWK program:

easydtd -w test.ezy test.awk
-----------Cut here, test.awk program Created by EasyDTD-------
# FILE: 
# This skeleton .AWK file was created by EASYDTD.
# It is a starting point with all of the 
# entity references to be resolved. 
# 
BEGIN{
#	print "This AWK program generated by EASYDTD"; 
}
{
	gsub(/\&SGML;/,"Standard Generalized Markup Language",$0);
	gsub(/\&,DSSSL;/,"Document Something Something Something Language",$0);
	print;
}
-----------Cut here, End of AWK program---------------------------
The run awk on your document instance:
awk -f test.awk document.sgm >doc.out
and the general entity references will be resolved:
------------------------Cut here, document.sgm---------------------------
This is a test document to show EasyDTD generated
AWK programs to do Entity Subsituttion. First, &SGML;
and then &DSSSL; are handled.
------------------------Cut here, End of document.sgm--------------------
And finally the output:
------------------------Cut here, doc.out--------------------------------
This is a test document to show EasyDTD generated
AWK programs to do Entity Subsituttion. First, Standard Generalized Markup Language
and then Document Something Something Something Language are handled.
------------------------Cut here, End of doc.out-------------------------

A second AWK option, -z, was added in V2.3 to do entity substitutions for non-entity replacements. The main difference is the & and ; are left out of the search string portion of the gsub function calls. An example where this would be useful is mapping characters back to ISO entity references automatically. i.e. & to &amp; etc.

The AWK option is very useful for managing hyper links in a Web Server with AWK and SGMLS instead of a commercial translating parser such as OmniMark.

The steps for automating hyperlink maintenance with EasyDTD, AWK, and SGMLS are:

As if by magic, SGML is used to manage hyperlink resolution with a little help from EasyDTD and AWK. As long as you keep the .EZY file up to date and recreate the AWK and entity files when links change, your hyperlinks are automatically generated. When a document file is moved, simply reparse (the SGMLS/AWK combination) the documents that link to it.

ELEMENTS

The format for <ELEMENTS> are:

	tag(|
	tagtwo)*
or
	(tag|
	tagtwo)*
generates (tag|tagtwo)*.

#PCDATA, CDATA, RCDATA, and EMPTY are special tag names that are used only for content model creation and do not generate <!ELEMENT statements.

ATTRIBUTES

The format for <ATTRIBUTES> are:

The following code is an example attribute specified in EasyDTD:

	<ATTRIBUTES>
	title id type font
This causes an <!ATTLIST entry for title with attributes of id, type, and font.

More complex attributes can be specified by putting each attribute on a line by itself following the element name. The attribute name must be indented at least on blank. This form of specifying an attribute also allows values for type and default value to be specified. The type will default to CDATA if none is specified and the default value will default to #IMPLIED if none is given. For example:

	<ATTRIBUTES>
	title
	 id ID #REQUIRED
	 type RCDATA
	 font
	 note (A|B|C) "C"
will generate:
	<ATTlist title
			 id	ID	#REQUIRED
			 type 	RCDATA	#IMPLIED
	 		 font	CDATA	#IMPLIED
			 note	(A|B|C)	"C"
>

SAMPLE RUNS

A simple EasyDTD input file:

	document
	  fm
	    title
	    author+
	    date
	  body
	    chapter+
	      section*
	        subsection*
	          (p|
                  list)*
	            li
	            li+
	  bm?
	    index
will generate the following DTD:
	<!DOCTYPE document  [
	<!-- 
	     Skeleton DTD created by EASYDTD. 
		     CAUTION: This DTD probably needs editing before use!!!
	-->
		                <!-- ENTITIES    -->
	<!ENTITY % doctype "document"   -- Document type GI -->
	
		                <!-- ELEMENTS    -->
	<!--      ELEMENTS       MIN  CONTENT -->
	<!ELEMENT %doctype;      - - (fm,body,bm?) >
	<!ELEMENT fm             - - (title,author+,date) >
	<!ELEMENT title          - - (#PCDATA) >
	<!ELEMENT author         - - (#PCDATA) >
	<!ELEMENT date           - - (#PCDATA) >
	<!ELEMENT body           - - (chapter+) >
	<!ELEMENT chapter        - - (section*) >
	<!ELEMENT section        - - (subsection*) >
	<!ELEMENT subsection     - - ((p|list)*) >
	<!ELEMENT p              - - (#PCDATA) >
	<!ELEMENT list           - - (li,li+) >
	<!ELEMENT li             - - (#PCDATA) >
	<!ELEMENT bm             - - (index) >
	<!ELEMENT index          - - (#PCDATA) >

		                <!-- ATTRIBUTES  -->
	<!--      ELEMENT        NAME         VALUE    DEFAULT -->
	]>

RUNNING EasyDTD

There are several command line options available:


	Usage:   easydtd [-flags] file_in [file_out]

        
Where:   
  flags are:
	-A     Generate S-Engine Skeleton .APP file
	-C     Generate a default SGML declaration
	-E     Show Elements
	-G     Generate content model entity references 
	-K     Generate OmniMark .XOM to process elements
	-L     Flag Long element names
	-M     End Tag Minimization
	-N     Generate elements that are entities
	-O'cc' Set default Occurs to cc 
	-P     Include <p> short ref
	-Q     Quiet mode; suppress informational messages
	-S'cc' Set default Separator to cc 
	-T'nn' Set max Tag length to n
	-V     Verbose mode.
	-W     Generate AWK program to perform general entity substitution
	-X     Generate only the entity portion of the DTD
	-Y     Generate a skeleton Data Dictionary for the DTD
	-Z     Generate AWK program for non-entity substitution
	file_in  is the input filename
	file_out is the output filename

This is a good spot to discuss filename handling. EasyDTD requires an input filename. A file extension of .EZY is assumed if no extension is given. Thus, file_in is treated as file_in.ezy. The output filename is generated from the input filename when none is specified. The base name used is the base name of the input file and the extension is selected based on the options specified. The generated output file extensions are:

Thus, the command line arguments of:


	easydtd -k its_easy
reads its_easy.ezy and writes a skeleton OmniMark program to its_easy.xom.

Run EasyDTD with verbose mode, max tag length of 32, and <p> short ref with the following line:

	EASYDTD -VT'32'P FILE.IN

A couple of the flags need a little additional explanation.

-a
EasyDTD generates a skeleton .APP file for S-Engine instead of a DTD. This is a great time saver if you are using S-Engine.
-c
Include a default SGML declaration in the output DTD
-e
Show Elements
-g
Generate content model entity references. This option causes the content model for elements that are parameter entities to be written to the file ENTITY.ENT. The entities can be merged into the DTD via the MERGENT.EXE program included with this distribution.
-k
Generate OmniMark .XOM to process elements
-l
Flag long element names. When the -q option is specified, adding -l causes informational messages identifying element names longer than the current maximum tag length to print on the screen.
-m
causes a minimization picture of '- o' to be inserted in elements instead of the default '- -'.
-n
Generate elements that are entities. Elements that are parameter entities will have element statements created and written to the output DTD.
-o'cc'
sets the default content model occurrence indicator to cc. The default is none.
-p
causes the short ref code for <p>, so that a blank line will be interpreted as <p>.
-q
Suppress informational messages. Informational messages such as "Duplicate element ..." are not written to the screen. Error messages are.
-s'cc'
sets the default content model separator to cc. The normal default is ','.
-t'nn'
sets the maximum length for tag names. The default is 8 per the SGML standard. Tag names longer than 8 are reported in error messages, but the tag is processed as is. The max length is 31.
-v
causes a series of informational messages to print while processing takes place.
-w
Generate an AWK program to perform General Entity substitution on an SGML document instance.
-x
Generate only the entity portion of the DTD.
-y
Generate a skeleton Data Dictionary for the DTD described by the EasyDTD input file. The Data Dictionary is described later in the document.
-z
Create an AWK search and replace program for non-entity replacements. The other AWK option is specifically for entity substitution.

This example generates a more complex DTD than the simple sample given previously. The Generate Content Model Entity References option is used to generated ENTITY.ENT that must be merged back into the DTD file. You will need the following files for this example:

The final output file will be REPORT.DTD. The DTD is meant to illustrate EasyDTD features and does not represent a complete DTD.

The EasyDTD input file, REPORT.EZY, is:


;
;This is an example DTD specification for EasyDTD.
;
;
<ELEMENTS>
report
  fm?               -- Front Matter is ignored for this exercise --
  body
    section+ +%empha;
      titleblk?
        title?,
        author*
      (footnote|
      verse|
      qp|           -- Quoted Paragraph --
        %para       -- Paragraph objects --
          (ref|     -- Reference callout in the text --
          %empha|   -- Emphasis --
            emph|
            sup|    -- Superscript --
            sub     -- Subscript --
          #PCDATA)*
      p|            -- Paragraph --
        %para
      fig|          -- Figure --
      tab|          -- Table --
      refer|        -- References --
      eqn|          -- Equation --
      ol|           -- Ordered Lists , Attributes control type --
        %list
          li
          (li|
          p)+
      ul|         -- Unordered Lists --
        %list
      section)*   -- Recursive sections --
  bm?             -- Back Matter --
<ATTRIBUTES>
section number 
ref idref
refer id

The listing for REPORT.DTD after the ENTITY.ENT has been merged in is:

<!DOCTYPE report  [
<!-- 
     Skeleton DTD created by EASYDTD. 

     CAUTION: This DTD probably needs editing before use!!!
-->

                <!-- ENTITIES    -->
<!ENTITY % doctype "report"	-- Document type GI -->
<!ENTITY % para "((ref|%empha;|#PCDATA)*)" >
<!ENTITY % empha "emph|sup|sub" >
<!ENTITY % list "(li,(li|p)+)" >


                <!-- ELEMENTS    -->
<!--      ELEMENTS       MIN  CONTENT -->
<!ELEMENT %doctype;      - - (fm?,body,bm?)  >
<!ELEMENT fm             - - (#PCDATA)  -- Front Matter is ignored for this exercise -- >
<!ELEMENT body           - - (section+)  >
<!ELEMENT section        - - (titleblk?,(footnote|verse|qp|p|fig|tab|refer|eqn|ol|ul|section)*) +%empha; >
<!ELEMENT titleblk       - - (title?,author*)  >
<!ELEMENT title          - - (#PCDATA)  >
<!ELEMENT author         - - (#PCDATA)  >
<!ELEMENT footnote       - - (#PCDATA)  >
<!ELEMENT verse          - - (#PCDATA)  >
<!ELEMENT qp             - - (%para;)      -- Quoted Paragraph -- >
<!ELEMENT ref            - - (#PCDATA)     -- Reference callout in the text -- >
<!ELEMENT emph           - - (#PCDATA)  >
<!ELEMENT sup            - - (#PCDATA)     -- Superscript -- >
<!ELEMENT sub            - - (#PCDATA)     -- Subscript -- >
<!ELEMENT p              - - (%para;)      -- Paragraph -- >
<!ELEMENT fig            - - (#PCDATA)     -- Figure -- >
<!ELEMENT tab            - - (#PCDATA)     -- Table -- >
<!ELEMENT refer          - - (#PCDATA)     -- References -- >
<!ELEMENT eqn            - - (#PCDATA)     -- Equation -- >
<!ELEMENT ol             - - (%list;)      -- Ordered Lists , Attributes control type -- >
<!ELEMENT li             - - (#PCDATA)  >
<!ELEMENT ul             - - (%list;)      -- Unordered Lists -- >
<!ELEMENT bm             - - (#PCDATA)     -- Back Matter -- >

                <!-- ATTRIBUTES  -->
<!--      ELEMENT        NAME         VALUE    DEFAULT -->
<!ATTLIST section
                         number       CDATA    #IMPLIED
>
<!ATTLIST ref
                         idref        IDREF     #IMPLIED>
<!ATTLIST refer
                         id           ID        #IMPLIED>
]>

Note: The order of the inserted entities is probably wrong and will have to be edited before the DTD will parse.


CAPACITIES

After the stack overflow bug, I decided it would be a good thing to warn users about buffer size limits and such. I used an input file with approximately 1,000 elements, so that shouldn't be a problem for most applications.


KNOWN BUGS

The only bug I know of is that DOS interprets a '|' as a pipe command and I never figured out how to quote it and never provided an alternate character.


PLANNED ENHANCEMENTS

Planned enhancements include:

My list seems to get longer every time I get into the code. I have not particular order in mind. What usually happens is I need a particular change, then do a couple of easy ones to make it worth while to put out a new version.


SUMMARY

EasyDTD makes it very easy to create DTDs. It does not remove the requirement that the user have a knowledge of SGML and content models. EasyDTD provides a short hand notation for the DTD.

I hope you get a lot of use from EasyDTD. I have enjoyed writing and using it!

EasyDTD is written by:

Norman E. Smith
CompuServe:     72745,1566
Internet:       72745.1566@compuserve.com
                smithn@orvb.saic.com

EasyDTD is available via anonymous FTP from FTP.IFI.UIO.NO in /pub/SGML/Demo/easy2_3.zip or /pub/SGML/Demo/easy2_3.tar.z.