gnu.iou
Class sgmlp

java.lang.Object
  |
  +--gnu.iou.sgmlp

public abstract class sgmlp
extends java.lang.Object

SGML Lexer splits document text string into lines and columns delimited by tags.

Static function

The `parse' function returns a two dimensional array. The first dimension is lines, the second we'll sometimes call "columns" -- although each line can have an irregular number of columns (zero or more second dimension elements per line).

This is less than an SGML tokenizer, but a pre- tokenizer preserving lines, quoted attributes, and creating arrays within lines for tags and non- tags.

Each element of a line is checked for starting or ending with a less- than ("<") or greater- than (">") character to see if it is a tag, or part of a tag. Tags can span multiple lines, so the opening less- than (LT) character may not be closed by a closing greater- than (GT) character till the next line.

The `parseValidate' function only returns a non- null result when the source text contains a "< - >" valid document possessing at least one "< - >" tag.

Author:
John Pritchard (john@syntelos.org)

Field Summary
static java.lang.String EQ
          Interned "="
 
Constructor Summary
sgmlp()
           
 
Method Summary
static void main(java.lang.String[] argv)
           
static java.lang.String[][] parse(java.io.InputStream in)
          Parse text string into lines and columns.
static java.lang.String[][] parse(java.io.InputStream in, boolean validate, boolean blindquotes)
           
static java.lang.String[][] parse(java.lang.String text)
          Parse text string into lines and columns.
static java.lang.String[][] parse(java.lang.String text, boolean validate, boolean blindquotes)
           
static java.lang.String[][] parseValidateClient(java.io.InputStream in)
          Parse as a template, returning null if there are no template objects in the text.
static java.lang.String[][] parseValidateClient(java.lang.String text)
          Parse as a template, returning null if there are no template objects in the text.
static java.lang.String[][] parseValidateServer(java.io.InputStream in)
          Parse as a template, returning null if there are no template objects in the text.
static java.lang.String[][] parseValidateServer(java.lang.String text)
          Parse as a template, returning null if there are no template objects in the text.
static java.lang.String[] tag_tokenizer(java.lang.String tag)
          Split a tag into tokens according to tag syntax, preserving quoted attribute values, stripping leading SGML tag "start" and "end" ('<', '>') characters.
static java.lang.String trim_value(java.lang.String att_value)
          Normalize an SGML tag attribute value, stripping symmetric quotes, returning null for empty strings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EQ

public static final java.lang.String EQ
Interned "="
Constructor Detail

sgmlp

public sgmlp()
Method Detail

parseValidateServer

public static final java.lang.String[][] parseValidateServer(java.lang.String text)
Parse as a template, returning null if there are no template objects in the text.

"Server- side" parse validation goes into quotes.


parseValidateClient

public static final java.lang.String[][] parseValidateClient(java.lang.String text)
Parse as a template, returning null if there are no template objects in the text.

"Client- side" parse validation ignores everything in quotes.


parse

public static final java.lang.String[][] parse(java.lang.String text)
Parse text string into lines and columns. One column per line unless there are "%...%" tokens in the line, in which case the "%...%" tokens are separated.
Parameters:
text - Text string

parseValidateServer

public static final java.lang.String[][] parseValidateServer(java.io.InputStream in)
Parse as a template, returning null if there are no template objects in the text.

"Server- side" parse validation goes into quotes.


parseValidateClient

public static final java.lang.String[][] parseValidateClient(java.io.InputStream in)
Parse as a template, returning null if there are no template objects in the text.

"Client- side" parse validation ignores everything in quotes.


parse

public static final java.lang.String[][] parse(java.io.InputStream in)
Parse text string into lines and columns. One column per line unless there are "%...%" tokens in the line, in which case the "%...%" tokens are separated.
Parameters:
text - Text string

parse

public static final java.lang.String[][] parse(java.lang.String text,
                                               boolean validate,
                                               boolean blindquotes)
Parameters:
text - Source with CRLF or LF newlines
validate - If source doesn't contain <sgml tokens>, return null.
blindquotes - If anything in quotes is blindly ignored.

parse

public static final java.lang.String[][] parse(java.io.InputStream in,
                                               boolean validate,
                                               boolean blindquotes)
Parameters:
in - Source
validate - If source doesn't contain <sgml tokens>, return null.
blindquotes - If anything in quotes is blindly ignored.

tag_tokenizer

public static final java.lang.String[] tag_tokenizer(java.lang.String tag)
Split a tag into tokens according to tag syntax, preserving quoted attribute values, stripping leading SGML tag "start" and "end" ('<', '>') characters. Does not require any particular elements of a tag, does not require a start or end character. Stops at an SGML tag end character that is not within a symmetrically quoted string.

Tag attributes are guaranteed as three tokens, as available: name string, equals character string, value string. This equals string is "interned" so that in processing the result, each element can be compared by value to a similarly interned string ("=".intern()) using the java equivalent value ("==") operator.

Returned tokens are all trim: no leading or trailing whitespace.

Parameters:
tag - A whole or part of an SGML tag. Ignores anything before the tag open character ('<').

trim_value

public static final java.lang.String trim_value(java.lang.String att_value)
Normalize an SGML tag attribute value, stripping symmetric quotes, returning null for empty strings.
Parameters:
att_value - Must not include leading or trailing whitespace, or otherwise be a string other than a bare tag attribute value.

main

public static void main(java.lang.String[] argv)