NAME
    Apache::HeavyCGI - Framework to run complex CGI tasks on an Apache
    server

SYNOPSIS
     use Apache::HeavyCGI; # see eg/ directory of the distribution
                           # for a complete example/template

WARNING UNSUPPORTED ALPHA CODE RELEASED FOR DEMO ONLY
    The release of this software is only for evaluation purposes to people
    who are actively writing code that deals with Web Application
    Frameworks. This package is probably just another Web Application
    Framework and may be worth using or may not be worth using. As of this
    writing (July 1999) it is by no means clear if this software will be
    developed further in the future. The author has written it over many
    years and is deploying it in several places, e.g.
    http://www.stadtplandienst.de, http://netlexikon.akademie.de and really
    soon on http://pause.perl.org too. It has turned out to be useful for
    him. YMMV.

    There is no official support for this software. If you find it useful or
    even if you find it useless, please mail the author directly.

    But please make sure you remember: THE RELEASE IS FOR DEMONSTRATION
    PURPOSES ONLY.

DESCRIPTION
    The Apache::HeavyCGI framework is intended to provide a couple of simple
    tricks that make it easier to write complex CGI solutions. It has been
    developed on a site that runs all requests through a single mod_perl
    handler that in turn uses CGI.pm or Apache::Request as the query
    interface. So Apache::HeavyCGI is -- as the name implies -- not merely
    for multi-page CGI scripts (for which there are other solutions), but it
    is for the integration of many different pages into a single solution.
    The many different pages can then conveniently share common tasks.

    The approach taken by Apache::HeavyCGI is a components-driven one with
    all components being pure perl. So if you're not looking for yet another
    embedded perl solution, and aren't intimidated by perl, please read on.

  Stacked handlers suck

    If you have had a look at stacked handlers, you might have noticed that
    the model for stacking handlers often is too primitive. The model
    supposes that the final form of a document can be found by running
    several passes over a single entity, each pass refining the entity,
    manipulating some headers, maybe even passing some notes to the next
    handler, and in the most advanced form passing pnotes between handlers.
    A lot of Web pages may fit into that model, even complex ones, but it
    doesn't scale well for pages that result out of a structure that's more
    complicated than adjacent items. The more complexity you add to a page,
    the more overhead is generated by the model, because for every handler
    you push onto the stack, the whole document has to be parsed and
    recomposed again and headers have to be re-examined and possibly
    changed.

  Why not subclass Apache

    Inheritance provokes namespace conflicts. Besides this, I see little
    reason why one should favor inheritance over a using relationship. The
    current implementation of Apache::HeavyCGI is very closely coupled with
    the Apache class anyway, so we could do inheritance too. No big deal I
    suppose. The downside of the current way of doing it is that we have to
    write

        my $r = $obj->{R};

    very often, but that's about it. The upside is, that we know which
    manpage to read for the different methods provided by "$obj-"{R}>,
    "$obj-"{CGI}>, and "$obj" itself.

  Composing applications

    Apache::HeavyCGI takes an approach that is more ambitious for handling
    complex tasks. The underlying model for the production of a document is
    that of a puzzle. An HTML (or XML or SGML or whatever) page is regarded
    as a sequence of static and dynamic parts, each of which has some
    influence on the final output. Typically, in today's Webpages, the
    dynamic parts are filled into table cells, i.e. contents between some
    "<TD></TD>" tokens. But this is not necessarily so. The static parts in
    between typically are some HTML markup, but this also isn't forced by
    the model. The model simply expects a sequence of static and dynamic
    parts. Static and dynamic parts can appear in random order. In the
    extreme case of a picture you would only have one part, either static or
    dynamic. HeavyCGI could handle this, but I don't see a particular
    advantage of HeavyCGI over a simple single handler.

    In addition to the task of generating the contents of the page, there is
    the other task of producing correct headers. Header composition is an
    often neglected task in the CGI world. Because pages are generated
    dynamically, people believe that pages without a Last-Modified header
    are fine, and that an If-Modified-Since header in the browser's request
    can go by unnoticed. This laissez-faire principle gets in the way when
    you try to establish a server that is entirely driven by dynamic
    components and the number of hits is significant.

  Header Composition, Parameter Processing, and Content Creation

    The three big tasks a CGI script has to master are Headers, Parameters
    and the Content. In general one can say, content creation SHOULD not
    start before all parameters are processed. In complex scenarios you MUST
    expect that the whole layout may depend on one parameter. Additionally
    we can say that some header related data SHOULD be processed very early
    because they might result in a shortcut that saves us a lot of
    processing.

    Consequently, Apache::HeavyCGI divides the tasks to be done for a
    request into four phases and distributes the four phases among an
    arbitrary number of modules. Which modules are participating in the
    creation of a page is the design decision of the programmer.

    The perl model that maps (at least IMHO) ideally to this task
    description is an object oriented approach that identifies a couple of
    phases by method names and a couple of components by class names. To
    create an application with Apache::HeavyCGI, the programmer specifies
    the names of all classes that are involved. All classes are singleton
    classes, i.e. they have no identity of their own but can be used to do
    something useful by working on an object that is passed to them.
    Singletons have an @ISA relation to the Class::Singleton manpage which
    can be found on CPAN. As such, the classes can only have a single
    instance which can be found by calling the "CLASS->instance" method.
    We'll call these objects after the mod_perl convention *handlers*.

    Every request maps to exactly one Apache::HeavyCGI object. The
    programmer uses the methods of this object by subclassing. The HeavyCGI
    constructor creates objects of the AVHV type (pseudo-hashes). If the
    inheriting class needs its own constructor, this needs to be an AVHV
    compatible constructor. A description of AVHV can be found in the fields
    manpage. An Apache::HeavyCGI object usually is constructed with the
    "new" method and after that the programmer calls the "dispatch" method
    on this object. HeavyCGI will then perform various initializations and
    then ask all nominated handlers in turn to perform the *header* method
    and in a second round to perform the *parameter* method. In most cases
    it will be the case that the availability of a method can be determined
    at compile time of the handler. If this is true, it is possible to
    create an execution plan at compile time that determines the sequence of
    calls such that no runtime is lost to check method availability. Such an
    execution plan can be created with the the Apache::HeavyCGI::ExePlan
    manpage module. All of the called methods will get the HeavyCGI request
    object passed as the second parameter.

    There are no fixed rules as to what has to happen within the "header"
    and "parameter" method. As a rule of thumb it is recommended to
    determine and set the object attributes LAST_MODIFIED and EXPIRES (see
    below) within the header() method. It is also recommended to inject the
    the Apache::HeavyCGI::IfModified manpage module as the last header
    handler, so that the application can abort early with an Not Modified
    header. I would recommend that in the header phase you do as little as
    possible parameter processing except for those parameters that are
    related to the last modification date of the generated page.

  Terminating the handler calls or triggering errors.

    Sometimes you want to stop calling the handlers, because you think that
    processing the request is already done. In that case you can do a

     die Apache::HeavyCGI::Exception->new(HTTP_STATUS => status);

    at any point within prepare() and the specified status will be returned
    to the Apache handler. This is useful for example for the
    Apache::HeavyCGI::IfModified module which sends the response headers and
    then dies with HTTP_STATUS set to Apache::Constants::DONE. Redirectors
    presumably would set up their headers and set it to
    Apache::Constants::HTTP_MOVED_TEMPORARILY.

    Another task for Perl exceptions are errors: In case of an error within
    the prepare loop, all you need to do is

     die Apache::HeavyCGI::Exception->new(ERROR=>[array_of_error_messages]);

    The error is caught at the end of the prepare loop and the anonymous
    array that is being passed to $@ will then be appended to
    "@{$self->{ERROR}}". You should check for $self->{ERROR} within your
    layout method to return an appropriate response to the client.

  Layout and Text Composition

    After the header and the parameter phase, the application should have
    set up the object that is able to characterize the complete application
    and its status. No changes to the object should happen from now on.

    In the next phase Apache::HeavyCGI will ask this object to perform the
    "layout" method that has the duty to generate an
    Apache::HeavyCGI::Layout (or compatible) object. Please read more about
    this object in the Apache::HeavyCGI::Layout manpage. For our HeavyCGI
    object it is only relevant that this Layout object can compose itself as
    a string in the as_string() method. As a layout object can be composed
    as an abstraction of a layout and independent of request-specific
    contents, it is recommended to cache the most important layouts. This is
    part of the reponsibility of the programmer.

    In the next step HeavyCGI stores a string representation of current
    request by calling the as_string() method on the layout object and
    passing itself to it as the first argument. By passing itself to the
    Layout object all the request-specific data get married to the
    layout-specific data and we reach the stage where stacked handlers
    usually start, we get at a composed content that is ready for shipping.

    The last phase deals with setting up the yet unfinished headers,
    eventually compressing, recoding and measuring the content, and
    delivering the request to the browser. The two methods finish() and
    deliver() are responsible for that phase. The default deliver() method
    is pretty generic, it calls finish(), then sends the headers, and sends
    the content only if the request method wasn't a HEAD. It then returns
    Apache's constant DONE to the caller, so that Apache won't do anything
    except logging on this request. The method finish is more apt to being
    overridden. The default finish() method sets the content type to
    text/html, compresses the content if the browser understands compressed
    data and Compress::Zlib is available, it also sets the headers Vary,
    Expires, Last-Modified, and Content-Length. You most probably will want
    to override the finish method.

    head2 Summing up +-------------------+ | sub handler {...} |
    +--------------------+ | (sub init {...}) | |Your::Class
    |---defines------>| | |ISA Apache::HeavyCGI| | sub layout {...} |
    +--------------------+ | sub finish {...} | +-------------------+

                                            +-------------------+
                                            | sub new {...}     |
     +--------------------+                 | sub dispatch {...}|
     |Apache::HeavyCGI    |---defines------>| sub prepare {...} |
     +--------------------+                 | sub deliver {...} |
                                            +-------------------+

     +----------------------+               +--------------------+
     |Handler_1 .. Handler_N|               | sub header {...}   |
     |ISA Class::Singleton  |---define----->| sub parameter {...}|
     +----------------------+               +--------------------+

                                                                           +----+
                                                                           |Your|
                                                                           |Duty|
     +----------------------------+----------------------------------------+----+
     |Apache                      | calls Your::Class::handler()           |    |
     +----------------------------+----------------------------------------+----+
     |                            | nominates the handlers,                |    |
     |Your::Class::handler()      | constructs $self,                      | ** |
     |                            | and calls $self->dispatch              |    |
     +----------------------------+----------------------------------------+----+
     |                            |        $self->init     (does nothing)  | ?? |
     |                            |        $self->prepare  (see below)     |    |
     |Apache::HeavyCGI::dispatch()| calls  $self->layout   (sets up layout)| ** |
     |                            |        $self->finish   (headers and    | ** |
     |                            |                         gross content) |    |
     |                            |        $self->deliver  (delivers)      | ?? |
     +----------------------------+----------------------------------------+----+
     |Apache::HeavyCGI::prepare() | calls HANDLER->instance->header($self) | ** |
     |                            | and HANDLER->instance->parameter($self)| ** |
     |                            | on all of your nominated handlers      |    |
     +----------------------------+----------------------------------------+----+

Object Attributes
    As already mentioned, the HeavyCGI object is a pseudo-hash, i.e. can be
    treated like a HASH, but all attributes that are being used must be
    predeclared at compile time with a "use fields" clause.

    The convention regarding attributes is as simple as it can be: uppercase
    attributes are reserved for the Apache::HeavyCGI class, all other
    attribute names are at your disposition if you write a subclass.

    The following attributes are currently defined. The module author's
    production environment has a couple of attributes more that seem to work
    well but most probably need more thought to be implemented in a generic
    way.

    CAN_GZIP
        Set by the can_gzip method. True if client is able to handle gzipped
        data.

    CAN_PNG
        Set by the can_png method. True if client is able to handle PNG.

    CAN_UTF8
        Set by the can_utf8 method. True if client is able to handle UTF8
        endoded data.

    CGI An object that handles GET and POST parameters and offers the method
        param() and upload() in a manner compatible with Apache::Request.
        Needs to be constructed and set by the user typically in the
        contructor.

    CHARSET
        Optional attribute to denote the charset in which the outgoing data
        are being encoded. Only used within the finish method. If it is set,
        the finish() method will set the content type to text/html with this
        charset.

    CONTENT
        Scalar that contains the content that should be sent to the user
        uncompressed. During te finish() method the content may become
        compressed.

    DOCUMENT_ROOT
        Unused.

    ERROR
        Anonymous array that accumulates error messages. HeavyCGI doesn't
        handle the error though. It is left to the user to set up a proper
        response to the user.

    EXECUTION_PLAN
        Object of type the Apache::HeavyCGI::ExePlan manpage. It is
        recommended to compute the object at startup time and always pass
        the same execution plan into the constructor.

    EXPIRES
        Optional Attribute set by the expires() method. If set, HeavyCGI
        will send an Expires header. The EXPIRES attribute needs to contain
        an the Apache::HeavyCGI::Date manpage object.

    HANDLER
        If there is an EXECUTION_PLAN, this attribute is ignored. Without an
        EXECUTION_PLAN, it must be an array of package names. HeavyCGI
        treats the packages as Class::Singleton classes. During the
        prepare() method HeavyCGI calls HANDLER->instance->header($self) and
        HANDLER->instance->parameter($self) on all of your nominated
        handlers.

    LAST_MODIFIED
        Optional Attribute set by the last_modified() method. If set,
        HeavyCGI will send a Last-Modified header of the specified time,
        otherwise it sends a Last-Modified header of the current time. The
        attribute needs to contain an the Apache::HeavyCGI::Date manpage
        object.

    MYURL
        The URL of the running request set by the myurl() method. Contains
        an URI::URL object.

    R   The Apache Request object for the running request. Needs to be set
        up in the constructor by the user.

    REFERER
        Unused.

    SERVERROOT_URL
        The URL of the running request's server-root set by the
        serverroot_url() method. Contains an URI::URL object.

    SERVER_ADMIN
        Unused.

    TIME
        The time when this request started set by the time() method. Please
        note, that the time() system call is considerable faster than the
        method call to Apache::HeavyCGI::time. The advantage of calling
        using the TIME attribute is that it is self-consistent (remains the
        same during a request).

    TODAY
        Today's date in the format 9999-99-99 set by the today() method,
        based on the time() method.

  Performance

    Don't expect Apache::HeavyCGI to serve 10 million page impressions a
    day. The server I have developed it for is a double processor machine
    with 233 MHz, and each request is handled by about 30 different
    handlers: a few trigonometric, database, formatting, and recoding
    routines. With this overhead each request takes about a tenth of a
    second which in many environments will be regarded as slow. On the other
    hand, the server is well respected for its excellent response times.
    YMMV.

BUGS
    The fields pragma doesn't mix very well with Apache::StatINC. When
    working with HeavyCGI you have to restart your server quite often when
    you change your main class. I believe, this could be fixed in fields.pm,
    but I haven't tried. A workaround is to avoid changing the main class,
    e.g. by delegating the layout() method to a different class.

AUTHOR
    Andreas Koenig <andreas.koenig@anima.de>. Thanks to Jochen Wiedmann for
    heavy debates about the code and crucial performance enhancement
    suggestions. The development of this code was sponsered by
    www.speed-link.de.