NAME
Apache::HeavyCGI - Framework to run complex CGI tasks on an Apache
server
SYNOPSIS
use Apache::HeavyCGI; # see eg/ directory of the distribution
# for a complete example/template
WARNING UNSUPPORTED ALPHA CODE RELEASED FOR DEMO ONLY
The release of this software is only for evaluation purposes to people
who are actively writing code that deals with Web Application
Frameworks. This package is probably just another Web Application
Framework and may be worth using or may not be worth using. As of this
writing (July 1999) it is by no means clear if this software will be
developed further in the future. The author has written it over many
years and is deploying it in several places, e.g.
http://www.stadtplandienst.de, http://netlexikon.akademie.de and really
soon on http://pause.perl.org too. It has turned out to be useful for
him. YMMV.
There is no official support for this software. If you find it useful or
even if you find it useless, please mail the author directly.
But please make sure you remember: THE RELEASE IS FOR DEMONSTRATION
PURPOSES ONLY.
DESCRIPTION
The Apache::HeavyCGI framework is intended to provide a couple of simple
tricks that make it easier to write complex CGI solutions. It has been
developed on a site that runs all requests through a single mod_perl
handler that in turn uses CGI.pm or Apache::Request as the query
interface. So Apache::HeavyCGI is -- as the name implies -- not merely
for multi-page CGI scripts (for which there are other solutions), but it
is for the integration of many different pages into a single solution.
The many different pages can then conveniently share common tasks.
The approach taken by Apache::HeavyCGI is a components-driven one with
all components being pure perl. So if you're not looking for yet another
embedded perl solution, and aren't intimidated by perl, please read on.
Stacked handlers suck
If you have had a look at stacked handlers, you might have noticed that
the model for stacking handlers often is too primitive. The model
supposes that the final form of a document can be found by running
several passes over a single entity, each pass refining the entity,
manipulating some headers, maybe even passing some notes to the next
handler, and in the most advanced form passing pnotes between handlers.
A lot of Web pages may fit into that model, even complex ones, but it
doesn't scale well for pages that result out of a structure that's more
complicated than adjacent items. The more complexity you add to a page,
the more overhead is generated by the model, because for every handler
you push onto the stack, the whole document has to be parsed and
recomposed again and headers have to be re-examined and possibly
changed.
Why not subclass Apache
Inheritance provokes namespace conflicts. Besides this, I see little
reason why one should favor inheritance over a using relationship. The
current implementation of Apache::HeavyCGI is very closely coupled with
the Apache class anyway, so we could do inheritance too. No big deal I
suppose. The downside of the current way of doing it is that we have to
write
my $r = $obj->{R};
very often, but that's about it. The upside is, that we know which
manpage to read for the different methods provided by "$obj-"{R}>,
"$obj-"{CGI}>, and "$obj" itself.
Composing applications
Apache::HeavyCGI takes an approach that is more ambitious for handling
complex tasks. The underlying model for the production of a document is
that of a puzzle. An HTML (or XML or SGML or whatever) page is regarded
as a sequence of static and dynamic parts, each of which has some
influence on the final output. Typically, in today's Webpages, the
dynamic parts are filled into table cells, i.e. contents between some
"
| " tokens. But this is not necessarily so. The static parts in
between typically are some HTML markup, but this also isn't forced by
the model. The model simply expects a sequence of static and dynamic
parts. Static and dynamic parts can appear in random order. In the
extreme case of a picture you would only have one part, either static or
dynamic. HeavyCGI could handle this, but I don't see a particular
advantage of HeavyCGI over a simple single handler.
In addition to the task of generating the contents of the page, there is
the other task of producing correct headers. Header composition is an
often neglected task in the CGI world. Because pages are generated
dynamically, people believe that pages without a Last-Modified header
are fine, and that an If-Modified-Since header in the browser's request
can go by unnoticed. This laissez-faire principle gets in the way when
you try to establish a server that is entirely driven by dynamic
components and the number of hits is significant.
Header Composition, Parameter Processing, and Content Creation
The three big tasks a CGI script has to master are Headers, Parameters
and the Content. In general one can say, content creation SHOULD not
start before all parameters are processed. In complex scenarios you MUST
expect that the whole layout may depend on one parameter. Additionally
we can say that some header related data SHOULD be processed very early
because they might result in a shortcut that saves us a lot of
processing.
Consequently, Apache::HeavyCGI divides the tasks to be done for a
request into four phases and distributes the four phases among an
arbitrary number of modules. Which modules are participating in the
creation of a page is the design decision of the programmer.
The perl model that maps (at least IMHO) ideally to this task
description is an object oriented approach that identifies a couple of
phases by method names and a couple of components by class names. To
create an application with Apache::HeavyCGI, the programmer specifies
the names of all classes that are involved. All classes are singleton
classes, i.e. they have no identity of their own but can be used to do
something useful by working on an object that is passed to them.
Singletons have an @ISA relation to the Class::Singleton manpage which
can be found on CPAN. As such, the classes can only have a single
instance which can be found by calling the "CLASS->instance" method.
We'll call these objects after the mod_perl convention *handlers*.
Every request maps to exactly one Apache::HeavyCGI object. The
programmer uses the methods of this object by subclassing. The HeavyCGI
constructor creates objects of the AVHV type (pseudo-hashes). If the
inheriting class needs its own constructor, this needs to be an AVHV
compatible constructor. A description of AVHV can be found in the fields
manpage. An Apache::HeavyCGI object usually is constructed with the
"new" method and after that the programmer calls the "dispatch" method
on this object. HeavyCGI will then perform various initializations and
then ask all nominated handlers in turn to perform the *header* method
and in a second round to perform the *parameter* method. In most cases
it will be the case that the availability of a method can be determined
at compile time of the handler. If this is true, it is possible to
create an execution plan at compile time that determines the sequence of
calls such that no runtime is lost to check method availability. Such an
execution plan can be created with the the Apache::HeavyCGI::ExePlan
manpage module. All of the called methods will get the HeavyCGI request
object passed as the second parameter.
There are no fixed rules as to what has to happen within the "header"
and "parameter" method. As a rule of thumb it is recommended to
determine and set the object attributes LAST_MODIFIED and EXPIRES (see
below) within the header() method. It is also recommended to inject the
the Apache::HeavyCGI::IfModified manpage module as the last header
handler, so that the application can abort early with an Not Modified
header. I would recommend that in the header phase you do as little as
possible parameter processing except for those parameters that are
related to the last modification date of the generated page.
Terminating the handler calls or triggering errors.
Sometimes you want to stop calling the handlers, because you think that
processing the request is already done. In that case you can do a
die Apache::HeavyCGI::Exception->new(HTTP_STATUS => status);
at any point within prepare() and the specified status will be returned
to the Apache handler. This is useful for example for the
Apache::HeavyCGI::IfModified module which sends the response headers and
then dies with HTTP_STATUS set to Apache::Constants::DONE. Redirectors
presumably would set up their headers and set it to
Apache::Constants::HTTP_MOVED_TEMPORARILY.
Another task for Perl exceptions are errors: In case of an error within
the prepare loop, all you need to do is
die Apache::HeavyCGI::Exception->new(ERROR=>[array_of_error_messages]);
The error is caught at the end of the prepare loop and the anonymous
array that is being passed to $@ will then be appended to
"@{$self->{ERROR}}". You should check for $self->{ERROR} within your
layout method to return an appropriate response to the client.
Layout and Text Composition
After the header and the parameter phase, the application should have
set up the object that is able to characterize the complete application
and its status. No changes to the object should happen from now on.
In the next phase Apache::HeavyCGI will ask this object to perform the
"layout" method that has the duty to generate an
Apache::HeavyCGI::Layout (or compatible) object. Please read more about
this object in the Apache::HeavyCGI::Layout manpage. For our HeavyCGI
object it is only relevant that this Layout object can compose itself as
a string in the as_string() method. As a layout object can be composed
as an abstraction of a layout and independent of request-specific
contents, it is recommended to cache the most important layouts. This is
part of the reponsibility of the programmer.
In the next step HeavyCGI stores a string representation of current
request by calling the as_string() method on the layout object and
passing itself to it as the first argument. By passing itself to the
Layout object all the request-specific data get married to the
layout-specific data and we reach the stage where stacked handlers
usually start, we get at a composed content that is ready for shipping.
The last phase deals with setting up the yet unfinished headers,
eventually compressing, recoding and measuring the content, and
delivering the request to the browser. The two methods finish() and
deliver() are responsible for that phase. The default deliver() method
is pretty generic, it calls finish(), then sends the headers, and sends
the content only if the request method wasn't a HEAD. It then returns
Apache's constant DONE to the caller, so that Apache won't do anything
except logging on this request. The method finish is more apt to being
overridden. The default finish() method sets the content type to
text/html, compresses the content if the browser understands compressed
data and Compress::Zlib is available, it also sets the headers Vary,
Expires, Last-Modified, and Content-Length. You most probably will want
to override the finish method.
head2 Summing up +-------------------+ | sub handler {...} |
+--------------------+ | (sub init {...}) | |Your::Class
|---defines------>| | |ISA Apache::HeavyCGI| | sub layout {...} |
+--------------------+ | sub finish {...} | +-------------------+
+-------------------+
| sub new {...} |
+--------------------+ | sub dispatch {...}|
|Apache::HeavyCGI |---defines------>| sub prepare {...} |
+--------------------+ | sub deliver {...} |
+-------------------+
+----------------------+ +--------------------+
|Handler_1 .. Handler_N| | sub header {...} |
|ISA Class::Singleton |---define----->| sub parameter {...}|
+----------------------+ +--------------------+
+----+
|Your|
|Duty|
+----------------------------+----------------------------------------+----+
|Apache | calls Your::Class::handler() | |
+----------------------------+----------------------------------------+----+
| | nominates the handlers, | |
|Your::Class::handler() | constructs $self, | ** |
| | and calls $self->dispatch | |
+----------------------------+----------------------------------------+----+
| | $self->init (does nothing) | ?? |
| | $self->prepare (see below) | |
|Apache::HeavyCGI::dispatch()| calls $self->layout (sets up layout)| ** |
| | $self->finish (headers and | ** |
| | gross content) | |
| | $self->deliver (delivers) | ?? |
+----------------------------+----------------------------------------+----+
|Apache::HeavyCGI::prepare() | calls HANDLER->instance->header($self) | ** |
| | and HANDLER->instance->parameter($self)| ** |
| | on all of your nominated handlers | |
+----------------------------+----------------------------------------+----+
Object Attributes
As already mentioned, the HeavyCGI object is a pseudo-hash, i.e. can be
treated like a HASH, but all attributes that are being used must be
predeclared at compile time with a "use fields" clause.
The convention regarding attributes is as simple as it can be: uppercase
attributes are reserved for the Apache::HeavyCGI class, all other
attribute names are at your disposition if you write a subclass.
The following attributes are currently defined. The module author's
production environment has a couple of attributes more that seem to work
well but most probably need more thought to be implemented in a generic
way.
CAN_GZIP
Set by the can_gzip method. True if client is able to handle gzipped
data.
CAN_PNG
Set by the can_png method. True if client is able to handle PNG.
CAN_UTF8
Set by the can_utf8 method. True if client is able to handle UTF8
endoded data.
CGI An object that handles GET and POST parameters and offers the method
param() and upload() in a manner compatible with Apache::Request.
Needs to be constructed and set by the user typically in the
contructor.
CHARSET
Optional attribute to denote the charset in which the outgoing data
are being encoded. Only used within the finish method. If it is set,
the finish() method will set the content type to text/html with this
charset.
CONTENT
Scalar that contains the content that should be sent to the user
uncompressed. During te finish() method the content may become
compressed.
DOCUMENT_ROOT
Unused.
ERROR
Anonymous array that accumulates error messages. HeavyCGI doesn't
handle the error though. It is left to the user to set up a proper
response to the user.
EXECUTION_PLAN
Object of type the Apache::HeavyCGI::ExePlan manpage. It is
recommended to compute the object at startup time and always pass
the same execution plan into the constructor.
EXPIRES
Optional Attribute set by the expires() method. If set, HeavyCGI
will send an Expires header. The EXPIRES attribute needs to contain
an the Apache::HeavyCGI::Date manpage object.
HANDLER
If there is an EXECUTION_PLAN, this attribute is ignored. Without an
EXECUTION_PLAN, it must be an array of package names. HeavyCGI
treats the packages as Class::Singleton classes. During the
prepare() method HeavyCGI calls HANDLER->instance->header($self) and
HANDLER->instance->parameter($self) on all of your nominated
handlers.
LAST_MODIFIED
Optional Attribute set by the last_modified() method. If set,
HeavyCGI will send a Last-Modified header of the specified time,
otherwise it sends a Last-Modified header of the current time. The
attribute needs to contain an the Apache::HeavyCGI::Date manpage
object.
MYURL
The URL of the running request set by the myurl() method. Contains
an URI::URL object.
R The Apache Request object for the running request. Needs to be set
up in the constructor by the user.
REFERER
Unused.
SERVERROOT_URL
The URL of the running request's server-root set by the
serverroot_url() method. Contains an URI::URL object.
SERVER_ADMIN
Unused.
TIME
The time when this request started set by the time() method. Please
note, that the time() system call is considerable faster than the
method call to Apache::HeavyCGI::time. The advantage of calling
using the TIME attribute is that it is self-consistent (remains the
same during a request).
TODAY
Today's date in the format 9999-99-99 set by the today() method,
based on the time() method.
Performance
Don't expect Apache::HeavyCGI to serve 10 million page impressions a
day. The server I have developed it for is a double processor machine
with 233 MHz, and each request is handled by about 30 different
handlers: a few trigonometric, database, formatting, and recoding
routines. With this overhead each request takes about a tenth of a
second which in many environments will be regarded as slow. On the other
hand, the server is well respected for its excellent response times.
YMMV.
BUGS
The fields pragma doesn't mix very well with Apache::StatINC. When
working with HeavyCGI you have to restart your server quite often when
you change your main class. I believe, this could be fixed in fields.pm,
but I haven't tried. A workaround is to avoid changing the main class,
e.g. by delegating the layout() method to a different class.
AUTHOR
Andreas Koenig . Thanks to Jochen Wiedmann for
heavy debates about the code and crucial performance enhancement
suggestions. The development of this code was sponsered by
www.speed-link.de.