lib/HTML/Mason
subdirectory of the distribution. After that, you can edit it by hand,
following the comments inside.
``make install'' copies Config.pm to your Perl library directory (e.g. /usr/lib/perl5/site_perl/HTML/Mason
) along with the other module files. This allows Mason internally to grab
the configuration data with ``use HTML::Mason::Config
''.
Currently this file controls:
o Whether or not certain optional modules, such as Time::HiRes, should be loaded for enhanced features.
o The type of DBM and the serialization method used for Mason's data caching. If you plan to use data caching, make sure that the DBM package is a good-quality one (DB_File or GDBM_File).
DocumentRoot /usr/local/www/htdocs PerlRequire /usr/local/mason/handler.pl DefaultType text/html <Location /> SetHandler perl-script PerlHandler HTML::Mason </Location>
handler.pl
creates three Mason objects: the Parser, Interpreter, and ApacheHandler.
The Parser compiles components into Perl subroutines; the Interpreter
executes those compiled components; and the Apache handler routes mod_perl
requests to Mason. These objects are created once in the parent httpd and
then copied to each child process.
These objects have a fair number of possible parameters. Only two of them are required, comp_root and data_dir; these are discussed in the next two subsections. The various parameters are documented in the individual reference manuals for each object: HTML::Mason::Parser, HTML::Mason::Interp, and HTML::Mason::ApacheHandler.
The advantage of embedding these parameters in objects is that advanced
configurations can create more than one set of objects, choosing which set
to use at request time. For example, suppose you have a staging site and a
production site running on the same web server, distinguishing between them
with a configuration variable called version
:
# Create Mason objects for staging site my $parser1 = new HTML::Mason::Parser; my $interp1 = new HTML::Mason::Interp (parser=>$parser1, ...); my $ah1 = new HTML::Mason::ApacheHandler (interp=>$interp1);
# Create Mason objects for production site my $parser2 = new HTML::Mason::Parser; my $interp2 = new HTML::Mason::Interp (parser=>$parser2, ...); my $ah2 = new HTML::Mason::ApacheHandler (interp=>$interp2);
sub handler { ...
# Choose the right ApacheHandler if ($r->dir_config('version') eq ' staging') { $ah1->handle_request($r); } else { $ah2->handle_request($r); } }
When Mason handles a request, the request filename ($r->filename
) must be underneath your component root -- that way Mason has a legitimate
component to start with. If the filename is not under the component root,
Mason will place a warning in the error logs and return a 404.
Unfortunately if your component root or document root goes through a soft
link, Mason will have trouble comparing the paths and will return 404. To
fix this, set your document root to the true path.
cache: data cache files debug: debug files etc: miscellaneous files obj: compiled components preview: preview settings files
These directories will be discussed in appropriate sections throughout this manual.
{ package HTML::Mason::Commands; use CGI ':standard'; use LWP::UserAgent; ... }
In any case, for optimal memory utilization, make sure all Perl modules are used in the parent process, and not in components. Otherwise, each child allocates its own copy and you lose the benefit of shared memory between parent processes and their children. See Vivek Khera's mod_perl tuning FAQ (perl.apache.org/tuning) for details.
To work around this conflict, Mason remembers all directories and files
created at startup, returning them in response to
$interp->files_written
. This list can be fed to a chown()
at the end of the startup
code in handler.pl
:
chown (scalar(getpwnam "nobody"), scalar(getgrnam "nobody"), $interp->files_written);
my %session; my $cookie = $r->header_in('Cookie'); $cookie =~ s/SESSION_ID=(\w*)/$1/; tie %session, 'Apache::Session::File', $cookie, {'Directory' => '/tmp/session'}; $r->header_out("Set-Cookie" => "SESSION_ID=$session{_session_id};") if ( !$cookie ); local *HTML::Mason::Commands::session = \%session;
This code is customizable; you can change the user ID location (e.g. URL instead of cookie), the user data storage mechanism (e.g. DBI database), and the name of the global hash.
my
) variables in components, there is very little need for globals at all.
That said, there are times when it is very useful to make a value available
to all Mason components. One example is the Apache request object; this is
automatically made available as the global $r. Other examples are a DBI
database handler or a hash of user session information. Usually you
initialize the global in your handler.pl, either outside the
handler()
subroutine (if you only need to set it once) or
inside (if you need to set it every request).
Mason by default parses components in strict
mode, so you can't simply start referring to a new global or you'll get a
fatal warning. The solution is to invoke use vars
inside the package that components execute in, by default HTML::Mason::Commands:
{ package HTML::Mason::Commands; use vars qw($dbh %session); }
Alternatively you can use the allow_globals parameter or method:
my $parser = new HTML::Mason::Parser (..., allow_globals => [qw($dbh %session)]); $parser->allow_globals(qw($foo @bar))
The only advantage to allow_globals is that it will do the right thing if you've chosen a different package for components to run in (via the in_package Parser parameter.)
Similarly, to initialize the variable in handler.pl, you need to set it in the component package:
$HTML::Mason::Commands::dbh = DBI->connect(...);
Alternatively you can use the set_global interpreter method:
$interp->set_global(dbh => DBI->connect(...));
Again, set_global will do the right thing if you've chosen a different package for components.
Now when referring to these globals inside components, you can use the plain variable name:
$dbh->prepare...
The most important task is selecting a good DBM package. Most standard DBM packages (SDBM, ODBM, NDBM) are unsuitable for data caching due to significant limitations on the size of keys and values. Perl only comes with SDBM, so you'll need to obtain a good-quality package if you haven't already. At this time the best options are Berkeley DB (DB_File) version 2.x, available at www.sleepycat.com, and GNU's gdbm (GDBM), available at GNU mirror sites everywhere. Stay away from Berkeley DB version 1.x on Linux which has a serious memory leak (and is unfortunately pre-installed on many distributions).
As far as the serialization methods, all of them should work fine. Data::Dumper is probably simplest: it comes with the latest versions of Perl, is required by Mason anyway, and produces readable output (possibly useful for debugging cache files). On the other hand Storable is significantly faster than the other options according to the MLDBM documentation.
data_dir/cache
, replacing slashes in the component path with ``::''. For example, the
cache file for component /foo/bar
is data_dir/cache/foo::bar
.
Currently Mason never deletes cache files, not even when the associated component file is modified. (This may change in the near future.) Thus cache files hang around and grow indefinitely. You may want to use a cron job or similar mechanism to delete cache files that get too large or too old. For example:
# Shoot cache files more than 30 days old foreach (<data_dir/cache>) { # path to cache directory unlink $_ if (-M >= 30); }
In general you can feel free to delete cache files periodically and without warning, because the data cache mechanism is explicitly not guaranteed -- developers are warned that cached data may disappear anytime and components must still function.
If some reason you want to disable data caching, specify use_data_cache=>0 to the Interp object. This will cause all mc_cache calls to return undef without doing anything.
$r
) and calls the same PerlHandler that Apache called. Debug files are
created under data_dir/debug/<username>
for authenticated users, otherwise they are placed in
data_dir/debug/anon
. Several ApacheHandler parameters are required to activate and configure
debug files:
/usr/bin/perl
. This is used in the Unix ``shebang'' line at the top of each debug file.
handler.pl
script. Debug files invoke
handler.pl
just as Apache does as startup, to load needed modules and create Mason
objects.
handler.pl
. This routine is called with the saved Apache request object.
ApacheHandler
constructor with all debug options:
my $ah = new HTML::Mason::ApacheHandler (interp=>$interp, debug_mode=>'all', debug_perl_binary=>'/usr/local/bin/perl', debug_handler_script=>'/usr/local/mason/eg/handler.pl', debug_handler_proc=>'HTML::Mason::handler');
When replaying a request through a debug file, the global variable
$HTML::Mason::IN_DEBUG_FILE
will be set to 1. This is useful if you want to omit certain flags (like
preloading) in handler.pl when running under debug. For example:
my %extra_flags = ($HTML::Mason::IN_DEBUG_FILE) ? () : (preloads=>[...]); my $interp = new HTML::Mason::Interp (..., %extra_flags);
Follow these steps to activate the Previewer:
Listen your.site.ip.address:3001 ... Listen your.site.ip.address:3005
You'll also probably want to restrict access to these ports in your access.conf. If you have multiple site developers, it is helpful to use username/password access control, since the previewer will use the username to keep configurations separate.
handler.pl
, add the line
use HTML::Mason::Preview;
somewhere underneath ``use HTML::Mason''. Then add code to your handler routine to intercept Previewer requests on the ports defined above. Your handler should end up looking like this:
sub handler { my ($r) = @_;
# Compute port number from Host header my $host = $r->header_in('Host'); my ($port) = ($host =~ /:([0-9]+)$/); $port = 80 if (!defined($port));
# Handle previewer request on special ports if ($port >= 3001 && $port <= 3005) { my $parser = new HTML::Mason::Parser(...); my $interp = new HTML::Mason::Interp(...); my $ah = new HTML::Mason::ApacheHandler (...); return HTML::Mason::Preview::handle_preview_request($r,$ah); } else { $ah->handle_request($r); # else, normal request handler } }
The three ``new'' lines inside the if block should look exactly the same as
the lines at the top of handler.pl
. Note that these separate Mason objects are created for a single request
and discarded. The reason is that the previewer may alter the objects'
settings, so it is safer to create new ones every time.
The format of the system log was designed to be easy to parse by programs, although it is not unduly hard to read for humans. Every event is logged on one line. Each line consists of multiple fields delimited by a common separator, by default ctrl-A. The first three fields are always the same: time, the name of the event, and the current pid ($$). These are followed by one or more fields specific to the event.
The events are:
EVENT NAME DESCRIPTION EXTRA FIELDS
REQ_START start of HTTP request request number, URL + query string REQ_END end of HTTP request request number, error flag (1 if error occurred, 0 otherwise) CACHE_READ attempt to read from component path, cache key, success data cache (mc_cache) flag (1 if item found, 0 otherwise) CACHE_STORE store to data cache component path, cache key COMP_LOAD component loaded into memory component path for first time
The request number is an incremental value that uniquely identifies each request for a given child process. Use it to match up REQ_START/REQ_END pairs.
To turn on logging, specify a string value to system_log_events containing one or more event names separated by '|'. In additional to individual event names, the following names can be used to specify multiple events:
REQUEST = REQ_START | REQ_END CACHE = CACHE_READ | CACHE_STORE ALL = All events
For example, to log REQ_START, REQ_END, and COMP_LOAD events, you could use system_log_events => ``REQUEST|COMP_LOAD'' Note that this is a string, not a set of constants or'd together.
Configuration Options
By default, the system log will be placed in data_dir/etc/system.log. You can change this with system_log_file.
The default line separator is ctrl-A. The advantage of this separator is
that it is very unlikely to appear in any of the fields, making it easy to
split()
the line. The disadvantage is that it will not always
display, e.g. from a Unix shell, making the log harder to read casually.
You can change the separator to any sequence of characters with system_log_separator.
The time on each log line will be of the form ``seconds.microseconds'' if
you are using Time::HiRes, and simply ``seconds'' otherwise. See
Config.pm
section.
Sample Log Parser
Here is a code skeleton for parsing the various events in a log. You can also find this in eg/parselog.pl in the Mason distribution.
open(LOG,"mason.log"); while (<LOG>) { chomp; my (@fields) = split("\cA"); my ($time,$event,$pid) = splice(@fields,0,3); if ($event eq 'REQ_START') { my ($reqnum,$url) = @fields; ... } elsif ($event eq 'REQ_END') { my ($reqnum,$errflag) = @fields; ... } elsif ($event eq 'CACHE_READ') { my ($comp,$key,$hitflag) = @fields; ... } elsif ($event eq 'CACHE_STORE') { my ($comp,$key) = @fields; ... } elsif ($event eq 'COMP_LOAD') { my ($comp) = @fields; ... } else { warn "unrecognized event type: $event\n"; } }
Suggested Uses
Performance: REQUEST events are useful for analyzing the performance of all Mason requests occurring on your site, and identifying the slowest requests. (You cannot measure this with standard Apache logs since they only record the end time of the request.) eg/perflog.pl in the Mason distribution is a log parser that outputs the average compute time of each unique URL, in order from slowest to quickest.
Server activity: REQUEST events are useful for determining what your web server children are working on, especially when you have a runaway. For a given process, simply tail the log and find the last REQ_START event with that process id. (You can also use the Apache status page for this, of course.)
Cache efficiency: CACHE events are useful for monitoring cache ``hit rates'' (number of successful reads over total number of reads) over all components that use a data cache. Because stores to a cache are more expensive than reads, a high hit rate is essential for the cache to have a beneficial effect. If a particular cache hit rate is too low, you may want to consider changing how frequently it is expired or whether to use it at all.
Load frequency: COMP_LOAD events are useful for determining which components are loaded most often and therefore good candidates for preloading.
data_dir/obj/component-path
. Future server processes can eval the object file and save time on
parsing. Besides improving performance, object files are essential for
debugging and interpretation of compilation errors. However, if you don't
want Mason to create object files (e.g. if disk space is scarce), you can
turn them off by passing use_object_files=>0 to the Interp object.
%my $name = "Jon"; Hello <% $name %>, how are you?
translates to something like:
my $name = "Jon"; print("Hello "); print($name); print(", how are you?");
The amount of memory taken up by a compiled component is therefore at least as large as the combined size of its HTML blocks. If a component has 50K of HTML, that means 50K of storage for each child process that loads the component. Multiply that by ten processes and twenty such components and you've got some noticeable memory overhead.
To reduce this overhead Mason generates, in certain cases, code that reads from the source file at runtime. For example, the following component:
<& top &> ... 20K of HTML ... <& center &> ... 30K of HTML ...
translates to something like:
my $_srctext = mc_file('/usr/local/www/htdocs/foo/bar'); mc_comp('top'); print(substr($_srctext,18,20498)); mc_comp('center'); print(substr($_srctext,20520,30720));
The resulting code is a bit slower but more memory efficient. Mason decides whether to use these ``source references'' by first measuring both the total size and the amount of HTML in a component. Those values are then examined by a customizable ``source_refer_predicate'' which makes a determination based on local policy, say ``more than 50% HTML'', or ``more than 20K of HTML''.
This feature requires no administration; I mention it simply so that you are not surprised to see zero size object files.
To remedy this, Mason has an accelerated mode that changes its behavior in two ways:
1. Does not check component source files at all, relying solely on object files. This means the developer or an automated system is responsible for recompiling any components that change and recreating object files, using the Parser parse method.
2. Rather than continuously checking whether object files have changed,
Mason monitors a ``reload file'' containing an ever-growing list of
components that have changed. Whenever a component changes, the developer
or an automated system is responsible for appending the component path to
the reload file. The reload file is kept in
data_dir/etc/reload.lst
.
You can activate this mode with the use_reload_file Interp method.
The advantage of using this mode is that Mason stats one file per request instead of ten or twenty. The disadvantage is a increase in maintenance costs as the object and reload files have to be kept up-to-date. Automated editorial tools, and cron jobs that periodically scan the component hierarchy for changes, are two possible solutions. The Mason content management system automatically handles this task.
The priorities for the staging site are rapid development and easy debugging, while the main priority for the production site is performance. This section describes various ways to adapt Mason for each case.
Which is better, batch or stream? It depends on the context.
For production web servers, stream mode is better because it gets data to the browser more quickly. A browser can only process and display data at a certain rate--streaming the data allows the browser to start working in parallel with the server, while waiting to the end serializes the task (first the server does all its work, then the browser does all its work). From a user perspective the initial bytes are especially important: until the browser receives some data, it simply displays a ``waiting'' message. Serving a computationally intense page in batch mode makes the server look unresponsive and tempts users to hit Stop, whereas in stream mode the browser at least acknowledges an answer and draws a background.
For development or staging web servers, batch mode has the advantage of better error handling. Suppose an error occurs in the middle of a page. In stream mode, the error message interrupts existing output, often appearing in an awkward HTML context such as the middle of a table which never gets closed. The user may see a partial page and have to ``View source'' to see the error message. In batch mode, the error message is output neatly and alone.
You control output mode by setting ah->output_mode
to ``batch'' or ``stream''.
When configuring Mason to serve multiple virtual hosts, Mason's comp_root must be separated from the DocumentRoot (since DocumentRoot changes per virtual server). In this case you'll want to collect all of your DocumentRoots inside a single component space:
# httpd.conf PerlRequire /usr/local/mason/handler.pl
# Web site #1 <VirtualHost www.site1.com> DocumentRoot /usr/local/www/htdocs/site1 <Location /> SetHandler perl-script PerlHandler HTML::Mason </Location> </VirtualHost>
# Web site #2 <VirtualHost www.site2.com> DocumentRoot /usr/local/www/htdocs/site2 <Location /> SetHandler perl-script PerlHandler HTML::Mason </Location> </VirtualHost>
In contrast to these big changes to httpd.conf, the Mason bootstrap in handler.pl stays the same:
my $interp = new HTML::Mason::Interp (parser=>$parser, comp_root=>'/usr/local/www/htdocs' data_dir=>'/usr/local/mason/');
The <Location> directives in this example now route all requests through Mason--every page is dynamic. The directory structure for this scenario might looks like this:
/usr/local/www/htdocs/ # component root +- shared/ # shared components +- site1/ # DocumentRoot for first site +- site2/ # DocumentRoot for second site
Incoming URLs for each site can only request components in their respective DocumentRoots, while components internally can call other components anywhere in the component space. The shared/ directory, then, is a private directory for use by components, inaccessible from the Web.