HTML::Mason::Admin - Mason Administrator's Guide
This guide is intended for the sysadmin/webmaster in charge of installing, configuring, or tuning a Mason system.
There are three ways to configure a Mason site:
The next three sections discuss these methods in detail. We recommend that you start with the simplest method and work your way forward as the need for flexibility arises.
It is important to note that you cannot mix httpd.conf configuration
directives with a handler script. Depending on how you declare your
PerlHandler
, one or the other will always take precedence and the
other will be ignored.
Mason is very flexible, and you can replace parts of it by creating your own classes. This documentation assumes that you are simply using the classes provide in the Mason distribution. Customizing and subclassing is covered in the Subclassing document.
The absolutely most minimal configuration looks like this:
PerlModule HTML::Mason::ApacheHandler
<FilesMatch "\.html$"> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </FilesMatch>
This configuration tells Apache to serve all .html files under your document root through Mason. The PerlModule line tells mod_perl to load Mason once at startup time, saving time and memory.
Mason's configuration parameters are set via mod_perl's PerlSetVar
and PerlAddVar
directives (the latter is only available in mod_perl
version 1.24 and greater). Though these parameters are all strings in
your httpd.conf file, Mason treats them as if they were several
different types:
eval
'ed. This is used
for parameters that expect subroutine references. For example, an
anonymous subroutine might look like:
PerlSetVar MasonOutMode "sub { ... }"
A named subroutine call would look like this:
PerlSetVar MasonOutMode "\&handle_output"
PerlAddVar
for the values, like this:
PerlAddVar MasonPreloads /foo/bar/baz.comp PerlAddVar MasonPreloads /foo/bar/quux.comp
As noted above, PerlAddVar
is only available in mod_perl 1.24 and
up. This means that it is only possible to assign a single value
(using PerlSetVar
) to list parameters if you are using a mod_perl
older than 1.24.
See HTML::Mason::Params for a full list of parameters.
For maximum flexibility, you may choose to write a custom script to
create your Mason objects and handle requests. In our documentation
and examples we call this script handler.pl
and place it in the
Apache conf/
subdirectory, though you may name it and place it
wherever you like.
The handler.pl
file is responsible for creating the AapcheHandler
object and supplying the many parameters that control how your
components are parsed and executed. It also provides the opportunity
to execute arbitrary code at three junctures: the server
initialization, the beginning of a request, and the end of a request.
Here is a simple handler.pl
, also provided in the eg/
directory:
#!/usr/bin/perl # # A basic, functional Mason handler.pl. # package MyMason::MyApp;
# Bring in Mason with Apache support. use HTML::Mason::ApacheHandler; use strict;
# List of modules that you want to use within components. { package HTML::Mason::Commands; use Data::Dumper; }
# Create ApacheHandler object at startup. my $ah = HTML::Mason::ApacheHandler->new();
sub handler { my ($r) = @_;
my $status = $ah->handle_request($r); return $status; }
1;
Copy this file into your Apache conf/
subdirectory, and place the
following into your httpd.conf:
PerlRequire conf/handler.pl
<FilesMatch "\.html$"> SetHandler perl-script PerlHandler MyMason::MyApp # notice - no ::ApacheHandler! </FilesMatch>
replacing MyMason::MyApp with a package name of your choosing.
At this point, your configuration should act identically to a minimal httpd configuration. You can now configure your server by:
HTML::Mason::ApacheHandler->new( ... );
The component root marks the top of your component hierarchy. When running Mason with the CGIHandler or ApacheHandler modules, this defaults to your document root.
The component root defines how component paths are translated into real file paths. If your component root is /usr/local/httpd/docs, a component path of /products/index.html translates to the file /usr/local/httpd/docs/products/index.html.
One cannot call a component outside the component root. If Apache passes a file through Mason that is outside the component root (say, as the result of an Alias) you will get a 404 and a warning in the logs.
You may also specify multiple component roots in the spirit of Perl's
@INC
. Each root is assigned a key that identifies the root
mnemonically to a component developer. For example, in httpd.conf:
PerlAddVar MasonCompRoot "private => /usr/home/joe/comps" PerlAddVar MasonCompRoot "main => /usr/local/www/htdocs"
or in handler.pl:
comp_root => [ [ private => '/usr/home/joe/comps' ], [ main => '/usr/local/www/htdocs' ] ]
This specifies two component roots, a main component tree and a
private tree which overrides certain components. The order is
respected ala @INC
, so private is searched first and main second.
(We chose the =>
notation because it looks cleaner, but note that
this is a list of lists, not a hash.)
Keys must be unique in a case-insensitive comparison.
The data directory is a writable directory that Mason uses for various features and optimizations. By default, it is a directory called ``mason'' under your Apache server root.
Mason will create the directory on startup, if necessary, and set its permissions according to the web server User/Group.
Components will often need access to external Perl modules. There are three basic ways to bring them in.
PerlModule CGI PerlModule LWP
component(s)
that use the module.
<%once> use CGI ':standard'; use LWP; </%once>
{ package HTML::Mason::Commands; use CGI ':standard'; use LWP; ... }
Each method has its own trade-offs:
HTML::Mason::Commands
), meaning that exported
method names and other symbols will be usable from components. The
first method, in contrast, will import symbols into the main
package. The significance of this depends on whether the modules
export symbols and whether you want to use them from components.
Mason should be prevented from serving images, tarballs, and other binary files as regular components. Performance will suffer, and such a file may inadvertently contain a Mason character sequence such as ``<%''.
There are several ways to restrict which file types are handled by Mason.
One way is to specify a filename pattern in the Apache configuration, e.g.:
<FilesMatch "(\.html|\.txt|^[^\.]+)$"> SetHandler perl-script PerlHandler HTML::Mason </FilesMatch>
This directs Mason to handle only files with .html or .txt extension, as well as those files with no extension.
Another way, if you are using a handler.pl
script, is to include
a line like the following at the top of your handler()
subroutine:
return -1 if $r->content_type && $r->content_type !~ m|^text/|i;
This line handles requests for text/* MIME types, such as text/html and text/plain, and declines others.
Users may exploit a server-side scripting environment by invoking scripts with malicious or unintended arguments. Mason administrators need to be particularly wary of this because of the tendency to break out ``subroutines'' into individually accessible file components.
For example, a Mason developer might create a helpful shared component for performing sql queries:
$m->comp('sql_select', table=>'employee', where=>'id=315');
This is a perfectly reasonable component to create and call internally, but clearly presents a security risk if accessible via URL:
http://www.foo.com/sql_select?table=credit_cards&where=*
Of course a web user would have to obtain the name of this component through guesswork or other means, but obscurity alone does not properly secure a system. Rather, you should choose a site-wide policy for distinguishing top-level components from private components, and make sure your developers stick to this policy. You can then prevent private components from being served.
One solution is to place all private components inside a directory, say /private, that lies under the component root but outside the document root.
Another solution is to decide on a naming convention, for example, that all private components begin with ``_'', or that all top-level components must end in ``.html''. Then turn all private requests away with a 404 NOT_FOUND (rather than, say, a 403 FORBIDDEN which would provide more information than necessary). Use either an Apache directive
PerlModule Apache::Constants <FilesMatch "^_"> SetHandler perl-script PerlInitHandler Apache::Constants::NOT_FOUND </FilesMatch>
or a handler.pl
directive:
return 404 if $r->filename =~ m{^_[^/]+$};
Even after you've safely protected internal components, top-level components that process arguments (such as form handlers) still present a risk. Users can invoke such a component with arbitrary argument values via a handcrafted query string. Always check incoming arguments for validity and never place argument values directly into SQL, shell commands, etc.
By default Mason will decline requests for directories, leaving Apache to serve up a directory index or a FORBIDDEN as appropriate. Unfortunately this rule applies even if there is a dhandler in the directory: /foo/bar/dhandler does not get a chance to handle a request for /foo/bar/.
If you would like Mason to handle directory requests, do the following:
1. Set the decline_dirs parameter to 0.
2. If you are using a handler.pl
and it contains a ``return -1'' line
to decline non-text requests (as given in the previous section), add a
clause allowing directory types:
return -1 if $r->content_type && $r->content_type !~ m|^text/|i && $r->content_type !~ m|directory$|i;
The dhandler that catches a directory request is responsible for setting a reasonable content type.
Global variables can make programs harder to read, maintain, and debug, and this is no less true for Mason components. Due to the persistent mod_perl environment, globals require extra initialization and cleanup care.
That said, there are times when it is very useful to make a value available to all Mason components: a DBI database handle, a hash of user session information, the server root for forming absolute URLs.
If you are using a handler.pl
script you can initialize the global there,
either outside the handler()
subroutine (if you only need to set it
once) or inside (if you need to set it every request). Because Mason
by default parses components in strict
mode, you'll need to invoke
use vars
to avoid a fatal globals warning.
{ package HTML::Mason::Commands; use vars qw($server_root); }
...
$HTML::Mason::Commands::server_root = "http://www.mysite.com/";
Alternatively, you can initialize the global in the <%once> or
<%init> section of a top-level autohandler
:
<%once> use vars qw($server_root); $server_root = "http://www.mysite.com/"; <%once>
Mason does not have a built-in session mechanism. However, with a page
or so of code in your handler.pl
, you can integrate Jeffrey Baker's
Apache::Session
into your application and make a tied global
session variable available to all components.
The Mason Sessions How-To, at ..., is the best source of information about this surprisingly tricky subject.
Data caching is implemented with DeWitt Clinton's Cache::Cache
module. For full understanding of this section you should read the
documentation for Cache::Cache
as well as for relevant subclasses
(e.g. Cache::FileCache
).
Cache::FileCache
is the subclass used for data caching,
although this may be overriden by the developer. Cache::FileCache
creates a separate subdirectory for every component that uses caching,
and one file some number of levels underneath that subdirectory for
each cached item. The root of the cache tree is
data_dir/cache
. The name of the cache subdirectory for a component
is determined by the function HTML::Mason::Utils::data_cache_namespace
.
$m->cache
is called, Mason passes to the cache
constructor the namespace
, username
, and cache_root
options,
along with any other options given in the $m->cache
method.
You may specify other default constructor options with the data_cache_defaults parameter. For example,
data_cache_defaults => { cache_class => 'SizeAwareFileCache', cache_depth => 2, default_expires_in => '1 hour' }
Any options passed to individual $m->cache
calls override these
defaults.
data_cache_defaults => {cache_class => 'NullCache'}
This subclass faithfully implements the cache API but never stores data.
This section explains Mason's various performance enhancements and how to administer them.
When Mason loads a component, it places it in a memory cache.
The maximum size of the cache is specified with the Interp's code_cache_max_size parameter; default is 10MB. When the cache fills up, Mason frees up space by discarding a number of components. The discard algorithm is least frequently used (LFU), with a periodic decay to gradually eliminate old frequency information. In a nutshell, the components called most often in recent history should remain in the cache. Very large components (over 20% of the maximum cache size) never get cached, on the theory that they would force out too many other components.
Note that the ``size'' of a component in memory cannot literally be measured. It is estimated by the length of the source text plus some overhead. Your process growth will not match the code cache size exactly.
You can prepopulate the cache with components that you know will be accessed often; see Preloading. Note that preloaded components possess no special status in the cache and can be discarded like any others.
Naturally, a cache entry is invalidated if the corresponding component source file changes.
To turn off code caching completely, set Interp's code_cache_max_size to 0.
The in-memory code cache is only useful on a per-process basis. Each process must build and maintain its own cache. Shared memory caches are conceivable in the future, but even those will not survive between web server restarts.
As a secondary, longer-term cache mechanism, Mason stores a compiled
form of each component in an object file under
data_dir/obj/component-path
. Any server process can eval the
object file and save time on parsing the component source file. The
object file is recreated whenever the source file changes.
Besides improving performance, object files can be useful for debugging. If you feel the need to see what your source has been translated into, you can peek inside an object file to see exactly how Mason converted a given component to a Perl object. This is crucial for pre-1.10 Mason, in which error line numbers are based on the object file rather than the source file.
If you change any Compiler or Lexer parameters, you must remove object files previously created under that compiler or lexer for the changes to take effect.
If for some reason you don't want Mason to create object files, set the Interp's use_object_files parameter to 0.
You can tell Mason to preload a set of components in the parent process, rather than loading them on demand, using the Interp's preloads parameter. Each child server will start with those components loaded in the memory cache. The trade-offs are:
Try to preload components that are used frequently and do not change often. (If a preloaded component changes, all the children will have to reload it from scratch.)
As described above, Mason checks the timestamp of a component source file every time that component is called. This can add up to a lot of file stats.
If you have a live site with infrequent and well-controlled updates, you may choose to use static_source mode. In this mode Mason will not check source timestamps when it uses an in-memory cache or object file. The disadvantage is that you must remove object files and restart the server whenever you change component source; however this process can be easily automated.
When an error occurs, Mason can respond by:
The first behavior is ideal for development, where you want immediate feedback on the error. The second behavior is usually desired for production so that users are not exposed to messy error messages. You choose the behavior by setting error_mode to ``output'' or ``fatal'' respectively.
These examples extend the single site configurations given so far.
If you want to share some components between your sites, arrange your httpd.conf so that all DocumentRoots live under a single component space:
# Web site #1 <VirtualHost www.site1.com> DocumentRoot /usr/local/www/htdocs/site1 <Location /> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </Location> </VirtualHost>
# Web site #2 <VirtualHost www.site2.com> DocumentRoot /usr/local/www/htdocs/site2 <Location /> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler </Location> </VirtualHost>
# Mason configuration PerlSetVar MasonCompRoot "/usr/local/www/htdocs" PerlSetVar MasonDataDir "/usr/local/mason" PerlModule HTML::Mason::ApacheHandler
The directory structure for this scenario might look like:
/usr/local/www/htdocs/ # component root +- shared/ # shared components +- site1/ # DocumentRoot for first site +- site2/ # DocumentRoot for second site
Incoming URLs for each site can only request components in their respective DocumentRoots, while components internally can call other components anywhere in the component space. The shared/ directory is a private directory for use by components, inaccessible from the Web.
Sometimes your sites need to have completely distinct component
hierarchies, e.g. if you are providing Mason ISP services for multiple
users. In this case the component root must change depending on the
site requested. Since you can't change an interpreter's component root
dynamically, you need to maintain separate ApacheHandler
objects
for each site in your handler.pl
:
my %ah; foreach my $site (qw(site1 site2 site3)) { $ah{$site} = new HTML::Mason::ApacheHandler (comp_root => "/usr/local/www/$site", data_dir => "/usr/local/mason/$site"); }
...
sub handler { my ($r) = @_; my $site = $r->dir_config('site'); $ah{$site}->handle_request($r); }
We assume each virtual server configuration section has a
PerlSetVar site <site_name>
Above we pre-create all Mason objects in the parent. Another scheme is to create objects on demand in the child:
my %ah;
...
sub handler { my ($r) = @_; my $site = $r->dir_config('site'); unless exists($ah{$site}) { # get comp_root from PerlSetVar as well my $comp_root = $r->dir_config('comp_root'); $ah{$site} = new HTML::Mason::ApacheHandler(comp_root=>$comp_root,...); } }
The advantage of the second scheme is that you don't have to hardcode
as much information in the handler.pl
. The disadvantage is a slight
memory and performance impact. On development servers this shouldn't
matter; on production servers you may wish to profile the two schemes.
Although Mason is most commonly used in conjunction with mod_perl, the APIs are flexible enough to use in any environment. Below we describe the two most common alternative environments, CGI and standalone scripts.
The easiest way to use Mason via a CGI script is with the CGIHandler module module.
Here is a skeleton CGI script that calls a component and sends the output to the browser.
#!/usr/bin/perl use HTML::Mason::CGIHandler;
my $h = new HTML::Mason::CGIHandler ( data_dir => '/home/jethro/code/mason_data', );
$h->handle_request;
The relevant portions of the httpd.conf file look like:
DocumentRoot /path/to/comp/root ScriptAlias /cgi-bin/ /path/to/cgi-bin/
Action html-mason /cgi-bin/mason_handler.cgi <FilesMatch "\.html$"> SetHandler html-mason </FilesMatch>
This simply causes Apache to call the mason_handler.cgi script every time a file under the component root is requested. This script uses the CGIHandler class to do most of the heavy lifting. See that class's documentation ofr more details.
Mason can be used as a pure text templating solution -- like Text::Template and its brethren, but with more power (and of course more complexity).
Here is a bare-bones script that calls a component file and sends the result to standard output:
my $interp = HTML::Mason::Interp->new (out_method=>\$outbuf); $interp->exec(<absolute-file-path>, <args>...);
Because no component root was specified, the root is set to '/' and any file on the system may be used as a component. If you have a well defined and contained component tree, you'll probably want to specify a component root.
Because no data directory was specified, object files will not be created and data caching will not work in the default manner. If performance is an issue, you will want to specify a data directory.
Here's a slightly fuller script that specifies a component root and data directory, and captures the result in a variable rather than sending to standard output:
my $outbuf; my $interp = HTML::Mason::Interp->new (comp_root => '/path/to/comp_root', data_dir => '/path/to/data_dir', out_method => \$outbuf ); $interp->exec(<component-path>, <args>...);
Jonathan Swartz <swartz@pobox.com>, Dave Rolsky <autarch@urth.org>, Ken Williams <ken@mathforum.org>
HTML::Mason, HTML::Mason::Interp, HTML::Mason::ApacheHandler, HTML::Mason::Lexer, HTML::Mason::Compiler