Simon Burns
The intention is to augment or replace the use of so-called `censorware' in certain environments, given that such systems of censorship are inevitably unreliable and imprecise1.
Squidalyser is an interactive, web-based tool to help with the scrutiny and analysis of squid2 access logs. Scrutiny because you may be interested in what your users are looking at on the web, and squidalyser makes this very easy. Analysis because you may be interested in patterns of usage for squid, or which Internet sites are being accessed the most often -- squidalyser makes this easy too. The program is designed primarily for use in schools, although it should be flexible enough to be used in many other environments.
This document and squidalyser itself is a work in progress, and may be inaccurate, incomplete or harmful -- or all three. As with information of any kind, they come with no warranty and you use them at your own risk. Feedback and bug fixes are welcome -- please send them to `squidalyser@ababa.org'.
As with all good software (which squidalyser aspires to be) this program and its associated documentation is released under the GPL:
http://www.gnu.org/licenses/gpl.txtThis means you can modify, redistribute and even sell the software, provided you adhere to the provisions of the licence -- not least that you grant the same rights to any recipients of this software.
The program consists of a number of perl scripts. The first, squidparse.pl, takes the information in your squid logfile and inserts it into a MySQL database. The second script, squidalyser.pl, runs through a web browser interface, and allows you to perform very specific or general queries on the database, to track web usage. A couple of auxiliary scripts make it easier to search for occurrences of lists of words in your logfile, and to track groups of users rather than individuals.
For example, you can:
You should download the latest version of the squidalyser program, currently
http://ababa.org/dist/squidalyser-0.2.53.tar.gzEnter the following command to unpack the files:
tar -zxvf squidalyser-0.2.53.tar.gzThis will create a `squidalyser-02.53' directory, into which the files will be unpacked.
This document boldly assumes MySQL is running on your system: if not, download the appropriate MySQL distribution from www.mysql.com and follow the installation instructions. To set up the database required by squidalyser, you need to:
Access mysql using this command:
mysql -u root -p`root' is the name of the user who has sufficient privileges to set up users and databases, and may be different on your system (ask your systems administrator if you are unsure). The `-p' option specifies that you need a password to access mysql: if you haven't set a root password, consult the MySQL documentation to find out how to do so.
Create the database by typing:
create database squid;Note the semi-colon at the end of the line!
Then grant permissions to the squidalyser user:
grant all privileges on squid.* to squidalyser@localhost identified by 'password';You will need to devise your own password -- make sure you wrap it in quotation marks when you type the command. Type `exit' or Ctrl-D to quit mysql.
The database consists of two tables, which are created from the `squidalyser.sql' file in the `sql' subdirectory of the `squidalyser-0.2.5' directory:
mysql squid -u squidalyser -p < squidalyser.sqlThis also inserts a few rows of data in the tables, so you can test that everything works.
Use these commands to test the squidalyser database:
mysql squid -u squidalyser -p
To gain access to the database.
show tables;
This should tell you that there are tables called `groups', `logfile', `members' and `wordlist'.
desc logfile;
There are eight fields in the table: id, remotehost, rfc931, authuser, request, status, bytes and time.
select count(*) from logfile;
There are 102 rows in the database, each representing an access to one web URL (ie a page, graphic, etc).
select sum(bytes) from logfile;
The resources downloaded total 918,531 bytes.
select max(bytes) from logfile;
The largest single item downloaded was 143,854 bytes.
select rfc931, max(bytes) from logfile group by rfc931;
This should show you the maximum file-size downloaded for each user in the database.If you saw any error messages, you probably didn't follow the instructions to the letter, so go back and try again. If you need to go right back to the start, you can erase the database by typing
drop database squid;
(Note: you only need to do this if you need or want to start again!)If everything worked, you should clear the test data from the database:
delete from logfile;Then type `exit' or press Ctrl-D to quit mysql.
Perl modules extend the functionality of perl. The squidalyser scripts require the modules listed in section 2.1 above. You will need to be the `root' user on your system to install them5 -- if you are not, contact your systems administrator and ask for them to be installed.
This is not a tutorial on installing these modules. However, you can download them from www.cpan.org, or install them using
perl -MCPAN -e shellIf you are unsure about any of this, consult the CPAN FAQ at
http://www.cpan.org/misc/cpan-faq.htmlpaying particular attention to the sections entitled `How do I install Perl modules?' and `Where do I find Perl DBI/DBD/database documentation?'
The squidparse.pl script takes data from the squid logfile and inserts it into the MySQL database. It probably needs to be run as `root', since it needs to read the logfile and ordinary users can't, under normal circumstances, do this. Copy the script and its configuration file to the appropriate location on your computer:
mkdir /usr/local/squidparse
cd squidalyser-0.2.53
cp squidparse/* /usr/local/squidparseNext create a crontab entry to run the squidparse.pl script each morning:
crontab -eThis will invoke your editor. Type this line at the end of that file:
00 03 * * * /usr/local/squidparse/squidparse.plThen save the file, and the squidparse.pl script will be run each morning at 3am. You may decide you want to run it more frequently for a busy site -- consult the cron documentation for information about how to do this.
Since the script needs access to the database you set up, you need to edit squidalyser.conf to tell it the database username and password, etc. Use an editor to amend the information in the configuration file -- there are comments in the file to explain the usage of each item. Blank lines and those starting with # are ignored.
squidalyser.pl is the web-based program which does all the work for you, allowing you to retrieve meaningful information from the database. Copy it to a CGI directory on your web-server. On Linux, this could be located at `/home/httpd/cgi-bin' or `/var/www/cgi-bin' -- check with your systems administrator if unsure. To copy the files to the appropriate location:
cp cgi-bin/* /var/www/cgi-bin/Then set the permissions and ownership:
chown apache: /var/www/cgi-bin/*.pl
chmod 755 /var/www/cgi-bin/*.plYour web-server may run under a different username, with `web', `httpd' and `nobody' being likely alternatives on a Linux system. Look in your httpd.conf for the `User' directive if you are unsure.
Finally, copy the icons from the `icons' subdirectory to your webserver's `icons' directory:
cp icons/* /var/www/icons
There is little point running squidalyser if your database contains no data! You can run the squidparse.pl script `by hand' if you wish, although it can take a while if your logfile is large. To do this, type:
cd /usr/local/squidalyser
./squidalyser.plThen wait, possibly for a few minutes, for the information to be inserted into MySQL.
You should find that using the program is easy. Access it from:
http://localhost/cgi-bin/squidalyser.pl(or alter the hostname to suit your setup).
When it is invoked, the script extracts all usernames from the database and places them in a list. You can select multiple items from this list, or `All' to see information relating to all users. However, the `All' option can cause browser overload6, and so is not recommended. If you select any other item, it will take priority even if you have the `All' option highlighted (on the assumption that you selected it, or failed to deselect it, by accident).
This should be a useful feature: it will return only those items which contain the sub-string in the URL. Here are some ideas about how you might use it:
To speed up searches, and reduce the quantity of information returned, it is recommended that you enter start and end times for the searches. Using Perl's excellent Time::ParseDate module means you can enter dates and times in free-form. For example:
This allows you to specify an IP address for the system used to access the web resources -- ie the one the user was sitting at, not your proxy server.
The word list feature allows you to search against a list of words, rather than entering them one at a time in the `Sub-string match' field. Click on the word list tab at the top of the screen, and enter the words in the first field -- either one word, or a list separated by commas or commas and spaces. Click on `Add' to add them to the word list, which is stored in the database so it will still be there next time you use the program. To remove words from the list, select them from the list-box and click on `Remove'.
When using squidalyser to query the database, do not enter words in the `Sub-string match' field; instead, click on the `Check against word list' option and submit the search. All URLs matching any word in the word list will be returned in the results.
To save you selecting and deselecting usernames from the list on the squidalyser main-screen, you can define lists of users using the `Group manager'. When you select a group name from the `Groups' drop-down menu, and submit the search, all users in that group will be included in the database search.
To create a group, click on the `Group manager' tab at the top of the screen, enter the name of the group in the first field, and click on the `Create' button. Other fields and buttons will appear on-screen, to allow you to add users to the group, or remove them from the group.
You can create more than one group, and switch between them (or delete them) using the `Edit or delete group' menu.
That's all folks! Check the web-page for new releases, which will also be notified on Freshmeat, COLA, etc. Fan-mail is appreciated, as are cash donations *8) and constructive criticisms. Bug reports and discussion are also invited -- email squidalyser@ababa.org. I hope the program proves to be useful, reliable and effective; let me know if I'm wrong.
I plan to look at these issues for a future release of the program. Items completed since earlier releases are show in italics.
Other squid logfile analysis programs can be found at
http://www.squidcache.org/ScriptsSarg is recommended :-)
Thanks to those who have contacted me about squidalyser. If you find yourself using the program on a regular basis, and if it proves useful, please let me know how you are using it. Please indicate if I may publish this information on the web-page (fully anonymised).
Apart from encouraging use of squidalyser, this information will help me to understand how the program is being used in real life. This will in turn feed into future development and, I hope, lead to a better program.
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.50)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir2445Dd7nEq/lyx_tmpbuf2445aRVgOW/squidalyser-02_5.tex
The translation was initiated by Simon Burns on 2002-05-25