HTML-Parser
-----------
This is a collection of modules that parse and extract information
from HTML documents. Bug reports and discussions about these modules
can be sent to the mailing list. Remember to also
look at the HTML-Tree package that creates and extracts information from
HTML syntax trees.
The modules present in this collection are:
HTML::Parser - The parser base class. It receives arbitrary sized
chunks of the HTML text, recognizes markup elements, and
separates them from the plain text. As different kinds of markup
and text are recognized, the corresponding event handlers are
invoked.
HTML::Entities - Provides functions to encode and decode text with
embedded HTML <entities>.
HTML::HeadParser - A lightweight HTML::Parser subclass that extracts
information from the section of an HTML document.
HTML::LinkExtor - An HTML::Parser subclass that extracts links from
an HTML document.
HTML::TokeParser - An alternative interface to the basic parser
that does not require event driven programming. Most simple
parsing needs are probably best attacked with this module.
Depreciated modules:
HTML::Filter - An HTML::Parser subclass that filters HTML text. You
will need to make a subclass if you want it to do more than
cat(1).
PREREQUISITES
In order to install and use this package you will need Perl version
5.004 or better. If you intend to use the HTML::HeadParser you need to
have libwww-perl installed.
INSTALLATION
Just follow the usual procedure:
perl Makefile.PL
make
make test
make install
COPYRIGHT
© 1995-2000 Gisle Aas. All rights reserved.
© 1999-2000 Michael A. Chase. All rights reserved.
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.