WWW::Search and AutoSearch
==========================
WHAT IS NEW WITH WWW::Search 1.025? (1999-06-30)
------------------------------------------------
overview:
* Bug fixes for some backends (as usual)
* New backend for AltaVista Intranet!
* New backend for Excite News!
* New test mechanism
See the file ChangeLog for details.
WHAT IS WWW::Search?
--------------------
WWW::Search is a collection of Perl modules which provide an API to
WWW (and similar) search engines. Currently WWW::Search includes
back-ends for variations of AltaVista, Dejanews, Excite, HotBot,
Infoseek, Lycos, Magellan, WebCrawler, and Yahoo, among others. We
include two applications built from this library: AutoSearch (a
program to automate tracking of search results over time), and
WebSearch, a small demonstration program to drive the library.
Because WWW::Search depends on parsing the HTML output of web search
engines it will fail of the search engine operators change their
format (an unfortunately frequent occurrence). WWW::Search includes a
test suite for most back-ends which verifies that it is functioning
correctly. As of the day of the release the current back-end status
is:
AltaVista working
Crawler partially working
Dejanews working
Excite working
ExciteForWebServers not working
Fireball not working
FolioViews working
Gopher not working? (not in test suite)
HotBot working
HotFiles not working? (not in test suite)
Infoseek working
Livelink not working? (not in test suite)
Lycos working
Magellan working
Metapedia not working? (not in test suite)
MSIndexServer not working
NorthernLight partially working (doesn't handle multi-page returns)
Null working
PLweb working
Profusion working
Search97 working
SFgate working
Simple not working? (not in test suite)
Snap working
Verity not working (not in test suite)
WebCrawler working
Yahoo working
ZDNet not working? (not in test suite)
``Paritially working'' indicates that some tests passed and some failed.
WHAT IS AutoSearch?
-------------------
WWW::Search's primary client is AutoSearch. AutoSearch performs a
web-based search and puts the results set in a web page. It
periodically updates this web page, indicating how the search changes
over time. Sample output from AutoSearch can be found at
. Output format is
configurable.
See the man page for AutoSearch details, or Demonstration section
below for the quick-start instructions.
REQUIREMENTS
------------
WWW::Search requires Perl5 and libwww-perl. For information on Perl5,
see . For libwww-perl, see
. Both are also available from the
Comprehensive Perl Archive Network (CPAN). Visit
to find a CPAN site near you.
At this time the primary WWW::Search development and testing is under
perl version 5.005_05 on Sun Sparc Solaris 2.5.1 and 7.
AVAILABILITY
------------
The latest version of WWW::Search should always be available on CPAN.
Feedback about WWW::Search is encouraged. If you're using it for a
neat application, please let us know. If you'd like to (or have)
implemented a new back-end for WWW::Search, let us know so we don't
duplicate work.
INSTALLATION
------------
In order to use this package you will need Perl version 5.002 or
better. You install WWW::Search, as you would install any perl module
library, by running these commands:
perl Makefile.PL
make
make test
make install
See below for a description of what "make test" does.
If you want to install a private copy of WWW::Search in your home
directory, then you should try to produce the initial Makefile with
something like this command:
perl Makefile.PL PREFIX=/my/perl/lib
TESTING
-------
The "make test" command compares expected output from WWW::Search with
actual output. It detects two kinds of errors:
- internal parsing:
First it checks to make sure that your system computes
the same results as my system based on some saved
Web queries. This test should always pass for working
backends; if it doesn't, send me mail.
- external queries:
Second, it makes real queries against the search engines
and compares them with some saved results.
External queries can fail for several reasons:
- new pages have been added which match the test queries
(not a bad thing)
- changes in the web search engine output which break WWW::Search's
parsers (a bad thing)
If the external tests fail, please either investigate the error or
send a description of the problem and the output of "make test" to the
maintainer of the back-end for the search engine that fails. You can
find out who maintains the back-end by looking at the documentation
(e.g. perldoc WWW::Search::Infoseek).
DISCUSSION, BUG REPORTS, AND IMPROVEMENTS
-----------------------------------------
A mailing list for WWW::Search discussion exists. To subscribe, send
"subscribe info-www-search" as the body of a message to
.
Back-end-related bug reports ("search engine ABC doesn't work") should
be sent to the author of the back-end (back-end authors are identified
in the corresponding man page and the output of ``make test'').
General bugs should be reported to .
When submitting a bug report, please remember to include
- your version of perl
- your version of WWW::Search
- sample output showing the error
- the output of "make test"
DEMONSTRATION
-------------
After installing the client programs, try
WebSearch '"Your Name Here"'
to see who's talking about you on the web.
Then (in your web page directory), try
AutoSearch -n 'me on the web' -s '"Your Name Here"' me
and the web page me/index.html will be created summarizing
this information.
Then add
0 3 * * 1 AutoSearch /path/to/your/web/pages/me
to your crontab(1) to update this search once a week.
DOCUMENTATION
-------------
See `perldoc WWW::Search` for an overview of the library.
POD-style documentation is also included in all modules
and scripts.
FUTURE PLANS
------------
Some ideas:
- application-level proxy support (I'm looking for a contribution
here from someone who uses/needs proxy support)
- more widespread use of new results tags across all back-ends
- a freeze/restore interface to suspend and resume in-progress queries
- more back-ends
Contributions from others are always welcome. Send me e-mail if you
plan a new back-end and to discuss architectural changes (to avoid
duplicating work).
SUPPORT AND CREDITS
-------------------
The WWW::Search architecture is by John Heidemann with feedback from
the other contributors. NOTE: This list is not updated; consult the
on-line documentation to find out who is currently maintaining each
component.
PLATFORM SUPPORT:
Unix John Heidemann
Windows Jim Smyser
(see )
APPLICATIONS:
WebSearch John Heidemann
AutoSearch William Scheding
BACK-ENDS:
AltaVista John Heidemann
Dejanews Cesare Feroldi de Rosa
and Martin Thurn
Crawler Andreas Borchert
Excite GLen Pringle
and Martin Thurn
ExciteForWebServers Paul Lindner
Fireball Andreas Borchert
FolioViews Paul Lindner
Gopher Paul Lindner
HotBot William Scheding and Martin Thurn
HotFiles Jim Smyser
Infoseek Cesare Feroldi de Rosa and Martin Thurn
Livelink Paul Lindner
Lycos William Scheding and John Heidemann,
Martin Thurn
Magellan Martin Thurn
MSIndexServer Paul Lindner
NorthernLight Jim Smyser
Null Paul Lindner
OpenDirectory Jim Smyser
PLWeb Paul Lindner
Profusion Jim Smyser
Search97 Paul Lindner
SFgate Paul Lindner
Simple Paul Lindner
Snap Jim Smyser
Verity Paul Lindner
WebCrawler Martin Thurn
Yahoo William Scheding and Martin Thurn
ZDNet Jim Smyser
AutoSearch is based on an earlier implementation by Kedar Jog
with advice from Joe Touch .
Bugs and extensions (to the software and documentation) have been
identified by William Scheding , T. V. Raman
(proxy support), C. Feroldi ,
Larry Virden , Paul Lindner ,
Guy Decoux , R Chandrasekar (Mickey)
, Martin Thurn ,
Chris Nandor , Martin Valldeby
, Jim Smyser , Darren
Stalder , Neil Bowers
, Ave Wrigley ,
Andreas Borchert , Jim Smyser
.
Bugs have reported by Joseph McDonald , Juan Jose
Amor , Bowen Dwelle , Vassilis
Papadimos , Vidyut Luther ,
Chris P. Acantilado .
Feedback, bug reports and fixes, and new back-ends should be sent to
Martin Thurn . When sending e-mail, please
please put [WWW::Search] at the beginning of the subject line (or risk
me losing the message in the pile).
COPYRIGHT
---------
Copyright (c) 1996 University of Southern California.
All rights reserved.
Redistribution and use in source and binary forms are permitted
provided that the above copyright notice and this paragraph are
duplicated in all such forms and that any documentation, advertising
materials, and other materials related to such distribution and use
acknowledge that the software was developed by the University of
Southern California, Information Sciences Institute. The name of the
University may not be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Portions of this README are derived from the README for libwww-perl.
ISPELL
------
LocalWords: AltaVista Lycos Hotbot WebCrawler libwww perl com sn CPAN isi PL
LocalWords: lsam pl pm perldoc README LocalWords AutoSearch Search's html usr
LocalWords: crontab HotBot autosearch Scheding Kedar Dejanews Infoseek lib de
LocalWords: SearchResult LCI wls Cesare Feroldi GLen Pringle pringle monash
LocalWords: au Raman raman Virden lvirden cas org LWP WebSearch RobotUA Amor
LocalWords: joe smartlink jjamor infor es Yahoo Thurn InfoSeek libwww's PLweb
LocalWords: SFgate Lindner Jul wrt Decoux Chandrasekar Linder Martin's mthurn
LocalWords: tasc DejaNews Bilal Siddiqui bilal siddiqui mankato msus Apr larc
LocalWords: mccoy nasa gov paul lindner itu int decoux moulon inra
LocalWords: fr mickeyc linc cis upenn Dwelle hotwired