WWW::Search and AutoSearch ========================== WHAT IS NEW WITH WWW::Search 1.020? (12-Aug-98) ------------------------------------------------ overview: lots of bug fixes and new back-ends - bug fix: maximum_to_retrieve now works for very small values. (Problem identified by Vidyut Luther .) - new back-ends: ExciteForWebServers, FolioViews, Livelink, MSIndexServer, Null, Search97 all from Paul Lindner (thanks!) - bug fix: Gopher, PLweb, SFgate, Simple, Verity from Paul Lindner - bug fix: Lycos from John Heidemann - new test suites: PLweb, FolioViews, Null, MSIndexServer, Search97, SFgate, ExciteForWebServers rom Paul Lindner - bug fix: HotBot repair from Martin Thurn - known bug: When installing on Windows the makefile will break on lines beginning with a colon (:). Delete them and the install should work. Improvements to Windows installation are welcome. (I don't run Windows and so can't write or test this code.) - known bug: WWW::Search doesn't work on MacPerl because of end-of-line differences. A fix for this problem is in progress. (Problem identified and fix suggested by Chris Nandor.) Note: WWW::Search may have problems with older libwww's (5.08). If "make test" dies with an error in RobotUA, upgrade libwww. (Tested with libwww-5.30.) WHAT IS WWW::Search? -------------------- WWW::Search is a collection of Perl modules which provide an API to WWW search engines. Currently WWW::Search includes back-ends for variations of AltaVista, Dejanews, Excite, HotBot, Infoseek, Lycos, Magellan, PLweb, SFgate, Verity, WebCrawler, and Yahoo. We include two applications built from this library: AutoSearch (an program to automate tracking of search results over time), and WebSearch, a small demonstration program to drive the library. Back-ends for other search engines and more sophisticated clients are currently under development. Because WWW::Search depends on parsing the HTML output of web search engines it will fail of the search engine operators change their format (an unfortunately frequent occurrence). WWW::Search includes a test suite for most back-ends which verifies that it's functioning correctly. As of the day of the release the current back-end status is: AltaVista working (in test suite) Dejanews not working? not in test suite Excite working (in test suite) ExciteForWebServers working (in test suite) FolioViews working (in test suite) Gopher not working? not in test suite HotBot working (in test suite) Infoseek not working (in test suite) Livelink not working? not in test suite Lycos working (in test suite) Magellan working? (in test suite) MSIndexServer working (in test suite) Null working (in test suite) PLweb working (in test suite) Search97 working (in test suite) SFgate working (in test suite) Simple not working? not in test suite Verity not working not in test suite WebCrawler not working (in test suite) Yahoo working? (in test suite) Magellan and Yahoo's test suites are sometimes flakey. (others are currently under development, see contributors below for details) WHAT IS AutoSearch? ------------------- WWW::Search's primary client is AutoSearch. AutoSearch performs a web-based search and puts the results set in a web page. It periodically updates this web page, indicating how the search changes over time. Sample output from WWW::Search can be found at . Output format is configurable. See the man page for AutoSearch details, or Demonstration section below for the quick-start instructions. REQUIREMENTS ------------ WWW::Search requires Perl5 and libwww-perl. For information on Perl5, see . For libwww-perl, see . Both are also available from the Comprehensive Perl Archive Network (CPAN). Visit to find a CPAN site near you. At this time WWW::Search is tested under Perl version 5.004_04. AVAILABILITY ------------ The latest version of WWW::Search should always be available from . Alpha releases are only available here (not at CPAN). WWW::Search is also available as part of CPAN. Visit to find a CPAN site near you. Feedback about WWW::Search is encouraged. If you're using it for a neat application, please let us know. If you'd like to (or have) implemented a new back-end for WWW::Search, let us know so we don't duplicate work. INSTALLATION ------------ In order to use this package you will need Perl version 5.002 or better. You install WWW::Search, as you would install any perl module library, by running these commands: perl Makefile.PL make make test make install See below for a description of what "make test" does. If you want to install a private copy of WWW::Search in your home directory, then you should try to produce the initial Makefile with something like this command: perl Makefile.PL PREFIX=~/perl TESTING ------- The "make test" command compares expected output from WWW::Search with actual output. It detects two kinds of errors: - internal parsing: First it checks to make sure that your system computes the same results from my system based on some saved Web queries. This test should always pass; if it doesn't, send me mail. - external queries: Second, it makes real queries against the search engines and compares them with some saved results. External queries can fail for several reasons: - new pages have been added which match my test queries (not bad) - changes in the web search engine output which break WWW::Search's parsers (very bad) If the external tests fail, please either investigate the error or send a description of the problem and the output of "make test" to the maintainer of the back-end for the search engine that fails. You can find out who maintains the back-end by looking at the man page or code for the back-end in the lib/WWW/Search directory. DISCUSSION, BUG REPORTS, AND IMPROVEMENTS ----------------------------------------- A mailing list for WWW::Search discussion exists. To subscribe, send "subscribe info-www-search" as the body of a message to . Back-end-related bug reports (search engine whatever doesn't work) should be sent to the author of the back-end (back-end authors are identified in the corresponding man page and the output of ``make test''). General bugs should be reported to . When submitting a bug report, please remember to include - your version of perl - your version of WWW::Search - sample output showing the error - the output of "make test" DEMONSTRATION ------------- After installing the client programs, try WebSearch '"Your Name Here"' to see who's talking about you on the web. Then (in your web page directory), try AutoSearch -n 'me on the web' -s '"Your Name Here"' me and the web page me/index.html will be created summarizing this information. Then add 0 3 * * 1 AutoSearch /path/to/your/web/pages/me to your crontab(1) to update this search once a week. DOCUMENTATION ------------- See WWW/Search.pm for an overview of the library. POD-style documentation is included in all modules and scripts. These are normally converted to manual pages and installed as part of the "make install" process. You should also be able to use the 'perldoc' utility to extract documentation from the module files directly. FUTURE PLANS ------------ Some ideas: - application-level proxy support (I'm looking for a contribution here from someone who uses/needs proxy support) - more widespread use of new results tags across all back-ends - a freeze/restore interface to suspend and resume in-progress queries - more back-ends Now that the test suite is done I don't plans to add major new features, but contributions from others are always welcome. Send me e-mail if you plan a new back-end and to discuss architectural changes (to avoid duplicating work). RELEASE HISTORY --------------- 1.002: (11 October 1996) - First public release. 1.004: (31 October 1996) - new: AutoSearch, a client application (see below for details) - new: WWW::Search is now in CPAN (see GETTING WWW::Search for details) - bug fix: installation problem (no rule to make CLIENTS/search) fixed 1.005: (12 November 1996) - new: back-ends for HotBot, Lycos, and several AltaVista variants - new: application support for search-engine selection - new: application and library support for search-engine options 1.006: (25 November 1996) - private beta release, see 1.007 for list of new features 1.007: (17 December 1996) - new: back-ends for Dejanews (from Cesare Feroldi de Rosa), Infoseek (also from Cesare Feroldi de Rosa), and Excite (from GLen Pringle) - new: more fields in SearchResult (score, dates, etc., see the man page) (problem found by Cesare Feroldi de Rosa) - new: better error handling on network failures (AutoSearch should report errors on its pages, $search->response() provides an API for error reporting) - new (internal): user_agent handling has changed - new: proxy support added to WWW::Search (still needed in applications) (problem and fix suggested by T. V. Raman) - bug-fix: numerous documentation updates (problems found by Larry Virden) - bug-fix: AltaVista web search was occasionally dropping hits (problem found by Larry Virden, fixed by Bill Scheding) - bug-fix: all non-alphanumeric characters are now escaped (problem found by Larry Virden) 1.008: (8 January 1997) - private alpha release, see 1.009 for list of new features 1.009: (14 January 1997) overview: 1.009 is primarily a maintenance release to accommodate changes to LWP and some search engines. - change: search application renamed WebSearch (a more specific name) - bug-fix: the WWW::Search error in formatting is fixed (problem found by Larry Virden, fix by him and johnh) - bug-fix: RobotUA handling updated for new LWP in Search.pm - bug-fix: update for Infoseek (page format changed about 1 Jan 97) (problem found by Joseph McDonald, fix by Cesare Feroldi de Rosa) - bug-fix: update for Excite (page format changed about 9 Jan 97) (problem found by Juan Jose Amor, fix by GLen Pringle) 1.010: (20 August 1997) overview: an interim release to fix AltaVista - new: normalized_score, a back-end independent score (from Paul Lindner) - new: generic options are supported by several back-ends (specify search engine URL, debugging, etc.) - new: AltaVista back-end now sets SearchResult::raw - bug-fix: update for AltaVista (page format changed Jul 97) (some information wrt fix provided by Guy Decoux) 1.011: (8 October 1997) - internal alpha release, see 1.012 for list of new features 1.012: (3 November 1997) - Overview: an alpha release for test-suite testing - new: for testing, HTTP results can be saved to disk and played back - new: test scripts (try "make test") - bug-fix: Lycos works again and is now maintained by John Heidemann - bug-fix: AltaVista advanced and news searches have been repaired - bug-fix: some uninitialized value warnings suppressed (fix suggested by R. Chandrasekar (Mickey)) - new: new back-ends PLweb - new: documentation for PLweb (contributed by Paul Linder) - new: new back-ends: Gopher, Simple (contributed by Paul Linder) - new: WWW::Search mailing list: to subscribe, send "subscribe info-www-search" as the body of a message to 1.013, (19 February 1998) overview: this is an alpha release to include Martin's new back-ends - bug fix: HotBot back-end updated by Martin Thurn - new: Yahoo back-end now works, by Martin Thurn - problem: several back-ends don't work (Lycos) - problem: several back-ends don't have test suites and so may or may not work (DejaNews, Excite, HotBot, Infoseek, PLweb, SFgate, Verity, Yahoo) - reminder: WWW::Search mailing list: to subscribe, send "subscribe info-www-search" as the body of a message to 1.014, (24 March 1998) overview: this is an alpha release to fix the AltaVista/Lycos back-ends - bug fix: AltaVista/Lycos back-ends (problem reported by Bilal Siddiqui ) - known problem: some back-end test suites give intermittent results (AltaVista::News) - problem: several back-ends don't have test suites and so may or may not work (DejaNews, Excite, HotBot, Infoseek, PLweb, SFgate, Verity, Yahoo) 1.015, (2-Apr-98) overview: this is an alpha release with several new back-ends - new: back-ends: Magellan, WebCrawler (thanks to Martin Thurn) - bug fix: Yahoo/HotBot/Excite back-ends, with test suites. Many thanks to Martin Thurn. - bug fix: AltaVista news test suites have been relaxed, even though the code worked before, the test suites used to report false negatives. - bug fix: AltaVista is now more careful to detect the end of a hit's raw HTML - new: the test suite has been enhanced to be less sensitive to changes in what's indexed - problem: several back-ends don't have test suites and so may or may not work (DejaNews, Infoseek, PLweb, SFgate, Verity) - reminder: WWW::Search mailing list: to subscribe, send "subscribe info-www-search" as the body of a message to 1.016, 21-May-98 overview: this is an alpha to fix HotBot/Infoseek - bug fix: Infoseek/HotBot back ends now work again. (HotBot problem reported by Alan McCoy , both back-ends fixed by Martin Thurn) - addition: Infoseek test suite - addition: test output now includes the version number 1.017, 27-May-98 overview: this is the public release since 1.012 - bug fix: Lycos bug fix 1.018, 31-May-98 overview: back-end updates - bug fix: Excite and WebCrawler (by Martin Thurn), AltaVista (by John Heidemann) updated 30-May-98 - known bugs: WWW::Search doesn't work on MacPerl because of end-of-line differences. A fix for this problem is in progress. (Problem identified and fix suggested by Chris Nandor.) 1.019, 25-Jun-98 overview: back-end updates - bug fix: test suite bugs were causing false negatives on Yahoo, Excite, Magellan, WebCrawler (reported by Martin Thurn, fixed John Heidemann) - new feature: the test suite is now run daily (automatically). Output can be found at . - new feature: verbose mode of WebSearch is more verbose - bug fix: AltaVista was recording the RealName URL on some queries (bug reported by Vassilis Papadimos ) - bug fix: AltaVista wasn't correctly reporting change_time/size (bug and fix from Martin Valldeby ) SUPPORT AND CREDITS ------------------- The WWW::Search architecture is by John Heidemann with feedback from the other contributors. Components of AltaVista have been written by several people: APPLICATIONS: WebSearch John Heidemann AutoSearch William Scheding BACK-ENDS: AltaVista John Heidemann Dejanews Cesare Feroldi de Rosa Excite GLen Pringle and Martin Thurn ExciteForWebServers Paul Lindner (under development) FolioViews Paul Lindner Gopher Paul Lindner HotBot William Scheding and Martin Thurn Infoseek Cesare Feroldi de Rosa and Martin Thurn Livelink Paul Lindner Lycos William Scheding and John Heidemann Magellan Martin Thurn MSIndexServer Paul Lindner Null Paul Lindner PLWeb Paul Lindner Search97 Paul Lindner SFgate Paul Lindner Simple Paul Lindner Verity Paul Lindner WebCrawler Martin Thurn Yahoo William Scheding and Martin Thurn AutoSearch is based on an earlier implementation by Kedar Jog with advice from Joe Touch . Bugs and extensions (to the software and documentation) have been identified by William Scheding , T. V. Raman (proxy support), C. Feroldi , Larry Virden , Paul Lindner , Guy Decoux , R Chandrasekar (Mickey) , Martin Thurn , Chris Nandor , Martin Valldeby . Bugs have reported by Joseph McDonald , Juan Jose Amor , Bowen Dwelle , Vassilis Papadimos , Vidyut Luther . Feedback, bug reports and fixes, and new back-ends should be sent to John Heidemann . Before submitting a bug report, please check for any announcements about known bugs. When sending e-mail, please please put [WWW::Search] at the beginning of the subject line (or risk me losing the message in the pile). COPYRIGHT --------- Copyright (c) 1996 University of Southern California. All rights reserved. Redistribution and use in source and binary forms are permitted provided that the above copyright notice and this paragraph are duplicated in all such forms and that any documentation, advertising materials, and other materials related to such distribution and use acknowledge that the software was developed by the University of Southern California, Information Sciences Institute. The name of the University may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Portions of this README are derived from the README for libwww-perl. ISPELL ------ LocalWords: AltaVista Lycos Hotbot WebCrawler libwww perl com sn CPAN isi PL LocalWords: lsam pl pm perldoc README LocalWords AutoSearch Search's html usr LocalWords: crontab HotBot autosearch Scheding Kedar Dejanews Infoseek lib de LocalWords: SearchResult LCI wls Cesare Feroldi GLen Pringle pringle monash LocalWords: au Raman raman Virden lvirden cas org LWP WebSearch RobotUA Amor LocalWords: joe smartlink jjamor infor es Yahoo Thurn InfoSeek libwww's PLweb LocalWords: SFgate Lindner Jul wrt Decoux Chandrasekar Linder Martin's mthurn LocalWords: tasc DejaNews Bilal Siddiqui bilal siddiqui mankato msus Apr larc LocalWords: mccoy nasa gov paul lindner itu int decoux moulon inra LocalWords: fr mickeyc linc cis upenn Dwelle hotwired