Circa

Version française
 
     
 

Presentation

Circa is a search engine for your Web site, or for a list of sites. It indexes like Altavista does. It can read, add and parse all url's found in a page, if the page is on the same server.

Circa is free, under GNU license

Try-it !

Make a search on AlianWebServer :

Or try advanced search.

Features

  • Full text indexing
  • Different weights for title, keywords, description and rest of page HTML read can be given in configuration
  • Boolean query language support : or (efault) and ("+") not ("-"). Ex perl + faq -cgi : Documents with faq, eventually perl and not cgi.
  • Support protocol HTTP,FTP
  • Make index in MySQL
  • Read HTML and full text plain
  • Can do indexation of filesystem without talk to Web Server
  • Can browse site by directory / rubrique.
  • Several kinds of indexing : full, incremental, only on a particular server. Documents not updated are not reindexed. All requests for a file are made first with a head http request, for information such as validate, last update, size, etc.
  • Size of documents read can be restricted (Ex: don't get all documents > 5 MB). For use with low-bandwidth connections, or computers which do not have much memory.
  • HTML template can be easily customized for your needs.
  • Search for different criteria: news, last modified date, language, URL / site.
  • Admin functions available by browser interface or command-line.
  • Full support of standard robots exclusion (robots.txt). Identification with CircaIndexer/0.1, mail alian@alianwebserver.com.
  • Delay requests to the same server for one minute. "It's not a bug, it's a feature!" Basic rule for HTTP serveur load.
  • Index the different links found in a CGI (all after name_of_file?)
  • Support proxy HTTP

To do

  • Support NNTP
  • Support of different character sets
  • Support of other bases
  • Add a client for search in PHP

Requirement

  • MySQL
  • Perl
  • Modules DBI, DBD::mysql,LWP::RobotUA,HTML::LinkExtor;

Install

  • Download one of archive file, uncompress it.
  • You must update search.cgi and search.pl (script for search) admin.cgi and admin.pl (script for admin) for put your MYSQL param :user, password, database and ip adress if different from 'localhost'.
  • Run admin.cgi (CGI interface) or admin.pl (command line) for add your url, drop or create tables, ...
  • Run search.cgi. You can use the default form for use in your page. Only field 'words' is necessary.

Documentation

Documentation POD is available, use pod2html name_of_file.pm > name_of_file.html for read it.

Version

28/09/2000 : 1.5 For Circa::Indexer, 1.4 for Circa::Search
26/09/2000 : 1.4 For Circa::Indexer, 1.3 for Circa::Search
23/09/2000 : 1.3 For Circa::Indexer, 1.2 for Circa::Search
21/09/2000 : 1.2 for Circa::Indexer
08/09/2000 : 1.00

Download

If you have root privileges and can install Perl modules, you can install this two modules : Circa::Search et Circa::Indexer. See directory demo for how use this module. Install Circa::Indexer first.

Else, you can use this distrib :

Format ZIP or Format tar.gz

Author

Alain BARBET alian@alianwebserver.com

Reference

Rules and security with :

http://info.webcrawler.com/mak/projects/robots/robots.html

Feature :

http://search.mnogo.ru/features.html

Why ?

I read of this need, I needed one for AlianWebServer, and I think other people need it too.

 
   
 
 
Powered by AlianWebServer