|
Presentation
Circa is a search engine for your Web site, or for a list of sites. It
indexes like Altavista does. It can read, add and parse all url's found
in a page, if the page is on the same server.
Circa is free, under GNU license
Try-it !
Make a search on AlianWebServer :
Features
- Full text indexing
- Different weights for title, keywords, description and rest of page
HTML read can be given in configuration
- Boolean query language support : or (efault) and ("+") not
("-"). Ex perl + faq -cgi : Documents with faq, eventually
perl and not cgi.
- Support protocol HTTP,FTP
- Make index in MySQL
- Read HTML and full text plain
- Can do indexation of filesystem without talk to Web Server
- Can browse site by directory / rubrique.
- Several kinds of indexing : full, incremental, only on a particular
server. Documents not updated are not reindexed. All requests for a
file are made first with a head http request, for information such as
validate, last update, size, etc.
- Size of documents read can be restricted (Ex: don't get all documents
> 5 MB). For use with low-bandwidth connections, or computers which
do not have much memory.
- HTML template can be easily customized for your needs.
- Search for different criteria: news, last modified date, language,
URL / site.
- Admin functions available by browser interface or command-line.
- Full support of standard robots exclusion (robots.txt). Identification
with CircaIndexer/0.1, mail alian@alianwebserver.com.
- Delay requests to the same server for one minute. "It's not a
bug, it's a feature!" Basic rule for HTTP serveur load.
- Index the different links found in a CGI (all after name_of_file?)
- Support proxy HTTP
To do
- Support NNTP
- Support of different character sets
- Support of other bases
- Add a client for search in PHP
Requirement
- MySQL
- Perl
- Modules DBI, DBD::mysql,LWP::RobotUA,HTML::LinkExtor;
Install
- Download one of archive file, uncompress it.
- You must update search.cgi and search.pl (script for search) admin.cgi
and admin.pl (script for admin) for put your MYSQL param :user, password,
database and ip adress if different from 'localhost'.
- Run admin.cgi (CGI interface) or admin.pl (command line) for add your
url, drop or create tables, ...
- Run search.cgi. You can use the default form for use in your page.
Only field 'words' is necessary.
Documentation
Documentation POD is available, use pod2html name_of_file.pm > name_of_file.html
for read it.
Version
28/09/2000 : 1.5 For Circa::Indexer, 1.4 for Circa::Search
26/09/2000 : 1.4 For Circa::Indexer, 1.3 for Circa::Search
23/09/2000 : 1.3 For Circa::Indexer, 1.2 for Circa::Search
21/09/2000 : 1.2 for Circa::Indexer
08/09/2000 : 1.00
Download
If you have root privileges and can install Perl modules, you can install
this two modules : Circa::Search
et Circa::Indexer. See directory
demo for how use this module. Install Circa::Indexer first.
Else, you can use this distrib :
Format ZIP
or Format tar.gz
Author
Alain BARBET alian@alianwebserver.com
Reference
Rules and security with :
http://info.webcrawler.com/mak/projects/robots/robots.html
Feature :
http://search.mnogo.ru/features.html
Why ?
I read of this need,
I needed one for AlianWebServer, and I think other people need it too.
|
|