Next:
Introduction
Up:
Documentation for the Combine
Previous:
Contents
Contents
Overview
Subsections
Introduction
Open Source distribution, Installation
Installation
Installation from source for the impatient
Porting to not supported operating systems - dependencies
Automated Debian/Ubuntu installation
Manual installation
Installation test
Getting started
Detailed documentation
Use scenarios
General crawling without restrictions
Focused crawling - domain restrictions
Focused crawling - topic specific
Configuration
Configuration files
Crawler operation
URL selection criteria
Document parsing
URL filtering
Link selection/scheduling policy
Built in topic filter - automated subject classification
Topic definition
Topic definition (term triplets) BNF grammar
Term triplet examples
Algorithm 1: plain matching
Algorithm 2: position weighted matching
Topic filter Plug-In API
Analysis
URL recycling
Complete application - SearchEngine in a Box
Evaluation of automated subject classification
Performance
System components
combineINIT
combineCtrl
combineUtil
combineExport
Internal executables and Library modules
Library
root 2006-11-08