Introduction
  Installing
  Handling
  Virtual servers
  Modules
  Filesystems
  RXML tags
  Graphics
  Proxy
  Miscellaneous modules
  Security considerations
  Scripting
  Databases
  LDAP
  IntraSeek
    Directories
    Configuring
    Creating new profile
    Indexing
    Languages
    Logs
    Advanced profile
    Technical document
  LogView
  FrontPage
  Upgrading
  Third party extensions
  Portability
  Reporting bugs
  Appendix
 
Advanced profile

This page contains the more advanced configuration options not mentioned on the New profile page are described.

Activated profile
If active the intraseek engine will try to mount the data base at startup, and it will also be scheduled for automatic crawler launches.

Automated update of the data bases
Defines at which intervals the data bases should be automatically updated. If you want the crawler for a profile to be automatically launched at certain intervals to keep the index data bases up-to-date, you can use this function. The selections should be rather self explanatory. If you select Never! at the first selection, the two following, days and time, will be ignored and crawlers will not be automatically launched. For more status reports on scheduled crawlers, look at the logs page.

Crawler log detail level
Selects how much information the crawler should write in the log.
  Fatal
Errors  
Errors   Warnings.
Reports.
Scheduler
info.  
Rejects.
Accepts  
Full (Default)   Yes   Yes   Yes   Yes  
Medium   Yes   Yes   Yes   No  
Short   Yes   Yes   No   No  
None   Yes   No   No   No  

Fatal
Errors
Errors Warnings.
Reports.
Scheduler info.
Rejects,
Accepts
Full (Default) Yes Yes Yes Yes
Medium Yes Yes Yes No
Short Yes Yes No No
None Yes No No No

Crawler walk pause
The crawler is quite fast. If you index pages outside your own net, you should slow it down somewhat by changing the Crawler walk pause between each document download, which is the number of seconds the crawler will pause before fetching the next document. This keeps it from causing large loads on the web servers of others.

Crawler nice increment
Sets the Unix nice level of the crawler process.

Stop lists
Specifies a lists of words that are so common that they should not be indexed. Several stop lists can be specified. Select none, one or more depending on which languages are used on your site's pages. The stop lists are stored as ordinary text files in the directory ENGINE_HOME/resource/.

Additional stop words
Indicates extra stop words that should be filtered. If Yoyodyne Productions is present on every one of your pages, it may be a good idea to specify yoyodyne and productions here. The disadvantage is that it will not then be possible to search for the words yoyodyne or productions, the advantage is that the data base files will be smaller, and searches faster. For further technical details on this function, check the memory usage section.

Query Logs active
If enabled queries will be logged to disk, for top 100 statistics, and such.

Safety save
This value says how many pages a crawler will go through before automatic saving and reorganization of its data base. For further technical details on this function, check the memory usage section.

Max documents to download
Specifies the maximum number of pages the crawler will index. It is a good idea to specify a maximum here. In case something should go wrong, you avoid having the entire partition filled by a huge data base. Going wrong usually means that the robot has become lost on the Internet, due to erroneously written accept and avoid patterns.

Crawler page fetch Timeout
Defines how many seconds will pass before the download of a page will be aborted. For example, if a crawler can connect to a page, but doesn't get anything from the web server in the other end, it could patiently wait for information - forever, if it hadn't been for this setting. Enter how many seconds you allow the fetcher to do its work.

Site structure logging
Creates logs of web site errors and warnings. If active, site structure logs will be generated for this profile. See the logs chapter for information on site structure logs. If you are not interested in the site structure log, you can turn it off here and save crawler time consumption, space on disc and memory. You will benefit from less memory usage by the crawler, and avoid logs that take place on disc. The operations controlling the log will be disabled as well.

Max size of query logs
Is specified in bytes. When a query log exceeds this size, it will be moved to a .bak file. The old .bak file will be removed.

Number of max quick links displayed
If you have a search resulting in several hundred pages, a list of several links to the next pages of result will be displayed below the list of summaries. The quick links referred to here, are the maximum number of links to show.

Number of documents summaries
Defines how many search summaries to display at every page.

Quoted search enabled
If set to Yes, the users of the search engine can use quotation marks to search for a phrase. For example, a search for "John Carl Smith" will search for persons with this name. Without quotes, the search would return any pages that use any of those common names.

Note that an extra data base will be used to store the extra information, if this setting is enabled. With the current implementation of full text searches we cannot guarantee good performance for data bases covering more than 1000 documents. If you have more documents, turn this option off, or IntraSeek can sometimes get stuck with heavy calculations for several seconds.

Wildcards enabled
If enabled, the users of the search engine can use quotation marks (?) and asterisks (*) to broaden searches. A search for net* might match netscape, nethack, network and so on. A search for int??net matches intranet as well as internet.

Note that IntraSeek requires that the user specifies at least three characters in front of the * notation, and that there is no distinction made between lower- and uppercase searches. Also note that an extra data base will be used to store the extra information, if this setting is enabled.

Summary text length
Is the length of the summaries displayed along with the search results and the link and the hit percent. If used on the web page, the Meta description will be used for this, otherwise the first part of the document becomes a summary.