Advanced profile
This page contains the more advanced configuration options not
mentioned on the New profile page are described.
- Activated profile
-
If active the intraseek
engine will try to mount the data base at startup, and it will also be
scheduled for automatic crawler launches.
- Automated update of the data bases
-
Defines at which
intervals the data bases should be automatically updated. If you want
the crawler for a profile to be automatically launched at certain
intervals to keep the index data bases up-to-date, you can use this
function. The selections should be rather self explanatory. If you
select Never! at the first selection, the two following,
days and time, will be ignored and crawlers will not be
automatically launched. For more status reports on scheduled crawlers,
look at the logs page.
- Crawler log detail level
-
Selects how much
information the crawler should write in the log.
, Fatal Errors, Errors, Warnings. Reports. Scheduler info., Rejects. Accepts
Full (Default), Yes, Yes, Yes, Yes
Medium, Yes, Yes, Yes, No
Short, Yes, Yes, No, No
None, Yes, No, No, No
|
Fatal Errors |
Errors |
Warnings.
Reports.
Scheduler info. |
Rejects,
Accepts |
Full (Default) |
Yes |
Yes |
Yes |
Yes |
Medium |
Yes |
Yes |
Yes |
No |
Short |
Yes |
Yes |
No |
No |
None |
Yes |
No |
No |
No |
- Crawler walk pause
-
The crawler is quite fast. If you
index pages outside your own net, you should slow it down somewhat by
changing the Crawler walk pause between each document download,
which is the number of seconds the crawler will pause before
fetching the next document. This keeps it from causing large loads on
the web servers of others.
- Crawler nice increment
-
Sets the Unix nice level of the
crawler process.
- Stop lists
-
Specifies a lists of words that are so
common that they should not be indexed. Several stop lists can be
specified. Select none, one or more depending on which languages are
used on your site's pages. The stop lists are stored as ordinary text
files in the directory ENGINE_HOME/resource/.
- Additional stop words
-
Indicates extra stop words that
should be filtered. If Yoyodyne Productions is present on every
one of your pages, it may be a good idea to specify yoyodyne
and productions here. The disadvantage is that it will not then
be possible to search for the words yoyodyne or productions, the
advantage is that the data base files will be smaller, and searches
faster. For further technical details on this function, check the memory usage section.
- Query Logs active
-
If enabled queries will be logged
to disk, for top 100 statistics, and such.
- Safety save
-
This value says how many pages a crawler
will go through before automatic saving and reorganization of its data
base. For further technical details on this function, check the memory usage section.
- Max documents to download
-
Specifies the maximum
number of pages the crawler will index. It is a good idea to specify a
maximum here. In case something should go wrong, you avoid having the
entire partition filled by a huge data base. Going wrong usually means
that the robot has become lost on the Internet, due to erroneously
written accept and avoid patterns.
- Crawler page fetch Timeout
-
Defines how many seconds
will pass before the download of a page will be aborted. For example,
if a crawler can connect to a page, but doesn't get anything from the
web server in the other end, it could patiently wait for information -
forever, if it hadn't been for this setting. Enter how many seconds
you allow the fetcher to do its work.
- Site structure logging
-
Creates logs of web site
errors and warnings. If active, site structure logs will be generated
for this profile. See the logs chapter for
information on site structure logs. If you are not interested in the
site structure log, you can turn it off here and save crawler time
consumption, space on disc and memory. You will benefit from less
memory usage by the crawler, and avoid logs that take place on disc.
The operations controlling the log will be disabled as well.
- Max size of query logs
-
Is specified in bytes. When a
query log exceeds this size, it will be moved to a .bak
file. The old .bak file will be removed.
- Number of max quick links displayed
-
If you have a
search resulting in several hundred pages, a list of several links to
the next pages of result will be displayed below the list of
summaries. The quick links referred to here, are the maximum
number of links to show.
- Number of documents summaries
-
Defines how many search
summaries to display at every page.
- Quoted search enabled
-
If set to Yes, the users
of the search engine can use quotation marks to search for a phrase.
For example, a search for "John Carl Smith" will search for persons
with this name. Without quotes, the search would return any pages that
use any of those common names.
Note that an extra data base will be used to store
the extra information, if this setting is enabled. With the current
implementation of full text searches we cannot guarantee good
performance for data bases covering more than 1000 documents. If you
have more documents, turn this option off, or IntraSeek can sometimes
get stuck with heavy calculations for several seconds.
- Wildcards enabled
-
If enabled, the users of the search
engine can use quotation marks (?) and asterisks (*) to broaden
searches. A search for net* might match netscape, nethack,
network and so on. A search for int??net matches intranet
as well as internet.
Note that IntraSeek requires that the user specifies at least three
characters in front of the * notation, and that there is no
distinction made between lower- and uppercase searches. Also note that
an extra data base will be used to store the extra information, if
this setting is enabled.
- Summary text length
-
Is the length of the summaries
displayed along with the search results and the link and the hit
percent. If used on the web page, the Meta description will be used
for this, otherwise the first part of the document becomes a summary.
|