Introduction
  Installing
  Handling
  Virtual servers
  Modules
  Filesystems
  RXML tags
  Graphics
  Proxy
  Miscellaneous modules
  Security considerations
  Scripting
  Databases
  LDAP
  SiteBuilder
  Access Control
  IntraSeek
    Directories
    Configuring
    Creating new profile
    Indexing
    Languages
    Logs
    Advanced profile
    Technical document
  LogView
  FrontPage
  Upgrading
  Third party extensions
  Portability
  Reporting bugs
  Appendix
 
Creating new profile

Profiles defines how and which web pages and servers are to be indexed by the crawler. To create a new profile select New profile wizard and follow the online instructions.

Below, the basic configuration variables for a profile is described, while the more advanced variables are described later, on the Advanced profile configuration page.

Profile id
A unique identification for the profile. It should be a short identifying text, and must not contain any spaces. For example, the id could be: my_profile.

Profile name
The contents of the profile name will be seen on the selection tab on the search page shown to the outside world. For example, this could be My test search.

Activated
Should be left at yes for now.

Storage directory
Is a search path in your file system. Ends with a "/". Intraseek has automatically created a special directory for storage of the databases, but you can change this to any path in the file-system.

Working directory
Is a search path in your file system. Ends with a "/". This is where data from the crawlers' data gatherings will be stored. Due to nature of the workings of the data base, it is advantageous for this to be situated on a fast disk, This will increase the speed of the process by several hundred per cent.

Startpages
Where you specify a set of pages for the crawler to start at. It is usually sufficient to state the URL of the main page of the site you are about to index, since an IntraSeek crawler will follow all links it finds. Separate the various URLs by putting them on separate lines. For example: http://my.server.com/~sysadm/

Accept pattern
Specifies which pages are to be accepted by the crawler. There are some very important things to consider here:

    1. Always limit the crawler to stay within your site. If you don't, it will, without any warning, crawl out on the worldwide web.

    2. Since the accept and avoid patterns really are regexps, they should read ^http://www.foo.com/* instead of www.foo.com/* if you want to make sure not to index http://gazonk.www.foo.com/.

    3. Separate the various accept patterns by putting them on separate lines. For example, this could be my.server.com/~webmaster/*.

Avoid pattern
Specifies what sort of pages the crawler will avoid. Already specified are file types that contain information the crawler can't index. If inappropriate, these may be removed in order to have the crawler index these file types.

For example, if you specify */~webmaster/non-public/ here, the crawler will avoid ~webmaster/non-public/ on all servers. If you specify *my.server.com/~root/*, /~root/ will not be indexed on the server my.server.com.

Remember to check arguments to CGI scripts and the like. For instance, directory listings can sometimes enter infinite loops. If any such are present, it is recommended that *?* be added here.

Check up on the crawler while it is running, by checking its log file, so that it doesn't go into a loop, run amok, etc.

Finally, on the last page of the New profile wizard pages, press OK to save the new profile. Technical notes: all profiles are saved in the text file ENGINE_HOME/profiles.txt. If no id is specified, a new unique id will be generated.