The purpose of this set of functions is to parse a Robots Exclusion Standard file into a data structure for easy access.
◆ wget_robots_parse()
int wget_robots_parse |
( |
wget_robots ** |
_robots, |
|
|
const char * |
data, |
|
|
const char * |
client |
|
) |
| |
- Parameters
-
[in] | data | Memory with robots.txt content (with trailing 0-byte) |
[in] | client | Name of the client / user-agent |
- Returns
- Return an allocated wget_robots structure or NULL on error
The function parses the robots.txt data
and returns a ROBOTS structure including a list of the disallowed paths and including a list of the sitemap files.
The ROBOTS structure has to be freed by calling wget_robots_free().
◆ wget_robots_free()
- Parameters
-
[in,out] | robots | Pointer to Pointer to wget_robots structure |
wget_robots_free() free's the formerly allocated wget_robots structure.
◆ wget_robots_get_path_count()
- Parameters
-
robots | Pointer to instance of wget_robots |
- Returns
- Returns the number of paths listed in
robots
◆ wget_robots_get_path()
- Parameters
-
robots | Pointer to instance of wget_robots |
index | Index of the wanted path |
- Returns
- Returns the path at
index
or NULL
◆ wget_robots_get_sitemap_count()
int wget_robots_get_sitemap_count |
( |
wget_robots * |
robots | ) |
|
- Parameters
-
robots | Pointer to instance of wget_robots |
- Returns
- Returns the number of sitemaps listed in
robots
◆ wget_robots_get_sitemap()
const char* wget_robots_get_sitemap |
( |
wget_robots * |
robots, |
|
|
int |
index |
|
) |
| |
- Parameters
-
robots | Pointer to instance of wget_robots |
index | Index of the wanted sitemap URL |
- Returns
- Returns the sitemap URL at
index
or NULL