Robot Exclusions

By default, SiteSucker honors robots.txt exclusions, the Robots META tag, and the X-Robots-Tag HTTP header.

The robots.txt file allows Web site administrators to define what parts of a site are off-limits to robots like SiteSucker. Web site administrators can disallow access to private and temporary directories, for example, because they do not want pages in those areas downloaded.

In addition, the Robots META tag and the X-Robots-Tag HTTP header can be used to request that links on specific pages not be followed by robots.

SiteSucker also honors the Crawl-delay directive in robots.txt. This parameter specifies the number of seconds to wait between successive requests to the same server. If this element is found in the robots.txt file, SiteSucker will delay between requests.

For the most part, this behavior can be overridden with the Ignore Robot Exclusions setting in the General settings. However, the application always honors robots.txt directives aimed specifically at SiteSucker.

Warning: Ignoring robot exclusions is not recommended. Robot exclusions are usually put in place for a good reason and should be obeyed.