Robot Exclusions

By default, SiteSucker honors robots.txt exclusions, the Robots META tag, and the X-Robots-Tag HTTP header.

The robots.txt file allows website administrators to define what parts of a site are off-limits to robots like SiteSucker. Website administrators can disallow access to private and temporary directories, for example, because they do not want pages in those areas downloaded.

In addition, the Robots META tag and the X-Robots-Tag HTTP header can be used to request that links on specific pages not be followed by robots.

SiteSucker also honors the Crawl-delay directive in robots.txt. This parameter specifies the number of seconds to wait between successive requests to the same server. If this element is found in the robots.txt file, SiteSucker will delay between requests.

For the most part, this behavior can be overridden with the Ignore Robot Exclusions option under the General settings. However, robots.txt directives aimed specifically at SiteSucker are always honored.

Warning: Ignoring robot exclusions is not recommended. Robot exclusions are usually put in place for a good reason and should be obeyed.