Request
The Request settings allow you to customize the HTTP requests that SiteSucker sends to the server.
The Request pane of the Settings dialog provides the following controls:
Identity
Use this control to customize the user agent string that SiteSucker provides to identify itself when making HTTP requests. Some sites are very particular about which browsers they will allow. You can use this feature to “fool” the site into thinking that SiteSucker is an approved browser.
To change SiteSucker’s identity, simply click on this control and select one of the web browsers listed. If you choose Web View, the web view’s default user agent string is included in all requests. This is the same web view that’s used in the Download Using Web Views option under the Webpage settings.
You can add identities to the standard list of web browsers by clicking the Identities button in the Preferences.
Attempts
Use this control to specify the number of times SiteSucker should attempt to download a file. SiteSucker will only retry downloading a file if a timeout, network connection, too many requests (429), bad gateway (502), or gateway timeout (504) error occurs. However, you can also use the Patterns setting under the Webpage settings to retry downloading an HTML file if a pattern is found or not found in the file.
Timeout
Use this control to select the length of time that SiteSucker should wait for a response from the server.
Delay
Use this control to specify the minimum length of time that SiteSucker should delay between HTTP requests to the same host. This feature allows SiteSucker to use less bandwidth and avoid anti-mining safeguards employed by some sites.
If a Crawl-delay
is already being imposed by the site’s robots.txt file, the longer of the two delays will be used. See Robot Exclusions for more information about the robots.txt file.