The General screen provides the following settings:
Ignore Robot Exclusions
Switch on this control to have SiteSucker ignore robots.txt directives, the Robots META tag, and the X-Robots-Tag HTTP header. See Robot Exclusions for more information about robots.txt, the Robots META tag, and the X-Robots-Tag HTTP header.
Note: SiteSucker always honors robots.txt directives aimed specifically at SiteSucker.
Warning: Ignoring robot exclusions is not recommended. Robot exclusions are usually put in place for a good reason and should be obeyed.
Check this box to have SiteSucker ignore the rel="nofollow" attribute. If the rel attribute equals ”nofollow” in an HTML tag, then a robot like SiteSucker should not follow that link. By default, SiteSucker will not download nofollow links. However, if this box is checked, SiteSucker will download links that have the rel=”nofollow” attribute.
Ignore Filename in Headers
Switch on this control to have SiteSucker ignore the filename directive in all HTTP Content-Disposition headers. See File Names for more information about how SiteSucker names downloaded files.
Treat Ambiguous URLs as Folders
Switch on this control to have SiteSucker treat ambiguous URLs as folders. If a URL does not end with a '/' or a file extension, SiteSucker considers it to be ambiguous. For example, if this option is on and SiteSucker downloads a webpage from
http://www.example.com/directory, the webpage will be saved at
www.example.com/directory/index.html in the destination folder. If this option is off, the webpage will be saved at
www.example.com/directory.html in the destination folder. See File Names for more information about how SiteSucker names downloaded files.
Always Download HTML and CSS
Switch on this control to have SiteSucker always download HTML and CSS files despite the File Replacement setting. Use this control to force SiteSucker to download fresh copies of HTML and CSS files.
Download Error Pages
Switch on this control to have SiteSucker download an error page, if available, when an error occurs while downloading a file. If a downloaded error page already exists, SiteSucker will always try to download the file again despite the File Replacement setting. Error pages are never scanned for links to other files.
If this control is off, nothing is downloaded when an error occurs.
Use this control to specify when SiteSucker should display the login dialog for basic HTTP authentication. For more information on authentication and the login dialog, see Password-protected Sites. You can choose from the following options:
- Never Display - SiteSucker never displays the login dialog. If valid login credentials were recently entered or were found in the Keychain, SiteSucker will use them; otherwise, files that require authentication will be skipped. This option also suppresses display of the Certificate Trust Panel, which is shown when there is a problem with a server's certificate. If the certificate for a server is invalid and this option is selected, SiteSucker will not display the panel and will not download content from that server.
- Always Display - SiteSucker always displays the login dialog.
- Display When Necessary - SiteSucker displays the login dialog unless valid login credentials were recently entered or a single relevant Keychain item was found.
Use this control to specify when SiteSucker should replace existing files. You can choose from the following options:
- Never - SiteSucker never replaces your local files and only downloads those files that haven't already been downloaded.
- Always - SiteSucker always deletes your local files and replaces them with files downloaded from the Internet.
- With Newer - SiteSucker only replaces existing files if a newer copy is found on the Internet.
Use this control to limit downloaded files to those at a specific site, those within a specific directory, or those containing a specific path. This option works in conjunction with the Path settings and the Include Supporting Files setting under the Webpage settings. SiteSucker provides the following path constraints:
- None - SiteSucker downloads the file specified in the URL text box and every file that it links to and every site that these files link to, etc. Be aware that this option could result in a HUGE download if allowed to continue forever.
Host - SiteSucker limits the download to those files on the host of the original file being downloaded. For example, if the URL is
http://www.example.com/directory/home.html, this setting limits the download to those URLs beginning with
- Host + 1 - SiteSucker limits the download to those files on the host of the original file being downloaded (just like the Host option), plus one level of files from other domains linked to the original host.
Subdomains - SiteSucker limits the download to those files within the second-level domain and all subdomains of the original file being downloaded. Extending the previous example, this setting will download URLs beginning with
Directory - SiteSucker only downloads those files that are within the directory of the original file being downloaded. Extending the previous example, this setting limits the download to those URLs beginning with
- Path Settings - SiteSucker only downloads the file specified in the URL text box and any files that have paths allowed by the Path settings.