General

General

The General screen provides the following settings:

Ignore Robot Exclusions

Switch on this control to have SiteSucker ignore robots.txt directives, the Robots META tag, and the X-Robots-Tag HTTP header. See Robot Exclusions for more information about robots.txt, the Robots META tag, and the X-Robots-Tag HTTP header.

Note: SiteSucker always honors robots.txt directives aimed specifically at SiteSucker.

Warning: Ignoring robot exclusions is not recommended. Robot exclusions are usually put in place for a good reason and should be obeyed.

Ignore rel="nofollow"

Check this box to have SiteSucker ignore the rel="nofollow" attribute. If the rel attribute equals ”nofollow” in an HTML tag, then a robot like SiteSucker should not follow that link. By default, SiteSucker will not download nofollow links. However, if this box is checked, SiteSucker will download links that have the rel=”nofollow” attribute.

Ignore Filename in Headers

Switch on this control to have SiteSucker ignore the filename directive in all HTTP Content-Disposition headers. See File Names for more information about how SiteSucker names downloaded files.

Treat Ambiguous URLs as Folders

Switch on this control to have SiteSucker treat ambiguous URLs as folders. If a URL does not end with a '/' or a file extension, SiteSucker considers it to be ambiguous. For example, if this option is on and SiteSucker downloads a webpage from http://www.example.com/directory, the webpage will be saved at www.example.com/directory/index.html in the destination folder. If this option is off, the webpage will be saved at www.example.com/directory.html in the destination folder. See File Names for more information about how SiteSucker names downloaded files.

Always Download HTML and CSS

Switch on this control to have SiteSucker always download HTML and CSS files despite the File Replacement setting. Use this control to force SiteSucker to download fresh copies of HTML and CSS files.

Login Dialog

Use this control to specify when SiteSucker should display the login dialog for basic HTTP authentication. For more information on authentication and the login dialog, see Password-protected Sites. You can choose from the following options:

  • Never Display - SiteSucker never displays the login dialog. If valid login credentials were recently entered or were found in the Keychain, SiteSucker will use them; otherwise, files that require authentication will be skipped. This option also suppresses display of the Certificate Trust Panel, which is shown when there is a problem with a server's certificate. If the certificate for a server is invalid and this option is selected, SiteSucker will not display the panel and will not download content from that server.
  • Always Display - SiteSucker always displays the login dialog.
  • Display When Necessary - SiteSucker displays the login dialog unless valid login credentials were recently entered or a single relevant Keychain item was found.

File Replacement

Use this control to specify when SiteSucker should replace existing files. You can choose from the following options:

  • Never - SiteSucker never replaces your local files and only downloads those files that haven't already been downloaded.
  • Always - SiteSucker always deletes your local files and replaces them with files downloaded from the Internet.
  • With Newer - SiteSucker only replaces existing files if a newer copy is found on the Internet.

Path Constraint

Use this control to limit downloaded files to those at a specific site, those within a specific directory, or those containing a specific path. This option works in conjunction with the Path settings and the Include Supporting Files setting under the Webpage settings. SiteSucker provides the following path constraints:

  • None - SiteSucker downloads the file specified in the URL text box and every file that it links to and every site that these files link to, etc. Be aware that this option could result in a HUGE download if allowed to continue forever.
  • Host - SiteSucker limits the download to those files on the host of the original file being downloaded. For example, if the URL is http://www.example.com/directory/home.html, this setting limits the download to those URLs beginning with http://www.example.com or https://www.example.com.
  • Subdomains - SiteSucker limits the download to those files within the second-level domain and all subdomains of the original file being downloaded. Extending the previous example, this setting will download URLs beginning with http://www.example.com, https://images.example.com, http://guide.example.com, or https://example.com.
  • Directory - SiteSucker only downloads those files that are within the directory of the original file being downloaded. Extending the previous example, this setting limits the download to those URLs beginning with http://www.example.com/directory/ or https://www.example.com/directory/.
  • Path Settings - SiteSucker only downloads the file specified in the URL text box and any files that have paths allowed by the Path settings.