The URL settings offer options that affect URLs and provide a way to specify which URLs should be included in or excluded from the download.
The URL screen provides the following controls:
Check All Links
Switch this on to have SiteSucker check all links in all downloaded HTML files — including links to files that you are not downloading — and log any errors that occur. With this option turned on, SiteSucker may report many errors that you normally wouldn’t see. This setting is intended as a debugging tool for web designers who want to see if their own sites have any bad links.
To minimize the time it takes to check all links, set the Filter setting under the File Type settings to Allow Specified File Types without turning on any options so that only HTML and CSS are downloaded.
Scan Comments for URLs
Treat Ambiguous URLs as Folders
Switch on this control to have SiteSucker treat ambiguous URLs as folders. If a URL does not end with a ‘/’ or a file extension, SiteSucker considers it to be ambiguous. For example, if this option is on and SiteSucker downloads a webpage from
http://www.example.com/directory, the webpage will be saved at
www.example.com/directory/index.html in the destination folder. If this option is off, the webpage will be saved at
www.example.com/directory.html in the destination folder. See File Names for more information about how SiteSucker names downloaded files.
Use this control to limit downloaded files to those at a specific site, those within a specific directory, or those having a specific URL. This option works in conjunction with the Include and Exclude URL settings and the General settings. SiteSucker provides the following URL constraints:
- None - SiteSucker downloads the file specified in the URL text box and every file that it links to and every site that these files link to, etc. Be aware that this option could result in a HUGE download if allowed to continue forever.
Host - SiteSucker limits the download to those files on the host of the original file being downloaded. For example, if the URL is
http://www.example.com/directory/home.html, this setting limits the download to those URLs beginning with
- Host + 1 - SiteSucker limits the download to those files on the host of the original file being downloaded (just like the Host option), plus one level of files from other domains linked to the original host.
Subdomains - SiteSucker limits the download to those files within the second-level domain and all subdomains of the original file being downloaded. Extending the previous example, this setting will download URLs beginning with
Directory - SiteSucker only downloads those files that are within the directory of the original file being downloaded. For example, if you are downloading
https://www.example.com/directory/using this setting, SiteSucker will only download files in the
directorydirectory. But if you are downloading
https://www.example.com/directory, SiteSucker will download all files from
www.example.comunless the Treat Ambiguous URLs as Folders setting is on, in which case SiteSucker will only download files in the
- URL Settings - SiteSucker only downloads the file specified in the URL text box and any files that have URLs allowed by the Include and Exclude URL settings.
Include and Exclude URL Settings
The Include and Exclude URL settings work in conjunction with the URL Constraint setting and the General settings according to the following rules:
- If this is the original URL (that is, the URL specified in the URL text box), then the file is downloaded.
- Otherwise, if the URL begins with one of the strings or matches one of the regular expressions in the Exclude list, then the file is not downloaded.
- Otherwise, if the URL meets the requirements of the current URL Constraint setting, then the file is allowed to download.
- Otherwise, if the URL begins with one of the strings or matches one of the regular expressions in the Include table, then the file is allowed to download.
- Otherwise, if the Always Download HTML and CSS option in the General settings is on and the URL references an HTML or CSS file type, then the file is allowed to download.
- Otherwise, if the Include Supporting Files option in the General settings is on and the URL references a non-HTML file type, then the file is allowed to download.
- Otherwise, the file is not downloaded.
In these lists, enter absolute URLs (that is, URLs beginning with
https://) or regular expression patterns. URLs should be entered as they appear in the Safari address and search field, i.e., without encoding except for characters from the ISO-8859-1 extended character set and spaces (which are encoded as
When using regular expressions, the pattern must match the entire URL. For example, to match any URL that contains “
logout”, enter the “
.*logout.*” regular expression. The pattern syntax currently supported is that specified by ICU, which is described at Regular Expressions - ICU Documentation.
If you tap the Edit button in the Include or Exclude screen, SiteSucker displays a toolbar with the following buttons:
Deletes the selected URLs.
Allows you to edit the selected URL.
Allows you to add a new URL. Turn on the Regular Expression button in the editor when adding a regular expression.