The Webpage pane of the Settings dialog provides the following controls:

Text Encoding

Use this control to specify the text encoding for webpages. SiteSucker will read and save all webpages using the specified text encoding. If it is set to Default, SiteSucker will try to detect the webpage's text encoding. This setting is ignored when reading or saving webpages that were previously downloaded.

Check All Links

Check this box to have SiteSucker check all links in all downloaded HTML files — including links to files that you are not downloading — and log any errors that occur. With this option turned on, SiteSucker may report many errors that you normally wouldn't see. This setting is intended as a debugging tool for web designers who want to see if their own sites have any bad links.

To minimize the time it takes to check all links, set the Filter setting under the File Type settings to Allow Specified File Types with nothing checked so that only HTML and CSS are downloaded, and set the File Modification setting under the General settings to Delete After Analysis, which will delete HTML and CSS files after they are downloaded and analyzed.

Scan Comments for URLs

Check this box to have SiteSucker scan comments for URLs in all downloaded HTML files. Normally, SiteSucker ignores comments. This option is useful when tags are included in comments so that they can be used by Internet Explorer or JavaScript.

Include Supporting Files

Check this box to have SiteSucker include all supporting files in the download. When this option is on, SiteSucker will download non-HTML files (such as style sheets, images, etc.) even if they are not allowed by the current Path settings or the Maximum Number of Levels under the Limit settings is exceeded. This setting is useful when downloading sites that link to style sheets, images, or other supporting files that are on separate hosts or subdomains.

Download Using Web Views

Check this box to have SiteSucker download HTML using hidden web views. When this option is on, SiteSucker will load each webpage into a hidden web view and then extract the HTML from the web view after the page is loaded. This can be useful when webpages are built using JavaScript or when the webpage is an XML file that can be converted into HTML by the web view.

Save Delay

Use this control to specify how long to delay saving a webpage after it has supposedly finished loading in a web view. Some webpages may take longer to load because their content is generated using JavaScript. This setting provides additional time for webpages to finish loading before they are saved. This control is only enabled if the Download Using Web Views setting is on.


Settings under this tab allow you to specify custom data attributes that SiteSucker should scan for URLs. Introduced in HTML5, custom data attributes store extra information, usually for the page's JavaScript, in standard HTML tags. Data attribute names begin with data- and do not contain uppercase characters.


To add a custom data attribute, click the Plus button, enter the name of the attribute, and press ↩.

To remove custom data attributes, select them in the table and click the Minus button.

To modify a custom data attribute, double-click on its name in the table, enter a new name, and press ↩. All names in the table must be unique.


Settings under this tab allow you to use regular expressions to replace text in HTML files or extract URLs from HTML text.


To replace text in HTML files, set the Template Type to Substitution and enter a search pattern and a substitution template for the text you would like to replace. If text is found that matches the search pattern, it is altered in accordance with the substitution template. The template specifies what should be used to replace each match, with the back-reference $0 representing the matched text, $1 representing the contents of the first capture group, and so on. To delete the matched text, enter a blank template.

To extract URLs from HTML text, set the Template Type to URL and enter a search pattern and a URL template that specifies a URL that SiteSucker should download. The URL template is ignored if it produces a blank URL or a URL that is identical to the template.

These search patterns are applied after any pre-analysis script is run, but before SiteSucker scans HTML files for URLs. Search patterns are applied in the order in which they appear in the list, and the order of search patterns can be rearranged by dragging them in the list. The pattern syntax currently supported is that specified by ICU, which is described at Regular Expressions - ICU Documentation.

As an example, in the image shown above, SiteSucker is instructed to do the following:

  1. extract a URL from the first argument of the javascript:openWin() function and then
  2. replace URLs that have a certain query string with the same URL without the query string.

To add a row to the table, click the Plus button, set the Template Type, enter the Search Pattern and Template, and press ↩.

To remove rows from the table, select them in the table and click the Minus button.

To modify a row, double-click on a string in the table, enter a new string, and press ↩.


Settings under this tab allow you to inject JavaScript into hidden web views after the page finishes loading but before other sub-resources finish loading.


This feature can be used to perform any number of tasks before SiteSucker saves a webpage. For example, it can be used to click on buttons that modify a webpage before SiteSucker saves it; click on links that download attachments; extract obscure URLs from a webpage; or rename files. SiteSucker also provides a number of message handlers that can be used to pass information from JavaScript back to the application.

In the image shown above, the script calls the moreImagesFunction() after the window has loaded and calls it again whenever webpage changes occur. When the moreImagesFunction() is called, the element with the "trending" ID is clicked and a message is sent back to SiteSucker requesting a five second delay before saving the webpage. This JavaScript makes it possible to load all the images on a particular webpage before it is saved.

The window.webkit.messageHandlers.delay.postMessage() function can be used to delay before saving a webpage. The function argument should be an integer or a floating-point number that specifies the delay in seconds. Essentially, this function resets the Save Delay setting to the value passed into the function without modifying the setting in the SiteSucker document.

The window.webkit.messageHandlers.url.postMessage() function can be used to pass a URL back to SiteSucker. The function argument should be a string that specifies an absolute or relative URL. SiteSucker will then try to download the URL without localizing the URL on the webpage. If you want to localize the URL, you will have to do it yourself using JavaScript or using the Patterns setting.

The window.webkit.messageHandlers.rename.postMessage() function can be used to rename files. The function argument should be an array containing two strings: a search pattern followed by a substitution template. These strings are added temporarily to the Replace table under the Path settings and are removed after the document stops downloading. As an example, you could use this feature to rename files with the text content of a link or anchor.

When a JavaScript string is specified and the Save Delay setting in the SiteSucker document is less than 2 seconds, a Save Delay of 2 seconds is used automatically.