Frequently Asked Questions

If you’re having trouble using SiteSucker, look below for a solution.

Can I change the number of simultaneous connections?

The Preferences window lets you set the number of simultaneous Internet connections for new SiteSucker documents.

Can I download images without downloading HTML files?

So that SiteSucker only downloads images, set the Filter setting under the File Type settings to Allow Specified File Types and then check the Images button. Even with this setting, SiteSucker still needs to download HTML files since it needs the hypertext links in order to find all the images. However, you can have SiteSucker delete HTML files after they are downloaded and analyzed by selecting the Delete After Analysis setting in the File Modification pop-up under the General settings.

Can I have SiteSucker download a site periodically?

The best way to schedule site downloads is by using the Calendar app. You create a Calendar Alarm as follows:

  1. In SiteSucker, save a document with the desired URL and settings.
  2. In Automator, create a Calendar Alarm.
  3. Select Internet from the library pane.
  4. Select and drag the Download Sites action into the workflow area.
    (You may need to click Third Party Actions and Allow before you can use the Download Sites action.)
  5. Select Other in the SiteSucker Document pop up and then choose the document from step 1.
  6. Save the newly created Calendar Alarm.

As soon as the workflow is saved, Calendar is opened and the new event is automatically created. Just adjust the date and time and set the repeat interval for the event. You can set up a SiteSucker document and Calendar Alarm for each site that you want to download periodically.

Can I use SiteSucker with a web proxy?

If your computer is protected from the Internet by a firewall, you may need to use a web proxy to access websites. The web proxy is set up in the Network preferences. (See Mac Help for more information.) If a web proxy has been configured, SiteSucker will automatically direct your requests to the specified proxy server.

Does SiteSucker have an auto-save feature?

SiteSucker does not have an auto-save feature, but you can use an AppleScript to periodically save a SiteSucker document while it is downloading. Download the Sample Scripts from the SiteSucker website and find the Autosave.scpt script. This script will periodically save the document you’re using to download a site. Just follow the instructions in the comments at the top of the script. The script is set up to save the document every minute, but you can change the interval in the script.

How can I browse a site offline?

Here is the preferred way to download a site so that you can view it offline.

Under the General settings, make sure that the File Modification option is set to Localize. (This is the default setting.) With this setting, SiteSucker modifies the downloaded HTML documents by replacing every link to a file on a web server with the corresponding link to the local file. This provides the best results when browsing files offline.

After SiteSucker has downloaded your site, click the File button in the SiteSucker toolbar to display the downloaded site in your default web browser (Safari, for example).

How can I download a site that has a login page?

See the Password-protected Sites page for instructions on how to download a site that has a login page.

How do I download a site created with Squarespace?

In general, to download Squarespace sites, choose Factory Defaults in the Settings menu, then turn on the Include Supporting Files option in the General settings and Download Using Web Views options in the Webpage settings. Enabling the Ignore Robot Exclusions option in the General settings may be required to download all content from some sites. To reduce clutter in the destination folder, I recommend turning on Delete robots.txt in the SiteSucker preferences.

How do I download a site from archive.org?

The Internet Archive (archive.org) Wayback Machine is a service that allows people to visit archived versions of websites. You can use SiteSucker to download these archived websites from archive.org.

For instance, to download the archived website example.com from archive.org, enter a URL similar to https://web.archive.org/web/20220812155547/https://www.example.com/ in the URL textbox, where the 20220812155547 component specifies when the page was captured. Set the URL Constraint option to URL Settings and add the regular expression https://web\.archive\.org/web/[^/]+/http://(www\.)?example\.com.* to the Include table in the URL settings. With these settings, only URLs that include example.com will be downloaded.

To remove the Wayback Machine toolbar from downloaded pages, add the following item to the Patterns table in the Webpage settings:

Search Pattern: <script [^>]+ src="[^"]+bundle-playback.js[^>]+></script>
Template:
Template Type: Early Substitution

You can use the Replace table under the Path settings to save downloaded files into a www.example.com folder in the destination folder. Specifically, add the following item:

File Path Pattern: web\.archive\.org/web/[^/]+/http﹕/(www\.)?example\.com(﹕80)?/(.+)
Substitution Template: www.example.com/$3

How do I download embedded videos?

Only SiteSucker Pro can download videos, including embedded YouTube, Vimeo, and Wistia videos.

Embedded YouTube, Vimeo, and Wistia videos are found on servers external to the site being downloaded. For this reason, you should turn on the Include Supporting Files option under the General settings. This will force SiteSucker to download supporting files, such as videos, no matter where they are located.

SiteSucker Pro can only download Wistia videos that are embedded using an iframe. To download a video embedded using another method, add a regular expression under the Patterns tab in the Webpage settings that replaces the existing text with an iframe. See Embedding Media on Your Website for information on embedded Wistia videos.

Furthermore, the Preferred Resolution option in the Video settings can be used to specify the preferred resolution of downloaded videos.

I have run out of disk space while downloading a large site. What can I do?

While the download is paused, you can change the destination folder as follows:

  1. Pause the download if it is not already paused.
  2. Move or copy all the items you have already downloaded to another folder on a larger hard drive.
  3. In the document that you are using, change the Destination Folder in the General settings to the new folder on the larger hard drive.
  4. Resume downloading.

Why doesn’t anything happen when I try to download a site?

There could be a number of reasons why SiteSucker fails to download a site. First, check the log file for any errors. If there are no errors or the errors don’t account for the problem, turn on the Log Warnings option under the Log settings and try to download the site again. The warnings will probably explain why the download failed.

In many cases, files aren’t downloaded because they are located on another server (try turning on the Include Supporting Files option in the General settings to fix this) or they are subject to robot exclusions (try turning on the Ignore Robot Exclusions option in the General settings).

If the errors or warnings don’t reveal the problem, you might want to try changing the Identity option in the Request settings. Some sites are particular about which browsers they will allow. The Identity setting allows you to “fool” the site into thinking that you’re using an approved browser.

If none of that works, try turning on the Download Using Web Views option under the Webpage settings. Some webpages are built using JavaScript, and without this option being on, SiteSucker may download a page that is incomplete.

You can also use the Suggested Settings feature, which recommends changing certain settings when specific conditions are detected while downloading a site. If you apply these changes and download the site again, SiteSucker may be able to download more files and get better results.

Why doesn’t SiteSucker remember my changes to the settings?

The settings in SiteSucker only apply to the document containing the settings. So, if you create a new document, adjust the settings, download a site using the document, and then close the document without saving it, the settings for that document will not be “remembered” by SiteSucker. If you save the document after adjusting the settings, then that document will have the same settings whenever you open it.

New documents are assigned the user default settings. To set the user default settings, create a new document or open an existing document, change the document’s settings, and then choose Save As User Defaults under the Settings menu or click the Save As User Defaults button in the Settings dialog to save those settings as the new user defaults.

Why won’t SiteSucker download files from my site’s www subdomain?

By default, SiteSucker limits the download to those files on the host of the original file being downloaded. So, if you enter the URL http://site.com, SiteSucker will only download from site.com and not from www.site.com. To have SiteSucker include www.site.com URLs in the download, add http://www.site.com and/or https://www.site.com to the Include table under the URL settings.

If the site redirects all www.site.com URLs to site.com, then all of your downloaded files will end up in the site.com folder. If not, you can use the Replace table under the Path settings to redirect downloaded files to the site.com folder. Specifically, enter strings like these:

Search Pattern: www\.site\.com(.*)
Substitution Template: site.com$1

This will replace “www.site.com” at the beginning of any path (relative to the destination folder) with “site.com”.

Why won’t SiteSucker download rollover images?

Rollover images are images that appear when your mouse moves over a link on a webpage. SiteSucker doesn’t download rollover images because they are displayed using JavaScript. If the JavaScript is embedded in the HTML text and is fairly simple, you might be able to extract the rollover image URL from the JavaScript by using the Patterns setting in the Webpage settings or by using a script.