File Type
The File Type settings allow you to specify which file types SiteSucker should download and which it should treat as HTML. These settings also allow you to fix incorrect file type information sent from a site.
The File Type pane of the Settings dialog provides the following controls:
Filter
Use settings under this tab to specify which file types SiteSucker is allowed to download. (SiteSucker uses media types to identify different file types, since this is standard information included in the Internet headers.) The following options are available:
- Allow All File Types - Download all files regardless of file type
- Allow Specified File Types - Only download the specified file types
- Disallow Specified File Types - Never download the specified file types
Note: SiteSucker always downloads HTML and CSS files regardless of this setting.
If the Allow Specified File Types or Disallow Specified File Types setting is selected, you can choose from the following options (shown with representative media types):
- Archives - application/zip, application/tar, application/stuffit, …
- Audio Files - audio/aiff, audio/mp3, audio/wav, audio/au, …
- Images - image/jpeg, image/gif, image/tiff, image/png, …
- Video Files - video/avi, video/mpg, video/x-ms-asf, … (SiteSucker Pro only)
- Custom Types - as specified under the Custom Types tab
Custom Types and HTML Types
If the Custom Types option is selected under the Filter tab, then settings under the Custom Types tab let you specify which media types SiteSucker should allow or disallow.
Settings under the HTML Types tab let you specify which media types SiteSucker should treat as HTML. When SiteSucker downloads one of these files, it scans it for URLs.
Warning: If the text/html option under the HTML Types tab is off, then webpages will not be scanned and nothing will be downloaded.
Note: You can use the Log Media Types option under the Log settings to determine the media types of downloaded files.
To add a new media type to the Custom Types or HTML Types, click the button and enter the new media type.
To remove media types, select the media types that you want to remove and click the button.
To select a custom type (for the Filter setting) or a file type to treat as HTML, check the box next to the media type in the table.
Media Type Replacement
The Replace file type setting allows you to replace the media type assigned by the server to a downloaded file with a different media type. Some sites provide the wrong media type for certain files. This can cause SiteSucker to save files with the wrong file extension or to modify files that should not be modified. You can use this setting to correct the media type associated with a file and avoid these problems.
Enter a URL pattern and a new media type for each media type you would like to replace. If the regular expression pattern matches the URL from an HTTP response, the server-provided media type is replaced with the media type specified in the setting. For a match to occur, the regular expression must match the entire URL. Patterns are evaluated in the order in which they appear in the table, and the order of media type replacements can be rearranged by dragging them in the table. The media type associated with a URL will only be replaced by the first match even if the URL matches multiple patterns.
For example, in the image shown above, SiteSucker is instructed to do the following:
-
associate the
application/epub+zip
media type with any URL that has theepub
file extension and -
associate the
application/x-mobipocket-ebook
media type with any URL that has themobi
file extension
To add a row to the table, click the button, enter the URL pattern and media type, and press ↩.
To remove rows from the table, select them in the table and click the button.
To modify a row, double-click on a string in the table, enter a new string, and press ↩.