LimitationsSiteSucker is a relatively simple program and it has a number of limitations. When SiteSucker analyzes HTML, it only examines the following tags:
If a link is specified in a different tag, SiteSucker will not see it. SiteSucker totally ignores JavaScript. Any link specified within JavaScript will not be seen by SiteSucker. (If the Log Warnings option is on in the download settings, SiteSucker will include a warning in the log file for any page that uses JavaScript.) SiteSucker does scan Flash (SWF) files for embedded plain text links, but it can only detect links to files that have one of the following extensions: html, swf, mp3, sit, zip, mov, gif, jpg, png, doc, or txt. SiteSucker cannot localize Flash files, and SiteSucker does not examine other media files for embedded links. By default, SiteSucker honors robots.txt exclusions and the Robots META tag. Therefore, any directories or pages disallowed by robot exclusions will not be downloaded by SiteSucker. This behavior, however, can be overridden with the Ignore Robot Exclusions setting under the Advanced tab in the download settings. |