Limitations

SiteSucker is a relatively simple program and it has a number of limitations.

When SiteSucker analyzes HTML, it only examines the following tags:

  • <a>
  • <area>
  • <body>
  • <div>
  • <embed>
  • <frame>
  • <iframe>
  • <img>
  • <input>
  • <link>
  • <meta>
  • <object>
  • <script>
  • <style>
  • <table>
  • <td>
  • <th>
  • <tr>

If a link is specified in a different tag, SiteSucker will not see it.

SiteSucker totally ignores JavaScript. Any link specified within JavaScript will not be seen by SiteSucker and will not be downloaded. (If the Log Warnings option is on in the settings, SiteSucker will include a warning in the log file for any page that uses JavaScript.)

SiteSucker does not scan PDFs, Flash (swf) files, Quicktime movie files (mov), or other media files for embedded links.

By default, any directories or files disallowed by robot exclusions will not be downloaded by SiteSucker. See Robot Exclusions for more information.