Path

Path

The Path section of the Settings dialog lets you specify which paths should be included in or excluded from the download. In these text boxes, enter absolute URLs (that is, URLs beginning with "http://" or "https://") or regular expressions separated by returns.

The paths settings work in conjunction with the Path Constraint setting under the General settings and the Include Supporting Files setting under the Webpage settings according to the following rules:

  1. If this is the original URL (that is, the URL specified in the URL text box), then the file is downloaded.
  2. Otherwise, if the URL begins with one of the strings (or matches one of the regular expressions) in the Paths to Exclude text box, then the file is not downloaded.
  3. Otherwise, if the URL meets the requirements of the current Path Constraint setting, then the file is downloaded.
  4. Otherwise, if the URL begins with one of the strings (or matches one of the regular expressions) in the Paths to Include text box, then the file is downloaded.
  5. Otherwise, if the Include Supporting Files setting is on and the URL references a non-HTML file type, then the file is downloaded.
  6. Otherwise, the file is not downloaded.

SiteSucker allows you to use regular expressions in the path strings. If the Use Regular Expressions box is checked, all paths are interpreted as regular expressions. For example, to match any URL that contains an underscore, enter the following regular expression: ".*_.*". Expressions are interpreted according to ICU v3 (for details see the ICU User Guide for Regular Expressions). Consult Regular Expressions Reference for additional guidance on regular expressions.