File Names

In most cases, SiteSucker uses the last path component of the URL being downloaded for the file name, and the other path components for the enclosing folder names. For example, when downloaded https://www.example.com/directory/home.html, SiteSucker will save the file at www.example.com/directory/home.html in the destination folder, or specifically, SiteSucker will do the following:

  • Create a folder named “www.example.com” in the destination folder.
  • Create a folder named “directory” in the “www.example.com” folder.
  • Create a file named “home.html” in the “directory” folder.

If a URL ends with a ‘/’, the file is given the name “index” with the appropriate file extension (usually html). So, when downloading the URL https://www.example.com/directory/, SiteSucker will save the file at www.example.com/directory/index.html in the destination folder.

If a URL does not end with a ‘/’ or a file extension, SiteSucker considers it to be ambiguous. By default, SiteSucker will get the file name from the last path component of an ambiguous URL and will add the appropriate file extension (usually html). For example, if the URL is https://www.example.com/directory, SiteSucker will save the file at www.example.com/directory.html in the destination folder. However, if the Treat Ambiguous URLs as Folders option is on in the URL settings, the same URL will be saved at www.example.com/directory/index.html in the destination folder.

By default, If the server response includes an HTTP Content-Disposition header with a filename directive, SiteSucker will get the file name from the filename directive. This behavior, however, can be overridden by turning on the Ignore Filename in Headers option in the Path settings.

Any characters that should never appear in a folder or file name (such as, ‘/’, ‘:’, and ‘\’) or characters that could cause problems loading a downloaded file in a web browser (such as, ‘#’, ‘%’, ‘?’, and ‘|’) will be replaced with look-alike characters. However, if the Replace Special Characters with ‘_’ option is on in the Path settings, special characters are replaced with the ‘_’ character in folder and file names. And any file or folder name that is longer than 255 characters will be truncated to 255 characters.

Finally, you can use regular expressions in the Replace table in the Path settings to replace the normal path or name of a downloaded file with a different path or name.