Batch Image Download With wget

(Note: This article was originally posted on my old Wikidot site on 2009-10-17)

wget is often used to download single files from the command line, but it can also mirror a website locally or just download part of a website. By specifying the right parameters we can make wget act as batch downloader, retrieving only the files we want.

In this example we assume a website with a sequence of pages, where each page links to the next in the sequence and they all contain a JPEG image. We want to download all the images to the current directory. The following command line does this:

$ wget --recursive --level=inf --no-directories --no-parent --accept *.jpg URL

Or if you prefer, the shorter but more obscure:

$ wget -r -l inf -nd -np -A *.jpg URL

Let’s take a look at the parameters:

    Makes wget follow links from the start page.

    This allows for infinite recursion. In combination with another option that limits the recursion depth, like –no-parent, we don’t need to know the necessary depth. Otherwise you should specify a number to set a limit.

    Default behaviour is to recreate the directory structure of the website. This option makes wget put all files in the same directory.

    Do not follow links to pages above the starting page in the hierarchy.

--accept *.jpg
    Here we specify what kind of file to download. The parameter can be a comma-separated list.

For other options and details about those listed here, check the wget man page.

In one scenario I used wget the files had to be zero padded (like img-01.jpg instead of img-1.jpg). For a single directory this did the job (for filenames with two digits):

$ rename 's/-([0-9])\./-0$1\./' *.jpg

For a directory tree this was used:

$ find . "*.jpg" -type f -print0 | xargs --null rename 's/-([0-9])\./-0$1\./'