Mirror a web site with wget

In order to mirror a web site for offline use or preservation purposes, wget is the tool of the day:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent <URL to the website>

 The options are as follows:

  • --mirror: recursive download and other useful options
  • --convert-links: change links to relative links for offline and local viewing
  • --adjust-extension: fixes extensions when they don't match the type of their content
  • --page-requisites: makes wget download all necessary files to display the web pages (e.g. style sheets, inlined images, etc.)
  • --no-parent: only download pages below a certain hierarchy
Before using, make sure you have the right to mirror the said web site. Using too much bandwidth or making too many requests in a certain time might get you blocked.

Comments