Thursday 28 March 2019

download - wget decides not to load because of black list


I'm trying to make a full copy of a web site; e.g.,


http://vfilesarchive.bgmod.com/files/

I'm running


wget -r -level=inf -R "index.html*" --debug http://vfilesarchive.bgmod.com/files/

and getting, for example


Deciding whether to enqueue "http://vfilesarchive.bgmod.com/files/Half-Life%D0%92%D0%86/".
Already on the black list.
Decided NOT to load it.

What is happening?  What does wget mean by "black list", why is it downloading only parts of what is there, and what should I do to get the entire web site?


The version of wget is


GNU Wget 1.20 built on mingw32

(running on Windows 10 x64).


P.S. I think I've managed to solve this with


wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" 

even though the filenames are slightly crippled due to special chars in URLs.  Is there a better solution?



Answer



I think I've managed to solve this with


wget -m --restrict-file-names=nocontrol --no-iri -R "index.html*" 

even though the filenames are slightly crippled due to special chars in URLs.


No comments:

Post a Comment

How can I VLOOKUP in multiple Excel documents?

I am trying to VLOOKUP reference data with around 400 seperate Excel files. Is it possible to do this in a quick way rather than doing it m...