[PLUG] wget and politeness
Jeme A Brelin
jeme at brelin.net
Sat Dec 25 01:24:17 UTC 2004
On Fri, 24 Dec 2004, Keith Lofstrom wrote:
> I am looking at twiki as a possible replacement for kwiki, and did a
> "wget -rk" of the twiki.org site so I could peruse it offline. Now,
> wget -r respects robots.txt, so it isn't downloading anything the site
> does not want a site scraper to see.
[snip]
> > You are black listed at the TWiki web site due to excessive access or
> > suspicious activities. Please contact site administrator
> > peter.thoeny at attglobalSTOPSPAM.net if you got on the list by mistake.
> > Black listed IP addresses will be submitted to major blacklist databases.
>
> So, the question; is there some wget etiquette that I don't know about?
Golly, Keith, I think something funny must have happened. Your download
tree is only 41MB? My immediate suspicion was that there was some odd
circular symlinking on the twiki site that caused you to spiral down some
directory and make a mess of things.
As you point out, wget's respect for robots.txt makes it ideal for site
scraping and grabbing one copy of each file on a site is hardly suspicous
or excessive access.
I think you need to contact Peter and explain the situation. From what
you say, you did nothing offensive whatsoever.
J.
--
-----------------
Jeme A Brelin
jeme at brelin.net
-----------------
[cc] counter-copyright
http://www.openlaw.org
More information about the PLUG
mailing list