[PLUG] wget and politeness

Jeme A Brelin jeme at brelin.net
Sat Dec 25 01:24:17 UTC 2004


On Fri, 24 Dec 2004, Keith Lofstrom wrote:
> I am looking at twiki as a possible replacement for kwiki, and did a
> "wget -rk" of the twiki.org site so I could peruse it offline.  Now,
> wget -r respects robots.txt, so it isn't downloading anything the site
> does not want a site scraper to see.
[snip]
> > You are black listed at the TWiki web site due to excessive access or
> > suspicious activities. Please contact site administrator
> > peter.thoeny at attglobalSTOPSPAM.net if you got on the list by mistake.
> > Black listed IP addresses will be submitted to major blacklist databases.
>
> So, the question;  is there some wget etiquette that I don't know about?

Golly, Keith, I think something funny must have happened.  Your download
tree is only 41MB?  My immediate suspicion was that there was some odd
circular symlinking on the twiki site that caused you to spiral down some
directory and make a mess of things.

As you point out, wget's respect for robots.txt makes it ideal for site
scraping and grabbing one copy of each file on a site is hardly suspicous
or excessive access.

I think you need to contact Peter and explain the situation.  From what
you say, you did nothing offensive whatsoever.

J.
-- 
   -----------------
     Jeme A Brelin
    jeme at brelin.net
   -----------------
 [cc] counter-copyright
 http://www.openlaw.org



More information about the PLUG mailing list