[PLUG] hiding email info

Colin Kuskie ckuskie at dalsemi.com
Tue Apr 15 10:23:01 UTC 2003


On Tue, Apr 15, 2003 at 10:11:14AM -0700, Michael Montagne wrote:
> Is it possible to hide email addresses
> on your web site by using a php or javascript script to buidl the
> address?

Yes, see Slashdot archives.

> What if all email addresses come from a text file?  Are those
> crawled too?

I'd guess that most spiders only parse HTML, and not more unusual
formats like ASCII, PDF, doc, xls, etc.  But ask yourself this, how
hard is it really to do MIME-type detection and custom backend
processing on the spider?

> If your whole site is generated dynamically can it still
> be harvested?  Can google index it?

The spiders follow links just like users do, so any content on your website
that can be reached by a link can be searched/indexed.

This behavior is exploited by the spider honeypots/SPAM traps to tie up
resources on the spider's computer.

Note that this is also something that can be used to protect email addresses.
Suppose you have to make a list of email addresses available on the web but
you don't want them to be easily searchable.

You provide a 2-page solution via CGI or other programmatic method.
Page 1 is a list of names with radio buttons (all default off) and a
submit button.  A human will select a name and press submit and get an
email address.  A spider will probably just follow the submit link,
which can be a frontend to a SPAM trap.

It's a little less convenient to your users, but it provides a measure
of protection for the people with email address on the site.  It also
doesn't scale very well to a general purpose email address solution.

Colin




More information about the PLUG mailing list