[PLUG] Training a bayesian filter

Paul Johnson baloo at ursine.ca
Sat Apr 10 19:19:02 UTC 2004


Paul Heinlein <heinlein at madboa.com> writes:

> On average, I get about 100 spam messages a day. It's not so much that
> I couldn't handle it manually, but I like having SpamAssassin
> performing sentinal duty for me.

I usually get at least three times that, and judicious use of
SpamAssassin, ClamAV, procmail and gnus scoring, it's more than
manageable.

> I've got four folders for unwanted messages:
[...]
> 4. spam.kept: where I keep about 250 known-spam messages. This is
>    useful if I have to rebuild a bayes database.

I'm intrigued why you bother keeping this one around when one could
easily get their fill and then some by browsing through
nntp://news.spamcop.net/spamcop.spam,
news:news.admin.net-abuse.sightings or http://www.spamarchive.org/ .

> Any that are left (typically these days, no more than 3 to 5% of the
> total number of filtered messages), I move to the spam.uncaught
> folder. Every so often, I feed the uncaught spam to sa-learn and then
> delete all the messages in spam.uncaught.

If you pipe the messages through spamassassin -r instead of dumping a
folder at sa-learn, spamassassin will also submit hashes to
razor/razor2 and pyzor if you have razor, razor2 and/or pyzor
installed.  Installing those gives spamassassin more things to check
against, making it more accurate.

-- 
Paul Johnson
<baloo at ursine.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
URL: <http://lists.pdxlinux.org/pipermail/plug/attachments/20040410/8bbca2b5/attachment.asc>


More information about the PLUG mailing list