[PLUG] Training corpus for bayesian spamassassin

Wil Cooley wcooley at nakedape.cc
Fri Apr 23 11:21:01 UTC 2004


On Fri, 2004-04-23 at 10:05, Keith Lofstrom wrote:

> It is superficially plausible to train the bayesian filter on
> spamassassin with just the misclassified false positives and
> false negatives.  For the last week or so, I have been running 
> spamassassin with the bayesian filter and training turned off,
> to see what kinds of mistakes it makes with the heuristic rules.
> It seems to pass about 30% of the spam (false negative) and trap
> about 5% (false positive!!) of the ham, with the threshold set
> to 3.0 and a whole bunch of addresses whitelisted.

Is the 3.0 the threshold for marking as spam or ham?  3.0 is really low
for spam; I use 6.3 and rarely get any false positives that aren't
newsletters or things like that.

Wil
-- 
Wil Cooley                          mailto:wcooley at nakedape.cc
Naked Ape Consulting                        http://nakedape.cc 
* * * * * Portland's Premier Open Source Consultancy * * * * *
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.pdxlinux.org/pipermail/plug/attachments/20040423/ccb14299/attachment.asc>


More information about the PLUG mailing list