[PLUG] Training corpus for bayesian spamassassin
Wil Cooley
wcooley at nakedape.cc
Fri Apr 23 11:21:01 UTC 2004
On Fri, 2004-04-23 at 10:05, Keith Lofstrom wrote:
> It is superficially plausible to train the bayesian filter on
> spamassassin with just the misclassified false positives and
> false negatives. For the last week or so, I have been running
> spamassassin with the bayesian filter and training turned off,
> to see what kinds of mistakes it makes with the heuristic rules.
> It seems to pass about 30% of the spam (false negative) and trap
> about 5% (false positive!!) of the ham, with the threshold set
> to 3.0 and a whole bunch of addresses whitelisted.
Is the 3.0 the threshold for marking as spam or ham? 3.0 is really low
for spam; I use 6.3 and rarely get any false positives that aren't
newsletters or things like that.
Wil
--
Wil Cooley mailto:wcooley at nakedape.cc
Naked Ape Consulting http://nakedape.cc
* * * * * Portland's Premier Open Source Consultancy * * * * *
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.pdxlinux.org/pipermail/plug/attachments/20040423/ccb14299/attachment.asc>
More information about the PLUG
mailing list