[PLUG] Bayesian filtering

Sat Nov 22 19:02:02 UTC 2003

On Sat, Nov 22, 2003 at 06:47:06PM -0500, Brian Derr wrote:
> With all this talk about SPAM I decided to try bogofilter.  However, 
> before I get going with it I have a question.  When teaching it the
> difference between spam/ham is it alright to use a mailing list as
> food? 

Why?  I dunno about you, but I filter out all my mailing-lists first and
just run Bogofilter on Inbox mail.  I initially trained it on the
contents of my Inbox and a collection of recent spam.  

> Will it become skewed because of the constant subject repeats?
> Or does that not matter with bayesian filtering?  I guess I'm just a
> bit confused, any help would be appreciated.  Thanks!

Train it on the stuff you want it to filter.  So if it's not gonna be
filtering PLUG stuff, don't show it PLUG mail.  What little spam I still
get, I save to a spam folder.  Every once in a while, I do a 
"cat spam | bogofilter -Ns; rm -f spam".

Here's the tail-end of my procmailrc:
  :0:
  * ^To: pdxlug at pdxlug.org
  pdxlug

  # This matches the start of a base64-encoded PE executable.  I have no
  # desire to ever receive them by email.  Despite the varying headers,
  # every(?) worm to date has contained this in the body.
  #   --Martin Pool <mbp at sourcefrog.net>
  :0B:
  *TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  virus

  # To re-register false-negatives as spam:
  # cat spam | bogofilter -Ns
  :0HB:
  * ? /usr/bin/bogofilter -u
  bogofilter