Bayesian spam filtering library for Python

G

6

18

I am looking for a Python library which does Bayesian Spam Filtering. I looked at SpamBayes and OpenBayes, but both seem to be unmaintained (I might be wrong).

Can anyone suggest a good Python (or Clojure, Common Lisp, even Ruby) library which implements Bayesian Spam Filtering?

Thanks in advance.

Clarification: I am actually looking for a Bayesian Spam Classifier and not necessarily a spam filter. I just want to train it using some data and later tell me whether some given data is spam. Sorry for any confusion.

Gouty answered 17/2, 2009 at 18:50 Comment(0)

B

11

Do you want spam filtering or Bayesian classification?

For Bayesian classification there are a number of Python modules. I was just recently reviewing Orange which looks very impressive. R has a number of Bayesian modules. You can use Rpy to hook into R.

Backache answered 17/2, 2009 at 19:35 Comment(0)

C

11

Try Reverend. It's a spam filtering module.

Catricecatrina answered 18/2, 2009 at 15:50 Comment(3)

@dfrankow: yup, divmod.org no longer seems to be online. PyPI has a page for Reverend here: pypi.python.org/pypi/Divmod%20Reverend/0.2.4 – Minton 6/3, 2011 at 14:19

And on that page, the homepage (divmod.org) is busted. So, one can install the PyPi package, but the original source of the package is gone. – Polygon 10/3, 2011 at 16:46

I got hold of the divmod people, and asked about reverend. The original source code for Reverend is available here: bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/files/head:/… – Horan 13/3, 2011 at 12:36

C

9

RedisBayes looks good to me:

http://pypi.python.org/pypi/redisbayes/0.1.3

In my experience Redis is an awesome addition to your stack and can help process data at blazing fast speeds compared to MySQL, PostgreSQL or any other RDBMS.

import redis, redisbayes
rb = redisbayes.RedisBayes(redis=redis.Redis())

rb.train('good', 'sunshine drugs love sex lobster sloth')
rb.train('bad', 'fear death horror government zombie god')

assert rb.classify('sloths are so cute i love them') == 'good'
assert rb.classify('i fear god and love the government') == 'bad'

print rb.score('i fear god and love the government')

rb.untrain('good', 'sunshine drugs love sex lobster sloth')
rb.untrain('bad', 'fear death horror government zombie god')

Hope that helps a bit.

Chartism answered 11/8, 2012 at 20:11 Comment(0)

C

3

Try to use bogofilter, I'm not sure how it can be used from Python. Bogofilter is integrated with many mail systems, which means a relative ease of interfacing.

Chromatid answered 17/2, 2009 at 19:10 Comment(0)

U

3

SpamBayes is maintained, and is mature (i.e. it works without having to have new releases all the time). It will easily do what you want. Note that SpamBayes is only loosely Bayesian (it uses chi-squared combining), but presumably you're after any sort of statistical token-based classification, rather than something specifically Bayesian.

Unpaid answered 30/4, 2009 at 9:31 Comment(0)

M

1

A module in the Python natural language toolkit (nltk) does naïve Bayesian classification: nltk.classify.naivebayes.

Disclaimer: I know crap all about Bayesian classification, naïve or worldly.

Minton answered 10/6, 2009 at 20:44 Comment(0)

Recommended topics

Hot tags