Design pattern for blocking undesirable content

Asked 25/4, 2011 at 16:52 Answered 27/4, 2011 at 21:3

Last year I was working on a Christmas project which allowed customers to send emails to each other with a 256 character free-text field for their Christmas request. The project worked by searching the (very-large) product database for suggest products that matched the text field, but offered a free text option for those customers that could not find the product in question.

One obvious concern was the opportunity for customers to send rather explicit requests to some unsuspecting customer with the company's branding sitting around it.

The project did not go ahead in the end, for various reasons, the profanity aspect being one.

However, I've come back to thinking about the project and wondering what kinds of validation could be used here. I'm aware of clbuttic which I know is the standard response to any question of this nature.

The solutions that I considered were:

Run it through something like WebPurify
Use MechanicalTurk
Write a regex pattern which looks for the word in the list. A more complicated version of this would consider plurals and past tenses of the word as well.
Write an array of suspicious words, and score each one. If the submission goes above a score, the validation fails.

So there are two questions:

If the submission fails, how do you handle it from a UI perspective?
What are the pros and cons of these solutions, or any others that you can suggest?

NB - answers like "profanity filters are evil" are irrelevant. In this semi-hypothetical situation, I haven't decided to implement a profanity filter or been given the choice of whether or not to implement one. I just have to do the best I can with my programming skills (which should be on a LAMP stack if possible).

Flimsy answered 25/4, 2011 at 16:52 Comment(1)

+1 for referencing the standard clbuttic response =) – Repose 25/4, 2011 at 18:6

Have you thought about bayesian filtering? Bayesian filtering is not just for detecting spam. You can train them in a variety of text recognition tasks. Grab a bayesian filter, collect a bunch of request texts and start marking them as containing profanity or not. After some time (how much time depends a lot on the amount and type of training data) your filter will be able to detect requests containing profanity from those containing no profanity.

It's not fool-proof, but it's much, much better than simple string matching and trying to deal with clbuttic problems. You have a variety of possibilities for bayesian filtering in PHP.

bogofilter

Bogofilter is a stand-alone bayesian filter that runs on any unix-y OS. It's targeted at filtering e-mail but you can train it for any kind of text. I have succesfully used this to implement a custom comment spam filter for my own website (source). You can interface with bogofilter like you can with any other commandline application. See my source code link for an example.

Roll your own

If you like a challenge, you could implement a bayesian filter entirely from scratch. Here's a decent article about implementing a bayesian filter in PHP.

Existing PHP libraries

(Ab)use an existing e-mail filter

You could use a standard SpamAssassin or DSpam installation and train it to recognise profanity. Just make sure that you disable options specifically aimed at e-mail messages (e.g. parsing mime blocks, reading headers) and just enable the options that deal with the baysian text processing. DSpam may be easier to adapt. SpamAssassin has the advantage that you can add custom rules on top of the bayesian filter. For SpamAssassin, make sure you disable all the default rules and write your own rules instead. The default rules are all targeted at spam e-mail detection.

Rosierosily answered 27/4, 2011 at 20:52 Comment(2)

Other than installing Akismet on Wordpress, I've never touched Bayesian filtering. Do you know any filters that you'd recommend? – Flimsy 27/4, 2011 at 21:0

I was writing them up as you were typing your comment :-) – Rosierosily 27/4, 2011 at 21:5

In the past, I've used a glorified form of str_replace. Here was my rationale:

Profane words could afford to be replaced by silly words, conveying the original point of the message but discouraging the use of profanity
On successful posts where filtering took place, users were shown a success message, but there was a notification that sanitization had taken place (something like, "Your post was added, potty mouth.")
I didn't ever wan the submission to fail. Posts were either posted uncensored, or censored. In your case, you might want to prevent profane posts entirely.

For what it's worth, Apple only recently stopped banning obscene language in their free laser engravings. Perhaps they had a reasonable rationale?

Marxismleninism answered 25/4, 2011 at 17:23 Comment(4)

How did you get round the 'clbuttic' problem? Did you only replace whole words? And what about stuff like f.u.c.k or @$$? – Flimsy 27/4, 2011 at 10:54

I stripped punctuation, and had to hard-code extra listings for alternate spellings. – Marxismleninism 28/4, 2011 at 12:19

Did you strip punctuation or convert it? For example, all @s to as. I tried using regex but it got very messy. – Flimsy 28/4, 2011 at 13:25

I just blacklisted @55 and its asinine equivalent. – Marxismleninism 28/4, 2011 at 18:21

What about using a few string matching rules and sticking only those into a moderation queue?

It sounds like many requests may not use the free text field so they should safely go through.

Then, only a small percentage should trip your string matches to end up in moderation. Even with a large userbase this should keep moderation time to a minimum. You might even make obvious profanity like the f or n word automatic fails to cut the remaining list down even more.

Make your moderation page easy to use and highlight the words that flagged the messages and that should make it a quick process to scan through and clean up. Adjust as needed if people are trying to post too much garbage or if there's too many false positives.

Or just use this strategy with baysian filtering like @Sander suggested.

Edit: Also a "report abuse" button will help you find out if bad stuff is getting through, but this would involve saving sent messages for a while and that might not be ideal if this is going to be highly active.

Corody answered 27/4, 2011 at 21:3 Comment(4)

The 'report abuse' button is a very good idea, since it gives a constructive feedback loop for receivers whenever the filter does fail. For the moderation queue, it still begs the question of how to recognise f.u.c.k and @$$. – Flimsy 27/4, 2011 at 21:15

Hmmm.... maybe the best way to deal with that is to flag words that are contain a high % of non-alphanumeric characters. Then the only problem would be f u c k but you could strip all whitespace and scan to catch those. Also thought I should mention that 'report abuse' also protects your image in the eyes of the recipient. By providing it you are letting people know that the system has potential for abuse and that you are trying hard to prevent it. In other words, the second they report it to you is the second they stop blaming you because now you're on their side as a protector. – Corody 27/4, 2011 at 23:17

I thought about stripping out whitespace, but give myself a new problem - lack of word boundaries. – Flimsy 27/4, 2011 at 23:36

Stripping the whitespace is only for catching words like b u t t h o l e - you would do it after your other checks. Too sumarize: 1. Scan for obvious words and either fail or flag for moderation. 2. Scan for words with high % of non-alphanumerics and flag for moderation if needed. 3. Strip whitespace and scan one final time for obvious words and flag for moderation. 4. Include a button to report abuse for anything that sneaks through. 5. Once active - adjust your filters if you are getting false positives or abuse reports. – Corody 30/4, 2011 at 16:10

bogofilter

Roll your own

Existing PHP libraries

(Ab)use an existing e-mail filter

Recommended topics

Hot tags