Last year I was working on a Christmas project which allowed customers to send emails to each other with a 256 character free-text field for their Christmas request. The project worked by searching the (very-large) product database for suggest products that matched the text field, but offered a free text option for those customers that could not find the product in question.
One obvious concern was the opportunity for customers to send rather explicit requests to some unsuspecting customer with the company's branding sitting around it.
The project did not go ahead in the end, for various reasons, the profanity aspect being one.
However, I've come back to thinking about the project and wondering what kinds of validation could be used here. I'm aware of clbuttic which I know is the standard response to any question of this nature.
The solutions that I considered were:
- Run it through something like WebPurify
- Use MechanicalTurk
- Write a regex pattern which looks for the word in the list. A more complicated version of this would consider plurals and past tenses of the word as well.
- Write an array of suspicious words, and score each one. If the submission goes above a score, the validation fails.
So there are two questions:
- If the submission fails, how do you handle it from a UI perspective?
- What are the pros and cons of these solutions, or any others that you can suggest?
NB - answers like "profanity filters are evil" are irrelevant. In this semi-hypothetical situation, I haven't decided to implement a profanity filter or been given the choice of whether or not to implement one. I just have to do the best I can with my programming skills (which should be on a LAMP stack if possible).