Let's assume we have a table of People (name, surname, address, SSN, etc).
We want to find all rows that are "very similar" to specified person A. I would like to implement some kind of fuzzy logic comparation of A and all rows from table People. There will be several fuzzy inference rules working separately on several columns (e.g. 3 fuzzy rules for name, 2 rules on surname, 5 rules on address)
The question is Which of the following 2 approaches would be better and why?
Implement all fuzzy rules as stored procedures and use one heavy SELECT statement to return all rows that are "very similar" to A. This approach may include using soundex, sim metric etc.
Implement one or more simplier SELECT statements, that returns less accurate results, "rather similar" to A, and then fuzzy-compare A with all returned rows (outside database) to get "very similar" rows. So fuzzy comparation would be implemented in my favorit programming language.
Table People should have up to 500k rows, and I would like to make about 500-1000 queries like this a day. I use MySQL (but this is yet to be considered).