Counting word occurrences in a table column
Asked Answered
V

6

7

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.

If there are 2 rows with these fields:

"I like to eat bananas"
"I don't like to eat like a monkey"

I want to get

    word | count()
    ---------------
    like  3
    eat   2
    to    2
    i     2
    a     1

Any idea? I am using MySQL 5.2.

Variole answered 7/7, 2009 at 21:8 Comment(1)
Possible duplicate of Count the frequency of each word. The present question is older, but a solution has been accepted in the other question.Octahedral
C
5

@Elad Meidar, I like your question and I found a solution:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

Citrin answered 23/9, 2015 at 14:43 Comment(0)
S
1

I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.

Spikenard answered 7/7, 2009 at 21:11 Comment(3)
Really, isn't it smarter to run it as a stored procedure? or a function?. i always thought that offloading such calculations to the DB is better.Variole
Agreed, there is no way of doing this using regular SQL. You could write a stored procedure to do it using a temporary table and a modified version of the one of the techniques in this article: sommarskog.se/arrays-in-sql-2000.html (this is for SQL Server 2000 but could be adapted to MySQL's dialect fairly easily) However, the issue still is that you're going outside SQL's "comfort zone" in doing soHeighttopaper
You certainly can do that in SP, using multiple passes over data set and extracting each word with regexp, then putting results in temp table and iterating over it and so on. But it is quite complicated, it will be both difficult to write and maintain, and I would said maintenance is even more important then writing it in the first place. DBs are good for what they designed for - storing/reading/writing relational data. Making them compute things are just suboptimal design and will eventually return as a lot of PITA.Spikenard
Y
1

You can try this perverted-a-little way:

SELECT 
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC

This query can be very slow. Also, it looks pretty ugly.

Yasmeen answered 16/2, 2010 at 16:59 Comment(0)
S
0

I think you should do it like indexing, with additional table. Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.

Shoal answered 8/7, 2009 at 5:46 Comment(1)
Yeah, but the problem is i need in in a scope of a user (FK on the table with the string i need to parse) using a join table between a word, a phrase and a user seems a bit of over kill no?Variole
T
0

ChatGPT answer that worked for me :

SELECT SUBSTRING_INDEX(word, '@', -1) AS word, COUNT(*) AS word_count
FROM words
GROUP BY word
ORDER BY word_count;
Tyrolienne answered 19/1 at 10:8 Comment(0)
T
-1

I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.

Tunisia answered 7/7, 2009 at 21:30 Comment(1)
Really, isn't it smarter to run it as a stored procedure? or a function?. i always thought that offloading such calculations to the DB is better.Variole

© 2022 - 2024 — McMap. All rights reserved.