I'd like to make an SQL query where the condition is that column1 contains three or more words. Is there something to do that?
maybe try counting spaces ?
SELECT *
FROM table
WHERE (LENGTH(column1) - LENGTH(replace(column1, ' ', ''))) > 1
and assume words is number of spaces + 1
len()
is a microsoft invention, it's not portable –
Workmanship If you want a condition that a column contains three or more words and you want it to work in a bunch of databases and we assume that words are separated by single spaces, then you can use like
:
where column1 like '% % %'
where column1 like '%_ %_ %_'
is only slightly better. –
Workmanship I think David nailed it above. However, as a more complete answer:
LENGTH(RTRIM(LTRIM(REPLACE(column1,' ', ' ')))) - LENGTH(REPLACE(RTRIM(LTRIM(REPLACE(column1, ' ', ' '))), ' ', '')) + 1 AS number_of_words
This will remove double spaces, as well as leading and trailing spaces in your string.
Of course, you may go further by adding replacements for more than 2 spaces in a row...
In Postgres you can use regexp_split_to_array()
for this:
select *
from the_table
where array_length(regexp_split_to_array(the_column, '\s+'), 1) >= 3;
This will split the contents of the column the_column
into array elements. One ore more whitespace are used as the delimiter. It won't respect "quoted" spaces though. The value 'one "two three" four'
will be counted as four words.
array_length()
not for regexp_split_to_array()
–
Librium The best way to do this, is to NOT do this.
Instead, you should use the application layer to count the words during INSERT and save the word count into its own column.
While I like, and upvoted, some of the answers here, all of them will be very slow and not 100% accurate.
I know people want a simple answer to SELECT the word count, but it just is NOT POSSIBLE with accuracy and speed.
If you want it to be 100% accurate, and very fast, then use this solution.
Steps to solve:
- Add a column to your table and index it:
ALTER TABLE tablename ADD COLUMN wordcount INT UNSIGNED NULL, ADD INDEX idxtablename_count (wordcount ASC);
. - Before doing your INSERT, count the number of words using your application. For example in PHP:
$count = str_word_count($somevalue);
- During the INSERT, include the value of
$count
for the columnwordcount
likeinsert into tablename (col1, col2, col3, wordcount) values (val1, val2, val3, $count);
Then your select statement becomes super easy, clean, uber-fast, and 100% accurate.
select * from tablename where wordcount >= 3;
Also remember when you are updating any rows that you will need to recount the words for that column.
select * from tablename where wordcount >= 3;
is my answer, if you want to be semantically accurate. OP asked for a query. OP got a query. –
Resupine For "n" or more words
select *
from table
where (length(column)- length(replace(column, " ", "")) + 1) >= n
PS: This would not work if words have multiple spaces between them.
With ClickHouse DB You can use splitByWhitespace() function.
None of the other answers seem to take multiple spaces into account. For example, a lot of people use two spaces between sentences; these space-counters would count an extra word per sentence. "Also, scenarios such as spaces around a hyphen - like that. "
For my purposes, this was far more accurate:
SELECT
LENGTH(REGEXP_REPLACE(myText, '[ \n\t\|\-]{1,}',' ')) -
LENGTH(REGEXP_REPLACE(myText, '[ \n\t\|\-]{1,}', '')) wordCount FROM myTable;
It counts any sets of 1 or more consecutive characters from any of: [space
, linefeed
, tab
, pipe
, or hyphen
] and counts it as one word.
This can work:
SUM(LENGTH(a) - LENGTH(REPLACE(a, ' ', '')) + 1)
Where a
is the string column. It will count the number of spaces, which is 1 less than the number of words.
I like @ashleydawg's answer with REGEX but stackoverflow won't let me comment or like it due to "reputation points".. so annoying!
Only thing to change there -- add 1 to the result. In this example, it returns 3 instead of 4.
set @mystring = "This is four words";
select LENGTH(REGEXP_REPLACE(@mystring, '[ \n\t\|\-]{1,}',' ')) -
LENGTH(REGEXP_REPLACE(@mystring, '[ \n\t\|\-]{1,}', '')) wordCount
To handle multiple spaces too, use the method shown here
Declare @s varchar(100)
set @s=' See how many words this has '
set @s=ltrim(rtrim(@s))
while charindex(' ',@s)>0
Begin
set @s=replace(@s,' ',' ')
end
select len(@s)-len(replace(@s,' ',''))+1 as word_count
https://exploresql.com/2018/07/31/how-to-count-number-of-words-in-a-sentence/
© 2022 - 2025 — McMap. All rights reserved.