Counting the number of occurrences of a substring within a string in PostgreSQL
Asked Answered
E

6

82

How can I count the number of occurrences of a substring within a string in PostgreSQL?


Example:

I have a table

CREATE TABLE test."user"
(
  uid integer NOT NULL,
  name text,
  result integer,
  CONSTRAINT pkey PRIMARY KEY (uid)
)

I want to write a query so that the result contains column how many occurrences of the substring o the column name contains. For instance, if in one row, name is hello world, the column result should contain 2, since there are two o in the string hello world.

In other words, I'm trying to write a query that would take as input:

enter image description here

and update the result column:

enter image description here


I am aware of the function regexp_matches and its g option, which indicates that the full (g = global) string needs to be scanned for the presence of all occurrences of the substring).

Example:

SELECT * FROM regexp_matches('hello world', 'o', 'g');

returns

{o}
{o}

and

SELECT COUNT(*)  FROM regexp_matches('hello world', 'o', 'g');

returns

2

But I don't see how to write an UPDATE query that would update the result column in such a way that it would contain how many occurrences of the substring o the column name contains.

Enate answered 2/4, 2016 at 17:18 Comment(1)
Possible duplicate of PostgreSQL count number of times substring occurs in textXylol
F
95

A common solution is based on this logic: replace the search string with an empty string and divide the difference between old and new length by the length of the search string

(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'substring', ''))) 
/ CHAR_LENGTH('substring')

Hence:

UPDATE test."user"
SET result = 
    (CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'o', ''))) 
    / CHAR_LENGTH('o');
Flamenco answered 2/4, 2016 at 17:28 Comment(3)
This is a solid answer, and it's right. You may be interested on my write up all the methods of doing thisXylol
Thanks! Does anyone know, why there is no simpler way? I mean, REPLACE already goes through the trouble of scanning the whole string for all the occurrences, why not have something that does half of the work of REPLACE - just count the occurrencesRabbinism
@AleksandrLevchuk: Well, you can write your own User Defined Function doing this calculation, e.g. there's Oracle's REGEXP_COUNT in enterprisedb.com/docs/en/9.5/eeguide/….Flamenco
H
75

A Postgres'y way of doing this converts the string to an array and counts the length of the array (and then subtracts 1):

select array_length(string_to_array(name, 'o'), 1) - 1

Note that this works with longer substrings as well.

Hence:

update test."user"
    set result = array_length(string_to_array(name, 'o'), 1) - 1;
Heyes answered 2/4, 2016 at 17:31 Comment(5)
If someone needs regexp, this solution with "regexp_split_to_array" instead of "string_to_array" works too.Artois
This solution is substantially slower than @dnoeth's suggestion. I don't think it's more-Postgres-y. When things are faster and more portable in a different method, I think we call that better. =)Xylol
@EvanCarroll Unfortunately, dnoeth's answer won't work for regex matches, since you may not know the length of the match. This answer will work for both regex matches and raw string matches. I think what we call better is the solution that works for everything you are trying to do :)Demerit
upvoted but i don't see what's "postgres'y" about it; let's not borrow pythonic characterizations :)Sunbow
@Sunbow . . . Most databases don't support arrays natively.Heyes
W
12

Other way:

UPDATE test."user" SET result = length(regexp_replace(name, '[^o]', '', 'g'));
Wageworker answered 16/8, 2016 at 9:18 Comment(1)
This is the simplest way.Unreel
S
9

Return count of character,

 SELECT (LENGTH('1.1.1.1') - LENGTH(REPLACE('1.1.1.1','.',''))) AS count
--RETURN COUNT OF CHARACTER '.'
Souse answered 12/4, 2019 at 12:43 Comment(3)
This was the most readable solution for me.Rehm
This works fine for just one character, but the division is needed in the accepted answer for substrings longer than one characterScrounge
@Scrounge - you could try with the TRANSLATE() function!Piceous
T
7
Occcurence_Count = LENGTH(REPLACE(string_to_search,string_to_find,'~'))-LENGTH(REPLACE(string_to_search,string_to_find,''))

This solution is a bit cleaner than many that I have seen, especially with no divisor. You can turn this into a function or use within a Select.
No variables required. I use tilde as a replacement character, but any character that is not in the dataset will work.

Tepee answered 3/4, 2018 at 13:37 Comment(0)
P
0
SELECT array_length(string_to_array('a long name here', 'o'),1)
  • 1 is for 1-dimension array
  • 'o' is the occurrence you want to count
Paginal answered 9/12, 2021 at 11:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.