How do I select and group by a portion of a string?
Asked Answered
W

3

5

Given I have data like the following, how can I select and group by portions of a string?

Version  Users
1.1.1    1
1.1.23   3
1.1.45   1
2.1.24   3
2.1.12   1
2.1.45   3
3.1.10   1
3.1.23   3

What I want is to sum up the users using version 1.1.x and 2.2.x and 3.3.x etc, but I'm not sure how I can group on a partial string in a select statement.

edit What the data should return like is this:

Version  Users
1.1.XX   5
2.1.XX   7
3.1.XX   4

There is an infinite variable number of versions, some are in this format (major, minor, build) some are just major, minor and some are just major, the only time I want to "roll up" the versions is when there is a build.

Welby answered 25/5, 2010 at 17:31 Comment(1)
does postgresql have a SUBSTRING_INDEX function like MySQL's? It allows you to specify a delimiter and how many chunks you want. dev.mysql.com/doc/refman/5.1/en/…Obstreperous
G
5
select rtrim(Version, '0123456789') ||'XX', sum(users) 
from Table
group by rtrim(Version, '0123456789')
Greenback answered 25/5, 2010 at 21:47 Comment(0)
R
3

You did not specify what you want to see after you group by the first portion nor any specification on the length of each subsection. Supposing that you want the count of users and that each section is no more than a single character (i.e., there is no 10.1.xxx nor 10.10.xxx):

Select substring(Version, 1, 3), Count(*)
From Table
Group By substring(Version, 1, 3)
Rosaliarosalie answered 25/5, 2010 at 17:43 Comment(1)
I think this substring is a more elegant way than rtrimSocket
D
3

The Postgres substring(string from pattern) function also supports regular expressions. This makes it easy to solve this problem also for complex strings where the rtrim or substring with fixed start and length would not work.

Example (substring(string, pattern) is a shorthand, the from can be omitted):

WITH x ("Version", "Users") AS (
select '1.1.1', 1 union all
select '1.1.23', 3 union all
select '1.1.45', 1 union all
select '2.1.24', 3 union all
select '2.1.12', 1 union all
select '2.1.45', 3 union all
select '3.1.10', 1 union all
select '3.1.23', 3)
select
  substring(x."Version", '([0-9]*\.[0-9]*)\.[0-9]*') || '.XX' as "Version",
  sum("Users") as "Users"
from x
group by
  substring(x."Version", '([0-9]*\.[0-9]*)\.[0-9]*');

Note that we use a parenthesized subexpression to return only part of the match, and the is not null is omitted in the example. This results in:

Version Users
1.1.XX  5
2.1.XX  7
3.1.XX  4

From the documentation:

The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below.

Note: the logic for handling versions in the form (major, minor) or only (major) is omitted above. You can easily also group these e.g. by changing the regex to ([0-9]*|[0-9]*\.[0-9]*), but I don't see how that would make sense. E.g. if you have '1', '1.2', '1.2.3', grouping only by major would be more sensible: '1' could mean anything, from '1.0.0' to '1.2.4' but it could also mean '1.2.3'.

Dinothere answered 9/1, 2021 at 12:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.