Produce DISTINCT values in STRING_AGG
Asked Answered
O

6

69

I'm using the STRING_AGG function in SQL Server 2017. I'd like to create the same effect as COUNT(DISTINCT <column>). I tried STRING_AGG(DISTINCT <column>,',') but that is not legal syntax.

I'd like to know if there is a T-SQL work-around. Here is my sample:

WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 
SELECT State, City, COUNT(DISTINCT Siting) [# Of Types], STRING_AGG(Siting,',') Animals
FROM Sitings 
GROUP BY State, City

The above produces the following result:

+---------+-----------+--------------+-------------------------+
|  State  |   City    | # Of Types   |         Animals         |
+---------+-----------+--------------+-------------------------+
| Arizona | Flagstaff |            1 | dog                     |
| Florida | Orlando   |            2 | dog,bird                |
| Arizona | Phoenix   |            2 | bird,bird,bird,dog,bird |
+---------+-----------+--------------+-------------------------+

The output is exactly what I want, except I want the concatenated "Animals" listed for Phoenix Arizona to be DISTINCT, like this:

+---------+-----------+--------------+--------------------+
|  State  |   City    | # Of Types   |      Animals       |
+---------+-----------+--------------+--------------------+
| Arizona | Flagstaff |            1 | dog                |
| Florida | Orlando   |            2 | dog,bird           |
| Arizona | Phoenix   |            2 | bird,dog           |
+---------+-----------+--------------+--------------------+

Any ideas?

When I use my real data set, which is much larger, I get an error about the "Animals" column exceeding 8000 characters.

My question I think is the same as this one, except my example is much simpler.

Orthography answered 2/8, 2018 at 5:52 Comment(0)
G
41

Here is one way to do it.

Since you want the distinct counts as well, it can be done simply by grouping the rows twice. The first GROUP BY will remove duplicates, the second GROUP BY will produce the final result.

WITH
Sitings
AS
(
    SELECT * FROM (VALUES 
    (1, 'Florida', 'Orlando', 'bird'),
    (2, 'Florida', 'Orlando', 'dog'),
    (3, 'Arizona', 'Phoenix', 'bird'),
    (4, 'Arizona', 'Phoenix', 'dog'),
    (5, 'Arizona', 'Phoenix', 'bird'),
    (6, 'Arizona', 'Phoenix', 'bird'),
    (7, 'Arizona', 'Phoenix', 'bird'),
    (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
)
,CTE_Animals
AS
(
    SELECT
        State, City, Siting
    FROM Sitings
    GROUP BY State, City, Siting
)
SELECT
    State, City, COUNT(1) AS [# Of Sitings], STRING_AGG(Siting,',') AS Animals
FROM CTE_Animals
GROUP BY State, City
ORDER BY
    State
    ,City
;

Result

+---------+-----------+--------------+----------+
|  State  |   City    | # Of Sitings | Animals  |
+---------+-----------+--------------+----------+
| Arizona | Flagstaff |            1 | dog      |
| Arizona | Phoenix   |            2 | bird,dog |
| Florida | Orlando   |            2 | bird,dog |
+---------+-----------+--------------+----------+

If you are still getting an error message about exceeding 8000 characters, then cast the values to varchar(max) before STRING_AGG.

Something like

STRING_AGG(CAST(Siting AS varchar(max)),',') AS Animals
Goodkin answered 2/8, 2018 at 6:7 Comment(1)
BTW: the explicit double GROUP BY is no performance drawback, since a DISTINCT in an aggregate function will do an implicit grouping (or sort distinct) too. When you compare the execution plans of your original query with those in this solution, you would find, that the double grouping is much faster (I tested it on a multi-milion row table and found a factor of ~3 in CPU time and factor ~12 in total execution time)Indo
T
16

Here is one more way of doing it (sql fiddle):

  WITH Sitings 
  AS
  (
    SELECT * FROM (VALUES 
      (1, 'Florida', 'Orlando', 'bird'),
      (2, 'Florida', 'Orlando', 'dog'),
      (3, 'Arizona', 'Phoenix', 'bird'),
      (4, 'Arizona', 'Phoenix', 'dog'),
      (5, 'Arizona', 'Phoenix', 'bird'),
      (6, 'Arizona', 'Phoenix', 'bird'),
      (7, 'Arizona', 'Phoenix', 'bird'),
      (8, 'Arizona', 'Flagstaff', 'dog')
    ) F (ID, State, City, Siting)
  ) 

select State,City,count(*) as [# Of Sitings],(select string_agg(value,', ') from (select distinct value from string_split(string_agg(Siting, ','),',')) t) AS Animals
FROM Sitings 
GROUP BY State, City

You may easily convert the splitting and merging part into a reusable scalar valued function.

NOTE

This is NOT an optimal solution, if you group first and then do aggregate (like answers above) it is better. Also, it does not get # of Types, it gets # of Sitings instead. However, it becomes handy as a quick inline function.

Tongue answered 10/10, 2020 at 13:50 Comment(4)
I didn't understand how it works, but it works for me!)Lowndes
@dzhukov, it groups by state and city first to get output like bird, bird, dog etc. Then it splits them again, gets distinct values and then joins them to get bird, dog.Tongue
I love this for the utility of being able to do it in the select clause in one line! Thanks! To combine this with a character limit and concatenation - I use ,BigList = (SELECT STRING_AGG( CONVERT(varchar(max),VALUE),', ') FROM (SELECT DISTINCT VALUE FROM STRING_SPLIT(STRING_AGG(CONVERT(varchar(max),CONCAT(tbla.[ID],'-',[tbla.Name])),','),',')) t)Delly
I always use this one to not have a big change on the query, and you even have the opportunity to order it. aggregate > split [and order]> reaggregateGault
P
13

just use sub-query

WITH Sitings 
      AS
      (
        SELECT * FROM (VALUES 
          (1, 'Florida', 'Orlando', 'bird'),
          (2, 'Florida', 'Orlando', 'dog'),
          (3, 'Arizona', 'Phoenix', 'bird'),
          (4, 'Arizona', 'Phoenix', 'dog'),
          (5, 'Arizona', 'Phoenix', 'bird'),
          (6, 'Arizona', 'Phoenix', 'bird'),
          (7, 'Arizona', 'Phoenix', 'bird'),
          (8, 'Arizona', 'Flagstaff', 'dog')
        ) F (ID, State, City, Siting)
      ) 

    select State,City,count(*) as [# Of Types],STRING_AGG(Siting,',') AS Animals from 
    (
      SELECT State, City, Siting
    FROM Sitings 
    GROUP BY State, City,Siting
    ) as T  group by State,City

http://sqlfiddle.com/#!18/ba4b8/11

  State     City    # Of Types  Animals
Arizona     Flagstaff   1   dog
Florida     Orlando     2   bird,dog
Arizona     Phoenix     2   bird,dog
Pachisi answered 2/8, 2018 at 6:42 Comment(0)
L
8

You can use this in postgres. I am not sure about mysql. But this works in postgres.

select state, city, string_agg(distinct (siting), ', ') from sitings group by state, city;

This will aggregate only distinct values.

Lintwhite answered 7/7, 2022 at 4:58 Comment(3)
Yet the question is neither about Postgres nor MySQL, but rather SQL Server.Kingfish
@Jeremy Caney: But it doesn't say 'SQL Server' in the title of the question. That's why PostgreSQL folks will end up here as well and will appreciate this answer. So, better to have the questions title restricted to SQL Server and wait for some to ask the very same question but regarding PostgreSQL and MySQL and Oracle and Informix and...?Filmdom
@MyBrainHurts: Stack Overflow discourages placing technology or product names in titles, and instead placing them within tags. I believe the logic is that it's redundant, and since answers often come from people who follow specific tags, it helps focus their attention onto the immediate problem. That said, I agree that it can lead to confusion—and especially from people who aren't looking carefully at the tags, or who following, in this case, tags from multiple SQL platforms.Kingfish
L
0

Install CLR functions GROUP_CONCAT:

https://github.com/orlando-colamatteo/ms-sql-server-group-concat-sqlclr

then:

SELECT 
   State, 
   City, 
   COUNT(DISTINCT Siting) [# Of Types], 
   dbo.GROUP_CONCAT(distinct Siting) Animals
FROM Sitings 
GROUP BY State, City
Leery answered 24/11, 2023 at 6:45 Comment(0)
A
0

You can create such function

CREATE FUNCTION fn_DistinctSeparatedList (@InputString VARCHAR(MAX), @separator nvarchar(10))
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @Items TABLE (Item VARCHAR(MAX));

    INSERT INTO @Items
    SELECT value 
    FROM STRING_SPLIT(replace(@InputString, @separator,'~'),'~')
    WHERE value IS NOT NULL AND value != '';

    WITH DistinctItems AS (
        SELECT DISTINCT Item AS Item
        FROM @Items
    )

    SELECT @InputString = STRING_AGG(Item, '~') 
    FROM DistinctItems;
    
    RETURN Replace(@InputString,'~',@separator);
END

and use that in your code that way:

drop table if exists #PetsOwner
Select 'Olivier' as Person, 'Cat' as Pet, 'Charlie' as PetName
into #PetsOwner
union
Select 'Olivier' as Person, 'cat' as Pet, 'Luna' as PetName
union
Select 'Olivier' as Person, 'Cat' as Pet, 'Cooper '  as PetName
union 
Select 'Leo' as Person, 'Cat' as Pet, 'Daisy'  as PetName
union 
Select 'Leo' as Person, 'Dog' as Pet, 'Milo'  as PetName
union
Select 'Michael' as Person, 'Fish' as Pet, 'Max'  as PetName

and now without function:

select Person, STRING_AGG(Pet, ', ')
from #PetsOwner
group by Person

and now with function distinct

select Person, BIStaging.fn_DistinctSeparatedList(STRING_AGG(Pet, ', '),', ')
from #PetsOwner
group by Person
Agra answered 13/3 at 10:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.