Optimal way to concatenate/aggregate strings

Asked 30/11, 2012 at 4:57 Answered 12/4, 2022 at 16:33

Solved sql sql-server aggregate-functions string-concatenation

139

I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE and FOR XML, but they just don't cut it for me.

String aggregation would do something like this:

id | Name                    Result: id | Names
-- - ----                            -- - -----
1  | Matt                            1  | Matt, Rocks
1  | Rocks                           2  | Stylus
2  | Stylus

I've taken a look at CLR-defined aggregate functions as a replacement for COALESCE and FOR XML, but apparently SQL Azure does not support CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.

Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but hey I'll take what I can get) that I can use to aggregate my stuff?

Strauss answered 30/11, 2012 at 4:57 Comment(7)

In what way does for xml not work for you? – Swear 3/12, 2012 at 6:40

It does work, but I took a look at the execution plan and each for xml shows a 25% usage in terms of query performance (a bulk of the query!) – Strauss 3/12, 2012 at 7:12

There are different ways of doing the for xml path query. Some faster than others. It could depend on your data but the ones using distinct is in my experience slower than using group by. And if you are using .value('.', nvarchar(max)) to get the concatenated values you should change that to .value('./text()[1]', nvarchar(max)) – Swear 3/12, 2012 at 8:36

Your accepted answer resembles my answer on #11137575 which I thought is faster than XML. Don't get fooled by query cost,you need ample data to see which is faster. XML is faster,which happens to be @MikaelEriksson's answer on the same question. Opt for XML approach – Uxorious 29/12, 2012 at 11:54

Please vote for a native solution for this here: connect.microsoft.com/SQLServer/feedback/details/1026336 – Goof 10/11, 2014 at 14:39

There's a useful performance analysis here: sqlperformance.com/2014/08/t-sql-queries/… BTW, the XML approach BLOWS-UP when your data has emojis or surrogate characters unless you jump through a number of hoops. – Anglian 9/5, 2017 at 7:10

So, @MikaelEriksson, in my comment above, I hope I answered your question -- at least for myself personally. – Anglian 9/5, 2017 at 7:10

SOLUTION

The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.

;WITH Partitioned AS
(
    SELECT 
        ID,
        Name,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
        COUNT(*) OVER (PARTITION BY ID) AS NameCount
    FROM dbo.SourceTable
),
Concatenated AS
(
    SELECT 
        ID, 
        CAST(Name AS nvarchar) AS FullName, 
        Name, 
        NameNumber, 
        NameCount 
    FROM Partitioned 
    WHERE NameNumber = 1

    UNION ALL

    SELECT 
        P.ID, 
        CAST(C.FullName + ', ' + P.Name AS nvarchar), 
        P.Name, 
        P.NameNumber, 
        P.NameCount
    FROM Partitioned AS P
        INNER JOIN Concatenated AS C 
                ON P.ID = C.ID 
                AND P.NameNumber = C.NameNumber + 1
)
SELECT 
    ID,
    FullName
FROM Concatenated
WHERE NameNumber = NameCount

EXPLANATION

The approach boils down to three steps:

Number the rows using OVER and PARTITION grouping and ordering them as needed for the concatenation. The result is Partitioned CTE. We keep counts of rows in each partition to filter the results later.
Using recursive CTE (Concatenated) iterate through the row numbers (NameNumber column) adding Name values to FullName column.
Filter out all results but the ones with the highest NameNumber.

Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).

I've quickly tested the solution on SQL Server 2012 with the following data:

INSERT dbo.SourceTable (ID, Name)
VALUES 
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')

The query result:

ID          FullName
----------- ------------------------------
2           Stylus
3           Bar, Baz, Foo
1           Matt, Rocks

Simulcast answered 3/12, 2012 at 10:50 Comment(8)

I checked the time consumption of this way against xmlpath and i reached about 4 milliseconds vs about 54 milliseconds. so the xmplath way is better specially in large cases. I'll write the compare code in an separate answer. – Hunker 19/2, 2014 at 9:4

It is far better since this approach only works for 100 values maximum. – Baldhead 15/10, 2014 at 11:35

@romano-zumbé Use MAXRECURSION to set the CTE limit to whatever you need. – Simulcast 15/10, 2014 at 22:31

Surprisingly, CTE was way slower for me. sqlperformance.com/2014/08/t-sql-queries/… compares a bunch of techniques, and seems to agree with my results. – Telephoto 28/11, 2016 at 14:5

This solution for a table with more than 1 million record doesn't work. Also, we have a limit on recursive depth – Colquitt 9/12, 2016 at 16:31

@ArdalanShahgholi you may want to limit your Range of data to be operated on. Otherwise, you will need to use an XML form for the concatenation. Recursive CTE's are layered evaluations, so a 1M record action would result in roughly 2M records as a result, before you do the final limitation. – Acentric 4/5, 2017 at 18:6

mssql in azure now has string_agg – Tenterhook 24/8, 2017 at 6:16

While I'm awestruck that you (or anyone) could come up with this solution, the XML version (first by "slachterman" and them much more read-ably by "QMaster" below) still (barring having SS 2017 and therefore String_Agg), IMHO, beats it hands down: 1) It's ~40X faster (for me SQL Server 2008 R2 on Windows Server 2008 R2 on Intel Xeon E5-2630 v4 @2.20 GHZ x2 w/ ~1 GB free w/ my sample), 2) (Perhaps more importantly) It avoids 2 extra copies of almost the same code (in the sub-Selects). – Ibert 1/5, 2018 at 20:19

STRING_AGG() in SQL Server 2017, Azure SQL, and PostgreSQL: https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql

GROUP_CONCAT() in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

(Thanks to @Brianjorden and @milanio for Azure update)

Example Code:

select Id
, STRING_AGG(Name, ', ') Names 
from Demo
group by Id

SQL Fiddle: http://sqlfiddle.com/#!18/89251/1

Guarded answered 13/12, 2016 at 15:19 Comment(2)

I've just tested it and now it works fine with Azure SQL Database. – Spectacled 24/3, 2017 at 17:9

STRING_AGG got pushed back to 2017. It's not available in 2016. – Spancake 2/5, 2017 at 20:41

Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).

create table #t (id int, name varchar(20))

insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')

select  id
        ,Names = stuff((select ', ' + name as [text()]
        from #t xt
        where xt.id = t.id
        for xml path('')), 1, 2, '')
from #t t
group by id

Craving answered 8/12, 2012 at 22:59 Comment(8)

Don't forget to put an index on that id column once the size of a table becomes a problem. – Unblock 9/12, 2012 at 15:15

And after reading how stuff/for xml path work (https://mcmap.net/q/73946/-how-stuff-and-39-for-xml-path-39-work-in-sql-server), I'm confident that it's a good solution despite XML in its name :) – Telephoto 28/11, 2016 at 14:9

@slackterman Depends on the number of records to be operated on. I think XML is deficient at the low counts, compared to CTE, but at the upper volume counts, alleviates the Recursion Dept limitation and is easier to navigate, if done correctly and succinctly. – Acentric 4/5, 2017 at 18:9

FOR XML PATH methods blow up if you have emojis or special / surrogate characters in your data!!! – Anglian 9/5, 2017 at 5:46

This code results in xml-encoded text (& switched to &, and so on). A more correct for xmlsolution is provided here. – Kirtley 3/5, 2019 at 21:20

Can you do this with JSON? – Greco 11/12, 2019 at 6:56

I've tried without temporary table and if it's fast for simple query but it's very slow with complex query. At the opposite if I put the result of my complex query, like this answer, in a temporary table it's very fast. – Inhalant 12/6, 2020 at 9:37

When I wrote about "temporary table", I meant "intermediate table". I didn't test with #temporary table. – Inhalant 3/12, 2021 at 10:5

Although @serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself. This is @serge way:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;

set nocount on;

declare @YourTable table (ID int, Name nvarchar(50))

WHILE @counter < 1000
BEGIN
    insert into @YourTable VALUES (ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
    SET @counter = @counter + 1;
END

SET @startTime = GETDATE()

;WITH Partitioned AS
(
    SELECT 
        ID,
        Name,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
        COUNT(*) OVER (PARTITION BY ID) AS NameCount
    FROM @YourTable
),
Concatenated AS
(
    SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1

    UNION ALL

    SELECT 
        P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
    FROM Partitioned AS P
        INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT 
    ID,
    FullName
FROM Concatenated
WHERE NameNumber = NameCount

SET @endTime = GETDATE();

SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 54 milliseconds

And this is xmlpath way:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;

set nocount on;

declare @YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))

WHILE @counter < 1000
BEGIN
    insert into @YourTable VALUES (@counter, ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
    SET @counter = @counter + 1;
END

SET @startTime = GETDATE();

set nocount off
SELECT
    t1.HeaderValue
        ,STUFF(
                   (SELECT
                        ', ' + t2.ChildValue
                        FROM @YourTable t2
                        WHERE t1.HeaderValue=t2.HeaderValue
                        ORDER BY t2.ChildValue
                        FOR XML PATH(''), TYPE
                   ).value('.','varchar(max)')
                   ,1,2, ''
              ) AS ChildValues
    FROM @YourTable t1
    GROUP BY t1.HeaderValue

SET @endTime = GETDATE();

SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 4 milliseconds

Hunker answered 19/2, 2014 at 9:14 Comment(3)

+1, you QMaster (of the Dark Arts) you! I got an even more dramatic diff. (~3000 msec CTE vs. ~70 msec XML on SQL Server 2008 R2 on Windows Server 2008 R2 on Intel Xeon E5-2630 v4 @2.20 GHZ x2 w/ ~1 GB free). Only suggestions are: 1) Either use OP's or (preferably) generic terms for both versions, 2) Since OP's Q. is how to "concatenate/aggregate strings" and this is only needed for strings (vs. a numeric value), generic terms are too generic. Just use "GroupNumber" and "StringValue", 3) Declare and use a "Delimiter" Variable and use "Len(Delimiter)" vs. "2". – Ibert 1/5, 2018 at 19:55

+1 for not expanding special character to XML encoding (e.g. '&' doesn't get expanded to '&' like in so many other inferior solutions) – Gould 27/8, 2019 at 16:9

I must say... SCORE! In my tests, this reduced the time my query took from 2.3 seconds, to 25 ms (2600 rows returned from a 80K or so source rows..) Thanks! SQL Server 2016, Enterprise... – Houseleek 24/6, 2022 at 22:45

Update: Ms SQL Server 2017+, Azure SQL Database

You can use: STRING_AGG.

Usage is pretty simple for OP's request:

SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id

Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. @hrobky mentioned this previously as a SQL Server 2016 feature at the time.

--- Old Post: Not enough reputation here to reply to @hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..

Licensee answered 19/12, 2016 at 14:10 Comment(3)

I've just tested it and it works like a charm in Azure SQL Database – Spectacled 24/3, 2017 at 17:7

STRING_AGG() is stated to become available in SQL Server 2017, in any compatibility level. learn.microsoft.com/en-us/sql/t-sql/functions/… – Myrilla 3/5, 2017 at 14:32

Yes. STRING_AGG is not available in SQL Server 2016. – Intractable 6/12, 2017 at 14:6

You can use += to concatenate strings, for example:

declare @test nvarchar(max)
set @test = ''
select @test += name from names

if you select @test, it will give you all names concatenated

Intersex answered 8/3, 2018 at 15:20 Comment(4)

Please specify SQL dialect or version since when is it supported. – Guarded 11/7, 2018 at 11:56

This works in SQL Server 2012. Note that a comma-separated list can be created with select @test += name + ', ' from names – Petrillo 8/2, 2019 at 21:0

This uses undefined behavior, and is not safe. This is especially likely to give a strange/incorrect result if you have an ORDER BY in your query. You should use one of the listed alternatives. – Magallanes 12/11, 2019 at 20:27

This type of query was never defined behavior, and in SQL Server 2019 we found it to have the incorrect behavior more consistently than in prior versions. Don't use this approach. – Fee 2/7, 2020 at 12:40

I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:

DECLARE @STRUCTURED_VALUES TABLE (
     ID                 INT
    ,VALUE              VARCHAR(MAX) NULL
    ,VALUENUMBER        BIGINT
    ,VALUECOUNT         INT
);

INSERT INTO @STRUCTURED_VALUES
SELECT   ID
        ,VALUE
        ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
        ,COUNT(*) OVER (PARTITION BY ID)    AS VALUECOUNT
FROM    RAW_VALUES_TABLE;

WITH CTE AS (
    SELECT   SV.ID
            ,SV.VALUE
            ,SV.VALUENUMBER
            ,SV.VALUECOUNT
    FROM    @STRUCTURED_VALUES SV
    WHERE   VALUENUMBER = 1

    UNION ALL

    SELECT   SV.ID
            ,CTE.VALUE + ' ' + SV.VALUE AS VALUE
            ,SV.VALUENUMBER
            ,SV.VALUECOUNT
    FROM    @STRUCTURED_VALUES SV
    JOIN    CTE 
        ON  SV.ID = CTE.ID
        AND SV.VALUENUMBER = CTE.VALUENUMBER + 1

)
SELECT   ID
        ,VALUE
FROM    CTE
WHERE   VALUENUMBER = VALUECOUNT
ORDER BY ID
;

Mockery answered 10/5, 2018 at 5:44 Comment(0)

Try this, i use it in my projects

DECLARE @MetricsList NVARCHAR(MAX);

SELECT @MetricsList = COALESCE(@MetricsList + '|', '') + QMetricName
FROM #Questions;

Alcoran answered 12/4, 2022 at 16:33 Comment(1)

Ok... so post the code to do it do more than 1 rowset at a time, which is what the OP needs. Note... forget it if it's a scalar or mTVF function. :D – Emoryemote 26/7, 2022 at 0:32

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Example Code:

Recommended topics

Hot tags