How to group-concatenate multiple columns?
Asked Answered
U

3

7

Assume this table:

PruchaseID | Customer | Product  | Method
-----------|----------|----------|--------
 1         | John     | Computer | Credit
 2         | John     | Mouse    | Cash
 3         | Will     | Computer | Credit
 4         | Will     | Mouse    | Cash
 5         | Will     | Speaker  | Cash
 6         | Todd     | Computer | Credit

I want to generate a report on each customer of what they bought, and their payment methods.
But I want that report to be one row per customer, such as:

Customer | Products                 | Methods
---------|--------------------------|--------------
 John    | Computer, Mouse          | Credit, Cash
 Will    | Computer, Mouse, Speaker | Credit, Cash
 Todd    | Computer                 | Credit

What I've found so far is to group-concatenate using the XML PATH method, such as:

SELECT
    p.Customer,
    STUFF(
        SELECT ', ' + xp.Product
        FROM Purchases xp
        WHERE xp.Customer = p.Customer
        FOR XML PATH('')), 1, 1, '') AS Products,
    STUFF(
        SELECT ', ' + xp.Method
        FROM Purchases xp
        WHERE xp.Customer = p.Customer
        FOR XML PATH('')), 1, 1, '') AS Methods
FROM Purchases

This gives me the result, but my concern is the speed of this.
At first glance there are three different selects going on here, two would each multiply by the number of rows Purchases has. Eventually this would slow down expenentially.

So, is there a way to do this with better performance?
I want to add even more columns to aggregate, should I do this STUFF() block for every column? That doesn't sound fast enough for me.

Siggestions?

Underscore answered 7/6, 2016 at 16:24 Comment(2)
Well you are denormalizing your data to do this so performance is going to be a potential challenge. The XML method is the best way of denormalizing data into a delimited list.Scott
Be careful when using for xml path, it might surprise you if you have for example & in your data. Aaron Bertrand has made a comparison of different methods you might want to check out.Blubber
A
5

Just an idea:

DECLARE @t TABLE (
    Customer VARCHAR(50),
    Product VARCHAR(50),
    Method VARCHAR(50),
    INDEX ix CLUSTERED (Customer)
)

INSERT INTO @t (Customer, Product, Method)
VALUES
    ('John', 'Computer', 'Credit'),
    ('John', 'Mouse', 'Cash'),
    ('Will', 'Computer', 'Credit'),
    ('Will', 'Mouse', 'Cash'),
    ('Will', 'Speaker', 'Cash'),
    ('Todd', 'Computer', 'Credit')

SELECT t.Customer
     , STUFF(CAST(x.query('a/text()') AS NVARCHAR(MAX)), 1, 2, '')
     , STUFF(CAST(x.query('b/text()') AS NVARCHAR(MAX)), 1, 2, '')
FROM (
    SELECT DISTINCT Customer
    FROM @t
) t
OUTER APPLY (
    SELECT DISTINCT [a] = CASE WHEN id = 'a' THEN ', ' + val END
                  , [b] = CASE WHEN id = 'b' THEN ', ' + val END
    FROM @t t2
    CROSS APPLY (
        VALUES ('a', t2.Product)
             , ('b', t2.Method)
    ) t3 (id, val)
    WHERE t2.Customer = t.Customer
    FOR XML PATH(''), TYPE
) t2 (x)

Output:

Customer   Product                    Method     
---------- -------------------------- ------------------
John       Computer, Mouse            Cash, Credit
Todd       Computer                   Credit
Will       Computer, Mouse, Speaker   Cash, Credit

Another idea with more performance benefits:

IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
    DROP TABLE #EntityValues

DECLARE @Values1 VARCHAR(MAX)
      , @Values2 VARCHAR(MAX)

SELECT Customer
     , Product
     , Method
     , RowNum = ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY 1/0)
     , Values1 = CAST(NULL AS VARCHAR(MAX))
     , Values2 = CAST(NULL AS VARCHAR(MAX))
INTO #EntityValues
FROM @t

UPDATE #EntityValues
SET 
      @Values1 = Values1 =
        CASE WHEN RowNum = 1 
            THEN Product
            ELSE @Values1 + ', ' + Product 
        END
    , @Values2 = Values2 = 
        CASE WHEN RowNum = 1 
            THEN Method
            ELSE @Values2 + ', ' + Method
        END

SELECT Customer
      , Values1 = MAX(Values1) 
      , Values2 = MAX(Values2)
FROM #EntityValues
GROUP BY Customer

But with some limitations:

Customer      Values1                       Values2
------------- ----------------------------- ----------------------
John          Computer, Mouse               Credit, Cash
Todd          Computer                      Credit
Will          Computer, Mouse, Speaker      Credit, Cash, Cash

Also check my old post about string aggregation:

http://www.codeproject.com/Articles/691102/String-Aggregation-in-the-World-of-SQL-Server

Albaugh answered 7/6, 2016 at 16:59 Comment(7)
Good to know alternate way.Polymyxin
Hi Devart, I like that!Marilla
@Underscore added another idea about "how to..."Albaugh
@Albaugh -- this is not "just an idea"; this was a great answer and a great article. Thanks!Underscore
@Albaugh -- one question though.. in your article you specifically mentioned the using .query() is time-consuming. You still think it's best to use this method in my scenario? Because for me, I'm actually implementing this in a view, so I'm trying to avoid declarations, cursors, and variable-assigning. So I believe .query() or simple XML-concatenation works best for my scenario. Yes?Underscore
@Albaugh -- also, is there a minimum MSSQL version for the first method in your answer?Underscore
Both examples will work from SQL Server 2005+. All xquery methods are time-consuming so the main rule which I follow minimize the count of xquery calls in query. About the best solution - just try both and provide results. IMO I like ex #1. Thanks in advance ;)Albaugh
X
3

Another solution is the CLR method for group concatenation @aaron bertrand has done a performance comparison on this here. If you can deploy CLR then download the script from https://orlando-colamatteo.github.io/ms-sql-server-group-concat-sqlclr/ which is free. and all details are there in the documentation. Your query will just change into like this

SELECT Customer,dbo.GROUP_CONCAT(product),dbo.GROUP_CONCAT(method)
FROM Purchases
GROUP BY Customer

This query is short, easy to remember and use, XML method also does the job but remembering the code is a bit difficult(atleast for me) and creeps in the problem like XML entitization which can be solved sure and some pitfalls also described in his blog.

Also from a performance view point using .query is time consuming I had the same issues with performance. I hope you can find this question I raised here in https://dba.stackexchange.com/questions/125771/multiple-column-concatenation check the version 2 given by kenneth fisher a nested xml concatenation method or a unpivot /pivot method suggested by spaggettidba.

Xenogamy answered 8/6, 2016 at 20:33 Comment(0)
P
1

This is one of the use cases for recursive CTEs (Common Table Expressions). You can learn more here https://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

;
WITH CTE1 (PurchaseID, Customer, Product, Method, RowID)
AS
(
    SELECT 
        PurchaseID, Customer, Product, Method, 
        ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Customer)
    FROM
        @tbl 
        /* This table holds source data. I ommited declaring and inserting 
        data into it because that's not important. */
)
, CTE2 (PurchaseID, Customer, Product, Method, RowID)
AS
(
    SELECT 
        PurchaseID, Customer, 
        CONVERT(VARCHAR(MAX), Product), 
        CONVERT(VARCHAR(MAX), Method), 
        1
    FROM 
        CTE1 
    WHERE 
        RowID = 1
    UNION ALL
    SELECT 
        CTE2.PurchaseID, CTE2.Customer, 
        CONVERT(VARCHAR(MAX), CTE2.Product + ',' + CTE1.Product), 
        CONVERT(VARCHAR(MAX), CTE2.Method + ',' + CTE1.Method), 
        CTE2.RowID + 1 
    FROM 
        CTE2 INNER JOIN CTE1 
            ON CTE2.Customer = CTE1.Customer
            AND CTE2.RowID + 1 = CTE1.RowID
)

SELECT Customer, MAX(Product) AS Products, MAX(Method) AS Methods 
FROM CTE2 
GROUP BY Customer

Output:

Customer    Products                Methods
John        Computer,Mouse          Credit,Cash
Todd        Computer                Credit
Will        Computer,Mouse,Speaker  Credit,Cash,Cash
Polymyxin answered 7/6, 2016 at 17:26 Comment(1)
Hi, @JamesZ has posted above a link to a performance comparison. You might have a look on this. Your code works, but it is very low performing...Marilla

© 2022 - 2024 — McMap. All rights reserved.