How can I find groups of records that match other groups of records (relational division?)
Asked Answered
O

2

6

For setting consolidated account handling, I want to find out accounts that have "exactly the same" set of owners.

I think it might work to pivot the owners with dynamic sql, then use ranking functions, but I don't want to pursue that approach; I don't have an upper limit on how many names can be associated with a given account, so I want to avoid dynamic SQL.

My data (also this is at http://www.sqlfiddle.com/#!3/1d36e)

 CREATE TABLE allacctRels
 (account INT NOT NULL,
 module CHAR(3) NOT NULL,
 custCode CHAR(20) NOT NULL)


 INSERT INTO allacctrels
 (account, module, custCode)
 VALUES
 (1, 'DDA', 'Wilkie, Walker'),
 (1, 'DDA', 'Houzemeal, Juvy'),
 (2, 'CDS', 'Chase, Billy'),
 (2, 'CDS', 'Norman, Storm'),
 (3, 'CDS', 'Chase, Billy'),
 (3, 'CDS', 'Norman, Storm'),
 (7, 'CDS', 'Perkins, Tony'),
 (15, 'SVG', 'Wilkie, Walker'), --typo in name before mwigdahl's response
 (16, 'SVG', 'Wilkie, Walker'), -- corrected typo here too
 (606, 'DDA', 'Norman, Storm'),
 (606, 'DDA', 'Chase, Billy'),-- corrected 2nd typo found 
 (4, 'LNS', 'Wilkie, Walker'),
 (4, 'LNS', 'Houzemeal, Juvy'),
 (44, 'DDA', 'Perkins, Tony'),
 (222, 'DDA', 'Wilkie, Walker'),
 (222, 'DDA', 'Houzemeal, Juvy'),
 (17, 'SVG', 'Wilkie, Walker'), -- added these three rows in edit, SVG 17 doesn't match any dda 
 (17, 'SVG', 'Welch, Raquel'),
 (17, 'SVG', 'Houzemeal, Juvy')

I want to find out, for each MODULE-ACCOUNT, what the lowest DDA account is that has the exact same owners associated with it.

In the sample data, I would want these results, the third column being the lowest DDA account that has the same owners. The results should have the same number of rows as therea re module/account combos - one row per each row in "SELECT DISTINCT module, account FROM allAcctRels")

1, DDA, 1
2, CDS, 606
3, CDS, 606
15, SVG, NULL
16, SVG, NULL
606, DDA, 606
4, LNS, 1
7, CDS, 44
44, DDA, 44
222, DDA, 1
17, SVG, NULL -- added to original post.

SVG 15 and 16 don't match any DDA account, so it doesn't matter that they match each other, they get NULL for the account to consolidate to. EDIT: SVG 17 doesn't match anything, even though there is a DDA acct that has all its holders on in SVG 17, the combination of holders in SVG 17 doesn't occur for any one DDA acct. Every DDA account will match itself, unless a dda account with the same owners and lower DDA exists (as is the case for DDA 222).

I can see that one general approach is to pivot each account, group the pivoted table, and use row_number. Given the unbounded number of holders associated with each account, I think the pivoting would take dynamic SQL I'd rather avoid.

It seems to me that this is a "relational divsion" problem, with the relational division probably being "fed" by a CROSS APPLY. I tried writing a function that would take a table of account holders associated with a specific account and find the lowest dda account, along the lines shown below, the idea being to see if all the the number of people in a given account is the same as the number of people when that account is joined to a given dda account, but I can't figure out how to "feed" tables of account numbers in to the function.

-- this is what I tried but I'm not sure it the logic would work
-- and I can't figure out how to pass the account holders for each
-- account in.  This is a bit changed from the function I wrote, some
    -- extraneous fields removed and cryptic column names changed.  So it 
    -- probably won't run as is.

    -- to support a parameter type to a tape
-- CREATE type VisionCustomer as Table
-- (customer varchar(30))

CREATE FUNCTION  consolidatable 
(@custList dbo.VisionCustomer READONLY)
RETURNS char(10)
AS  
BEGIN
DECLARE @retval Varchar(10)
DECLARE @howmany int
select @howmany=Count(*) FROM @custlist;

SELECT @retval = min (acct) FROM allAcctRels
    JOIN @custlist
        On VendorCustNo = Customer
            WHERE acctType = 'DDA'
            GROUP BY acct
            HAVING (count(*) = @howmany)
            and
            COUNT(*) = (select Count(*) FROM allAcctRels X
    WHERE X.acctType = 'DDA'
    AND X.account = AllAcctRels.account) ;
RETURN @retval
END;
Ordeal answered 10/4, 2012 at 19:8 Comment(2)
Note that "Chase, Billie" in the 606 DDA row doesn't jibe with the resultset you say you want back; I think you want this to be "Chase, Billy", right?Gossip
Yes, that is correct, sorry about that, and thank you, I'll re-edit nowOrdeal
D
2

I believe this is what you are looking for ( http://www.sqlfiddle.com/#!3/f96c5/1 ):

;WITH AccountsWithOwners AS
(
  SELECT DISTINCT
    DA.module
    , DA.account
    , STUFF((SELECT 
                 ',' + AAR.custCode
               FROM allacctRels AAR 
               WHERE AAR.module = DA.module 
                 AND AAR.account = DA.account
               ORDER BY AAR.custCode
               FOR XML PATH(''))
              , 1, 1, '') AS Result
  FROM allacctRels DA
) 
, WithLowestDda AS
(
    SELECT
        AWO.module
        , AWO.account
        , MatchingAccounts.account AS DdaAccount
        , ROW_NUMBER() OVER(PARTITION BY AWO.module, AWO.account ORDER BY MatchingAccounts.account) AS Row
    FROM AccountsWithOwners AWO
    LEFT JOIN AccountsWithOwners MatchingAccounts
        ON MatchingAccounts.module = 'DDA'
        AND MatchingAccounts.Result = AWO.Result
)
SELECT
    account
    , module
    , DdaAccount
FROM WithLowestDda
WHERE Row = 1
Decrement answered 10/4, 2012 at 21:47 Comment(2)
This looks like a winner, it got in just as I have to start some backups and head home, I'll look at it more tonight but I see it gets correct results on my test data. I'm not familiar with the FOR XML, so I'll want to figure out how it works before accepting.Ordeal
@LevinMagruder It's basically making XML that has no actual markup and is used to create a comma separated list. Let me know if you have any questions.Decrement
G
1

This actually turns out to be pretty simple, if I understand you correctly. Try this:

SELECT a.account, a.module, MIN(b.account) 
FROM allacctRels a
    LEFT JOIN allacctRels b ON a.custCode = b.custCode AND b.module = 'DDA'
GROUP BY a.account, a.module

EDIT: The above doesn't work after the clarifications, but this should. It is indeed a type of relational division. Probably not the most efficient query plan in the world, but it works.

SELECT a.account, a.module, MIN(b.account)
FROM allacctRels a
    LEFT JOIN allacctRels b ON b.module = 'DDA'
    AND
    -- first test is to confirm that the number of matching names for this combination equals the number of names for the DDA set...
    (
        SELECT COUNT(*) 
        FROM allacctRels b2 
            INNER JOIN allacctRels a2 ON b2.custCode = a2.custCode 
        WHERE a.account = a2.account AND b.account = b2.account
    ) = 
    (
        SELECT COUNT(*) 
        FROM allacctRels b2 
        WHERE b.account = b2.account
    )
    AND 
    -- second test is to confirm that the number of names for the DDA set equals the number of names for the base set...
    (
        SELECT COUNT(*) 
        FROM allacctRels b2 
        WHERE b.account = b2.account
    ) = 
    (
        SELECT COUNT(*) 
        FROM allacctRels a2 
        WHERE a.account = a2.account
    )
GROUP BY a.account, a.module
Gossip answered 10/4, 2012 at 19:18 Comment(6)
+1; this is the first thing that popped into my head when I was looking at this.Meaning
But here you preserve all rows that have any match at all in on the 'b' (DDA) side, don't you? E.g., if I add "Welch Raquel" to every non-DDA account, the results for Non-dda accts won't change but there shouldn't be any matches, since no non-DDA acct has Raquel Welch. I had two typos in "Walker Wilkie" in my sample data that made this query give the same results as my example, and I didn't have any examples in my data of accounts with an "extra" person that shouldn't match. I've updated sql fiddle and I'll update examples above, too.Ordeal
Thanks for clarifying. I'm looking at it again.Gossip
Updated answer, should be what you're looking for now.Gossip
I had to introduce module to the where clauses (sqlfiddle.com/#!3/8d7c0/4, to avoid error shown in sqlfiddle.com/#!3/cbb1b/1). The results look good and I think I get how it works, but it's not immediately obvious how the min(b.account) is getting tied to being an account that "goes with" the a.account, since the LEFT JOIN is just putting all DDA accts with every record on the left hand side. – Levin Magruder 6 mins agoOrdeal
That's not what the LEFT JOIN is doing, though. The LEFT JOIN is there only so that you will get a record back even for accounts that don't match any DDA subgroup. If there is at least one match, it will give you only the DDA subgroups that exactly match the custCode set for a given account. MIN(b.account) simply collapses those returned subgroups and gives you the lowest ID.Gossip

© 2022 - 2024 — McMap. All rights reserved.