GROUP BY another table that have been grouped with two sub query
Asked Answered
H

12

7

I have table like this

Table1

ID      |    Val         |     Val2       |
606541  |3175031503131004|3175032612900004|
606542  |3175031503131004|3175032612900004|
677315  |3175031503131004|3175032612980004|
222222  |1111111111111111|8888888888888888|
231233  |1111111111111111|3175032612900004|
111111  |9999992222211111|1111111111111111|
57      |3173012102121018|3173015101870020|
59      |3173012102121018|3173021107460002|
2       |900             |7000            |
4       |900             |7001            |

I have two condition with column Val and Val2. Show the result if the Val:

  1. Val column has at least two or more duplicate values AND
  2. Val2 column has no duplicate value (unique)

For example :

Sample 1

 ID      |    Val         |     Val2       |
 606541  |3175031503131004|3175032612900004|
 606542  |3175031503131004|3175032612900004|  
 677315  |3175031503131004|3175032612980004|

 False, because  even the Val column 
 had two or more duplicate but the Val2 
 had dulicate value (ID 606541  and 606542)

Sample Expected 1 Result

 No records

Sample 2

 ID      |    Val         |     Val2       |
 222222  |1111111111111111|8888888888888888|
 231233  |1111111111111111|3175032612900004|   
 111111  |9999992222211111|1111111111111111|

 True, Because the condition is match, 
 Val column had duplicate value AND Val2 had unique values

Sample 2 Expected Result

 ID      |    Val         |     Val2       |
 222222  |1111111111111111|8888888888888888|
 231233  |1111111111111111|3175032612900004|

Sample 3

 ID      |    Val         |     Val2       |
 606541  |3175031503131004|3175032612900004|
 606542  |3175031503131004|3175032612900004|
 677315  |3175031503131004|3175032612980004|
 222222  |1111111111111111|8888888888888888|     
 231233  |1111111111111111|3175032612900004|
 111111  |9999992222211111|1111111111111111|

 Note : This is false condition, Because even the value for id 606541, 606542, and
 677315 in column Val had duplicate value at least 
 two or more but the value in column Val2 had no unique value (it could be true condition if id 606541, 
 606542, and 677315 had 3 different value on Val2).

 NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column 
 Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match 
 the second condition which only have no duplicate value

Sample 3 Expected Result

 No records

Now back to Table1 in the earlier, i tried to show result from the two condition with this query

SELECT
tb.* FROM table1 tb 
WHERE
    tb.Val2 IN (
    SELECT ta.Val2 
    FROM (
        SELECT
            t.* 
        FROM
            table1 t 
        WHERE
            t.Val IN ( 
            SELECT Val FROM table1 
            GROUP BY Val 
            HAVING count( Val ) > 1 ) 
        ) ta 
    GROUP BY
        ta.Val2 
    HAVING
    count( ta.Val2 ) = 1 
    )

The result

ID         Val                   Val2
677315  3175031503131004    3175032612980004
222222  1111111111111111    8888888888888888
57      3173012102121018    3173015101870020
59      3173012102121018    3173021107460002
2       900                  7000            
4       900                  7001 

While i expect the result was like this:

ID         Val                   Val2
57  3173012102121018    3173015101870020
59  3173012102121018    3173021107460002
2       900             7000            
4       900             7001            

Is there something wrong with my query ?

Here is my DB Fiddle.

Hegelianism answered 6/7, 2020 at 22:36 Comment(8)
The way you’ve written the query as nested sub queries, it’s only checking that val2 is singular in the dataset returned from the check that val1 has multiples. Instead of nesting the sub worries try writing them as two separate subqueries ANDed in the WHERE clause.Episiotomy
Long strings don't aid clarityFrequentation
is there any references for my case @Episiotomy ?Hegelianism
Not sure what you mean?Episiotomy
I mean i dont know how to do your advice with my condition. Is there any link or references for that ?Hegelianism
In many cases nested conditions will return the same results as ANDed conditions, however in this case because your conditions are supposed to be based on counts from the total rowset, they will return different results.Episiotomy
Do you know how to use AND?Episiotomy
i do know but i dont know how to do based on your advice.Hegelianism
S
7

Excuse for any mistakes as this would be my first answer in this forum. Could you also try with below, i agree to the answer with window function though.

SELECT t.*
FROM   table1 t 
WHERE  t.val IN (SELECT val 
                   FROM table1 
                 GROUP BY val 
                 HAVING COUNT(val) > 1 
                    AND COUNT(val) = COUNT(DISTINCT val2)
                 )
AND    t.val NOT IN (SELECT t.val
                     FROM   table1 t
                     WHERE  EXISTS (SELECT 1
                             FROM   table1 tai
                             WHERE  tai.id != t.id
                             AND    tai.val2 = t.val2));

/* first part of where clause makes sure we have distinct values in column val2 for repeated value in column val

second part of where clause with not in tells us there is no value shares across different ids with respect to value in column val2 */

--reverse order query ( not sure gives the expected result)

SELECT t.*
FROM   table2 t
WHERE  t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND    t.val2 IN (SELECT t.val2
                  FROM   table2 ta
                  WHERE  EXISTS (SELECT 1
                          FROM   table2 tai
                          WHERE  tai.id != ta.id
                          AND    tai.val = ta.val));
Spindell answered 12/7, 2020 at 8:3 Comment(6)
do you know how could i do it reverse ? like val colum is unique and val2 is duplicate ?Hegelianism
@Gagantous: If i understood you and the conditions remains unchanged wrt original question, just replacing val -> val2 and val2 -> val in the query I posted should do the work. or did I not get the point ?Spindell
i cant do that because the process should be in this order : val -> val2, if i used your suggestion, it would be end up with this order val2 -> valHegelianism
Pardon me as i certainly cannot imagine the output data. Could you try with the second query i added to the answer ?Spindell
the aggregate will be different if i used with this order val2 -> valHegelianism
Great!! Nice to hear that. Cheers :)Spindell
C
7

You have to use Group By to find val & val2 with duplicate values and need to use Inner Join and Left Join in order to include/eliminate records as given conditions (oppose to IN, NOT IN etc. clauses that might cause performance issues in case you're dealing with large data).

Please find the query below:

select t1.*from table1 t1 left join
      (select val from table1
       where val2 in (select val2 from table1 group by val2 having count(id) > 1)
        ) t2
 on t1.val = t2.val
 inner join
     (select val from table1 group by val having count(id) >1) t3
     on t1.val = t3.val
 where t2.val is null

Query for Reverse Condition:

select t1.*from table1 t1 inner join
       (select val from table1 group by val having count(id) = 1)
         t2
 on t1.val = t2.val
 inner join
     (select val2 from table1 group by val2 having count(id) >1) t3
     on t1.val2 = t3.val2

Please find fiddle for both queries here.

Cindicindie answered 10/7, 2020 at 18:35 Comment(6)
can u add expected output in case of val colum is unique and val2 is duplicate?Cindicindie
I believe with 'val colum is unique and val2 is duplicate' condition & val->val2 order, you'll get 0 rows for the given data sample. Am I correct?Cindicindie
yes that's right the result would be 0 rows if i used this condition val colum is unique and val2 is duplicateHegelianism
thank you for the clarification. I have added query for the same & fiddle link for both the cases. Please validate & let me know whether it's working or not.Cindicindie
Also...please accept the answer & award the Bounty to the answer that you think resolved your issue in the best way. Thank you.Cindicindie
@Hegelianism Thank you for rewarding the Bounty. You can tag me in the comments for any SQL related questions.Cindicindie
S
7

Excuse for any mistakes as this would be my first answer in this forum. Could you also try with below, i agree to the answer with window function though.

SELECT t.*
FROM   table1 t 
WHERE  t.val IN (SELECT val 
                   FROM table1 
                 GROUP BY val 
                 HAVING COUNT(val) > 1 
                    AND COUNT(val) = COUNT(DISTINCT val2)
                 )
AND    t.val NOT IN (SELECT t.val
                     FROM   table1 t
                     WHERE  EXISTS (SELECT 1
                             FROM   table1 tai
                             WHERE  tai.id != t.id
                             AND    tai.val2 = t.val2));

/* first part of where clause makes sure we have distinct values in column val2 for repeated value in column val

second part of where clause with not in tells us there is no value shares across different ids with respect to value in column val2 */

--reverse order query ( not sure gives the expected result)

SELECT t.*
FROM   table2 t
WHERE  t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND    t.val2 IN (SELECT t.val2
                  FROM   table2 ta
                  WHERE  EXISTS (SELECT 1
                          FROM   table2 tai
                          WHERE  tai.id != ta.id
                          AND    tai.val = ta.val));
Spindell answered 12/7, 2020 at 8:3 Comment(6)
do you know how could i do it reverse ? like val colum is unique and val2 is duplicate ?Hegelianism
@Gagantous: If i understood you and the conditions remains unchanged wrt original question, just replacing val -> val2 and val2 -> val in the query I posted should do the work. or did I not get the point ?Spindell
i cant do that because the process should be in this order : val -> val2, if i used your suggestion, it would be end up with this order val2 -> valHegelianism
Pardon me as i certainly cannot imagine the output data. Could you try with the second query i added to the answer ?Spindell
the aggregate will be different if i used with this order val2 -> valHegelianism
Great!! Nice to hear that. Cheers :)Spindell
E
4

Can you try this and let me know the results? SQL fiddle

SELECT t1.id, t1.val, t1.val2 FROM table1 t1
JOIN (
  select val from
  (select id, val, val2 from table1 group by val2 having count(1) = 1) a
  group by a.val having count(1) > 1
)t2 on t1.val = t2.val;
Elenaelenchus answered 11/7, 2020 at 16:24 Comment(0)
D
3

you can use group by :

select * from (select * from #table1 where Val2 in (select Val2 val from #table1 group by Val2 having COUNT(*) =1 )) select1
         where select1.val in  (select Val val from #table1 group by Val having COUNT(*) >1)

or you can use RANK :

 select * from  ( SELECT 
     i.id,
    i.Val val,
    RANK() OVER (PARTITION BY i.val ORDER BY i.id DESC) AS Rank1,
        RANK() OVER (PARTITION BY i.val2 ORDER BY i.id DESC) AS Rank2
FROM #table1 AS i 

) select1 where  select1.Rank1 >1 or select1.Rank2 =2 
Dapple answered 7/7, 2020 at 7:1 Comment(7)
why there is table2 in there ? i only used one tableHegelianism
why CTE_select2 didnt calculate column Val2 ? i dont see any Val2 aggregate in thereHegelianism
got an error dbfiddle.uk/…Hegelianism
I used SQL server, not my SQL: dbfiddle.uk/…Dapple
the result still not what i expect, see my expected resultHegelianism
code only answers are frowned upon on SO. Please provide explanation highlighting which parts of your answer addresses OP's issue, and why. For long term value, future visitors should be able to learn from your answer to apply to their own coding issues. Quality answers keeps quality of SO high, prevents "haz me the codz" Q/A, and increases usefulness which also leads to more upvotes. Code is rather terse and easy to misinterpret, so natural language can help break through all that, leading to quick understanding. Please consider editing to add more info.Mayemayeda
@Hegelianism did you check my answer?Dapple
T
3

You don't need group by or having. Sub-selects will do the job just fine.

SELECT * FROM MyTable a
WHERE  (SELECT Count(*) FROM MyTable b WHERE  a.val = b.val) >= 2 
AND (SELECT Count(*)  FROM MyTable c WHERE  a.val2 = c.val2) = 1;

This looks at the table as if it was 3 identical tables, but it's just one. The first sub select

(SELECT Count(*) FROM MyTable b WHERE a.val = b.val)

returns a number containing how many occurrences of "Val" are in the table; if there are at least 2 we're good to go. The second sub select

(SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2)

returns a number containing how many occurrences of "Val2" are in the table; if it's 1 and the first sub select returns at least 2 then we print the record.

Tergiversate answered 14/7, 2020 at 22:1 Comment(0)
F
2

If you want a solution, i think this will help.

I got the val2s which has no duplicates vals which has more than 1 duplicates and join

Select t.* from 
table1 t
inner join 
(Select val2 from table1 group by val2 having count(*) = 1) tv2 on t.val2 = tv2.val2
inner join 
(Select val from table1 group by val having count(*) > 1) tv on t.val = tv.val; 
Fluecure answered 7/7, 2020 at 2:44 Comment(3)
this is same with my query result, see this fiddle dbfiddle.uk/…. see the end of the question, i added the expected resultHegelianism
Your fiddle data and the data in the question samples do not match. Foe ex. Id = 2,4 are not in fiddle.Fluecure
still same results, i have updated my fiddle, here is fiddle of your query syntax dbfiddle.uk/…Hegelianism
S
2

You can do it with EXISTS and NOT EXISTS.

If you want only the column Val:

select t1.val from table1 t1
where not exists (
  select 1 from table1 
  where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
group by t1.val
having count(t1.val) > 1

If you want full rows:

select t1.* from table1 t1
where exists (select 1 from table1 where id <> t1.id and val = t1.val)
and not exists (
  select 1 from table1 
  where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)

And one solution with window functions for MySql 8.0+:

select t.id, t.val, t.val2
from (
  select *, max(counter2) over (partition by val) countermax
  from (
    select *,
      count(*) over (partition by val) counter,
      count(*) over (partition by val2) counter2
    from table1
  ) t
) t 
where t.counter > 1 and t.countermax = 1 

See the demo.

Substituent answered 7/7, 2020 at 10:51 Comment(14)
I know this query seems work, but why count val2 that more than 1 ? i thought the duplicate value is column val, not column val2Hegelianism
select val2 from table1 group by val2 having count(*) > 1 returns all the val2s that are duplicated and these are the ones that must be excluded.Substituent
OHHHHH i didnt realizedHegelianism
i have another question, where is the query that search for column val that had duplicate value more than 1 ? i couldnt find itHegelianism
This is what EXISTS does: WHERE exists (select 1 from table1 where id <> t1.id and val = t1.val)Substituent
how could i do it reverse ? i mean if Val had no duplicates value and Val2 had duplicate value more than 1 ?Hegelianism
i have tried this, but it's run very slow on table with huge data ( i have 1 Milion data ), it tooks 3000S to processHegelianism
i suggested you to mark this as answer, the problem with the speed on query are the other problem.Hirokohiroshi
@FachryDzaky i will wait for another answer, if there is no other answer, i will give the bounty to forpassHegelianism
if you run this on mysql, i suggest you to make index or make it in store procedure if the query run slowHirokohiroshi
@FachryDzaky i just want to see other people answer that's all, maybe there is better approach or some interesting way to solve my problem, i did upvoted the answer thoHegelianism
no, i meant, if thee issues also about performance of run query (because there's a lot of data) i suggest you to also made index, see drupal.org/docs/7/guidelines-for-sql/…Hirokohiroshi
my apologize, i misunderstood the context of your comment beforeHegelianism
I'm skeptical of the performance of NOT EXISTS ( ... IN ( ... HAVING ) ). I worry that the SELECT .. HAVING will be repeatedly be executed. (Can you get the EXPLAIN?) suggest you try to turn it inside out so the SELECT ... HAVING is performed only once.Proverbs
K
2

Common Table Expressions may help readability and perhaps performance as well.

with dup as (select  val, count(*) -- two or more of val
             from table1
             group by val
             having count(*)>1)   
select  tb1.* 
from table1 tb1
 inner join dup
  on dup.val = tb1.val
where not exists (select val2, count(*) -- Not exists is generally fast
                  from table1
                  where val = tb1.val
                  group by 1
                  having count(*) > 1)

Fiddle

Keirakeiser answered 13/7, 2020 at 21:48 Comment(0)
D
2

I'm going through your dataset at the moment, and I feel like your final result is accurate when you compare the results to your original dataset. Your criteria used are:

  1. Val is duplicated at least once
  2. Val2 is unique

9999992222211111 is the only unique value in the Val list, so that's the only value I don't expect to see in the final result. For Val2, the only duplicated value is 3175032612900004, so I don't expect to see in the final result.

What it sounds like you're trying to do is to apply the original conditions to your final result table (which is different from your original data table). If that's what you're after, you can go through the same process applied to the original table to your new table, in which you'll get the exact result you want.

I've taken that and included all of this in my fiddle below. You'll see two output queries, one with the result you're seeing, and one with the result you want. Let me know if this answers your question! =)

Here's my fiddle: fiddle

Dredi answered 14/7, 2020 at 23:51 Comment(0)
P
2

The answer to your query

Is there something wrong with my query ?

is in your Note 2 of Sample 3

NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match the second condition which only have no duplicate value

You are not eliminating the records where Val2 is duplicate with another record outside the set. So, all you need to do in your query is to add the below condition

AND tb.Val NOT IN (SELECT t.Val
               FROM table1 t 
               WHERE t.Val2 IN (SELECT Val2 FROM table1 GROUP BY Val2 HAVING count( Val2 ) > 1 ))

I have added this condition to your query and see the expected results. See fiddle below

My Fiddle

The answer given by @Govind feels like a better re-write of your requirements. It is checking for the duplicates of Val column only when there are no duplicates in Val2 column. Very neat and concise query.

Answer by Govind

Potbellied answered 15/7, 2020 at 8:9 Comment(0)
W
1

Something like this?

SELECT *
  FROM table1
 WHERE val IN
       (SELECT val
          FROM table1
         GROUP BY val
        HAVING COUNT(*) > 1 AND COUNT(DISTINCT val2) = COUNT(*))
   AND val NOT IN (SELECT t.val
                     FROM table1 t
                    INNER JOIN (SELECT val2
                                 FROM table1
                                GROUP BY val2
                               HAVING COUNT(*) > 1) x
                       ON x.val2 = t.val2);
Warfore answered 15/7, 2020 at 13:57 Comment(0)
N
0
`select val, count(*) from table1 group by val having count(*)>=2;`

`val                count(*)`

`1111111111111111   2`

`3173012102121018   2`

`3175031503131004   3`

`900                2`
  1. Val column has at least two or more duplicate values - TRUE

select val2, count(*) from table1 group by val2 having count(*)>1;

`val2   count(*)`
`3175032612900004   3`
  1. Val2 column has no duplicate value (unique) - FALSE

So ideally you should get no records found right?

Neolatin answered 14/7, 2020 at 14:46 Comment(2)
what do you mean by this ?Hegelianism
from your SQL fiddler, I found duplicates for val2, so answer for the given content is no records, did I get that right?Neolatin

© 2022 - 2024 — McMap. All rights reserved.