SQL query to return one single record for each unique value in a column
Asked Answered
L

14

8

I have a table in SQL Server 2000 that I am trying to query in a specific way. The best way to show this is with example data.

Behold, [Addresses]:

Name         Street                 City          State
--------------------------------------------------------
Bob          123 Fake Street        Peoria        IL
Bob          234 Other Street       Fargo         ND
Jim          345 Main Street        St Louis      MO

This is actually a simplified example of the structure of the actual table. The structure of the table is completely beyond my control. I need a query that will return a single address per name. It doesn't matter which address, just that there is only one. The result could be this:

Name         Street                 City          State
--------------------------------------------------------
Bob          123 Fake Street        Peoria        IL
Jim          345 Main Street        St Louis      MO

I found a similar question here, but none of the solutions given work in my case because I do not have access to CROSS APPLY, and calling MIN() on each column will mix different addresses together, and although I don't care which record is returned, it must be one intact row, not a mix of different rows.

Recommendations to change the table structure will not help me. I agree that this table is terrible, (it's worse than shown here) but this is part of a major ERP database that I can not change.

There are about 3000 records in this table. There is no primary key.

Any ideas?

Lamrouex answered 11/6, 2009 at 18:52 Comment(4)
Can you give an idea how many records in your table? I have some ideas on doing this but may not be very fast if there are thousands / millions of records.Wellordered
Do you have any primary keys on this table?Slaphappy
~3000 records, and no PK, amazingly enough. I added this info to the question.Lamrouex
ok, check my answer, I added a distinctInvolucre
O
4

Well, this will give you pretty bad performance, but I think it'll work

SELECT t.Name, t.Street, t.City, t.State
FROM table t 
INNER JOIN (
     SELECT m.Name, MIN(m.Street + ';' + m.City  + ';' + m.State) AS comb
     FROM table m
     GROUP BY m.Name
) x
   ON  x.Name = t.Name
   AND x.comb = t.Street + ';' + t.City  + ';' + t.State
Onetime answered 11/6, 2009 at 19:0 Comment(11)
Unfortunately, there is no unique id field for this table. Yes, I know. This sucks.Lamrouex
I think those are part of the street name. It doesn't appear to have any keys.Slaphappy
123, 234, etc. are not IDs, they are part of the street address. The problem as given doesn't include an ID field in the table, which is a pity because this solution is great otherwise.Aaronson
You can create a unique id by using a temp table / table variable. Eg: DECLARE @data TABLE (id int IDENTITY(1,1), ...) And then, insert all the data from the table into the temp table / table variable, and then use this script.Courser
See my version below on a way to change this into something that might work without keys.Slaphappy
Doh, keep forgetting you have the street numbers FIRST in the street address :-)Onetime
@Tadmas: the problem here is now we've gone past simple queries and now we have to do this in a Procedure where before we could do it in a query or view.Slaphappy
@Orion Adrian: Correctness is more important than convenience.Courser
Changed the query to accomodate for the missing id fieldOnetime
@Tadmas: Except when you don't have permissions to change what you need to change for correctness.Slaphappy
NULLs will make this solution much longer.Brocky
H
3

Use a temp table or table variable and select a distinct list of names into that. Use that structure then to select the top 1 of each record in the original table for each distinct name.

Homoousian answered 11/6, 2009 at 19:15 Comment(3)
The performance for this would be very poor as you would have to use a cursor for this, unless you have code for it.Slaphappy
Looking at the source table I don't think performance is the issue.Homoousian
I should have read all the answers first because this is the same as my solution. The table isn't indexed and there's only 3000 records so I don't think a cursor will be significantly slower than a strictly SQL solution. This also doesn't strike me as a query that will be run frequently.Sidran
O
3

If you can use a temp table:

select * -- Create and populate temp table 
into #Addresses
from Addresses 

alter table #Addresses add PK int identity(1, 1) primary key

select Name, Street, City, State 
-- Explicitly name columns here to not return the PK
from #Addresses A
where not exists 
    (select *
    from #Addresses B
    where B.Name = A.Name
    and A.PK > B.PK)

This solution would not be advisable for much larger tables.

Octangular answered 11/6, 2009 at 19:40 Comment(1)
++1. This is an excellent answer. Short and sweet. It doesn't require you to list out all the field names individually, and it complete avoids the issue of nullable field comparison. One correlated subquery, no aggregates, no joins. For 3000 rows, this is a) the least coding, b) good performance, and c) fool-proof results. Beautiful!Brocky
I
2
select distinct Name , street,city,state
from table t1 where street =  
(select min(street) from table t2 where t2.name = t1.name)
Involucre answered 11/6, 2009 at 19:2 Comment(8)
That won't work: You can have the same street address in multiple cities.Courser
There are actually examples of multiple rows that contain the same address, so in those cases, I'd still get duplicates.Lamrouex
The same address for the same Name ? if name + address is unique, this will work, I thinkInvolucre
Yes, there are some examples of the same address occurring twice for the same name.Lamrouex
Yeah, it will work if they are unique, but that's not a valid assumption. The odds of a name and street address collision are not high, but it's still nonzero.Courser
@Tadmas: Really to get what you want you need to either create a primary key, get the row index, or create a hash of all the fields together to prevent collisions. Then again I don't think the OP is really sorting on just name as name/name collisions are very likely.Slaphappy
There really isn't a "name" field. This is an analogy for the structure of the table. And I do not have the ability to add any indexes or columns.Lamrouex
With DISTINCT, this looks promising. I think this will probably work.Lamrouex
G
2

select Name , street,city,state FROM( select Name , street,city,state, ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) AS rn from table) AS t WHERE rn=1

Gigantopithecus answered 11/6, 2009 at 19:14 Comment(0)
W
1

A temporary table solution would be as follows

CREATE Table #Addresses
(
    MyId int IDENTITY(1,1),
    [Name] NVARCHAR(50),
    Street NVARCHAR(50),
    City NVARCHAR(50),
    State NVARCHAR(50)
)

INSERT INTO #Addresses ([Name], Street, City, State) SELECT [Name], Street, City, State FROM Addresses

SELECT
    Addresses1.[Name],
    Addresses1.Street,
    Addresses1.City,
    Addresses1.State
FROM
    #Addresses Addresses1
WHERE
    Addresses1.MyId =
(
    SELECT
        MIN(MyId)
    FROM
        #Addresses Addresses2
    WHERE
        Addresses2.[Name] = Addresses1.[Name]
)

DROP TABLE #Addresses
Wellordered answered 11/6, 2009 at 19:38 Comment(2)
This doesn't work because address components don't always increase or decrease together. For example, 123 < 234, but Peoria > Fargo.Lamrouex
Ok, tried it with your dataset and indeed, seems my logic was very wrong. Have left just the temporary table solution though which works fine for me.Wellordered
F
1

This is ugly as hell, but it sounds like your predicament is ugly, too... so here goes...

select  name,
    (select top 1 street from [Addresses] a1 where a1.name = a0.name) as street,
    (select top 1 city from [Addresses] a2 where a2.name = a0.name) as city,
    (select top 1 state from [Addresses] a3 where a3.name = a0.name) as state
from    (select distinct name from [Addresses]) as a0
Frau answered 11/6, 2009 at 19:40 Comment(7)
I was writing this solution in my head as I was reading through the previous answers. It isn't a slick or pretty answer but it should work. Another option here would be to change the FROM subquery to a GROUP BY to improve performance slightly over DISTINCT.Ethnarch
Is this guaranteed not to mix adresses?Lamrouex
needs and ORDER BY I think to be safeOnetime
This is wrong - you may end up with city and street from different addressesGigantopithecus
Yes, Alex and recursive, the chance that it could mix up is there. However, by selecting TOP 1 on the same table using the same index should return the same row almost without fail. Depending on the size of the table, only a high frequency of updates and deletes should cause a mix up. So if this is not a high-volume OLTP table that the query is being run on, you could apply a lock and virtually guarantee yourself the same row every time.Frau
Joe, to be safe you need to add an ORDER BY clause on something unique. Even without concurrency, different subselects can go against different indexes.Gigantopithecus
Without candidate keys, you would have to order by the whole table to assure consistency.Brocky
S
1

I think this is a good candidate for a cursor based solution. It's been so long since I've used a cursor that I won't attempt to write the T-SQL but here's the idea:

  1. Create temp table with same schema as Addresses
  2. Select distinct Names into cursor
  3. Loop through cursor selecting top 1 from Addresses into temp table for each distinct Name
  4. Return select from temp table
Sidran answered 11/6, 2009 at 20:7 Comment(0)
S
0

A slight modification on the above should work.

SELECT Name, Street, City, State
FROM table t 
INNER JOIN (
     SELECT Name, MIN(Street) AS Street
     FROM table m
     GROUP BY Name
) x
   ON x.Name = t.Name AND x.Street = t.Street

Now this won't work if you have the same street but the other pieces of information are different (e.g. with typos).

OR a more complete hash would include all the fields (but you likely have too many for performance):

SELECT Name, Street, City, State
FROM table t 
INNER JOIN (
     SELECT Name, MIN(Street + '|' + City  + '|' + State) AS key
     FROM table m
     GROUP BY Name
) x
   ON  x.Name = t.Name
   AND x.key = Street + '|' + City  + '|' + State
Slaphappy answered 11/6, 2009 at 19:11 Comment(3)
I checked this. Unfortunately, there are some examples of records that have the same street.Lamrouex
It would have to be name and street for there to be a problem. You can add more columns to reduce the likelihood of collision, but overall it's not a great solution to the problem. If at all possible getting a primary key on the records is better.Slaphappy
Close, except "table" x doesn't have a field named Street. You would need something like "SELECT Name, MIN(Street) a" and "ON x.Name = t.Name AND x.a = t.Street"Poltergeist
H
0

I don't think that you can do that, given your constraints. You can pull out distinct combinations of those fields. But if someone spelled Bob and Bobb with the same address you'd end up with two records. [GIGO] You are correct that any grouping (short of grouping on all of the fields-equivalent to DISTINCT) will mix rows. It's too bad that you don't have a unique identifier for each customer.

You might be able to nest queries together in such as way as to select the top 1 for each name and join all of those together.

Hydrocellulose answered 11/6, 2009 at 19:14 Comment(0)
T
0
SELECT name,
       ( SELECT TOP 1 street, city, state
           FROM addresses b
          WHERE a.name = b.name )
  FROM addresses a
 GROUP BY name
Tyche answered 11/6, 2009 at 19:25 Comment(0)
S
0
SELECT name, street, address, state
FROM
 (SELECT name, street, address, state,
  DENSE_RANK() OVER (PARTITION BY name ORDER BY street DESC) AS r 
 FROM tbl) AS t
WHERE r = 1; 
Stander answered 11/6, 2009 at 19:35 Comment(1)
SQL 2000 does not support DENSE_RANKLamrouex
W
0

And still another way:

-- build a sample table  
DECLARE @T TABLE (Name VARCHAR(50),Street VARCHAR(50),City VARCHAR(50),State VARCHAR(50))  
INSERT INTO @T   
SELECT 'Bob','123 Fake Street','Peoria','IL' UNION  
SELECT 'Bob','234 Other Street','Fargo','ND' UNION  
SELECT 'Jim','345 Main Street','St Louis','MO' UNION  
SELECT 'Fred','234 Other Street','Fargo','ND'  

-- here is all you do to get the unique record  
SELECT * FROM @T a WHERE (SELECT COUNT(*) FROM @T b WHERE a.Name = b.name and a.street <= b.street) = 1
Winegrower answered 11/6, 2009 at 20:34 Comment(0)
B
0
select c.*, b.* from companies c left outer join 
(SELECT *,
    ROW_NUMBER()
        OVER(PARTITION BY FKID ORDER BY PKId) AS Seq
 FROM Contacts) b on b.FKID = c.PKID and b.Seq = 1
Bounden answered 11/6, 2012 at 9:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.