SQL INNER QUERY returns more than one value in an UPDATE query

Asked 23/8, 2012 at 15:2 Answered 23/8, 2012 at 16:23

Solved sql sql-server sql-update inner-query

I have a table to be updated using records of another table and I am doing this to bring over information from one system (database) to another. The scenario is bit complex, but I desperately need help :-s

There are 3 tables - component , scan and stage_link

component

component_id  stage_id
------------  --------
1              NULL
2              NULL
3              NULL
4              NULL
5              NULL

scan

scan_id  component_id  scanner_id           date_scanned
-------  ------------  ----------  -----------------------
 1         1           scanner_a    2012-01-01 07:25:15.125
 2         1           scanner_b    2012-01-02 08:14:05.456
 3         2           scanner_a    2012-01-01 12:05:45.465
 4         3           scanner_a    2012-01-01 19:45:12.536
 5         1           scanner_c    2012-01-03 23:33:54.243
 6         2           scanner_b    2012-01-02 11:59:12.545

stage_link

stage_link_id  scanner_id  stage_id
    -------     ----------  ----------  
       1         scanner_a    1   
       2         scanner_b    1    
       3         scanner_c    2    
       4         scanner_d    2    
       5         scanner_e    2   
       6         scanner_f    3

I need to update the table component and set the field stage_id according to the latest scan. Each scan takes the component to a stage according to the scanner involved. I have written the following query in order to update the table component, but it throws an error saying;

Subquery returned more than 1 value. This is not permitted when the subquery follows '='

The query is;

UPDATE component
SET stage_id = (select stage_id
                from(
                    select scn.scanner_id, sl.stage_id
                    from scan scn
                    INNER JOIN stage_link sl ON scn.scanner_id = sl.scanner_id
                    where scn.date_scanned = (  select temp_a.max_date 
                                                from (  SELECT x.component_id, MAX(x.date_scanned) as max_date
                                                        FROM scan x
                                                        where component_id = x.component_id 
                                                        GROUP BY x.component_id
                                                      ) as temp_a
                                                where component_id = temp_a.component_id)
                    ) as temp_b
                )

I am working on MS SQL Server and want to sort this out using no PHP or any other language.

I have tried for a day to make this work but still didn't get a way to make this work. Any help would be highly appreciated!

Thank you very much in advance :-)

Limon answered 23/8, 2012 at 15:2 Comment(0)

Check this out without using correlated subqueries:

UPDATE  Com
SET     stage_id = Temp4.stage_id
FROM    dbo.component Com
        INNER JOIN 
        ( 
            SELECT Temp2.component_id ,SL.stage_id
            FROM   dbo.stage_link SL
            INNER JOIN (
                            SELECT component_id ,scanner_id
                            FROM   scan
                            WHERE  date_scanned IN (
                                SELECT  MaxScanDate
                                FROM    
                                ( 
                                    SELECT component_id , MAX(date_scanned) MaxScanDate
                                    FROM scan
                                    GROUP BY component_id
                                ) Temp 
                            )
                        ) Temp2 ON Temp2.scanner_id = SL.scanner_id
        ) Temp4 ON Com.component_id = Temp4.component_id

The output:

component_id stage_id
------------ -----------
1            2
2            1
3            1
4            NULL
5            NULL

Oceanic answered 23/8, 2012 at 16:23 Comment(2)

Thank you very much for this suggestion!! it came in great help!! This is exactly what I wanted!!! :-) thank you very much again!!!!! – Limon 23/8, 2012 at 16:58

@mithilatw: You are welcome. Please note that there might be 2 issues with this query (for future reference): (1) This assumes that the date_scanned is always unique. If 2 scanners scan at the same time (upto the nanosecond part), then the SELECT component_id ,scanner_id would return 2 values. (2) The fields in the join Temp2.scanner_id = SL.scanner_id are char based columns, hence if you have many scanner_ids, then this query might start slowing down. – Oceanic 23/8, 2012 at 17:56

Well, your subquery is returning more than one value. One easy way is to do an aggregation:

SET stage_id = (select max(stage_id)
. . .

A likely reason is because there is more than one scan on the most recent date. Since you can only choose one, given the context, then the MIN or MAX should suffice.

However, I think the real reason is that you don't have the right aliases for he correlated subquery. I think these lines:

where component_id = x.component_id
where component_id = temp_a.component_id

Should include aliases, probably being:

where component.component_id = x.component_id
where component.component_id = temp_a.component_id

If this is not sufficient, you need to explain what you want. Do you want the query to return a random scan from the most recent date? Do you want to update component for all scans on the most recent date?

You need to investigate this further. Try something like this:

select scn.scanner_id, sl.stage_id, count(*)
from scan scn INNER JOIN
     stage_link sl
     ON scn.scanner_id = sl.scanner_id join
     (SELECT x.component_id,
             MAX(x.date_scanned) as max_date
      FROM scan x
      GROUP BY x.component_id
     ) cmax
     on scn.component_id = cmax.component_id
where scn.date_scanned = cmax.maxdate 
group by scn.scanner_id, sl.stage_id
order by count(*) desc

Singley answered 23/8, 2012 at 15:12 Comment(3)

Thank you very much for the answer!! I am trying to update all the components of the table component by looking at the scans which are done on one component at a time. Therefore, there can only be one scan on the max date when grouped by component. The issue with your suggestion is the accuracy of information. And the maximum numeric value of stage_id is not the latest stage the component could be (sorry this scenario is this much complex!) My problem is, there can only be one scan at the max date per component and it still returns more than 1 :-( – Limon 23/8, 2012 at 15:23

@Limon . . . Take the subquery out of the update and start investigating where the duplicate occurs. It could be a duplicate in one of the tables by scanner_id or by component. Experience suggests, though, that duplicates on the same date is a very likely culprit. – Singley 23/8, 2012 at 15:26

Tried this, didn't work. There should be a better way than using correlated subqueries. – Oceanic 23/8, 2012 at 15:34

This needed an OLAP function to work:

UPDATE Component SET Component.stage_id = Stage_Link.stage_id
FROM Component
JOIN (SELECT component_id, scanner_id, 
             ROW_NUMBER() OVER(PARTITION BY component_id 
                               ORDER BY date_scanned DESC) rownum
      FROM Scan) Scan
  ON Scan.component_id = Component.component_id
     AND Scan.rownum = 1
JOIN Stage_Link
  ON Stage_Link.scanner_id = Scan.scanner_id
WHERE Component.stage_id IS NULL

Which generates a result set of:

Component
component_id   stage_id
========================
1              2
2              1
3              1
4              null
5              null

(I also have a working SQL Fiddle example.)

Lovemaking answered 23/8, 2012 at 15:35 Comment(3)

Thank you very much for your time! I tried this suggestion in my database and the query ran for more than 1 hour, but still didn't return a result. I see the idea you try to implement and I do agree with it.. I don't understand why my database wouldn't accept it :-P – Limon 23/8, 2012 at 16:51

#winces# Actually, your problem is that your database did accept it - it's just not running as fast as it could be. It's probably running the value retrieval for each line. This isn't a problem when it's small (like for the test data), but obviously doesn't scale well. Doesn't SQL Server support CTEs in UPDATE statements? You might try to get at least the part with ROW_NUMBER() into a CTE, so that it only gets run once (this is likely to be the best savings). Alternatively, dump the results of the sub-select into a temporary table, and run a correlated update from that. – Lovemaking 23/8, 2012 at 17:22

@X-Zero: I think you meant correlated subquery instead of CTE. SQL Server supports both, but with correlated subqueries, as you mentioned, it evaluates for every row of Component and hence not exactly set-based which would explain why it takes a long time. – Oceanic 23/8, 2012 at 17:32

Recommended topics

Hot tags