SQL Server Query failing when executed from R
Asked Answered
M

3

6

I have the following simple SQL server code:

set nocount on;
if OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A;

set nocount on;
create table #A
( obj_id int,
  obj_name varchar(50),
  obj_dt   datetime);

  insert into #A (
  obj_id,
  obj_name,
  obj_dt)
  values
  ( 1
   ,'name'
   ,'2019-01-01 00:00:00'
  ),
  ( 2
   ,NULL
   ,NULL
  ),
  ( 2
   ,'alias'
   ,'2019-02-01 00:00:00'
  );

set nocount on;
if OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B;

set nocount on;
select 
 #A.obj_id
,subq.obj_name
,subq.obj_dt
into #B 
from #A
join (select
        obj_id,
        max(obj_name) as obj_name,
        max(obj_dt) as obj_dt
      from #A
      group by obj_id) as subq
on #A.obj_id = subq.obj_id;

set nocount on;
select * from #B;

which, as expected, returns the following data when executed in Microsoft SQL Server Management Studio:

obj_id  obj_name    obj_dt
1       name        2019-01-01 00:00:00.000
2       alias       2019-02-01 00:00:00.000
2       alias       2019-02-01 00:00:00.000

So far so good. Now I wish to run this code from R and have that same output returned to R. I have the query above stored in the string query and my RODBC connection stored in the variable connection, and attempt to retrieve the data with

sqlQuery(connection,query)

The result is character(0). However, if I modify the query above by commenting out the subq.obj_name and subq.obj_dt fields in the definition of #B then the code successfully returns the expected dataset

obj_id
1      1
2      2
3      2

from R.

So what is going on here? Both sql queries are valid and run successfully in the Microsoft SQL server environment, but only one works when piped through R. I can't figure out what accounts for the failure of the RODBC code to handle the second query.

Mola answered 7/3, 2019 at 13:39 Comment(2)
Not an answer, but you should definitely try to see if you can get error output from SQL Server, via your R script.Inclination
Typically if there is an error message it will be stored in the output variable. In this case, it just produces character(0) so I'm thinking there is no actual error but rather that just nothing is being returned? Not sure though ...Mola
M
0

Okay, so I think I've figured out what is going wrong here. The subquery

select
    obj_id,
    max(obj_name) as obj_name,
    max(obj_dt) as obj_dt
    from #A
group by obj_id

produces a hidden warning. You don't see the warning if you just run the code as is, but if you store the output in a temporary table then the warning message is produced:

select
    obj_id,
    max(obj_name) as obj_name,
    max(obj_dt) as obj_dt
    into #C
    from #A
group by obj_id

Warning: Null value is eliminated by an aggregate or other SET operation.

The warning is hidden when this is run as part of the subquery in the original SQL code in the question. I believe this message is somehow part of output that R is "seeing", and once R sees that output it terminates the query. But since no results have been returned yet the output in R is empty (i.e. character(0)).

To solve this issue I coalesced the variables that I'm computing the max of to some minimal values (I'm not sure about what the minimal character is in sql server collation but '0' worked for my purposes). The idea is to remove NULL values before aggregation so no warning is generated. The final working SQL code is below:

set nocount on;
if OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A;

set nocount on;
create table #A
( obj_id int,
  obj_name varchar(50),
  obj_dt   datetime);

  insert into #A (
  obj_id,
  obj_name,
  obj_dt)
  values
  ( 1
   ,'name'
   ,'2019-01-01 00:00:00'
  ),
  ( 2
   ,NULL
   ,NULL
  ),
  ( 2
   ,'alias'
   ,'2019-02-01 00:00:00'
  );

set nocount on;
if OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B;

set nocount on;
select 
 #A.obj_id
,subq.obj_name
,subq.obj_dt
into #B 
from #A
join
(select
        obj_id,
        max(isnull(obj_name,'0')) as obj_name,
        max(isnull(obj_dt,cast(-1 as datetime))) as obj_dt
      from #A
      group by obj_id) as subq 
on #A.obj_id = subq.obj_id;

set nocount on;
select * from #B; 

I believe this behavior should be addressed in the RODBC package as it is quite likely to trip up others and can be a bit tricky to track down the root cause and troubleshoot.

Mola answered 7/3, 2019 at 14:46 Comment(0)
D
1

It's a known issue about local temporary table (#mytable not ##mytable) and not only in R but all external calls of temporary table even in Microsoft Tools like SSMS (see the comment of the 1st link below).

Look at those links:

After reading those links It's therefore strange that it does work without subq.obj_name and subq.obj_dt: maybe it works because the queries are in a unique call.

Desiccate answered 7/3, 2019 at 14:1 Comment(3)
The links you provided indicate that the problem is returned row counts causing RODBC to behave as if the entire query is complete, and that the workaround is to put set nocount at the beginning of each statement. You'll notice that my code above does that, so I believe that this is a distinct issue from the ones referenced at these links (which I have already previously encountered and dealt with as indicated). I'm guessing the RODBC package is no longer actively developed?Mola
I've switched since Nov'18 from RODBC to DBI and RStudio/odbc, and many of my problems has gone, but in this case it's related to multi-queries in one or many external calls and not only in R but also in SSMS. Perhaps it will resolve your issue, but I doubt. Tell us if it's the case. I didn't try again to use #table after my switch from RODBC to to DBI and RStudio/odbc.Desiccate
Yeah, I've been thinking about switching as well but its a daunting thought given how integrated the RODBC approach is in all my assets. When time permits I'll explore other options for sure. In this case though, I've identified the root cause and implemented a workaround (see answer below), and I believe the issue is tied to the links in your answer as it seems to relate to R terminating the query once it sees any kind of output.Mola
M
0

Okay, so I think I've figured out what is going wrong here. The subquery

select
    obj_id,
    max(obj_name) as obj_name,
    max(obj_dt) as obj_dt
    from #A
group by obj_id

produces a hidden warning. You don't see the warning if you just run the code as is, but if you store the output in a temporary table then the warning message is produced:

select
    obj_id,
    max(obj_name) as obj_name,
    max(obj_dt) as obj_dt
    into #C
    from #A
group by obj_id

Warning: Null value is eliminated by an aggregate or other SET operation.

The warning is hidden when this is run as part of the subquery in the original SQL code in the question. I believe this message is somehow part of output that R is "seeing", and once R sees that output it terminates the query. But since no results have been returned yet the output in R is empty (i.e. character(0)).

To solve this issue I coalesced the variables that I'm computing the max of to some minimal values (I'm not sure about what the minimal character is in sql server collation but '0' worked for my purposes). The idea is to remove NULL values before aggregation so no warning is generated. The final working SQL code is below:

set nocount on;
if OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A;

set nocount on;
create table #A
( obj_id int,
  obj_name varchar(50),
  obj_dt   datetime);

  insert into #A (
  obj_id,
  obj_name,
  obj_dt)
  values
  ( 1
   ,'name'
   ,'2019-01-01 00:00:00'
  ),
  ( 2
   ,NULL
   ,NULL
  ),
  ( 2
   ,'alias'
   ,'2019-02-01 00:00:00'
  );

set nocount on;
if OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B;

set nocount on;
select 
 #A.obj_id
,subq.obj_name
,subq.obj_dt
into #B 
from #A
join
(select
        obj_id,
        max(isnull(obj_name,'0')) as obj_name,
        max(isnull(obj_dt,cast(-1 as datetime))) as obj_dt
      from #A
      group by obj_id) as subq 
on #A.obj_id = subq.obj_id;

set nocount on;
select * from #B; 

I believe this behavior should be addressed in the RODBC package as it is quite likely to trip up others and can be a bit tricky to track down the root cause and troubleshoot.

Mola answered 7/3, 2019 at 14:46 Comment(0)
F
0

As an expansion on Rookatu's answer you can turn off the warnings by adding the below to the beginning of your query:

SET ANSI_WARNINGS OFF

Of course this may not be ideal if you are trying to capture other warnings, but can be an easier to implement solution in a pinch.

Flash answered 31/1, 2022 at 18:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.