Difference between PROC SQL and sqldf
Asked Answered
S

2

6

I am in the midst of re-writing a SAS program I normally use into R. Many statements in the SAS program are written with proc sql, so for ease of programming I am replicating those statements with sqldf in R. All of the sql code works the same with sqldf as it does in SAS, except for this statement:

SAS:

proc sql;
create table merged32 as
select max(ctf) as ctf,
       max(ctms) as ctms,
       max(site_name) as site_name,
       provider_id, npi,
       first_name, middle_name, last_name, specialty,
       address_1, city, state, site_ct, zip, site_key
    from merged2
    group by 9,10,11,12,14,15;
quit;
run;

sqldf:

sqldf("select max(ctf) as ctf,
              max(ctms) as ctms,
              max(site_name) as site_name,
              provider_id, npi,
              first_name, middle_name, last_name, specialty,
              address_1, city, state, site_ct, zip, site_key
       from merged2
       group by 9,10,11,12,14,15")

In SAS, it returns a dataset with 1878 rows; in sqldf, a dataframe with 1375.

Are there any major differences between proc sql and sqldf that could be causing this, or in general that need to be considered? I didn't provide data because 1) the datasets are huge, and 2) I'm more just interested in knowing the differences between the two sql systems.

Stronghold answered 2/8, 2012 at 19:4 Comment(4)
Not sure about sqldf, but for meaningful results, you should group by all your key variables in SQL.Quimper
I never knew why I tend to put the "key" variables first, and the aggregates last. now I know...Schapira
shoot, didn't even notice the missing key variables in the list. Thanks guysStronghold
@user1445246, has this question been answered through the inputs in the comments? If so, please feel free to add your solution as an answer and mark it as accepted to help remove it from the "unanswered question" queue.Pedropedrotti
Q
3

Not sure about sqldf, but for meaningful results, you should group by all your key variables in SQL.

Quimper answered 30/10, 2012 at 18:43 Comment(0)
C
0

The results in SAS maybe have duplicated records, while those in R do not

Crassulaceous answered 23/1, 2014 at 18:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.