How to import a CSV file with delimiter as ";" and decimal separator as "," into SAS?

Asked 5/2, 2015 at 15:6 Answered 1/7, 2016 at 15:34

I`ve got (and will receive in the future) many CSV files that use the semicolon as delimiter and the comma as decimal separator. So far I could not find out how to import these files into SAS using proc import -- or in any other automated fashion without the need for messing around with the variable names manually.

Create some sample data:

%let filename = %sysfunc(pathname(work))\sap.csv;

data _null_;
  file "&filename";
  put 'a;b';
  put '12345,11;67890,66';
run;

The import code:

proc import out = sap01 
datafile= "&filename"
dbms = dlm; 
delimiter = ";";
GETNAMES = YES; 
run;

After the import a value for the variable "AMOUNT" such as 350,58 (which corresponds to 350.58 in the US format) would look like 35,058 (meaning thirtyfivethousand...) in SAS (and after re-export to the German EXCEL it would look like 35.058,00). A simple but dirty workaround would be the following:

data sap02; set sap01;
AMOUNT = AMOUNT/100;
format AMOUNT best15.2;
run;

I wonder if there is a simple way to define the decimal separator for the CVS-import (similar to the specification of the delimiter). ..or any other "cleaner" solution compared to my workaround. Many thanks in advance!

Sandblind answered 5/2, 2015 at 15:6 Comment(0)

You technically should use dbms=dlm not dbms=csv, though it does figure things out. CSV means "Comma separated values", while DLM means "delimited", which is correct here.

I don't think there's a direct way to make SAS read in with the comma via PROC IMPORT. You need to tell SAS to use the NUMXw.d informat when reading in the data, and I don't see a way to force that setting in SAS. (There's an option for output with a comma, NLDECSEPARATOR, but I don't think that works here.)

Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program; then for each of the read-in records add :NUMX10. or whatever the appropriate maximum width of the field is. It will end up looking something like this:

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  input
    firstnumvar :NUMX10.
    secondnumvar :NUMX10.
    thirdnumvar :NUMX10.
    fourthnumvar :NUMX10.
    charvar :$15.
    charvar2 :$15.
  ;
run;

It will also generate lots of informat and format code; you can alternately convert the informats to NUMX10. instead of BEST. instead of adding the informat to the read-in. You can also just remove the informats, unless you have date fields.

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  informat firstnumvar secondnumvar thirdnumvar fourthnumvar NUMX10.;
  informat charvar $15.;
  format  firstnumvar secondnumvar thirdnumvar fourthnumvar BEST12.;
  format charvar $15.;
  input
    firstnumvar
    secondnumvar
    thirdnumvar
    fourthnumvar
    charvar $
  ;
run;

Ontina answered 5/2, 2015 at 15:10 Comment(6)

Thanks for your help, Joe! I've just edited my post wrt the dbms option. Your other ideas are quite interesting but still of the workaround-kind. Hard to believe that SAS cant handle such a mundane problem. But I fear that you might be right... – Sandblind 5/2, 2015 at 15:32

SAS can handle it and Joe showed you how. PROC IMPORT is nothing but a utility to read in basic files -- it writes a Data Step to read the file. If the file does not conform to its idea of "basic" then you need to write the Data Step. – Valencia 5/2, 2015 at 16:19

Edit: Hard to believe that SAS cant handle such a mundane problem in an automated fashion. But I'm currently working with the solution of Joe, thanks a lot again! – Sandblind 5/2, 2015 at 16:28

PROC IMPORT for delimited files has a very limited feature set. Data step input is what you're largely expected to use for anything remotely complicated. It is not a 'workaround', and it is automated (and automatable) as far as I am concerned. I will agree that it's a bit surprising that SAS doesn't have an option to allow it to consider ',' a decimal separator, but I'm not sure how truly common that is in the field. – Ontina 5/2, 2015 at 17:1

The ability to specify the delimiter as well as the decimal separator for csv file imports is standard for R (read.csv) -- but e.g. not for STATA as far as I know. Google tells me that it is also possible for MATHEMATICA. Quite ironically, it seems to be possible for SAS to export to a CSV (by means of ODS) allowing for both the delimiter and the decimal separator to be set up individually. Since the comma is the standard decimal separator in Germany, and given the many related search requests in google, I thought that the issue to import such a file was relatively common. – Sandblind 6/2, 2015 at 8:51

I would assume it's simply because not enough demand for it exists - and there are easily-enough-done methods (including automatic methods) for doing this anyway. SAS is pretty good at adding new features when requested - you can recommend a new feature and have others vote on it, and the highly-voted ones are usually implemented. Of course, it's not open source so you don't get quite as many features, but they're pretty good about it. – Ontina 6/2, 2015 at 15:48

Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program

This has a drawback. If there is a change in the stucture of the csv file, for example a changed column order, then one has to change the code in the SAS programm.
So it is safer to change the input, substituting in the numeric fields the comma with dot and passing SAS the modified input.

The first idea was to use a perl program for this, and then use in SAS a filename with a pipe to read the modified input.
Unfortunately there is a SAS restriction in the proc import: The IMPORT procedure does not support device types or access methods for the FILENAME statement except for DISK.
So one has to create a workfile on disk with the adjusted input.

I used the CVS_PP package to read the csv file.
testdata.csv contains the csv data to read.
substitute_commasep.perl is the name of the perl program

perl code:

# use lib "/........";    # specifiy, if Text::CSV_PP is locally installed. Otherwise error message: Can't locate Text/CSV_PP.pm in ....;
use Text::CSV_PP;
use strict;
   my $csv = Text::CSV_PP->new({ binary => 1
                                ,sep_char   => ';'
                             }) or die "Error creating CSV object: ".Text::CSV_PP->error_diag ();
   open my $fhi, "<", "$ARGV[0]" or die "Error reading CSV file: $!";
   while ( my $colref = $csv->getline( $fhi) ) {
      foreach (@$colref) {              # analyze each column value
         s/,/\./ if /^\s*[\d,]*\s*$/;   # substitute,  if the field contains only numbers and ,
      }
      $csv->print(\*STDOUT, $colref);
      print "\n";
   }
   $csv->eof or $csv->error_diag();
   close $fhi;

SAS code:

filename readcsv pipe "perl substitute_commasep.perl testdata.csv";
filename dummy "dummy.csv";
data _null_;
     infile readcsv;
     file dummy;
     input;
     put _infile_;
run;
proc import datafile=dummy
     out=data1
     dbms=dlm
     replace;
     delimiter=';';
     getnames=yes;
     guessingrows=32767;
run;

Chump answered 1/7, 2016 at 15:34 Comment(0)

Recommended topics

Hot tags