Read files with extension .data into R
Asked Answered
A

5

10

I need to read a data file into R for my assignment. You can download it from the following site.

http://archive.ics.uci.edu/ml/datasets/Acute+Inflammations

The data file ends with an extension .data which I never see before. I tried read.table and alike but could not read it into R properly. Can anyone help me with this, please?

Adventitious answered 13/1, 2014 at 21:40 Comment(0)
V
11

It's a UTF-16 little endian file with a byte order mark at the beginning. read.table will fail unless you specify the correct encoding. This works for me on MacOS. Decimals are indicated by a comma.

read.table("diagnosis.data", fileEncoding="UTF-16", dec=",")

      V1  V2  V3  V4  V5  V6  V7  V8
1   35.5  no yes  no  no  no  no  no
2   35.9  no  no yes yes yes yes  no
3   35.9  no yes  no  no  no  no  no
Vadnais answered 13/1, 2014 at 22:5 Comment(0)
A
5

From your link:

The data is in an ASCII file. Attributes are separated by TAB.

Thus you need to use read.table() with sep = "\t"

-- Attribute lines: For example, '35,9 no no yes yes yes yes no' Where: '35,9' Temperature of patient 'no' Occurrence of nausea 'no' Lumbar pain 'yes' Urine pushing (continuous need for urination) 'yes' Micturition pains 'yes' Burning of urethra, itch, swelling of urethra outlet 'yes' decision: Inflammation of urinary bladder 'no' decision: Nephritis of renal pelvis origin

Also looks like it uses a comma for the decimal, so also specify dec = "," inside read.table().

It looks like you'll need to put in the column headings manually, though your link defines them.

Make sure you see @Gavin Simpson's comment below to clean up other undocumented "features" of this dataset.

Asepsis answered 13/1, 2014 at 21:54 Comment(3)
I am unable to read the file using read.table or readLines. There are a few rogue characters at the beginning of the file, too.Georgeta
That may or may not work however. I see "15:56:01: The file "~/Downloads/diagnosis.data" is not valid UTF-8." in my text editor, which is causing read.table() to error out on my Linux box. Opening the file in the editor, setting encoding to UTF-8 and resaving, allowed me to read the file with read.table("~/Downloads/diagnosis.data", sep = "\t", dec = ",").Fanatical
Seperator has "," in some .data files. Therefore data <-read.table (file.choose (), fileEncoding =" UTF-8 ", sep =", ") can be used. I used it like that. When I didn't do this I saw an extra "," at the end of each column.Corso
A
3

You have a UTF-16LE file, a.k.a Unicode on Windows (in case you're on that os). Try this

f <-file("http://archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data", open="r" ,encoding="UTF-16LE")
data <- read.table(f, dec=",", header=F)

Though trying what @Gavin Simpson said might help, as you can add your headings and save the file

Andri answered 13/1, 2014 at 22:6 Comment(0)
M
3

I also struggled with file ".data", but I did well. Here is my code:

df <- read.table("diagnosis.data", fileEncoding = "UTF-8", sep = ",")

I hope this code can help you and others alot.

Monumental answered 6/9, 2021 at 3:58 Comment(0)
P
1

The above responses are very useful. A bit trickier way is that you can just rename the file name or file type to .csv format. Then using the read.csv command you can do the rest.

Phytogeography answered 23/6, 2019 at 16:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.