as.numeric returns NA for no apparent reason for some of the values in a column
Asked Answered
rna
M

2

6

While trying to convert a column of characters (strings of numbers, e.g "0.1234") into numeric, using as.numeric, some of the values are returned NA with the warning 'NAs introduced by coercion'. The characters that are returned as NAs don't seem to be any different from the ones that are returned as numeric correctly. Does anyone know what can be the problem?

Already tried to look for any characters that are not numeric (as ',') that can hide inside some of the values. I did find strings containing '-' (e.g "-0.123") that really turned into NAs, but these are only part of the strings turned into NAs. Also, tried to look for spaces inside the strings. that doesn't seem to be the problem as well.

data$y
 [1] "0.833250539"  "0.820323535"  "0.462284612"  "0.792943985"  "0.860587952"  "0.729665177"  "0.461503956"  "0.625871118" 
 [9] "0.740999346"  "0.962727964"  "0.971089266"  "0.869004848"  "0.828651766"  "0.900648732"  "0.970326033"  "0.898123286" 
[17] "0.911640765"  "0.902442126"  "0.843392097"  "0.763421844"  "0.892426243"  "0.380433624"  "0.925017633"  "0.725470821" 
[25] "0.699924767"  "0.689061225"  "0.907462936"  "0.888064239"  "0.913547115"  "-‬0.625103904‭" "0.897385961"  "0.889727462" 
[33] "0.90127339"   "0.947012474"  "0.948883588"  "0.845845512"  "0.97866966"   "0.796247738"  "0.864627056"  "0.266656189‭" 
[41] "0.894915463"  "0.969690678"  "0.771365656‭"  "0.88304436"   "0.954039006"  "0.836952199"  "0.731558669‭"  "0.907224294" 
[49] "0.622059127"  "0.887742343"  "0.917550343"  "0.97240334‭"   "0.902841957"  "0.617403052"  "0.82926708"   "0.674903846" 
[57] "0.947132958"  "0.929213613‭"  "-‬0.297844476" "0.871767367"

y = as.numeric(data$y)

Warning message: NAs introduced by coercion

y
 [1] 0.8332505 0.8203235 0.4622846 0.7929440 0.8605880 0.7296652 0.4615040 0.6258711 0.7409993 0.9627280 0.9710893 0.8690048 0.8286518
[14] 0.9006487 0.9703260 0.8981233 0.9116408 0.9024421 0.8433921 0.7634218 0.8924262 0.3804336 0.9250176 0.7254708 0.6999248 0.6890612
[27] 0.9074629 0.8880642 0.9135471        NA 0.8973860 0.8897275 0.9012734 0.9470125 0.9488836 0.8458455 0.9786697 0.7962477 0.8646271
[40]        NA 0.8949155 0.9696907        NA 0.8830444 0.9540390 0.8369522        NA 0.9072243 0.6220591 0.8877423 0.9175503        NA
[53] 0.9028420 0.6174031 0.8292671 0.6749038 0.9471330        NA        NA 0.8717674
Maurer answered 26/8, 2019 at 10:21 Comment(2)
Although very nice with pasting in your data and code, I agree something is amiss. Could you please paste your data using the dput command? If your data frame is too large, then just a subset that includes some of the values that are returned as NA.Mensch
Possible duplicate of #38829120Amey
L
2

Your strings contain some non-unicode characters. If you are certain that it is safe to remove them, use

as.numeric(iconv(data$y, 'utf-8', 'ascii', sub=''))

Ref on the conversion

Leventhal answered 26/8, 2019 at 10:37 Comment(1)
That worked great for me. All of the numbers seem to be exactly the same as they looked as strings, so I don't really understand what characters were the problem, but it worked nonetheless. Thanks!Maurer
C
2

Copy and pasting your character gives me (for the example of the last NA) "-,0.297844476". There is something wrong with the encoding. You can work around by using

as.numeric(gsub(",","",data$y))

edit This answer does not work on all your NAs... I don't really know what is going on with your data, please provide a dput if possible.

Cycloid answered 26/8, 2019 at 10:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.