Proper way to test for NA in Julia DataFrames
Asked Answered
D

2

10

What is the proper way to test if a value in a DataFrame is NA in the Julia DataFrames package?

I have this far found out that typeof(var) == NAtype works, but is there a more elegant way of doing it?

Defense answered 26/1, 2015 at 15:52 Comment(0)
H
11

Using typeof(var) == NAtype for this is awkward, in particular because it is not vectorized.

The canonical way of testing for NA values is to use the (vectorized) function called isna.

Example

Let's generate a toy DataFrame with some NA values in the B column:

julia> using DataFrames

julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
| Row | A  | B  |
|-----|----|----|
| 1   | 1  | 2  |
| 2   | 2  | 4  |
| 3   | 3  | 6  |
| 4   | 4  | 8  |
| 5   | 5  | 10 |
| 6   | 6  | 12 |
| 7   | 7  | 14 |
| 8   | 8  | 16 |
| 9   | 9  | 18 |
| 10  | 10 | 20 |

julia> df[[1,4,8],symbol("B")] = NA
NA

julia> df
10x2 DataFrame
| Row | A  | B  |
|-----|----|----|
| 1   | 1  | NA |
| 2   | 2  | 4  |
| 3   | 3  | 6  |
| 4   | 4  | NA |
| 5   | 5  | 10 |
| 6   | 6  | 12 |
| 7   | 7  | 14 |
| 8   | 8  | NA |
| 9   | 9  | 18 |
| 10  | 10 | 20 |

Now let's pretend we don't know the contents of our DataFrame and ask, for example, the following question:

Does column B contain an NA values?

The typeof approach won't work, here:

julia> typeof(df[:,symbol("B")]) == NAtype
false

The isna function is more adequate:

julia> any(isna(df[:,symbol("B")]))
  true
Hepatic answered 26/1, 2015 at 16:24 Comment(0)
M
0

In addition to @jub0bs answer, If one would like to check whether a DataFrame contains any NaN values or not, the following code can help:

julia> df = DataFrame(A = 1:10, B = 2:2:20)
10×2 DataFrame
 Row │ A      B
     │ Int64  Int64
─────┼──────────────
   1 │     1      2
   2 │     2      4
   3 │     3      6
   4 │     4      8
   5 │     5     10
   6 │     6     12
   7 │     7     14
   8 │     8     16
   9 │     9     18
  10 │    10     20


julia> any(isnan.(Matrix(df)))
false

This means there aren't any NaN values in the given DataFrame!

Marketable answered 17/7, 2022 at 14:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.