What is the proper way to test if a value in a DataFrame is NA in the Julia DataFrames package?
I have this far found out that typeof(var) == NAtype
works, but is there a more elegant way of doing it?
What is the proper way to test if a value in a DataFrame is NA in the Julia DataFrames package?
I have this far found out that typeof(var) == NAtype
works, but is there a more elegant way of doing it?
Using typeof(var) == NAtype
for this is awkward, in particular because it is not vectorized.
The canonical way of testing for NA
values is to use the (vectorized) function called isna
.
Let's generate a toy DataFrame with some NA
values in the B
column:
julia> using DataFrames
julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
| Row | A | B |
|-----|----|----|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |
julia> df[[1,4,8],symbol("B")] = NA
NA
julia> df
10x2 DataFrame
| Row | A | B |
|-----|----|----|
| 1 | 1 | NA |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | NA |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | NA |
| 9 | 9 | 18 |
| 10 | 10 | 20 |
Now let's pretend we don't know the contents of our DataFrame and ask, for example, the following question:
Does column
B
contain anNA
values?
The typeof
approach won't work, here:
julia> typeof(df[:,symbol("B")]) == NAtype
false
The isna
function is more adequate:
julia> any(isna(df[:,symbol("B")]))
true
In addition to @jub0bs answer, If one would like to check whether a DataFrame contains any NaN
values or not, the following code can help:
julia> df = DataFrame(A = 1:10, B = 2:2:20)
10×2 DataFrame
Row │ A B
│ Int64 Int64
─────┼──────────────
1 │ 1 2
2 │ 2 4
3 │ 3 6
4 │ 4 8
5 │ 5 10
6 │ 6 12
7 │ 7 14
8 │ 8 16
9 │ 9 18
10 │ 10 20
julia> any(isnan.(Matrix(df)))
false
This means there aren't any NaN
values in the given DataFrame!
© 2022 - 2024 — McMap. All rights reserved.