Is there a way to remove nan from a dictionary filled with data?
Asked Answered
D

5

14

I have a dictionary that is filled with data from two files I imported, but some of the data comes out as nan. How do I remove the pieces of data with nan?

My code is:

import matplotlib.pyplot as plt 
from pandas.lib import Timestamp
import numpy as np   
from datetime import datetime
import pandas as pd
import collections

orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date'])
specificdrugs=pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\Drugs.txt',sep=',')

"""This is a dictionary that collects data from the .txt file
This dictionary has a key,value pair for every generic name with its corresponding approval date """
drugdict={}
for d in specificdrugs['Generic Name']:
    drugdict.dropna()
    drugdict[d]=orangebook[orangebook.Ingredient==d.upper()]['Approval_Date'].min()

What should I add or take away from this code to make sure that there are no key,value pairs in the dictionary with a value of nan?

Dorren answered 5/6, 2014 at 19:9 Comment(2)
You can use filter() with a dictionary comprehension. See this for reference: https://mcmap.net/q/99508/-how-to-filter-a-dictionary-according-to-an-arbitrary-condition-function .Jobey
are your nans stored in the dict as keys or values?Drucilla
D
30
from math import isnan

if nans are being stored as keys:

# functional
clean_dict = filter(lambda k: not isnan(k), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not isnan(k)}

if nans are being stored as values:

# functional
clean_dict = filter(lambda k: not isnan(my_dict[k]), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not isnan(my_dict[k])}
Drucilla answered 5/6, 2014 at 19:19 Comment(3)
This didn't work for me where the keys are not numeric... in case it's helpful for anyone, for strings I changed this to: c = {k: c[k] for k in c if type(k) is str}Vardon
you can also iterate over .items(): {k: v for k, v in my_dict.items() if not isnan(v)}Dietrich
Im my case my keys were string but my values could be numbers, string or nan, and I use this formula {k: my_dict[k] for k in my_dict if type(my_dict[k]) is str or not isnan(my_dict[k])}Drawknife
B
5

With simplejson

import simplejson

clean_dict  = simplejson.loads(simplejson.dumps(my_dict, ignore_nan=True))
## or depending on your needs
clean_dict  = simplejson.loads(simplejson.dumps(my_dict, allow_nan=False))
Beeck answered 3/4, 2017 at 19:3 Comment(1)
im still getting None as valueBabbette
F
2

A slightly modified version of twinlakes's approach would be that of using pandas.isna() functionality as follows: if nans are being stored as keys:

# functional
clean_dict = filter(lambda k: not pd.isna(k), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not pd.isna(k)}

if nans are being stored as values:

# functional
clean_dict = filter(lambda k: not pd.isna(my_dict[k]), my_dict)

# dict comprehension
clean_dict = {k: my_dict[k] for k in my_dict if not pd.isna(my_dict[k])}

This way even when the fields are non numeric, it'll still work.

Fadein answered 30/4, 2022 at 23:57 Comment(0)
L
1

Instead of trying to remove the NaNs from your dictionary, you should further investigate why NaNs are getting there in the first place.

It gets difficult to use NaNs in a dictionary, as a NaN does not equal itself.

Check this out for more information: NaNs as key in dictionaries

Lithometeor answered 5/6, 2014 at 19:12 Comment(0)
C
0

Know old, but here is what worked for me and simple - remove NaNs on reading of the CSV upfront:

orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date']).dropna()

I also like to convert to dictionary at the same time:

orangebook = pd.read_csv('C:\Users\WEGWEIS_JAKE\Desktop\Work Programs\Code Files\products2.txt',sep='~', parse_dates=['Approval_Date']).dropna().to_dict()
Chronological answered 16/12, 2019 at 13:8 Comment(1)
Using .dropna() removes the whole column and rows with null valuesGuimar

© 2022 - 2024 — McMap. All rights reserved.