I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data
I have loaded and processed the data using:
# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# load the data from the file
df = pd.read_csv('./data/train.csv')
# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix
# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']
# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))
df.info()
How can I add the categorical columns like Sex and Embarked to the plot?
from pandas.tools.plotting import scatter_matrix
should be replaced byfrom pandas.plotting import scatter_matrix
(cf reference answer ) – Daffodil