Removing Time from Date Axis in Seaborn Heatmap
Asked Answered
S

5

13

I am attempting to create a Seaborn heatmap to visualize scientific measurements. The heatmap should display vertical depth on the y-axis, time on the x-axis, and the intensity of the measurement as the heat function. The data is structured with 'depth', 'date', and 'capf' as columns in a pandas DataFrame.

Here is the code snippet I'm working with:

sns.set()
nametag = 'Well_4_all_depths_capf'
Dp = D[D.well == 'well4']
print(Dp.date)

heat = Dp.pivot("depth",  "date", "capf")
plt.title(nametag)
sns.heatmap(heat, linewidths=.25)
plt.savefig('%s%s.png' % (pathheatcapf, nametag), dpi=600)

The output from print(Dp.date) shows that the dates are formatted as I desire (year-day-month), but when I run the heatmap, the date axis incorrectly displays times (00:00 etc.), which I want to remove.

Here's the current output of Dp.date:

0    2016-08-09
1    2016-08-09
...
6    2016-08-09

Is the problem related to how I'm handling dates? Specifically, could the use of pd.to_datetime when creating the 'date' column be an issue? Here's how I've parsed dates from filenames:

D['date'] = pd.to_datetime(['%s-%s-%s' % (f[0:4], f[4:6], f[6:8]) for f in D['filename']])

I suspect this formatting might be causing the issue with unwanted time data appearing on the x-axis of the heatmap. How can I remove these timestamps?

enter image description here

Swetlana answered 2/12, 2016 at 5:35 Comment(0)
L
13

You have to use strftime function for your date series of dataframe to plot xtick labels correctly:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import random

dates = [datetime.today() - timedelta(days=x * random.getrandbits(1)) for x in xrange(25)]
df = pd.DataFrame({'depth': [0.1,0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001],\
 'date': dates,\
 'value': [-4.1808639999999997, -9.1753490000000006, -11.408113999999999, -10.50245, -8.0274750000000008, -0.72260200000000008, -6.9963940000000004, -10.536339999999999, -9.5440649999999998, -7.1964070000000007, -0.39225599999999999, -6.6216390000000001, -9.5518009999999993, -9.2924690000000005, -6.7605589999999998, -0.65214700000000003, -6.8852289999999989, -9.4557760000000002, -8.9364629999999998, -6.4736289999999999, -0.96481800000000006, -6.051482, -9.7846860000000007, -8.5710630000000005, -6.1461209999999999]})
pivot = df.pivot(index='depth', columns='date', values='value')

sns.set()
ax = sns.heatmap(pivot)
ax.set_xticklabels(df['date'].dt.strftime('%d-%m-%Y'))
plt.xticks(rotation=-90)

plt.show()

enter image description here

Lelia answered 21/12, 2016 at 9:13 Comment(3)
This now gives ValueError: The number of FixedLocator locations (13), usually from a call to set_ticks, does not match the number of ticklabels (25)., so not sure it's quite right...Misadventure
@ChrisWithers "Note that it is important to set both, the tick locations (set_xticks) as well as the tick labels (set_xticklabels), otherwise they would become out of sync." from matplotlib docEstren
@BennyJobigan - maybe edit the answer to take this into account and make it work?Misadventure
B
5

Example with standard heatmap datetime labels

import pandas as pd
import seaborn as sns

dates = pd.date_range('2019-01-01', '2020-12-01')

df = pd.DataFrame(np.random.randint(0, 100, size=(len(dates), 4)), index=dates)

sns.heatmap(df)

standard_heatmap

We can create some helper classes/functions to get to some better looking labels and placement. AxTransformer enables conversion from data coordinates to tick locations, set_date_ticks allows custom date ranges to be applied to plots.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections.abc import Iterable
from sklearn import linear_model

class AxTransformer:
    def __init__(self, datetime_vals=False):
        self.datetime_vals = datetime_vals
        self.lr = linear_model.LinearRegression()
        
        return
    
    def process_tick_vals(self, tick_vals):
        if not isinstance(tick_vals, Iterable) or isinstance(tick_vals, str):
            tick_vals = [tick_vals]
            
        if self.datetime_vals == True:
            tick_vals = pd.to_datetime(tick_vals).astype(int).values
            
        tick_vals = np.array(tick_vals)
            
        return tick_vals
    
    def fit(self, ax, axis='x'):
        axis = getattr(ax, f'get_{axis}axis')()
        
        tick_locs = axis.get_ticklocs()
        tick_vals = self.process_tick_vals([label._text for label in axis.get_ticklabels()])
        
        self.lr.fit(tick_vals.reshape(-1, 1), tick_locs)
        
        return
    
    def transform(self, tick_vals):        
        tick_vals = self.process_tick_vals(tick_vals)
        tick_locs = self.lr.predict(np.array(tick_vals).reshape(-1, 1))
        
        return tick_locs
    
def set_date_ticks(ax, start_date, end_date, axis='y', date_format='%Y-%m-%d', **date_range_kwargs):
    dt_rng = pd.date_range(start_date, end_date, **date_range_kwargs)

    ax_transformer = AxTransformer(datetime_vals=True)
    ax_transformer.fit(ax, axis=axis)
    
    getattr(ax, f'set_{axis}ticks')(ax_transformer.transform(dt_rng))
    getattr(ax, f'set_{axis}ticklabels')(dt_rng.strftime(date_format))

    ax.tick_params(axis=axis, which='both', bottom=True, top=False, labelbottom=True)
    
    return ax

These provide us a lot of flexibility, e.g.

fig, ax = plt.subplots(dpi=150)

sns.heatmap(df, ax=ax)

set_date_ticks(ax, '2019-01-01', '2020-12-01', freq='3MS')

cleaned_heatmap_date_labels

or if you really want to get weird you can do stuff like

fig, ax = plt.subplots(dpi=150)

sns.heatmap(df, ax=ax)

set_date_ticks(ax, '2019-06-01', '2020-06-01', freq='2MS', date_format='%b `%y')

weird_heatmap_date_labels

For your specific example you'll have to pass axis='x' to set_date_ticks

Bibliolatry answered 27/12, 2020 at 0:32 Comment(1)
tick_vals = pd.to_datetime(tick_vals).astype(int).values causes a FutureWarning: The behavior of .astype from datetime64[ns] to int32 is deprecated. Replace with tick_vals = pd.to_datetime(tick_vals).astype("int64").values.Barnabas
P
1

To address the issue of timestamps appearing on the seaborn.heatmap date axis, follow these steps:

  1. Ensure Correct Date Format: Convert the 'date' column to a datetime type using pandas.to_datetime. If you wish to display only the date without the time, you can further refine this by using the .dt.date attribute to extract the date component, or dt.strftime to format the string as desired.
  2. Data Preparation for Heatmap: When reshaping your DataFrame for the heatmap, consider using pandas.DataFrame.pivot_table rather than pivot if there are multiple entries for each combination of 'depth' and 'date'. pivot_table allows you to specify an aggregation function (e.g., mean), which is necessary if your dataset includes multiple measurements for the same depth on the same date. If each depth and date combination is unique, you could alternatively use pivot.
    • This is better than .groupby because the dataframe is correctly shaped for plotting.
  3. Plotting the Heatmap: With your DataFrame reshaped, using sns.heatmap will now reflect the date labels correctly without displaying unwanted timestamps. Customize the heatmap with the cmap parameter to enhance visual appeal.
  • Tested in python 3.12.0, pandas 2.2.1, matplotlib 3.8.1, seaborn 0.13.2

Here's a simplified code example illustrating these steps:

import pandas as pd
import numpy as np
import seaborn as sns

# load the data
df = pd.DataFrame(data)

# Convert 'date' to datetime, then extract just the date part
df['date'] = pd.to_datetime(df['date']).dt.date

# reshape the data for heatmap; if there's no need to aggregate a function, then use .pivot(...)
dfp = df.pivot_table(index='depth', columns='date', values='capf', aggfunc='mean')

# plot
ax = sns.heatmap(data=dfp, cmap='GnBu')

This method ensures that the date axis in the heatmap displays correctly formatted dates without redundant time information, thereby enhancing the clarity and usability of the visualizations.

enter image description here

Sample Data

# create sample data
dates = [f'2016-08-{d}T00:00:00.000000000' for d in range(9, 26, 2)] + ['2016-09-09T00:00:00.000000000']
depths = np.arange(1.25, 5.80, 0.25)
np.random.seed(365)
p1 = np.random.dirichlet(np.ones(10), size=1)[0]  # random probabilities for random.choice
p2 = np.random.dirichlet(np.ones(19), size=1)[0]  # random probabilities for random.choice
data = {'date': np.random.choice(dates, size=1000, p=p1), 'depth': np.random.choice(depths, size=1000, p=p2), 'capf': np.random.normal(0.3, 0.05, size=1000)}

df.head()

                            date  depth      capf
0  2016-08-19T00:00:00.000000000   4.75  0.339233
1  2016-08-19T00:00:00.000000000   3.00  0.370395
2  2016-08-21T00:00:00.000000000   5.75  0.332895
3  2016-08-23T00:00:00.000000000   1.75  0.237543
4  2016-08-23T00:00:00.000000000   5.75  0.272067

dfp.head()

date   2016-08-09  2016-08-11  2016-08-13  2016-08-15  2016-08-17  2016-08-19  2016-08-21  2016-08-23  2016-08-25  2016-09-09
depth                                                                                                                        
1.50     0.334661         NaN         NaN    0.302670    0.314186    0.325257    0.313645    0.263135         NaN         NaN
1.75     0.305488    0.303005    0.410124    0.299095    0.313899    0.280732    0.275758    0.260641         NaN    0.318099
2.00     0.322312    0.274105         NaN    0.319606    0.268984    0.368449    0.311517    0.309923         NaN    0.306162
2.25     0.289959    0.315081         NaN    0.302202    0.306286    0.339809    0.292546    0.314225    0.263875         NaN
2.50     0.314227    0.296968         NaN    0.312705    0.333797    0.299556    0.327187    0.326958         NaN         NaN
Phrixus answered 15/9, 2021 at 2:7 Comment(0)
E
-1

I found the simplest thing to do for me is to just get the ticks, reformat them, and then put them back. This avoids the need to figure out the number and location of the ticks, because matplotlib already has done that. You just substitute the text. (see note below)

def format_date_ticks(old_ticks:list[plt.Text])->list[str]:
    text = [l.get_text() for l in old_ticks] # plt.Text to str
    return pd.to_datetime(text).date # str to datetime, then format as desired

# ...

new_ticks = format_date_ticks(ax.get_xticklabels()) # get and transform old ticks
ax.set_xticklabels(new_ticks) # replace the old with new

Note: per comments on another answer:

This now gives ValueError: The number of FixedLocator locations (13), usually from a call to set_ticks, does not match the number of ticklabels (25)., so not sure it's quite right...

"Note that it is important to set both, the tick locations (set_xticks) as well as the tick labels (set_xticklabels), otherwise they would become out of sync." from matplotlib doc

Estren answered 6/3, 2024 at 7:4 Comment(0)
C
-3

I had a similar problem, but the date was the index. I've just converted the date to string (pandas 1.0) before plotting and it worked for me.

heat['date'] = heat.date.astype('string')
Cleocleobulus answered 3/3, 2020 at 14:24 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.