What does levels mean in seaborn kde plot?
Asked Answered
D

2

9

I am trying to make a contour plot of my 2d data. However, I would like to input the contours manually. I found the "levels" option in seaborn.kde documentation, where I can define the levels for contours manually. However, I have no idea what these levels mean. The documentation gives this definition -

Levels correspond to iso-proportions of the density.

What does iso-proportions of density mean? Are there any references that I could read up on this?

Dessiatine answered 14/10, 2020 at 21:55 Comment(1)
Note that when levels is set to a single number, it is supposed to be the number of contour lines (or areas in case fill=True). When levels is an array, each of the entries defines a contour line; these numbers should be between 0 and 1 (close to 0 meaning almost all samples will fit into the contour; close to 1 means only the most central samples will fit into the contour). An array with one element will output exactly one contour line.Witching
A
2

Basically, the contour line for the level corresponding to 0.05 is drawn such that 5% of the distribution lies "below" it. Alternately, because the integral over the full density equals 1 (that's what makes it a PDF), the integral over the area outside of the contour line will be 0.05.

Amygdaline answered 14/10, 2020 at 22:25 Comment(0)
C
4

The level here describes the cumulative mass below a given threshold. As described with an example in the documentation.

Number of contour levels or values to draw contours at. A vector argument must have increasing values in [0, 1]. Levels correspond to iso-proportions of the density: e.g., 20% of the probability mass will lie below the contour drawn for 0.2. Only relevant with bivariate data

You can describe levels in 2 ways -

  1. Specify the number of partitions you want in your probability mass function (levels = 5 makes 4 contour lines that partition the probability mass function into 5 parts)
  2. Explicitly mention the thresholds for each of the contours as a vector

The partitions mentioned here describe the area outside the contour plot. So, 0.2 means, 20% of the probability mass lies outside the first contour that represents 20%. Playing around with the following code makes this clearer.

I show both the implementations below for your reference.

import seaborn as sns
geyser = sns.load_dataset("geyser",)

#Levels as equal cuts in the probability mass function
sns.kdeplot(
    data=geyser, x="waiting", y="duration", hue="kind",
    levels=5
)

enter image description here

#Levels as explicitly described cuts in the probability mass function
sns.kdeplot(
    data=geyser, x="waiting", y="duration", hue="kind",
    levels=[0.3, 0.4, 0.8]
)

enter image description here

Crustaceous answered 14/10, 2020 at 22:34 Comment(2)
I guess what confused me was the "probability mass will lie below the contour drawn" statement. In case of 2d contours it would have been more clear if it said; area outside of the contour (or as @mwaskom said, the integral over the area outside of the contour line).Dessiatine
Yea the examples I show to clarify that as well. Edited for more clarification. Also, another part of your question was to add your custom thresholds which are also answered in my answer.Crustaceous
A
2

Basically, the contour line for the level corresponding to 0.05 is drawn such that 5% of the distribution lies "below" it. Alternately, because the integral over the full density equals 1 (that's what makes it a PDF), the integral over the area outside of the contour line will be 0.05.

Amygdaline answered 14/10, 2020 at 22:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.