Drawing line plot for a histogram
Asked Answered
C

1

6

I'm trying to reproduce this chart using Altair as much as I can. https://fivethirtyeight.com/wp-content/uploads/2014/04/hickey-bechdel-11.png?w=575

I'm stuck at getting the black line dividing pass/fail. This is similar to this Altair example: https://altair-viz.github.io/gallery/step_chart.html. However: in the 538 viz the value for the final date must be extended for the full width of that last element. In the step chart example and my solution, the line stops as soon as the last date element is met.

I have looked at altair's github and google groups and found nothing similar to this problem.

import altair as alt
import pandas as pd

movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']

base=alt.Chart(movies).encode(
  alt.X("year:N",bin=alt.BinParams(step=5,extent=[1970,2015]),axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),title=None),  alt.Y("count()",stack='normalize',title=None,axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1]))

).properties(width=400)
main=base.transform_calculate(cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
                ).mark_bar(stroke='white' #add horizontal lines
                ).encode(  
  alt.Color("clean_test:N",scale=alt.Scale(
      domain=domain,
      range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
    ,order=alt.Order('cleanrank:O', sort='ascending')
)

extra=base.transform_calculate(cleanpass='datum.clean_test == "ok" ? "PASS" : datum.clean_test == "dubious" ? "PASS" : "FAIL"'
                      ).mark_line(interpolate='step-after'
                      ).encode(alt.Color("cleanpass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
                      )



alt.layer(main,extra).configure_scale(
    bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')
Chain answered 10/9, 2019 at 22:23 Comment(0)
P
0

One - rather hacky - way to make the step chart cover the beginning of the first until the end of the last bin is to control the bin positions manually (using the rank of the ordered bins).

This way we can add two lines: one with 'step-after' and another one with step-before shifted by one bin. From here on, the tick labels would still need to be replaced & centered with the appropriate bin labels, e.g. the levels from pd.cut...

enter image description here

Dataframe preparation

import altair as alt
import pandas as pd

movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']

movies['year_bin'] = pd.cut(movies['year'], range(1970, 2016, 5))
movies['year_rank'] = movies['year_bin'].cat.codes
movies = movies[movies['year_rank']>=0]
df_plot = movies[['year_rank', 'clean_test']].copy()
df_plot['year_rank_end'] = df_plot['year_rank'] + 1
df_plot['clean_pass'] = df_plot['clean_test'].apply(lambda x: 'PASS' if x in ['ok', 'dubious'] else 'FAIL')

Chart declaration

base=alt.Chart(df_plot).encode(
    x=alt.X('year_rank', 
        axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),
        title=None
        ),  
  x2='year_rank_end',
  y=alt.Y('count()',title=None, stack='normalize',
        axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1])
        )
).properties(width=400)

main=base.transform_calculate(
    cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
    ).mark_bar(
        stroke='white' #add horizontal lines
    ).encode( 
  alt.Color("clean_test:N",scale=alt.Scale(
      domain=domain,
      range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
    ,order=alt.Order('cleanrank:O', sort='ascending')
)

extra=base.transform_calculate(
    ).mark_line(
        interpolate='step-after'
    ).encode(
        alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
    )

extra2=base.transform_calculate(
    # shift data by one bin, so that step-before matches the unshifted step-after
    year_rank='datum.year_rank +1' 
    ).mark_line(
        interpolate='step-before'
    ).encode(
        alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']), legend=None)
    )

alt.layer(main, extra, extra2).configure_scale(
    bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')
Phlox answered 28/11, 2022 at 0:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.