Plotly: How to group data and specify colors using go.box instead of px.box?
Asked Answered
P

1

8

The question:

Using plotly express you can group data and assign different colors using color=<group> in px.box(). But how can you do the same thing using plotly.graph_objects and go.box()

Some details:

Plotly Express is nice but sometimes we need more than the basics. So I tried to use Plotly Go instead but then I can't figure out how to box plots with boxes in groups without adding a go.Box for each group manually as in the documentation.

Here is the code that I took from the documentation for Plotly Express:

import plotly.express as px

df = px.data.tips()
fig = px.box(df, x="time", y="total_bill", color="smoker",
             notched=True, # used notched shape
             title="Box plot of total bill",
             hover_data=["day"] # add day column to hover data
            )
fig.show()

How can you achieve the same thing in Plotly Go? Because the color property is not recognised as valid.

import plotly.graph_objects as go

df = px.data.tips()
fig = go.Figure(go.Box(
    x=df.time, 
    y=df.total_bill, 
    color="smoker",
    notched=True, # used notched shape
            ))
fig.show()

Moreover, how can you define the colors for the boxes? Using marker_color only works with one color (can't give a list) in Plotly Go and sets all boxes to that color and it is not a valid property for Plotly Express. I tried using colorscale and that doesn't work either.

Psychosocial answered 8/3, 2020 at 14:16 Comment(5)
I'd be happy to answer your questions. But please make it a bit easier to answer one question at a time. As it now stands, you're running the risk of getting the question closed for being too unfocused. I'd at least suggest that you save the last part "Generally why is...etc" for later and raise a question on its own.Zamboanga
@Zamboanga I removed the last part. I can also remove the question at the end and have it on a different post. The reason why I posted the color question was that is essentially why I wanted to switch to Plotly Go over Plotly Express which seems to have more limited options.Psychosocial
I'll have a closer look at it tomorrow unless someone beats me to it.Zamboanga
How did my suggestion work out for you?Zamboanga
That sounds great. Thanks!Psychosocial
Z
10

Let's jump straight to the answer and shed some light on the details afterwards. In order to set the colors for your go.box figures you'll have to split the dataset in the groups you want to study, and assign a color to each subcategory using line=dict(color=<color>). The code snippet below will show you how you can use plotlys built-in colorcycle to get the same result you would using plotly express without specifying each color for each category. You'll also have to set boxmode='group' for the figure layout to prevent the boxes from being displayed on top of eachother.

Plot 1 - Using go.box:

enter image description here

Code 1 - Using go.box:

# imports
import plotly.graph_objects as go
import plotly.express as px

fig=go.Figure()
for i, smokes in enumerate(df['smoker'].unique()):
    df_plot=df[df['smoker']==smokes]
    #print(df_plot.head())

    fig.add_trace(go.Box(x=df_plot['time'], y=df_plot['total_bill'],
                         notched=True,
                         line=dict(color=colors[i]),
                         name='smoker=' + smokes))

fig.update_layout(boxmode='group', xaxis_tickangle=0)
fig.show()

Now for the...

how can you define the colors for the boxes?

...part.

The color of the boxes are defined by the fillcolor which defaults to a half-transparent variant of the line color. In the above example you can set a transparent green to all boxes using fillcolor='rgba(0,255,0,0.5)':

Plot 2: fillcolor='rgba(0,255,0,0.5)'

enter image description here

Or you can reference different colors of the same color cycle as you're using for the line colors using an offset version of the colors list like fillcolor=colors[i+4]

Plot 3: fillcolor=colors[i+4]

enter image description here

The absolutely simplest thing to do to set line and fillcolor would be to just set line=dict(color='black') and fillcolor='yellow' for all groups:

Plot 4: Back to the basics

enter image description here

Complete code:

# imports
import plotly.express as px
import plotly.graph_objects as go

# data
df = px.data.tips()

# plotly setup
fig=go.Figure()

# a plotly trace for each subcategory
for i, smokes in enumerate(df['smoker'].unique()):
    df_plot=df[df['smoker']==smokes]

    fig.add_trace(go.Box(x=df_plot['time'], y=df_plot['total_bill'],
                         notched=True,
                         line=dict(color='black'),
                         #line=dict(color=colors[i]),
                         fillcolor='yellow',
                         #fillcolor=colors[i+4],
                         name='smoker=' + smokes))

# figure layout adjustments
fig.update_layout(boxmode='group', xaxis_tickangle=0)
fig.show()

Some details about it all:

How can you achieve the same thing in Plotly Go? Because the color property is not recognised as valid.

If you study the documentation for go.box, you'll quickly discover that go.box has no color method, while px.box has got this:

color: str or int or Series or array-like
        Either a name of a column in `data_frame`, or a pandas Series or
        array_like object. Values from this column or array_like are used to
        assign color to marks.

In other words, what color in px.Box does for you, is to split up the dataset in, for example, unique groups in a dataset of a long format such as px.data.tips()

When it comes to go.box there is no such method and you'll just have to accept the ValueError:

ValueError: Invalid property specified for object of type plotly.graph_objs.Box: 'color'

Zamboanga answered 9/3, 2020 at 9:34 Comment(1)
This works perfectly. I did not think about using the for loop. What would be the reasoning for having something like this built into go.box but not px.box?Psychosocial

© 2022 - 2024 — McMap. All rights reserved.