PatsyError: Number of rows mismatch between data argument and column (statsmodels)
Asked Answered
L

1

8

I'm working with statsmodels using R-style formulas with the Patsy package and receiving an error I can't make heads or tails of, any tips or tricks would be greatly appreciated.

PatsyError: Number of rows mismatch between data argument and C('Industry_Banking&CapitalMarkets') (8137 versus 1)

The DataFrame does have 8137 rows and no missing data

full code is below

mixed = smf.mixedlm("""count_SoldServiceName ~ date_int + AzureActiveEngagementCount + AzureEngagementPartnerCount 
                     + DCount_learning_path_name + Industry_Automotive + C('Industry_Banking&CapitalMarkets') + C('Industry_Chemicals&Agrochemicals') + Industry_CivilianGovernment
                     + Industry_ConsumerGoods + C('Industry_Defense&Intelligence') + Industry_DiscreteManufacturing + Industry_Energy + Industry_Gaming 
                     + Industry_HealthPayor + Industry_HealthProvider + Industry_HigherEducation + Industry_Insurance + C('Industry_Media&Entertainment') + Industry_Nonprofit 
                     + Industry_PartnerProfessionalServices + Industry_Pharmaceuticals + C('Industry_Primary&SecondaryEdu/K-12') + Industry_ProfessionalServices 
                     + C('Industry_PublicSafety&Justice') + Industry_Retailers + Industry_SmartSpaces + Industry_Telecommunications +  C('Industry_Travel,Transport&Hospitality') 
                     + Industry_other + InvestmentArea_AA + InvestmentArea_ACO + InvestmentArea_CSE + InvestmentArea_CSM + InvestmentArea_ECIF + InvestmentArea_FT 
                     + InvestmentArea_GBB + InvestmentArea_PAL + active_flag_int + annual_sales_in_us_dollars + commitment_int 
                     + completed_lp_learners + edx_number_completed_courses + employees_total + esi_offer_int
                     +  health_int + s500_int + segmentname_int + fundamentals_flag + role_based_flag"""                    
                     ,workloads_agg
                     ,groups=workloads_agg['tpid_sub']
                     ,exog_re=workloads_agg['date_int']
                     ,missing='drop'
                   ,use_sqrt=True)
mixed_fit = mixed_complete2.fit(method=['bfgs', 'lbfgs', 'cg','powell'])
Lovash answered 7/11, 2019 at 0:29 Comment(0)
L
12

Just in case anyone else comes across this. Solution was to just rename all columns and remove all instances of special characters like ',', '/', '&' etc.

Lovash answered 7/11, 2019 at 1:11 Comment(1)
If I may add, after removing all special characters, also make sure that you did not extra escape any column names, e.g. """ column1~columns2 + column3""" not """ column1~columns2 + 'column3' """Elater

© 2022 - 2024 — McMap. All rights reserved.