Issues with Data Preprocessing and Changing Type of DataFrame Columns
Asked Answered



I defined student_sub_set dataframe as below:

# select the subset of characteristics for the regression
student_sub_set = student[['acad_lang_home', 'absent_freq','tired_freq','sex',
                           'bullying','like_math',  'clear_math',
                           'disorder_math', 'confident_math',  'value_math',
                           'like_science',  'clear_science','confident_science',  'value_science','study_support',
                           'parent_edu_max', 'internet_access',
                           'parent_immig_1', 'mmat_avg', 'ssci_avg']].dropna()

when I run I get this output:

Int64Index: 2565 entries, 1 to 4573
Data columns (total 21 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   acad_lang_home     2565 non-null   category
 1   absent_freq        2565 non-null   category
 2   tired_freq         2565 non-null   category
 3   sex                2565 non-null   object  
 4   bullying           2565 non-null   category
 5   like_math          2565 non-null   category
 6   clear_math         2565 non-null   category
 7   disorder_math      2565 non-null   category
 8   confident_math     2565 non-null   category
 9   value_math         2565 non-null   category
 10  like_science       2565 non-null   category
 11  clear_science      2565 non-null   category
 12  confident_science  2565 non-null   category
 13  value_science      2565 non-null   category
 14  study_support      2565 non-null   category
 15  parent_edu_max     2565 non-null   category
 16  internet_access    2565 non-null   float64 
 17  desired_edu        2565 non-null   category
 18  parent_immig_1     2565 non-null   float64 
 19  mmat_avg           2565 non-null   float64 
 20  ssci_avg           2565 non-null   float64 
dtypes: category(16), float64(4), object(1)
memory usage: 162.9+ KB

Then I defined x_stud as below:

X_stud = student_sub_set[['acad_lang_home', 'absent_freq','tired_freq','sex', 'bullying','like_math', 'clear_math', 'disorder_math', 'confident_math', 'value_math', 'like_science', 'clear_science','confident_science', 'value_science','study_support', 'parent_edu_max', 'internet_access', 'desired_edu', 'parent_immig_1']]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2565 entries, 1 to 4573
Data columns (total 45 columns):
 #   Column                                                Non-Null Count  Dtype  
---  ------                                                --------------  -----  
 0   internet_access                                       2565 non-null   float64
 1   parent_immig_1                                        2565 non-null   float64
 2   acad_lang_home_Sometimes                              2565 non-null   uint8  
 3   acad_lang_home_Almost always                          2565 non-null   uint8  
 4   acad_lang_home_Always                                 2565 non-null   uint8  
 5   absent_freq_Once every two month                      2565 non-null   uint8  
 6   absent_freq_Once a month                              2565 non-null   uint8  
 7   absent_freq_Once every two weeks                      2565 non-null   uint8  
 8   absent_freq_Once a week                               2565 non-null   uint8  
 9   tired_freq_Sometimes                                  2565 non-null   uint8  
 10  tired_freq_Almost every day                           2565 non-null   uint8  
 11  tired_freq_Every day                                  2565 non-null   uint8  
 12  sex_Male                                              2565 non-null   uint8  
 13  bullying_About Monthly                                2565 non-null   uint8  
 14  bullying_About Weekly                                 2565 non-null   uint8  
 15  like_math_Somewhat Like Learning Mathematics          2565 non-null   uint8  
 16  like_math_Very Much Like Learning Mathematics         2565 non-null   uint8  
 17  clear_math_Moderate Clarity of Instruction            2565 non-null   uint8  
 18  clear_math_High Clarity of Instruction                2565 non-null   uint8  
 19  disorder_math_Some Lessons                            2565 non-null   uint8  
 20  disorder_math_Most Lessons                            2565 non-null   uint8  
 21  confident_math_Somewhat Confident in Mathematics      2565 non-null   uint8  
 22  confident_math_Very Confident in Mathematics          2565 non-null   uint8  
 23  value_math_Somewhat Value Mathematics                 2565 non-null   uint8  
 24  value_math_Strongly Value Mathematics                 2565 non-null   uint8  
 25  like_science_Somewhat Like Learning Science           2565 non-null   uint8  
 26  like_science_Very Much Like Learning Science          2565 non-null   uint8  
 27  clear_science_Moderate Clarity of Instruction         2565 non-null   uint8  
 28  clear_science_High Clarity of Instruction             2565 non-null   uint8  
 29  confident_science_Somewhat Confident in Science       2565 non-null   uint8  
 30  confident_science_Very Confident in Science           2565 non-null   uint8  
 31  value_science_Somewhat Value Science                  2565 non-null   uint8  
 32  value_science_Strongly Value Science                  2565 non-null   uint8  
 33  study_support_Either Own Room or Internet Connection  2565 non-null   uint8  
 34  study_support_Both Own Room and Internet Connection   2565 non-null   uint8  
 35  parent_edu_max_Lower Secondary                        2565 non-null   uint8  
 36  parent_edu_max_Upper Secondary                        2565 non-null   uint8  
 37  parent_edu_max_Post-secondary but not University      2565 non-null   uint8  
 38  parent_edu_max_University or Higher                   2565 non-null   uint8  
 39  desired_edu_ISCED Level 2                             2565 non-null   uint8  
 40  desired_edu_ISCED Level 3                             2565 non-null   uint8  
 41  desired_edu_ISCED Level 4                             2565 non-null   uint8  
 42  desired_edu_ISCED Level 5                             2565 non-null   uint8  
 43  desired_edu_ISCED Level 6                             2565 non-null   uint8  
 44  desired_edu_ISCED Level 7                             2565 non-null   uint8  
dtypes: float64(2), uint8(43)
memory usage: 167.8 KB

what is difference between them? I can not figure out why type of columns of this two dataframes are not as the same of each other!. I wached this code alot but I can not figure out the differnces between them. can anyone tell me the cause of this difference?

Drachm answered 10/3 at 23:36 Comment(1)
There is really nothing that suggests that you are running in our second output since none of the columns you give from student_sub_set is apparent.Bumbling

It is not likely that you are outputting because, based on the type of features in the student_sub_set dataframe and the definition of X_stud, you have to see this output for

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2565 entries, 1 to 4573
Data columns (total 19 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   acad_lang_home     2565 non-null   category
 1   absent_freq        2565 non-null   category
 2   tired_freq         2565 non-null   category
 3   sex                2565 non-null   object  
 4   bullying           2565 non-null   category
 5   like_math          2565 non-null   category
 6   clear_math         2565 non-null   category
 7   disorder_math      2565 non-null   category
 8   confident_math     2565 non-null   category
 9   value_math         2565 non-null   category
 10  like_science       2565 non-null   category
 11  clear_science      2565 non-null   category
 12  confident_science  2565 non-null   category
 13  value_science      2565 non-null   category
 14  study_support      2565 non-null   category
 15  parent_edu_max     2565 non-null   category
 16  internet_access    2565 non-null   float64 
 17  desired_edu        2565 non-null   category
 18  parent_immig_1     2565 non-null   float64 
dtypes: category(16), float64(2), object(1)
memory usage: 122.8+ KB
Sloth answered 29/6 at 10:8 Comment(0)

Based on the information it is not likely that you are outputting, as insinuated in the post. None of the columns appear in the output.

Rework the code and make sure your are creating x_stud as intended.

If, on the other hand, you are actually aggregating the data further, as is insinuated by derivative category names such as acad_lang_home_Sometimes. These aggregation steps needs to be posted.

Bumbling answered 13/3 at 5:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.