Is There A Neat/Simplest Way To This data.table R Code?
Asked Answered
B

3

4

The STRATUM from OECD data is so long, for simplicity I put this name and would like to simplified it to a more short and precise naming as in the code below.

pisaMas[,`:=`
             (SchoolType = c(ifelse(STRATUM == "National Secondary School", "Public", 
                                    ifelse(STRATUM == "Religious School", "Religious", 
                                           ifelse(STRATUM == "MOE Technical School", "Technical",0)))))]
pisaMas[,table(SchoolType)]

I would like to know if there are a simple way to this problems, using data.table package.

Bipinnate answered 8/1, 2020 at 3:41 Comment(2)
I would create a two column dataframe with the long name in one column the short name in the other column, then merge on the long name and then drop the long name column, keeping the short name column. The only thing you need to do is be sure that you use the long-name column name for the names you do not want to replace in BOTH columns of the new Data Frame so you do not drop them or introduce NA values.Chaechaeronea
Yes, a join is the data.table approach for this.Hashish
H
6

Current development version of data.table has new function fcase (modeled after SQL CASE WHEN) for this situation:

pisaMas[ , SchoolType := fcase(
  STRATUM == "National Secondary School", "Public", 
  STRATUM == "Religious School", "Religious", 
  STRATUM == "MOE Technical School", "Technical",
  default = ''
)]
pisaMas[ , table(SchoolType)]

To get the development version, try

install.packages(
  'data.table', type = 'source',repos = 'http://Rdatatable.github.io/data.table'
)

If the simple install doesn't work, you can check the Installation wiki for some more details:

https://github.com/Rdatatable/data.table/wiki/Installation

You can also solve this with a lookup table, see this Q&A for details:

https://mcmap.net/q/1923170/-best-way-to-replace-a-lengthy-ifelse-structure-in-r

Hatter answered 8/1, 2020 at 4:12 Comment(4)
Interesting! Thanks. That's one cool way to tackle this problem. Do I really need to put the default = ' ' as well? Why?Bipinnate
@ZeeA. you can make default whatever you want. if you don't specify, it'll be default = NA_character_Hatter
unfortunately, I spent my whole day figuring out how to use fcase by following all the instructions online, but seems to failed with so many errors. Starting from removing the package and re-install it using the 'source'.Bipinnate
@ZeeA. oh no! sorry to hear that. If you can distill the issue you can file a usability request on GitHub. Or if there's something you're unsure about, you can try a new question specific to fcase.Hatter
B
0

This is what I come out with after a few thoughts.

#' First I create a function (rname.SchType) that have oldname and newname using else if:

rname.SchType <- function(x){
  if (is.na(x)) NA
  else if (x == "MYS - stratum 01: MOE National Secondary School\\Other States")"Public"
  else if(x == "MYS - stratum 02: MOE Religious School\\Other States")"Religious" 
  else if(x == "MYS - stratum 03: MOE Technical School\\Other States")"Technical"
  else if(x == "MYS - stratum 04: MOE Fully Residential School")"SBP"
  else if(x == "MYS - stratum 05: non-MOE MARA Junior Science College\\Other States")"MARA"
  else if(x == "MYS - stratum 06: non-MOE Other Schools\\Other States")"Private"
  else if(x == "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”")"Perlis Fully Residential"
  else if(x == "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”")"Putrajaya Fully Residential"
  else if(x == "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”")"Labuan Fully Residential"
}

By using the function I just created, I past it through the data.table with just a line of code, by applying base R (sapply) inside data.table, hence managed to avoid code clutter-ness and look much simpler:

pisaMalaysia[,`:=`(jenisSekolah = sapply(STRATUM,rname.SchType))]
Bipinnate answered 8/1, 2020 at 4:11 Comment(3)
May simpler (in your opinion!) but extremely inefficient.Hashish
Because it is not vectorized. You are using an R-level loop when you could use a C-level loop.Hashish
Thanks @Hashish ! That's whole new knowledge for me. Didn't know anything about the different between C-level or R-level before. I'll look into that!Bipinnate
B
0

I think I finally got the answer for my question above! This answer overcome the issue of 'not vectorized' as mentioned by @Roland, thank you sir! And it is in my opinion is much faster even though it took me literally couple of weeks to understand the concept and finding the right questions on the web!

First, I create a new data.table that consist of 2 columns, one with the original name and the second is the desired name for the school.

lookUpStratum <- data.table(STRATUM=c("MYS - stratum 01: MOE National Secondary School\\Other States",
                                      "MYS - stratum 02: MOE Religious School\\Other States",
                                      "MYS - stratum 03: MOE Technical School\\Other States",
                                      "MYS - stratum 04: MOE Fully Residential School",
                                      "MYS - stratum 05: non-MOE MARA Junior Science College\\Other States",
                                      "MYS - stratum 06: non-MOE Other Schools\\Other States",
                                      "MYS - stratum 07: Perlis non-“MOE Fully Residential Schools”",
                                      "MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools”",
                                      "MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools”"),
                            SCH.TYPE=c("Public",
                                       "Religious",
                                       "Technical",
                                       "SBP",
                                       "MARA",
                                       "Private",
                                       "Perlis Fully Residential",
                                       "Putrajaya Fully Residential",
                                       "Labuan Fully Residential"))

The answer lies on the setDT (Coerce lists and data.frames to data.table by reference).

Using this line of code I read here, it looks kinda long but it solved my problem! And to be honest I understand this first before I understand the shortest one code below.

setDT(pisaMalaysia)[,SCH.TYPE := lookUpStratum$SCH.TYPE[match(pisaMalaysia$STRATUM,lookUpStratum$STRATUM)]]

After a few minutes I finally managed to get my head around this code here and produced this code:

setDT(pisaMalaysia)[lookUpStratum,SCH.TYPE1 := i.SCH.TYPE, on = c(STRATUM = "STRATUM")]

I got these answers from the same post here.

To check if everything is the same:

table(pisaMalaysia$SCH.TYPE)
table(pisaMalaysia$SCH.TYPE1)
#' original data
pisaMalaysia[,table(STRATUM)]

results:

> table(pisaMalaysia$SCH.TYPE)
   Labuan Fully Residential                        MARA    Perlis Fully Residential 
                         54                         122                          78 
                    Private                      Public Putrajaya Fully Residential 
                        385                        4929                          78 
                  Religious                         SBP                   Technical 
                        273                        2661                         281 

> table(pisaMalaysia$SCH.TYPE1)
   Labuan Fully Residential                        MARA    Perlis Fully Residential 
                         54                         122                          78 
                    Private                      Public Putrajaya Fully Residential 
                        385                        4929                          78 
                  Religious                         SBP                   Technical 
                        273                        2661                         281 

> pisaMalaysia[,table(STRATUM)]
STRATUM
                      MYS - stratum 01: MOE National Secondary School\\Other States 
                                                                               4929 
                               MYS - stratum 02: MOE Religious School\\Other States 
                                                                                273 
                               MYS - stratum 03: MOE Technical School\\Other States 
                                                                                281 
                                     MYS - stratum 04: MOE Fully Residential School 
                                                                               2661 
                MYS - stratum 05: non-MOE MARA Junior Science College\\Other States 
                                                                                122 
                              MYS - stratum 06: non-MOE Other Schools\\Other States 
                                                                                385 
                       MYS - stratum 07: Perlis non-“MOE Fully Residential Schools” 
                                                                                 78 
MYS - stratum 08: Wilayah Persekutuan Putrajaya non-“MOE Fully Residential Schools” 
                                                                                 78 
   MYS - stratum 09: Wilayah Persekutuan Labuan non-“MOE Fully Residential Schools” 
                                                                                 54 

Thanks! Hope this will help others too.

Bipinnate answered 20/1, 2020 at 8:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.