How to slice a dataframe by selecting a range of columns and rows based on names and not indexes?
Asked Answered
O

3

8

This is a follow-up question of the question I asked here. There I learned a) how to do this for columns (see below) and b) that the selection of rows and columns seems to be quite differently handled in R which means that I cannot use the same approach for rows.

So suppose I have a pandas dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  columns=['c' + str(i) for i in range(6)],
                  index=["r" + str(i) for i in range(6)])

    c0  c1  c2  c3  c4  c5
r0   4   2   3   9   9   0
r1   9   0   8   1   7   5
r2   2   6   7   5   4   7
r3   6   9   9   1   3   4
r4   1   1   1   3   0   3
r5   0   8   5   8   2   9

then I can easily select rows and columns by their names like this:

print df.loc['r3':'r5', 'c1':'c4']

which returns

    c1  c2  c3  c4
r3   9   9   1   3
r4   1   1   3   0
r5   8   5   8   2

How would I do this in R? Given a dataframe like this

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

   c1 c2 c3 c4 c5 c6
r1  1  2  3  4  5  6
r2  2  3  4  5  6  7
r3  3  4  5  6  7  8
r4  4  5  6  7  8  9
r5  5  6  7  8  9 10
r6  6  7  8  9 10 11

Apparently, if I know the indexes of my desired rows/columns, I can simply do:

df[3:5, 1:4]

but I might delete rows/columns throughout my analysis so that I would rather select by name than by index. From the link above I learned that for columns the following would work:

subset(df, select=c1:c4)

which returns

  c1 c2 c3 c4
r1  1  2  3  4
r2  2  3  4  5
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
r6  6  7  8  9

but how could I also select a range of rows by name at the same time?

In this particular case I could of course use grep but how about columns that have arbitrary names?

And I don't want to use

df[c('r3', 'r4' 'r5'), c('c1','c2', 'c3', 'c4')]

but an actual slice.

Osithe answered 9/6, 2016 at 0:28 Comment(0)
F
8

You can use which() with rownames:

subset(df[which(rownames(df)=='r3'):which(rownames(df)=='r5'),], select=c1:c4)


   c1 c2 c3 c4
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
Flexion answered 9/6, 2016 at 0:36 Comment(3)
Yes, I should be more precise (will edit my question): It should work for arbitrary names; those ones here would indeed be easy to parse :)Osithe
ok I have read it too quickly at first. Is this what you need?Flexion
Great! Yes, that works fine. I upvote it for now and accept it later on depending on other answers' quality.Osithe
M
3

Use match to find the position of specific row names.

df[match("r3", rownames(df)):match("r5", rownames(df)), match("c1", colnames(df)):match("c4", colnames(df))]

   c1 c2 c3 c4
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
Miscellanea answered 9/6, 2016 at 0:37 Comment(5)
But then I need to specify the rows and columns what I actually want to avoid (I edited my question to make that clearer); just imagine this for 100 rows/columns you want to select...Osithe
Yes, that works, too (upvoted)! You just need to fix the typos in index.c.Osithe
Thanks for noticing!Miscellanea
Doesn't this assume that the row names are in lexicographic order? That may not always be the case (row names as ID's or something)Irresolute
I liked the version before better; the row and column names can be arbitrary, using r and c, respectively, was just an example.Osithe
E
3

You can write a function that will kinda give you the same behavior

'%:%' <- function(object, range) {
  FUN <- if (!is.null(dim(object))) {
    if (is.matrix(object)) colnames else names
  } else identity
  wh <- if (is.numeric(range)) range else which(FUN(object) %in% range)
  FUN(object)[seq(wh[1], wh[2])]
}

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

Use it like

df %:% c('c2', 'c4')
# [1] "c2" "c3" "c4"

rownames(df) %:% c('r2', 'r4')
# [1] "r2" "r3" "r4"

For your question

df[rownames(df) %:% c('r3', 'r5'), df %:% c('c1', 'c5')]
#    c1 c2 c3 c4 c5
# r3  3  4  5  6  7
# r4  4  5  6  7  8
# r5  5  6  7  8  9
Ehrlich answered 9/6, 2016 at 1:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.