r - Find rows in dataframe with maximum values grouped by values in another column -

July 15, 2010

i solve problem in r without using sql.

how can select rows max(column value), distinct column in sql?

sure, use sqldf it, there must cool apply method in r it, too?

setup data first read in data:

lines <- "id  home  datetime  player   resource 1   10   04/03/2009  john    399  2   11   04/03/2009  juliet  244 5   12   04/03/2009  borat   555 3   10   03/03/2009  john    300 4   11   03/03/2009  juliet  200 6   12   03/03/2009  borat   500 7   13   24/12/2008  borat   600 8   13   01/01/2009  borat   700 " df <- read.table(text = lines, header = true) df$datetime <- as.date(df$datetime, format = "%d/%m/%y")

1) base - by there many ways process using various packages here show base solution first:

> do.call("rbind", by(df, df$home, function(x) x[which.max(x$datetime), ]))    id home   datetime player resource 10  1   10 2009-03-04   john      399 11  2   11 2009-03-04 juliet      244 12  5   12 2009-03-04  borat      555 13  8   13 2009-01-01  borat      700

1a) base - ave , variation (also using base of r):

fun <- function(x) which.max(x) == seq_along(x) is.max <- ave(xtfrm(df$datetime), df$home, fun = fun) == 1 df[is.max, ]

2) sqldf , here using sqldf in case:

> library(sqldf) > sqldf("select id, home, max(datetime) datetime, player, resource  +        df  +        group home")   id home   datetime player resource 1  1   10 2009-03-04   john      399 2  2   11 2009-03-04 juliet      244 3  5   12 2009-03-04  borat      555 4  8   13 2009-01-01  borat      700

Search This Blog

Three

r - Find rows in dataframe with maximum values grouped by values in another column -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

Automatically create pages in phpfox -

c# - Farseer ContactListener is not working -