r - Find rows in dataframe with maximum values grouped by values in another column -
i solve problem in r without using sql.
how can select rows max(column value), distinct column in sql?
sure, use sqldf it, there must cool apply method in r it, too?
setup data first read in data:
lines <- "id home datetime player resource 1 10 04/03/2009 john 399 2 11 04/03/2009 juliet 244 5 12 04/03/2009 borat 555 3 10 03/03/2009 john 300 4 11 03/03/2009 juliet 200 6 12 03/03/2009 borat 500 7 13 24/12/2008 borat 600 8 13 01/01/2009 borat 700 " df <- read.table(text = lines, header = true) df$datetime <- as.date(df$datetime, format = "%d/%m/%y")
1) base - by there many ways process using various packages here show base solution first:
> do.call("rbind", by(df, df$home, function(x) x[which.max(x$datetime), ])) id home datetime player resource 10 1 10 2009-03-04 john 399 11 2 11 2009-03-04 juliet 244 12 5 12 2009-03-04 borat 555 13 8 13 2009-01-01 borat 700
1a) base - ave , variation (also using base of r):
fun <- function(x) which.max(x) == seq_along(x) is.max <- ave(xtfrm(df$datetime), df$home, fun = fun) == 1 df[is.max, ]
2) sqldf , here using sqldf in case:
> library(sqldf) > sqldf("select id, home, max(datetime) datetime, player, resource + df + group home") id home datetime player resource 1 1 10 2009-03-04 john 399 2 2 11 2009-03-04 juliet 244 3 5 12 2009-03-04 borat 555 4 8 13 2009-01-01 borat 700
Comments
Post a Comment