r - data.table object trying to effect ddply like operation using by() -
i've searched quite bit no avail, let alone trying make work myself, without further do:
i have data.table:
dt = data.table(date=rep(c(as.date("2010-01-01"),as.date("2010-01-02")), each=3), bucket=rep(c("bucket1","bucket2","bucket3"),each=2), kbucket=c("(0,.5]","(.5,1]","(1,1.5]","(1.5,2]","(1.5,2]","(2.5,3]"),vol=1:6,o=10:15,m=20:25)
which looks like:
date bucket kbucket vol o m 1: 2010-01-01 bucket1 (0,.5] 1 10 20 2: 2010-01-01 bucket1 (.5,1] 2 11 21 3: 2010-01-01 bucket2 (1,1.5] 3 12 22 4: 2010-01-02 bucket2 (1.5,2] 4 13 23 5: 2010-01-02 bucket3 (1.5,2] 5 14 24 6: 2010-01-02 bucket3 (2.5,3] 6 15 25
i've used ddply so, on df, facsimile of dt, data frame:
out <- ddply(df,.(date,bucket,kbucket),wrap_summarize)
where wrap_summarize defined as:
wrap_summarize = function(x) { out <- summarize( x, n = length(x$date), sumvol = sum(x$vol), sumo = sum(x$o), avgm = mean(x$m,na.rm=true)) }
to get
date bucket kbucket n sumvol sumo avgm 1 2010-01-01 bucket1 (.5,1] 1 2 11 21 2 2010-01-01 bucket1 (0,.5] 1 1 10 20 3 2010-01-01 bucket2 (1,1.5] 1 3 12 22 4 2010-01-02 bucket2 (1.5,2] 1 4 13 23 5 2010-01-02 bucket3 (1.5,2] 1 5 14 24 6 2010-01-02 bucket3 (2.5,3] 1 6 15 25
this desired result.
the actual data has structure hundreds of thousands of rows. need data.table methods. try this:
test <- dt[,list(n=length(dt$date),sumvol=sum(dt$vol),sumo=sum(dt$o),avgm=mean(dt$m,na.rm=t)), by=list(date,bucket,kbucket)]
only get, not desired:
date bucket kbucket n sumvol sumo avgm 1: 2010-01-01 bucket1 (0,.5] 6 21 75 22.5 2: 2010-01-01 bucket1 (.5,1] 6 21 75 22.5 3: 2010-01-01 bucket2 (1,1.5] 6 21 75 22.5 4: 2010-01-02 bucket2 (1.5,2] 6 21 75 22.5 5: 2010-01-02 bucket3 (1.5,2] 6 21 75 22.5 6: 2010-01-02 bucket3 (2.5,3] 6 21 75 22.5
i think need use .sd here, @ point, thought best ask , share problem if not efficient solution. in advance!
you're looking this:
dt[,list( .n, sumvol=sum(vol), sumo=sum(o), avgm=mean(m,na.rm=t) ),by=list(date,bucket,kbucket)]
which gives
# date bucket kbucket n sumvol sumo avgm # 1: 2010-01-01 bucket1 (0,.5] 1 1 10 20 # 2: 2010-01-01 bucket1 (.5,1] 1 2 11 21 # 3: 2010-01-01 bucket2 (1,1.5] 1 3 12 22 # 4: 2010-01-02 bucket2 (1.5,2] 1 4 13 23 # 5: 2010-01-02 bucket3 (1.5,2] 1 5 14 24 # 6: 2010-01-02 bucket3 (2.5,3] 1 6 15 25
Comments
Post a Comment