Normalizing data in R -


hello have following data.frame (appended). add additional column normalized counts n = n/sum(n). had previous data.frame without date column , able using

oo[, n.norm := n/sum(n), by=operator]

i have tried add date function

oo[, n.norm := n/sum(n), by=operator,date] 

but receive error message

error in `[.data.frame`(oo, , `:=`(n.norm, n/sum(n)), = operator, date) :    unused argument(s) (by = operator) 

for example operator 'a' in month 'jan 2013', have number of counts n of each roi_score = c("good","ok","poor","crap"). sum n combination (a , jan 2013) , divide counts n sum(n)

on note, can provide me decent introduction manipulating data.frames in r

structure(list(operator = structure(c(1l, 1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l,  4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 5l, 5l,  5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l, 5l), .label = c("a",  "d", "j", "l", "m"), class = "factor"), roi_score = structure(c(1l,  1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l,  4l, 4l, 4l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l,  3l, 3l, 4l, 4l, 4l, 4l, 4l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l,  2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 1l, 1l, 1l, 1l, 1l,  2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 4l, 1l,  1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 4l, 4l,  4l, 4l, 4l), .label = c("crap", "good", "ok", "poor"), class = "factor"),      date = c("apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013", "apr 2013", "feb 2013", "jan 2013", "mar 2013",      "may 2013"), n = c(0, 0, 0, 0, 0, 1, 2, 15, 1, 5, 3, 2, 3,      1, 0, 3, 0, 5, 5, 1, 0, 0, 0, 1, 0, 14, 17, 16, 8, 7, 5,      10, 6, 1, 5, 24, 27, 31, 16, 15, 0, 0, 0, 0, 0, 26, 24, 20,      11, 18, 3, 4, 17, 3, 2, 20, 36, 12, 21, 9, 0, 0, 0, 0, 0,      3, 12, 5, 12, 4, 0, 0, 3, 4, 0, 29, 37, 41, 25, 10, 0, 0,      0, 0, 0, 9, 9, 15, 17, 3, 6, 4, 5, 4, 1, 14, 13, 9, 15, 9     )), .names = c("operator", "roi_score", "date", "n"), row.names = c(na,  100l), class = "data.frame") 

i uncertain if data in data.frame or data.table format. here code, adapted solution given arun (reshape/remould data frame create normalized bar chart , pie chart)

df <- data.frame(read.csv("/misc/jaguar_data/report/system/db_fs/roi_scores.csv")) #get date nice structure faceting df$date = strftime(strptime(df$date,f="%d/%m/%y"), "%b %y") dt <- data.table(df) ops <- as.character(unique(dt$operator)) scr <- as.character(unique(dt$roi_score)) dts <- unique(dt$date)  oo <- setkey(dt[, .n, by="operator,roi_score,date"], operator, roi_score,date)[cj(ops, scr,dts)][is.na(n), n:= 0l]  oo[, n.norm := n/sum(n), by=operator] 

your code (almost) perfect. 2 slight issues.

1: using data.table syntax, instead of oo being data.frame should data.table. use:

 library(data.table)    oo <- data.table(oo) 

2: when using by more 1 column, make sure wrap columns in list(..) or in 1 single comma-separated string. examples

 oo[, n.norm := n/sum(n), by=list(operator,date)]   # - or - #  oo[, n.norm := n/sum(n), by="operator,date"] 

edit: if hoping divide each total each operator-date group, code should above. if instead, want divide total of entire data, use

 oo[, n.norm := n/sum(dt$n), by=list(operator,date)] 

fixing 2 things , using else have it:

     operator roi_score     date  n    n.norm   1:             crap apr 2013  0 0.0000000   2:             crap feb 2013  0 0.0000000   3:             crap jan 2013  0 0.0000000   4:             crap mar 2013  0 0.0000000   5:             crap may 2013  0 0.0000000  ---                                           96:        m      poor apr 2013 14 0.4827586  97:        m      poor feb 2013 13 0.5000000  98:        m      poor jan 2013  9 0.3103448  99:        m      poor mar 2013 15 0.4166667 100:        m      poor may 2013  9 0.6923077 

edit 2:

just note. in general, if using expressions within [brackets], assign-by-reference operator :=, object should data.table.

if see error such

 error in `[.data.frame`( _<your object name>_, ... 

then due fact either (a) object not data.table or (b) forgot load data.table package.


Comments

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

Automatically create pages in phpfox -

c# - Farseer ContactListener is not working -