r - table() generating NAs when there are no NAs in the underlying data -
i want generate column of counts of particular variable. easiest way seems using table(). reasonably small amounts of data, there seems no problem.
a <- data.frame(a1 = sample(1:1000, 100000, replace = true)) b <- data.frame(b1 = sample(1:1000, 100000, replace = true)) c <- cbind(a, b) c$countc <- table(as.factor(c$a1))[c$a1] summary(c$countc) min. 1st qu. median mean 3rd qu. max. 65 94 101 101 108 132 however, if i'm building table larger set (note i'm sampling 1:10k, rather 1:1k), generates nas, despite there being no nas in data i'm building table from:
a <- data.frame(a1 = sample(1:10000, 100000, replace = true)) b <- data.frame(b1 = sample(1:10000, 100000, replace = true)) c <- cbind(a, b) c$countc <- table(as.factor(c$a1))[c$a1] summary(c$a1) min. 1st qu. median mean 3rd qu. max. 1 2512 5005 5008 7502 10000 summary(c$countc) min. 1st qu. median mean 3rd qu. max. na's 1.00 8.00 10.00 10.18 12.00 25.00 7 the problem not occur if data not in data-frame.
a <- sample(1:10000, 1000000, replace = true) summary(table(as.factor(a))[a]) min. 1st qu. median mean 3rd qu. max. 57 94 101 101 108 144 does know reason why?
after installing data.table package , doing preliminaries...
require(data.table) n0<- 1e5 n <- 1e6 dt <- data.table(a1 = sample(1:n0, n, replace = true),b1 = sample(1:n0, n, replace = true)) this trick.
setkey(dt,a1) dt[ dt[,.n,by=a1], countc:=n ] when access data.table dt[i,j], can select rows i , else j, in data.frames.
dt[,.n,by=a1] selects rows (since i blank) , counts rows each "a1" using special variable .n.
after setting column "a1" key dt, can pass data.table -- in case dt[,.n,by=a1] -- in i merge information in latter data.table. in j, create new column in dt using countc:=n. 3 vignettes on data.table's cran page place start learning more how works.
the question @ hand. oh, think see original problem was. suppose unique(x)=c(1,2,4). if try table(x)[x], trying access table(x)[1], table(x)[2] , table(x)[4]. last 1 undefined since length of table 3. r returns na when access indices greater length of vector. example, @ (1:3)[4].
in case, if missing unique values in 1:n0 not @ top, see nas.
Comments
Post a Comment