R Text mining - how to change texts in R data frame column into several columns with bigram frequencies? -
in addition question r text mining - how change texts in r data frame column several columns word frequencies? wondering how can manage make columns bigrams frequencies instead of word frequencies. again, many in advance!
this example data frame (thanks tyler rinker).
person sex adult state code 1 sam m 0 computer fun. not fun. k1 2 greg m 0 no it's not, it's dumb. k2 3 teacher m 1 should do? k3 4 sam m 0 liar, stinks! k4 5 greg m 0 telling truth! k5 6 sally f 0 how can certain? k6 7 greg m 0 there no way. k7 8 sam m 0 distrust you. k8 9 sally f 0 talking about? k9 10 researcher f 1 shall move on? then. k10 11 greg m 0 i'm hungry. let's eat. already? k11
data set above:
library(qdap); data
the dev version of qdap
(should go cran within next few days) ngrams. you'll need use dev version. on toy data set fast on larger data set such qdap
's mraja1
data set requires ~5 minutes complete. could:
- select bigrams more wisely (i.e., don't use them there's going ton)
- wait time
- run in parallel
- figure out way this
- get faster computer
here's code dev version of qdap
, run bigram search:
library(devtools) install_github("qdap", "trinker") library(qdap) ## gets bigrams bigrams <- sapply(ngrams(data$state)[[c("all_n", "n_2")]], paste, collapse=" ") ## searches grouping variable bigram use termco(data$state, data$person, bigrams) ## raw values termco(data$state, data$person, bigrams)[["raw"]]
Comments
Post a Comment