R Text mining - how to change texts in R data frame column into several columns with bigram frequencies? -

May 15, 2011

in addition question r text mining - how change texts in r data frame column several columns word frequencies? wondering how can manage make columns bigrams frequencies instead of word frequencies. again, many in advance!

this example data frame (thanks tyler rinker).

      person sex adult                                 state code 1         sam   m     0         computer fun. not fun.   k1 2        greg   m     0               no it's not, it's dumb.   k2 3     teacher   m     1                    should do?   k3 4         sam   m     0                  liar, stinks!   k4 5        greg   m     0               telling truth!   k5 6       sally   f     0                how can certain?   k6 7        greg   m     0                      there no way.   k7 8         sam   m     0                       distrust you.   k8 9       sally   f     0           talking about?   k9 10 researcher   f     1         shall move on?  then.  k10 11       greg   m     0 i'm hungry.  let's eat.  already?  k11

data set above:

library(qdap); data

the dev version of qdap (should go cran within next few days) ngrams. you'll need use dev version. on toy data set fast on larger data set such qdap's mraja1 data set requires ~5 minutes complete. could:

select bigrams more wisely (i.e., don't use them there's going ton)
wait time
run in parallel
figure out way this
get faster computer

here's code dev version of qdap , run bigram search:

library(devtools) install_github("qdap", "trinker") library(qdap)  ## gets bigrams bigrams <- sapply(ngrams(data$state)[[c("all_n", "n_2")]], paste, collapse=" ")  ## searches grouping variable bigram use termco(data$state, data$person, bigrams)   ## raw values termco(data$state, data$person, bigrams)[["raw"]]

Search This Blog

Three

R Text mining - how to change texts in R data frame column into several columns with bigram frequencies? -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -