nlp - Part of speech for unknown and known words -
what different between part of speech tagging unknown words , part of speech tagging known words. there tool can predict part of speech tagging words ..
one common way of handling out-of-vocabulary words replacing words low occurrence (e.g., frequency < 3) in training corpus token *rare*, tagger capture how tag rare words. in testing phase, treat every word not in tagger's vocabulary *rare*.
an simpler way tag every out-of-vocabulary word majority tag. following code using nltk toolkit tags every unseen word 'nn'.
tagger = nltk.unigramtagger(trainingcorpus, backoff=nltk.defaulttagger('nn'))
Comments
Post a Comment