Information extraction, indexing and search of PDF, word and text documents with MongoDB -
does mongodb have feature store pdf, text or .doc/docx documents , search them or match between 2 documents on keyword found in content?
for example:
i might want store 1 document called 'claim.txt' has values for
diagnosis code, short description, date , amount in it.
need store 1 called 'physician_diagnosis.pdf' has, among other text, matching short description in it.
i issue query find document has both matching date , same diagnosis. (e.g. 'pneumonia', '12/12/2012')
is possible mongodb using api, or need pre-processing?
if possible, please point me example , documentation.
your task better suited solr (http://lucene.apache.org/solr/), has inputs many different documents (http://wiki.apache.org/solr/extractingrequesthandler). have write code proper extraction though.
mongodb more meant structured data - although call them documents, not mean "pdf documents" or "word documents" here. it's generic format supports nested field types call document, opposed relational database row doesn't allow that.
Comments
Post a Comment