Information extraction, indexing and search of PDF, word and text documents with MongoDB -


does mongodb have feature store pdf, text or .doc/docx documents , search them or match between 2 documents on keyword found in content?

for example:

i might want store 1 document called 'claim.txt' has values for
diagnosis code, short description, date , amount in it.
need store 1 called 'physician_diagnosis.pdf' has, among other text, matching short description in it.

i issue query find document has both matching date , same diagnosis. (e.g. 'pneumonia', '12/12/2012')

is possible mongodb using api, or need pre-processing?

if possible, please point me example , documentation.

your task better suited solr (http://lucene.apache.org/solr/), has inputs many different documents (http://wiki.apache.org/solr/extractingrequesthandler). have write code proper extraction though.

mongodb more meant structured data - although call them documents, not mean "pdf documents" or "word documents" here. it's generic format supports nested field types call document, opposed relational database row doesn't allow that.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -