Information extraction, indexing and search of PDF, word and text documents with MongoDB -


does mongodb have feature store pdf, text or .doc/docx documents , search them or match between 2 documents on keyword found in content?

for example:

i might want store 1 document called 'claim.txt' has values for
diagnosis code, short description, date , amount in it.
need store 1 called 'physician_diagnosis.pdf' has, among other text, matching short description in it.

i issue query find document has both matching date , same diagnosis. (e.g. 'pneumonia', '12/12/2012')

is possible mongodb using api, or need pre-processing?

if possible, please point me example , documentation.

your task better suited solr (http://lucene.apache.org/solr/), has inputs many different documents (http://wiki.apache.org/solr/extractingrequesthandler). have write code proper extraction though.

mongodb more meant structured data - although call them documents, not mean "pdf documents" or "word documents" here. it's generic format supports nested field types call document, opposed relational database row doesn't allow that.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Socket.connect doesn't throw exception in Android -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -