lucene - Elasticsearch - higher scoring if higher frequency of term -


i have 2 documents, , searching keyword "twitter". suppose both documents blog posts "tags" field.

document has 1 term in "tags" field, , it's "twitter". document b has 100 terms in "tags" field, 3 of them "twitter".

elastic search gives higher score document though document b has higher frequency. score "diluted" because has more terms. how give document b higher score, since has higher frequency of search term?

i know elasticsearch/lucene performs normalization based on number of terms in document. how can disable normalization, document b gets higher score above?

as other answer says interesting see whether have same result on single shard. think , depends on norms tags field, taken account when computing score using tf/idf similarity (default).

in fact, lucene take account term frequency, in other words number of times term appears within field (1 or 3 in case), , inverted document frequency, in other words how term frequent in index, in order compare other terms in query (in case doesn't make difference if searching single term).

but there's factor called norms, rewards shorter fields , take account eventual index time boosting, can per field (in mapping) or per document. can verify norms reason of result enabling explain option in search request , looking @ explain output.

i guess fact first document contains tag makes more important other ones contains tag multiple times lot of ther tags well. if don't behaviour can disable norms in mapping tags field. should enabled default if field "index":"analyzed" (default). can either switch "index":"not_analyzed" if don't want tags field analyzed (it makes sense depends on data , domain) or add "omit_norms": true option in mapping tags field.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -