algorithm - Find the optimum number of non uniform bins -
r - problem: find optimum number of non-uniform bins show range of data points.
i have bunch of data points (let assume different prices of different mobiles). need categorize these mobile phones categories (based on price). bin size (in example refers price range) need not uniform (there might lots of mobiles in low price category , few in long tail category).
is there efficient algorithm find optimum number of bins required , number of data points (in case mobile phones) shall go each category.
this not standard formula, wanted post seem work data set tested.
find average price of mobiles.
ex: 5 mobiles prices 10, 20, 40, 80, 200
avg 350/5 = 70
subtract minimum price average price: 70 - 10 = 60 -> name n1
subtract avg price max price: 200 - 70 = 130 -> name n2
find ratio n2/n1 : 130/60: 2
this indicates better have 2 bins @ lower price range every 1 bin @ higher range.
so, example take 2 bins below 70. range 0 - 35(2 mobiles), 36 - 70(1 mobile)
1 bin above 70: range 71 - 200(2 mobiles)
as can see, number of bins , bin sizes reasonably optimal.
Comments
Post a Comment