python - Understanding shannon entropy of a data set -


i'm reading machine learning in action , going through decision tree chapter. understand decision trees built such splitting data set gives way structure branches , leafs. gives more information @ top of tree , limits how many decisions need go through.

the book shows function determining shannon entropy of data set:

def calcshannonent(dataset):     numentries = len(dataset)     labelcounts = {}     featvec in dataset: #the number of unique elements , occurance         currentlabel = featvec[-1]         if currentlabel not in labelcounts.keys(): labelcounts[currentlabel] = 0         labelcounts[currentlabel] += 1     shannonent = 0.0     key in labelcounts:         prob = float(labelcounts[key])/numentries         shannonent -= prob * log(prob,2) #log base 2     return shannonent 

where input data set array of arrays each array represents potential classifiable feature:

dataset = [[1, 1, 'yes'],     [1, 1, 'yes'],     [1, 0, 'no'],     [0, 1, 'no'],     [0, 1, 'no']] 

what don't why shannon entropy function in book ever looking @ last element in feature array? looks calculating entropy "yes" or "no" items, , not entropy of of other features?

enter image description here

it doesn't make sense me because entropy data set

dataset = [[1, 1, 'yes'],     [1, 'asdfasdf', 'yes'],     [1900, 0, 'no'],     [0, 1, 'no'],     ['ddd', 1, 'no']] 

is same entropy above, though has lot more diverse data.

shouldn't other feature elements counted in order give total entropy of data set, or misunderstanding entropy calculation supposed do?

if curious, full source (which code came from) book here under chapter03 folder.

the potential ambiguity here dataset looking @ contains both features , outcome variable, outcome variable being in last column. problem trying solve "do feature 1 , feature 2 me predict outcome"?

another way state is, if split data according feature 1, better information on outcome?

in case, without splitting, outcome variable [ yes, yes, no, no, no ]. if split on feature 1, 2 groups: feature 1 = 0 -> outcome [ no, no ] feature 1 = 1 -> ouctome [ yes, yes, no ]

the idea here see if better off split. initially, had information, described shannon entropy of [ yes, yes, no, no, no ]. after split, have 2 groups, "better information" group feature 1 = 0: know in case outcome no, , measured entropy of [ no, no ].

in other words, approach figure out if out of features have available, there 1 which, if used, increased information on care about, is, outcome variable. tree building greedily pick feature highest information gain @ each step, , see if it's worth splitting further resulting groups.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -