python - How to determine the learning rate and the variance in a gradient descent algorithm? -


i started learn machine learning last week. when want make gradient descent script estimate model parameters, came across problem: how choose appropriate learning rate , variance。i found that,different (learning rate,variance) pairs may lead different results, times can't convergence. also, if change training data set, well-chose (learning rate,variance)pair not work. example(script below),when set learning rate 0.001 , variance 0.00001, 'data1', can suitable theta0_guess , theta1_guess. ‘data2’, can't make algorithem convergence, when tried dozens of (learning rate,variance)pairs still can't reach convergence.

so if tell me there criteria or methods determine (learning rate,variance)pair.

import sys  data1 = [(0.000000,95.364693) ,     (1.000000,97.217205) ,     (2.000000,75.195834),     (3.000000,60.105519) ,     (4.000000,49.342380),     (5.000000,37.400286),     (6.000000,51.057128),     (7.000000,25.500619),     (8.000000,5.259608),     (9.000000,0.639151),     (10.000000,-9.409936),     (11.000000, -4.383926),     (12.000000,-22.858197),     (13.000000,-37.758333),     (14.000000,-45.606221)]  data2 = [(2104.,400.),      (1600.,330.),      (2400.,369.),      (1416.,232.),      (3000.,540.)]  def create_hypothesis(theta1, theta0):     return lambda x: theta1*x + theta0  def linear_regression(data, learning_rate=0.001, variance=0.00001):       theta0_guess = 1.     theta1_guess = 1.       theta0_last = 100.     theta1_last = 100.      m = len(data)      while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):          theta1_last = theta1_guess         theta0_last = theta0_guess          hypothesis = create_hypothesis(theta1_guess, theta0_guess)          theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] point in data])         theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] point in data])         return ( theta0_guess,theta1_guess )    points = [(float(x),float(y)) (x,y) in data1]  res = linear_regression(points) print res 

plotting best way see how algorithm performing. see if have achieved convergence can plot evolution of cost function after each iteration, after given of iteration see not improve can assume convergence, take following code:

cost_f = [] while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):      theta1_last = theta1_guess     theta0_last = theta0_guess      hypothesis = create_hypothesis(theta1_guess, theta0_guess)     cost_f.append((1./(2*m))*sum([ pow(hypothesis(point[0]) - point[1], 2) point in data]))      theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] point in data])     theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] point in data])     import pylab pylab.plot(range(len(cost_f)), cost_f) pylab.show() 

which plot following graphic (execution learning_rate=0.01, variance=0.00001)

as can see, after thousand iteration don't improvement. declare convergence if cost function decreases less 0.001 in 1 iteration, based on own experience.

for choosing learning rate, best thing can plot cost function , see how performing, , remember these 2 things:

  • if learning rate small slow convergence
  • if learning rate large cost function may not decrease in every iteration , therefore not converge

if run code choosing learning_rate > 0.029 , variance=0.001 in second case, gradient descent doesn't converge, while if choose values learning_rate < 0.0001, variance=0.001 see algorithm takes lot iteration converge.

not convergence example learning_rate=0.03

slow convergence example learning_rate=0.0001 enter image description here


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -