HTTP error 403 in Python 3 Web Scraping -


i trying scrap website practice, kept on getting http error 403 (does think i'm bot)?

here code:

#import requests import urllib.request bs4 import beautifulsoup #from urllib import urlopen import re  webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1').read findrows = re.compile('<tr class="- banding(?:on|off)>(.*?)</tr>') findlink = re.compile('<a href =">(.*)</a>')  row_array = re.findall(findrows, webpage) links = re.finall(findlink, webpate)  print(len(row_array))  iterator = [] 

the error is:

 file "c:\python33\lib\urllib\request.py", line 160, in urlopen     return opener.open(url, data, timeout)   file "c:\python33\lib\urllib\request.py", line 479, in open     response = meth(req, response)   file "c:\python33\lib\urllib\request.py", line 591, in http_response     'http', request, response, code, msg, hdrs)   file "c:\python33\lib\urllib\request.py", line 517, in error     return self._call_chain(*args)   file "c:\python33\lib\urllib\request.py", line 451, in _call_chain     result = func(*args)   file "c:\python33\lib\urllib\request.py", line 599, in http_error_default     raise httperror(req.full_url, code, msg, hdrs, fp) urllib.error.httperror: http error 403: forbidden 

this because of mod_security or similar server security feature blocks known spider/bot user agents (urllib uses python urllib/3.3.0, it's detected). try setting known browser user agent with:

from urllib.request import request, urlopen  req = request('http://www.cmegroup.com/trading/products/#sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1', headers={'user-agent': 'mozilla/5.0'}) webpage = urlopen(req).read() 

this works me.

by way, in code missing () after .read in urlopen line, think it's typo.

tip: since exercise, choose different, non restrictive site. maybe blocking urllib reason...


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -