HTTP error 403 in Python 3 Web Scraping -

March 15, 2010

i trying scrap website practice, kept on getting http error 403 (does think i'm bot)?

here code:

#import requests import urllib.request bs4 import beautifulsoup #from urllib import urlopen import re  webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1').read findrows = re.compile('<tr class="- banding(?:on|off)>(.*?)</tr>') findlink = re.compile('<a href =">(.*)</a>')  row_array = re.findall(findrows, webpage) links = re.finall(findlink, webpate)  print(len(row_array))  iterator = []

the error is:

 file "c:\python33\lib\urllib\request.py", line 160, in urlopen     return opener.open(url, data, timeout)   file "c:\python33\lib\urllib\request.py", line 479, in open     response = meth(req, response)   file "c:\python33\lib\urllib\request.py", line 591, in http_response     'http', request, response, code, msg, hdrs)   file "c:\python33\lib\urllib\request.py", line 517, in error     return self._call_chain(*args)   file "c:\python33\lib\urllib\request.py", line 451, in _call_chain     result = func(*args)   file "c:\python33\lib\urllib\request.py", line 599, in http_error_default     raise httperror(req.full_url, code, msg, hdrs, fp) urllib.error.httperror: http error 403: forbidden

this because of mod_security or similar server security feature blocks known spider/bot user agents (urllib uses python urllib/3.3.0, it's detected). try setting known browser user agent with:

from urllib.request import request, urlopen  req = request('http://www.cmegroup.com/trading/products/#sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1', headers={'user-agent': 'mozilla/5.0'}) webpage = urlopen(req).read()

this works me.

by way, in code missing () after .read in urlopen line, think it's typo.

tip: since exercise, choose different, non restrictive site. maybe blocking urllib reason...

Search This Blog

Three

HTTP error 403 in Python 3 Web Scraping -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -