html - How to read text from a website in Python -
this question has answer here:
- parsing html in python [closed] 6 answers
i read of information website: http://www.federalreserve.gov/monetarypolicy/beigebook/beigebook201301.htm
i have following code, , reads html source
def connect2web(): aresp = urllib2.urlopen("http://www.federalreserve.gov/monetarypolicy/" + "beigebook/beigebook201301.htm") web_pg = aresp.read() print web_pg
i lost on how parse information, however, because html parsers require file or original website, whereas have information need in string.
we started bs time ago moved lxml
from lxml import html my_tree = html.fromstring(web_pg) elements = [item item in my_tree.iter()]
so have decide elements want , need make sure elements keep not children of other elements decide want keep instance
<div> stuff <table> <tr> <td> banana </td> </tr> <table> more stuff </div>
the html above table child of div in table contained in div have use logic keep elements parents not kept
Comments
Post a Comment