html - How to read text from a website in Python -

June 15, 2012

this question has answer here:

parsing html in python [closed] 6 answers

i read of information website: http://www.federalreserve.gov/monetarypolicy/beigebook/beigebook201301.htm

i have following code, , reads html source

def connect2web():     aresp = urllib2.urlopen("http://www.federalreserve.gov/monetarypolicy/" +      "beigebook/beigebook201301.htm")      web_pg = aresp.read()      print web_pg

i lost on how parse information, however, because html parsers require file or original website, whereas have information need in string.

we started bs time ago moved lxml

from lxml import html my_tree = html.fromstring(web_pg) elements = [item item in my_tree.iter()]

so have decide elements want , need make sure elements keep not children of other elements decide want keep instance

<div> stuff <table> <tr> <td> banana </td> </tr> <table> more stuff </div>

the html above table child of div in table contained in div have use logic keep elements parents not kept

Search This Blog

Three

html - How to read text from a website in Python -

Comments

Post a Comment

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -