html - How to read text from a website in Python -

June 15, 2012

this question has answer here:

parsing html in python [closed] 6 answers

i read of information website: http://www.federalreserve.gov/monetarypolicy/beigebook/beigebook201301.htm

i have following code, , reads html source

def connect2web():     aresp = urllib2.urlopen("http://www.federalreserve.gov/monetarypolicy/" +      "beigebook/beigebook201301.htm")      web_pg = aresp.read()      print web_pg

i lost on how parse information, however, because html parsers require file or original website, whereas have information need in string.

we started bs time ago moved lxml

from lxml import html my_tree = html.fromstring(web_pg) elements = [item item in my_tree.iter()]

so have decide elements want , need make sure elements keep not children of other elements decide want keep instance

<div> stuff <table> <tr> <td> banana </td> </tr> <table> more stuff </div>

the html above table child of div in table contained in div have use logic keep elements parents not kept

Search This Blog

Three

html - How to read text from a website in Python -

Comments

Post a Comment

Popular posts from this blog

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -

iphone - How do I keep MDScrollView from truncating my row headers and making my cells look bad? -