Regex in Python for TR -


i wanted scrap website, contents in tr tags under tbody:

http://www.cmegroup.com/trading/products/#sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1

under <tbody> tag 

there many

<tr class = "- bandingon"> <tr class = "- bandingoff"> [...] tags 

i wanted information stored in each table row (tr class)

for have write regex definition tr class:

findrows = re.compile('<tr class="-bandingon">(.*)</tr>') findrows = re.compile('<tr class="-bandingoff">(.*)</tr>') 

is there way combine 2 1 regex?

don't use regex. use html parser:

from bs4 import beautifulsoup  soup = beautifulsoup(html)  row in soup.select('tr.bandingon, tr.bandingoff'):     print row.get_text() 

it's cleaner, easier work , more robust regex.

also, before resorting scraping, apis. site has json api, easier use:

http://www.cmegroup.com/cmews/mvc/productslate/v1/list/500/1?sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1&r=ndgwctx4 

Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -