Regex in Python for TR -
i wanted scrap website, contents in tr tags under tbody:
under <tbody> tag
there many
<tr class = "- bandingon"> <tr class = "- bandingoff"> [...] tags
i wanted information stored in each table row (tr class)
for have write regex definition tr class:
findrows = re.compile('<tr class="-bandingon">(.*)</tr>') findrows = re.compile('<tr class="-bandingoff">(.*)</tr>')
is there way combine 2 1 regex?
don't use regex. use html parser:
from bs4 import beautifulsoup soup = beautifulsoup(html) row in soup.select('tr.bandingon, tr.bandingoff'): print row.get_text()
it's cleaner, easier work , more robust regex.
also, before resorting scraping, apis. site has json api, easier use:
http://www.cmegroup.com/cmews/mvc/productslate/v1/list/500/1?sortfield=oi&sortasc=false&venues=3&page=1&cleared=1&group=1&r=ndgwctx4
Comments
Post a Comment