java - htmlcleaner parse with tags -
i try extract part of page. use parser htmlcleaner, , remove tags. there settings save html tags? or maybe better way extract part of code, using else?
my code:
static final string xpath_stats = "//div[@class='text']/p/"; // config cleaner properties htmlcleaner htmlcleaner = new htmlcleaner(); cleanerproperties props = htmlcleaner.getproperties(); props.setallowhtmlinsideattributes(false); props.setallowmultiwordattributes(true); props.setrecognizeunicodechars(true); props.setomitcomments(true); props.settransspecialentitiestoncr(true); // create url object url url = new url(blog_url); // html page root node tagnode root = htmlcleaner.clean(url); object[] statsnode = root.evaluatexpath(xpath_stats); (object tag : statsnode) { stats = stats + tag.tostring().trim(); } return stats;
thanks nikhil.thakkar! json. code may someone:
url url2 = new url(blog_url); document doc2 = jsoup.parse(url2, 3000); element masthead = doc2.select("div.main_text").first(); string linkouterh = masthead.outerhtml();
you can use jsoup parser. more info here: http://jsoup.org/
Comments
Post a Comment