java - htmlcleaner parse with tags -


i try extract part of page. use parser htmlcleaner, , remove tags. there settings save html tags? or maybe better way extract part of code, using else?

my code:

static final string xpath_stats = "//div[@class='text']/p/";  // config cleaner properties htmlcleaner htmlcleaner = new htmlcleaner(); cleanerproperties props = htmlcleaner.getproperties(); props.setallowhtmlinsideattributes(false); props.setallowmultiwordattributes(true); props.setrecognizeunicodechars(true); props.setomitcomments(true); props.settransspecialentitiestoncr(true);   // create url object url url = new url(blog_url); // html page root node tagnode root = htmlcleaner.clean(url);   object[] statsnode = root.evaluatexpath(xpath_stats); (object tag : statsnode) {     stats =  stats + tag.tostring().trim(); }  return stats; 

thanks nikhil.thakkar! json. code may someone:

    url url2 = new url(blog_url);     document doc2 = jsoup.parse(url2, 3000);     element masthead = doc2.select("div.main_text").first();     string linkouterh = masthead.outerhtml();  

you can use jsoup parser. more info here: http://jsoup.org/


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -