java - htmlcleaner parse with tags -

July 15, 2014

i try extract part of page. use parser htmlcleaner, , remove tags. there settings save html tags? or maybe better way extract part of code, using else?

my code:

static final string xpath_stats = "//div[@class='text']/p/";  // config cleaner properties htmlcleaner htmlcleaner = new htmlcleaner(); cleanerproperties props = htmlcleaner.getproperties(); props.setallowhtmlinsideattributes(false); props.setallowmultiwordattributes(true); props.setrecognizeunicodechars(true); props.setomitcomments(true); props.settransspecialentitiestoncr(true);   // create url object url url = new url(blog_url); // html page root node tagnode root = htmlcleaner.clean(url);   object[] statsnode = root.evaluatexpath(xpath_stats); (object tag : statsnode) {     stats =  stats + tag.tostring().trim(); }  return stats;

thanks nikhil.thakkar! json. code may someone:

    url url2 = new url(blog_url);     document doc2 = jsoup.parse(url2, 3000);     element masthead = doc2.select("div.main_text").first();     string linkouterh = masthead.outerhtml();

you can use jsoup parser. more info here: http://jsoup.org/

Search This Blog

Three

java - htmlcleaner parse with tags -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

Socket.connect doesn't throw exception in Android -

SPSS keyboard combination alters encoding -