java - Extracting certain text from html file -

February 15, 2011

i want extract texts html file placed between parapraph(p) , link(a href) tags.i want without java regex , html parsers.i thougth

while ((word = reader.readline()) !=null) { //iterate end of file     if(word.contains("<p>")) { //catching p tag         while(!word.contains("</p>") { //iterate end of tag             try { //start writing                 out.write(word);             } catch (ioexception e) {             }         }     } }

but not working.the code seems pretty valid me.how reader can catch "p" , "a href" tags.

the problems start when have <p>blah</p> in single line. 1 simple solution change < \n< - this:

boolean insidepar = false; while ((line = reader.readline()) !=null) {     for(string word in line.replaceall("<","\n<").split("\n")){         if(word.contains("<p>")){             insidepar = true;         }else if(word.contains("</p>")){             insidepar = false;         }         if(insidepar){ // write word}     } }

still i'd recommend using parser library @hovercraftfullofeels.

edit: i've updated code it's bit closer working version, there more problems along way.

Search This Blog

Three

java - Extracting certain text from html file -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

Automatically create pages in phpfox -

c# - Farseer ContactListener is not working -