java - Extracting certain text from html file -


i want extract texts html file placed between parapraph(p) , link(a href) tags.i want without java regex , html parsers.i thougth

while ((word = reader.readline()) !=null) { //iterate end of file     if(word.contains("<p>")) { //catching p tag         while(!word.contains("</p>") { //iterate end of tag             try { //start writing                 out.write(word);             } catch (ioexception e) {             }         }     } } 

but not working.the code seems pretty valid me.how reader can catch "p" , "a href" tags.

the problems start when have <p>blah</p> in single line. 1 simple solution change < \n< - this:

boolean insidepar = false; while ((line = reader.readline()) !=null) {     for(string word in line.replaceall("<","\n<").split("\n")){         if(word.contains("<p>")){             insidepar = true;         }else if(word.contains("</p>")){             insidepar = false;         }         if(insidepar){ // write word}     } } 

still i'd recommend using parser library @hovercraftfullofeels.

edit: i've updated code it's bit closer working version, there more problems along way.


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -