Re: [Htmlparser-user] Failure parsing html with StringBean
Brought to you by:
derrickoswald
From: Ian M. <ian...@gm...> - 2006-06-07 22:13:24
|
The File class in Java has a method that gets you a list of all File objects in that directory. The rest should be easy. Ian On 6/7/06, Mark Stark <htm...@ey...> wrote: > Have you any idea how to pass recursively a list of files in a directory > to the string bean or any given visitor? > > Derrick Oswald schrieb: > > If you don't care how many carriage returns are present in the output, > > just output one after processing each tag in visitTag() and visitEndTag(). > > > > Mark Stark wrote: > > > >> Thanks Derrick, > >> > >> i have to add, that i've removed the breaksFlow() statement. i add a > >> carriageReturn after all segments (text between some bracktes). i later > >> save it in a file (key - value) > >> > >> my intention is, to extract all strings from a given html, write them > >> into a file, and replace these strings with some other values. (translation) > >> > >> the problem is, if "Organisationseinheiten $[weblogEnabled$" is > >> recognized as one connected segment, it is not possible to replace it in > >> a second run with the translation. is it understandable? :) > >> > >> p.s.: it is important that that parser can pass the templates with this > >> $$ subs. > >> > >> thanks a lot > >> > >> > >> > >> Derrick Oswald schrieb: > >> > >> > >>> Mark, > >>> > >>> A newline is only inserted in the output if the tag breaks the normal > >>> flow of text. > >>> The list of tags that do this is from the HTML specification and is > >>> encoded in the org.htmlparser.nodes.Tagnode class as the breakTags list. > >>> > >>> The StringBean processing is driven by the tags that are encountered. If > >>> it doesn't see a tag that causes a break, none is emitted. > >>> > >>> Since the text $[weblogEnabled$ is outside of any TD tag in a table, an > >>> argument could be made that it shouldn't print at all, but if your > >>> browser prints something and it inserts a newline, an argument could > >>> also be made to change the operation of the StringBean to assume that a > >>> break is pending *after* tags that break the flow, and output newlines > >>> accordingly. I fear this would cause more problems than it solves though. > >>> > >>> Presumably this 'dollar text' will be substituted by some server side > >>> processing into a real <TD>xxxx</TD> section, perhaps the parser should > >>> be applied after this processing. > >>> > >>> Derrick > >>> > >>> Mark Stark wrote: > >>> > >>> > >>> > >>>> I made a system.out before collapsing the string and got following hint > >>>> > >>>> Txt (3664[96,78],3672[96,86]): Personen > >>>> Txt (3676[96,90],3683[97,2]): \t\t\t\n\t\t > >>>> Txt (3688[97,7],3697[98,7]): \n \t > >>>> Txt (3712[98,22],3720[100,2]): \n\t\t\n\t\t > >>>> Txt (3797[100,79],3805[100,87]): Projekte > >>>> Txt (3809[100,91],3816[101,2]): \t\t\t\n\t\t > >>>> Txt (3821[101,7],3830[102,7]): \n \t > >>>> Txt (3846[102,23],3850[103,2]): \n\t\t > >>>> Txt (3858[103,10],3880[103,32]): Organisationseinheiten > >>>> Txt (3889[103,41],3900[105,2]): \n\t\t\t\t\t\n\t\t > >>>> Txt (3905[105,7],3934[107,7]): \n\t\t$[weblogEnabled$\n \t > >>>> > >>>> The output from these lines after collaps() is > >>>> Personen > >>>> Projekte > >>>> Organisationseinheiten $[weblogEnabled$ > >>>> > >>>> The "failure" (i dont know if its a failure at all) should be into > >>>> collapse() > >>>> > >>>> > >>>> Mark Stark schrieb: > >>>> > >>>> > >>>> > >>>> > >>>>> hi, > >>>>> > >>>>> i'am using StringBean to extract strings from a given html source. This > >>>>> code caues htmlparser to only recognize one connected string > >>>>> > >>>>> <td class="yes"> > >>>>> <strong>Organisationseinheiten</strong> > >>>>> </td> > >>>>> $[weblogEnabled$ > >>>>> <td class="no"> > >>>>> > >>>>> returned: Organisationseinheiten $[weblogEnabled$ > >>>>> > >>>>> But it should be > >>>>> > >>>>> Organisationseinheiten > >>>>> > >>>>> $[weblogEnabled$ > >>>>> > >>>>> Can someone give me a hint which part of StringBean causes this? > >>>>> > >>>>> thanks a lot > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Htmlparser-user mailing list > >>>>> Htm...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> _______________________________________________ > >>>> Htmlparser-user mailing list > >>>> Htm...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >>>> > >>>> > >>>> > >>>> > >>>> > >>> _______________________________________________ > >>> Htmlparser-user mailing list > >>> Htm...@li... > >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >>> > >>> > >>> > >>> > >> > >> > >> > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > >> > >> > > > > > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |