Re: [Htmlparser-user] Failure parsing html with StringBean

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

If you don't care how many carriage returns are present in the output, 
just output one after processing each tag in visitTag() and visitEndTag().

Mark Stark wrote:

>Thanks Derrick,
>
>i have to add, that i've removed the breaksFlow() statement. i add a
>carriageReturn after all segments (text between some bracktes). i later
>save it in a file (key - value)
>
>my intention is, to extract all strings from a given html, write them
>into a file, and replace these strings with some other values. (translation)
>
>the problem is, if "Organisationseinheiten $[weblogEnabled$" is
>recognized as one connected segment, it is not possible to replace it in
>a second run with the translation. is it understandable? :)
>
>p.s.: it is important that that parser can pass the templates with this
>$$ subs.
>
>thanks a lot
>
>
>
>Derrick Oswald schrieb:
>  
>
>>Mark,
>>
>>A newline is only inserted in the output if the tag breaks the normal 
>>flow of text.
>>The list of tags that do this is from the HTML specification and is 
>>encoded in the org.htmlparser.nodes.Tagnode class as the breakTags list.
>>
>>The StringBean processing is driven by the tags that are encountered. If 
>>it doesn't see a tag that causes a break, none is emitted.
>> 
>>Since the text $[weblogEnabled$ is outside of any TD tag in a table, an 
>>argument could be made that it shouldn't print at all, but if your 
>>browser prints something and it inserts a newline, an argument could 
>>also be made to change the operation of the StringBean to assume that a 
>>break is pending *after* tags that break the flow, and output newlines 
>>accordingly. I fear this would cause more problems than it solves though.
>>
>>Presumably this 'dollar text' will be substituted by some server side 
>>processing into a real <TD>xxxx</TD> section, perhaps the parser should 
>>be applied after this processing.
>>
>>Derrick
>>
>>Mark Stark wrote:
>>
>>    
>>
>>>I made a system.out before collapsing the string and got following hint
>>>
>>>Txt (3664[96,78],3672[96,86]): Personen
>>>Txt (3676[96,90],3683[97,2]): \t\t\t\n\t\t
>>>Txt (3688[97,7],3697[98,7]): \n      \t
>>>Txt (3712[98,22],3720[100,2]): \n\t\t\n\t\t
>>>Txt (3797[100,79],3805[100,87]): Projekte
>>>Txt (3809[100,91],3816[101,2]): \t\t\t\n\t\t
>>>Txt (3821[101,7],3830[102,7]): \n      \t
>>>Txt (3846[102,23],3850[103,2]): \n\t\t
>>>Txt (3858[103,10],3880[103,32]): Organisationseinheiten
>>>Txt (3889[103,41],3900[105,2]): \n\t\t\t\t\t\n\t\t
>>>Txt (3905[105,7],3934[107,7]): \n\t\t$[weblogEnabled$\n      \t
>>>
>>>The output from these lines after collaps() is
>>>Personen
>>>Projekte
>>>Organisationseinheiten $[weblogEnabled$
>>>
>>>The "failure" (i dont know if its a failure at all) should be into
>>>collapse()
>>>
>>>
>>>Mark Stark schrieb:
>>> 
>>>
>>>      
>>>
>>>>hi,
>>>>
>>>>i'am using StringBean to extract strings from a given html source. This
>>>>code caues htmlparser to only recognize one connected string
>>>>
>>>><td class="yes">
>>>>	<strong>Organisationseinheiten</strong>				
>>>></td>
>>>>	$[weblogEnabled$
>>>><td class="no">
>>>>
>>>>returned: Organisationseinheiten $[weblogEnabled$
>>>>
>>>>But it should be
>>>>
>>>>Organisationseinheiten
>>>>
>>>>$[weblogEnabled$
>>>>
>>>>Can someone give me a hint which part of StringBean causes this?
>>>>
>>>>thanks a lot
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Htmlparser-user mailing list
>>>>Htm...@li...
>>>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>
>>>_______________________________________________
>>>Htmlparser-user mailing list
>>>Htm...@li...
>>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>
>>> 
>>>
>>>      
>>>
>>
>>_______________________________________________
>>Htmlparser-user mailing list
>>Htm...@li...
>>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>
>>
>>    
>>
>
>
>
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>  
>