Re: [Htmlparser-user] Failure parsing html with StringBean

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thank you, this works fine :)

it is not a htmlparser question, but do you know how to run multiple
TestClasses with JUnit4TestAdapter

return new JUnit4TestAdapter(SegmentFindingVisitorTest.class);

Derrick Oswald schrieb:
> If you don't care how many carriage returns are present in the output, 
> just output one after processing each tag in visitTag() and visitEndTag().
> 
> Mark Stark wrote:
> 
>> Thanks Derrick,
>>
>> i have to add, that i've removed the breaksFlow() statement. i add a
>> carriageReturn after all segments (text between some bracktes). i later
>> save it in a file (key - value)
>>
>> my intention is, to extract all strings from a given html, write them
>> into a file, and replace these strings with some other values. (translation)
>>
>> the problem is, if "Organisationseinheiten $[weblogEnabled$" is
>> recognized as one connected segment, it is not possible to replace it in
>> a second run with the translation. is it understandable? :)
>>
>> p.s.: it is important that that parser can pass the templates with this
>> $$ subs.
>>
>> thanks a lot
>>
>>
>>
>> Derrick Oswald schrieb:
>>  
>>
>>> Mark,
>>>
>>> A newline is only inserted in the output if the tag breaks the normal 
>>> flow of text.
>>> The list of tags that do this is from the HTML specification and is 
>>> encoded in the org.htmlparser.nodes.Tagnode class as the breakTags list.
>>>
>>> The StringBean processing is driven by the tags that are encountered. If 
>>> it doesn't see a tag that causes a break, none is emitted.
>>>
>>> Since the text $[weblogEnabled$ is outside of any TD tag in a table, an 
>>> argument could be made that it shouldn't print at all, but if your 
>>> browser prints something and it inserts a newline, an argument could 
>>> also be made to change the operation of the StringBean to assume that a 
>>> break is pending *after* tags that break the flow, and output newlines 
>>> accordingly. I fear this would cause more problems than it solves though.
>>>
>>> Presumably this 'dollar text' will be substituted by some server side 
>>> processing into a real <TD>xxxx</TD> section, perhaps the parser should 
>>> be applied after this processing.
>>>
>>> Derrick
>>>
>>> Mark Stark wrote:
>>>
>>>    
>>>
>>>> I made a system.out before collapsing the string and got following hint
>>>>
>>>> Txt (3664[96,78],3672[96,86]): Personen
>>>> Txt (3676[96,90],3683[97,2]): \t\t\t\n\t\t
>>>> Txt (3688[97,7],3697[98,7]): \n      \t
>>>> Txt (3712[98,22],3720[100,2]): \n\t\t\n\t\t
>>>> Txt (3797[100,79],3805[100,87]): Projekte
>>>> Txt (3809[100,91],3816[101,2]): \t\t\t\n\t\t
>>>> Txt (3821[101,7],3830[102,7]): \n      \t
>>>> Txt (3846[102,23],3850[103,2]): \n\t\t
>>>> Txt (3858[103,10],3880[103,32]): Organisationseinheiten
>>>> Txt (3889[103,41],3900[105,2]): \n\t\t\t\t\t\n\t\t
>>>> Txt (3905[105,7],3934[107,7]): \n\t\t$[weblogEnabled$\n      \t
>>>>
>>>> The output from these lines after collaps() is
>>>> Personen
>>>> Projekte
>>>> Organisationseinheiten $[weblogEnabled$
>>>>
>>>> The "failure" (i dont know if its a failure at all) should be into
>>>> collapse()
>>>>
>>>>
>>>> Mark Stark schrieb:
>>>>
>>>>
>>>>      
>>>>
>>>>> hi,
>>>>>
>>>>> i'am using StringBean to extract strings from a given html source. This
>>>>> code caues htmlparser to only recognize one connected string
>>>>>
>>>>> <td class="yes">
>>>>> 	<strong>Organisationseinheiten</strong>				
>>>>> </td>
>>>>> 	$[weblogEnabled$
>>>>> <td class="no">
>>>>>
>>>>> returned: Organisationseinheiten $[weblogEnabled$
>>>>>
>>>>> But it should be
>>>>>
>>>>> Organisationseinheiten
>>>>>
>>>>> $[weblogEnabled$
>>>>>
>>>>> Can someone give me a hint which part of StringBean causes this?
>>>>>
>>>>> thanks a lot
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Htmlparser-user mailing list
>>>>> Htm...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>>>
>>>>>
>>>>>   
>>>>>
>>>>>        
>>>>>
>>>> _______________________________________________
>>>> Htmlparser-user mailing list
>>>> Htm...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>>
>>>>
>>>>
>>>>      
>>>>
>>> _______________________________________________
>>> Htmlparser-user mailing list
>>> Htm...@li...
>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>
>>>
>>>    
>>>
>>
>>
>>
>> _______________________________________________
>> Htmlparser-user mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>
>>  
>>
> 
> 
> 
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> 
>