We are trying out the new 3.4-dev version in Okapi and one of our unit tests failed. The cuase is a small tag reordering. We are not calling the tidy methods. Is there some new option we should be setting? In most applications this isn't a problem, but for localization we want the output to reflect the source document - even if it is malformed.
You'll need to supply some code to demonstrate how you're getting that output. I thought Okapi was using only StreamedSource which isn't even capable of reordering tags. I tested StreamedSource with both 3.3 and 3.4-dev and it iterates the tags in the same order as the source.
In any case I'm also a bit confused about your question. You say you want the output to reflect the source document. Your results show 3.4-dev doing that correctly while 3.3 swaps the tags around, so wouldn't that be an improvement anyway?
Regards
Martin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry Martin - I was trying to spare you the detals but ended up making it more complex. Also my examples above are reversed (it is 3.4-dev that changed the tag order) :-)
But you've given me the sanity check I needed - that StreamedSource does not muck with the tags. The issue must be deeper in our system. I thought these same tests passed with the 3.3 version - but there must be some other change I did not account for.
Thanks for the quick rply.
Jim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey Martin!
We are trying out the new 3.4-dev version in Okapi and one of our unit tests failed. The cuase is a small tag reordering. We are not calling the tidy methods. Is there some new option we should be setting? In most applications this isn't a problem, but for localization we want the output to reflect the source document - even if it is malformed.
Original Input with context:
our accomplishments
blah troglm blah trabo blah blah.
blah glubta blah burbtle blah blah
borgot blah blah, plusgup blah turgatle.
transpec blah pargat blah blah blah
blah fob purtleo blah pribato blah
trannle fanbut blah tranda blah brop
blah blah blah burdg blah raptle.
Yes, our accomplishments are big.
Jericho 3.3 output snippet:
our accomplishments
Jericho 3.4-dev output snippet:
our accomplishments
I can send the full file if needed. But don't see a way to attach via the topic interface.
Jim
Hi Jim,
You'll need to supply some code to demonstrate how you're getting that output. I thought Okapi was using only StreamedSource which isn't even capable of reordering tags. I tested StreamedSource with both 3.3 and 3.4-dev and it iterates the tags in the same order as the source.
In any case I'm also a bit confused about your question. You say you want the output to reflect the source document. Your results show 3.4-dev doing that correctly while 3.3 swaps the tags around, so wouldn't that be an improvement anyway?
Regards
Martin
Sorry Martin - I was trying to spare you the detals but ended up making it more complex. Also my examples above are reversed (it is 3.4-dev that changed the tag order) :-)
But you've given me the sanity check I needed - that StreamedSource does not muck with the tags. The issue must be deeper in our system. I thought these same tests passed with the 3.3 version - but there must be some other change I did not account for.
Thanks for the quick rply.
Jim
No worries Jim. Nice to see you're still making good use of the library!