From: SourceForge.net <no...@so...> - 2005-09-30 18:50:02
|
Bugs item #1288756, was opened at 2005-09-12 16:21 Message generated for change (Comment added) made by mihmax You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=520347&aid=1288756&group_id=68187 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: 1.4.6 Status: Open Resolution: Accepted Priority: 5 Submitted By: Didier Br (didierbr) Assigned to: Maxym Mykhalchuk (mihmax) Summary: <br> and <address> do not segment in HTLM documents Initial Comment: In 1.4.6 (beta 2 and beta 3) In HTML documents, the tags <br> and <address> do not generate new segments. <br> is obviously a break. <address> is a block-level element, thus should segment. From W3C: HTML 3.2 The ADDRESS element requires start and end tags, and specifies information such as authorship and contact details for the current document.*User agents should render the content with paragraph-breaks before and after*. Note that the content is restricted to paragraphs, plain text and text-like elements as defined by the %text entity. Html 4.01 <!ENTITY % block "P | %heading; | %list; | %preformatted; | DL | DIV | NOSCRIPT | BLOCKQUOTE | FORM | HR | TABLE | FIELDSET | ADDRESS"> The attached file allows to reproduce the issue. ---------------------------------------------------------------------- >Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-09-30 19:48 Message: Logged In: YES user_id=488500 Thanks to Jean-Christophe Helary, Now I know how to solve this bug and segment on <br> only in sentence segmenting mode. 2005/9/30, JC Helary wrote: > I tried to set a rule that would put a break after a > "<b.>" sequence, and it worked: it put a break > after all the <b1>, <b2> etc that are > supposed to be <br> in html... > > I don't know the regexp way to say "'<b' followed > by any number of number until a '>'" so I am > limited to <b9> but I suppose a wider > rule could be possible. > > Which means that _any_ OmegaT tag can be > used to be set as a breaking point :) Of course to be sure I'll modify the HTML filter so that <br> tag is not shortcut: it'll stay <brXXX>, in order to distinguish it from <b> tag and I'll add <br\d+> break ("<br" and ">" are just symbols, and \d+ meaning one or more digits). Thanks a lot for the idea, JC! ---------------------------------------------------------------------- Comment By: Didier Br (didierbr) Date: 2005-09-19 12:47 Message: Logged In: YES user_id=1343245 << <br> is definitely a soft break, no need for paragraph-segmenting, in my opinion.>> Sorry, I wasn't clear on that. In the current version, <br> doesn't segment, even in sentence-segmenting mode. It should at least segment in this case. In the case of paragraph-segmenting, I think it really depends on how <br> is used. In some cases, sentences on both sides of the <br> are really different paragraphs. In other cases, they might belong to the same paragraph. I think I would be generally in favour of segmenting for <br>, even in paragraph mode. But that's just my opinion. ---------------------------------------------------------------------- Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-09-19 12:17 Message: Logged In: YES user_id=488500 <br> is definitely a soft break, no need for paragraph-segmenting, in my opinion. <address> was definitely forgotten, will appear in 1.4.6 beta 4 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=520347&aid=1288756&group_id=68187 |