Menu

#680 OmegaT HTML filter confuses <head> and <header>

3.1
closed-fixed
5
2014-05-23
2014-05-16
No

There seems to be some odd bug related to incomplete (X)HTML files:

Take the following file as an example, let's call it "test-case.html":

<header><p>This is the header line.</p></header>
<article><p>This is the first article.</p></article>
<article><p>This is the second article.</p></article>
<footer><p>This is the footer line.</p></footer>

I've added some additional segmentation rules to split segments at <p> and </p>. Due to this, I get all four lines without any tags to translate.

Once the lines are translated (to German) and the target file is created, you end up with this:

<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8"><p>Dies ist die Kopfzeile.</p></header>
<article><p>Dies ist der erste Artikel.</p></article>
<article><p>Dies ist der zweite Artikel.</p></article>
<footer><p>Dies ist die Fußzeile.</p></footer>

As you can see, OmegaT obviously tried to add the content-type meta tag to prevent any character display issues. However, in the target file the unrelated <header> tag is overwritten with the now incomplete <head> tag as well as the <meta> tag.

The correct output should be like this:

<header><p>Dies ist die Kopfzeile.</p></header>
<article><p>Dies ist der erste Artikel.</p></article>
<article><p>Dies ist der zweite Artikel.</p></article>
<footer><p>Dies ist die Fußzeile.</p></footer>

I encountered the issue in 3.1.0 beta as well as 3.1.1 beta.

1 Attachments

Discussion

  • Mario Liebisch

    Mario Liebisch - 2014-05-16

    Just one more note: I forgot mentioning that the <meta> tag isn't properly closed as well (missing a trailing slash).

     
  • Didier Briel

    Didier Briel - 2014-05-19
    • summary: Header tag in partial HTML files is overwritten/replaced --> OmegaT HTML filter confuses <head> and
    • assigned_to: Didier Briel
     
  • Didier Briel

    Didier Briel - 2014-05-19

    There are a number of issues here.

    First, you are not obliged to let OmegaT add a <meta> tag. Go to Options, File Filters, HTML and XHTML, Options and check Never.

    Secondly, even when Only if (X)HTML has a header is checked, OmegaT still tries to write a <head> tag. That's because the HTML filter was written before the <header> tag was introduced (in HTML 5), and confuses the two tags. This will be fixed.

    Last (for meta), HTML doesn't require closing stand alone tags, even in HTML 5.
    Examples from the W3C clearly do not use a trailing tag:
    http://www.w3.org/TR/html5/document-metadata.html#standard-metadata-names

    <meta name=generator content="Frontweaver 8.2">

    Didier

     
  • Didier Briel

    Didier Briel - 2014-05-19
    • status: open --> open-fixed
     
  • Didier Briel

    Didier Briel - 2014-05-19

    Fixed in SVN (/trunk).

    The HTML filter doesn't try to match <head> to <header>.

    Didier

     
  • Mario Liebisch

    Mario Liebisch - 2014-05-19

    Awesome, also just picking the third option (only set the encoding if there's a matching meta tag already) worked as an immediate fix - didn't expect that option to be honest.

    Also looked up the closing tag slash and it's actually fine the way it is now. The closing / is only valid for XML/XHTML, but HTML5 is supposed to ignore it (so <meta /> is the same as <meta> there).

     
  • Didier Briel

    Didier Briel - 2014-05-23
    • status: open-fixed --> closed-fixed
     
  • Didier Briel

    Didier Briel - 2014-05-23

    Closing.

    Fixed in the released version 3.1.1 update 1 of OmegaT.

    Didier

     

Log in to post a comment.