OmegaT - multiplatform CAT tool / Bugs / #1001 Tag validation false positives for some tags from html2 filter

Description has changed:

Diff:

--- old
+++ new
@@ -1,9 +1,10 @@
-In html (using html2 filter), if you have &#34;&lt;form&gt; &lt;span&gt; text &lt;input&gt; text &lt;/span&gt;&lt;/form&gt;&#34;, you get tags &lt;s0&gt;, &lt;i1&gt;, &lt;/s0&gt; and if you paste that in the translation too, you&#39;ll get a tag validation error: bad nesting.
-It considers &lt;i1&gt; a start tag, since it is not like &#39;&lt;i1/&gt;&#39;, and thus expects an end-tag before &lt;/s0&gt;.
+In html (using html2 filter), if you have &#34;`&lt;form&gt; &lt;span&gt; text &lt;input&gt; text &lt;/span&gt;&lt;/form&gt;`&#34;, you get tags `&lt;s0&gt;`, `&lt;i1&gt;`, `&lt;/s0&gt;` and if you paste that in the translation too, you&#39;ll get a tag validation error: bad nesting.
+It considers `&lt;i1&gt;` a start tag, since it is not like &#39;`&lt;i1/&gt;`&#39;, and thus expects an end-tag before `&lt;/s0&gt;`.

-Same for &lt;br&gt;
+Same for `&lt;br&gt;`

 possible fixes: 
-a) don&#39;t complain if the source has no closing tag either
-b) fix html2 filter to create &lt;i1/&gt;, &lt;br1/&gt; etc.  (breaks existing translations!)
-c) allow ignoring these errors
+
+1. don&#39;t complain if the source has no closing tag either
+2. fix html2 filter to create `&lt;i1/&gt;`, `&lt;br1/&gt;` etc.  (breaks existing translations!)
+3. allow ignoring these errors

Aaron Madlon-Kay - 2020-07-04

There is an impedance mismatch between HTML's conception of tags and OmegaT's conception of tags (basically XML-like).

In my opinion it doesn't make sense to make the tag validation logic try to handle this, because it would need to understand the HTML spec to know what tags are OK to leave unclosed, and that is the filter's job.

I also don't like making tag validation too configurable, because it increases the user's burden. How can a user know what is valid and what's not? It will probably depend on the specific project and maybe even differ between files in the same project.

Thus I think the correct thing to do is ensure that the HTML filter outputs tags compatible with OmegaT's tag format; this is your suggestion (2). Yes it's breaking, but that isn't necessarily disqualifying.

A bigger idea is to allow filters to particpate in the validation process somehow. Obviously knowledge of what HTML tags can be standalone belongs in the HTML filter; it makes sense then that the HTML filter should know if tags in the translation are valid.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2020-07-04

Actually "1. don't complain if the source has no closing tag either" is a reasonable idea too. I don't expect it will be easy to implement though.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Fleurke - 2020-07-04

I implemented 1.
It was a fairly simple fix.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2020-08-03

summary: some tags recognized as start-tag causing validation errors --> Tag validation false positives for some tags from html2 filter

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2020-08-03

Fixed in OmegaT 5.3.0.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aaron Madlon-Kay - 2020-08-03

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tag validation false positives for some tags from html2 filter

The free computer aided translation (CAT) tool for professionals

Group

Searches

Help

#1001 Tag validation false positives for some tags from html2 filter

Discussion