From: SourceForge.net <no...@so...> - 2013-01-22 14:31:15
|
Feature Requests item #3556996, was opened at 2012-08-13 11:33 Message generated for change (Comment added) made by didierbr You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=520350&aid=3556996&group_id=68187 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: File filters >Group: 2.6 Status: Open >Resolution: Fixed Priority: 5 Private: No Submitted By: Bob Myers (rtmyers) >Assigned to: Didier Briel (didierbr) Summary: In OpenXML filtter, add xml:space="preserve" to w:t elts Initial Comment: If a piece of text starts or ends with a space, and the w:t element does not contain the xml:space="preserve" attribute, the space gets ignored. OmegaT's OpenXML file filter should add that attribute if necessary. For instance, the OpenXmlPowerTools RemoveOpenXmlMarkup does this. ---------------------------------------------------------------------- >Comment By: Didier Briel (didierbr) Date: 2013-01-22 06:31 Message: Implemented in SVN (/trunk). Didier ---------------------------------------------------------------------- Comment By: Bob Myers (rtmyers) Date: 2012-08-19 02:10 Message: If the filter architecture makes it hard to change/add attributes, then is some kind of post-processing step a possible solution? For instance, this would be a very simple XSLT transformation. ---------------------------------------------------------------------- Comment By: Bob Myers (rtmyers) Date: 2012-08-18 06:41 Message: Actually OmegaT is working perfectly. It replaces the content of the <w:t> element, such as <w:t>sourcetext/w:t> with the target, yielding <w:t> targettext</w:t> (note space at beginning of text content). Exactly what it should do. The problem is, when Word reads the target file, it ignores the leading space because the <w:t> element has no xml:space="preserve" attribute. That attribute did not need to be present in the source, and therefore Word did not output it, since the text content of the tag in the source file did not have a leading or trailing space. If it had, then Word would have (does) insert the xml:space="preserve" attribute. No nobody is really at fault here. The problem is created by the infelicitous conjunction of (1) the fact that Word does not put xml:space="preserve" on <w:t> elements unless it thinks it needs to, and (2) a particular translation situation where a subsegment with no leading or trailing space in the source languages is translated into a subsegment that does use a leading or trailing space, something that can happen quite easily when going from a language such as Japanese with no intraword spaces, into English, in the presence of tags, which is my case. In case it matters, the absence of the xml:space="preserve" attribute will also cause two or more adjacent spaces in the middle of the textual content to be treated as one. I presume the number of cases where someone really wants two spaces in the target translation is very small, but we might nevertheless wish to be cognizant of the fact that in such cases the overall end-to-end behavior of OmegaT together with Word will be to silently compress multiple spaces. The safest thing to do when considering such behavior might therefore be to simply add the xml:space="preserve" attribute to ALL <w:t> elements processed through the filter. ---------------------------------------------------------------------- Comment By: Didier Briel (didierbr) Date: 2012-08-18 06:17 Message: Of course, a missing space in the target document is an issue. The point was: if the attribute is not set in the original document, and Word manages fine without it, then the issue might be somewhere else in OmegaT, which is worth investigating (spaces should not be lost). Another point is, as no OmegaT filter knows how to change attributes in the target document, it's not a question of a two-line change, but of implementing this concept. It is not a philosophical issue, it's a technical one. Didier ---------------------------------------------------------------------- Comment By: Bob Myers (rtmyers) Date: 2012-08-18 05:43 Message: Stupid me, I thought the purpose of a filter was to allow the user to actually create a usable translation with OmegaT. Instead there is some philosophical objection related to the inner nature of filters that prevents this enhancement, and therefore means that I end up with translations with missing spaces, which I either have to all fix manually, or run the output through some separate program such as OpenXmlPowerTools RemoveXmlMarkup filter to do what the filter could have done in two lines of code? I guess this is the point where I should join the project and provide the patch, although given this attitude I'm not even sure it would be accepted, and it seems more logical for someone that knows the code base and has the dev environment set up to do it instead of me setting things up and learning the code. For your information, the relevant code from OpenXmlPowerTools is: if (textElementValue.Length > 0 && (textElementValue[0] == ' ' || textElementValue[textElementValue.Length - 1] == ' ')) return new XAttribute(XNamespace.Xml + "space", "preserve"); } ---------------------------------------------------------------------- Comment By: Didier Briel (didierbr) Date: 2012-08-18 04:28 Message: Understood but then, what is requested is a bit beyond what OmegaT filters are able to do currently. Besides, if the Word document is opened in Word (without any CAT tool), I guess the space xml:space="preserve" attribute is not there either. OmegaT filters are supposed to mimic the behaviour of a translation directly in the file. Didier ---------------------------------------------------------------------- Comment By: Bob Myers (rtmyers) Date: 2012-08-16 03:48 Message: The filter option does not do what is required. Apparently, what this option does is to force preservation of white spaces in the INPUT document, even if xml:space="preserve" is missing. The OUTPUT document continues to have no xml:space="preserve" attribute on the w:t elements, which is what is needed. ---------------------------------------------------------------------- Comment By: Didier Briel (didierbr) Date: 2012-08-16 01:18 Message: You can force xml:space="preserve" for all tags in the Open XML filter (Options > File Filters > Open XML. Is that solution not enough? Didier Briel ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=520350&aid=3556996&group_id=68187 |