#977 Word filter: reduce number of unnecessary tags

word (1)
Guido Leenders

The Word filter shows lots of tags during translation. Especially with "old" documents with many revisions.

In the past, I've assisted a team that worked on a Word add-on that allows to fill Word documents through SQL using Word templates. During the development I've seen many XML fragments coming by which had no visual representation. At this moment we are working on loading filled Word forms back into the database and yet even more of these meaningless XML fragments.

These XML fragments are also visible in OmegaT and clutter the segments extremely.

Based upon http://blogs.msdn.com/b/ericwhite/archive/2008/11/04/remove-rsid-attributes-and-elements-before-comparing-documents.aspx, most of these tags are introduced automatically by Microsoft Word using random rsId, to allow Word to merge two branches without access to the original source from which the branches emerged.

In my vision, you will not need these XML fragments often (so you can delete them). You could also try hiding them from displaying, but when people would be trying to compare the original and translated version it might introduce problems.

I suggest to add an option to the Word Open XML filter to optionally delete these XML fragments described in the post by Eric White from the source.