After discussion in mailing list and #288, I could establish that Excel files without paragraphs (which is the case of most Excel files indeed) are not correctly read in StaX filter while read correctly in old filter.
Word and Powerpoint, which always have paragraphs, are not concerned.
@miurahr9 I am not seeing this fix in the change.txt file.
Yes it have been still open.
@t_cordonnier can you point @miurahr9 at your PR so that he can merge ?
@t_cordonnier, I guess the PR was about disabling the Excel handling in Stax, right ?
No. This ticket is about a bug in the StaX filter which prevents reading some Excel files correctly. We had together agreed to inactivate the filter for Excel files until I find a solution but this is another ticket, whose number I don't remember. This other ticket is probably closed, while the current one is still open.
For the moment I did not find a correct solution, I had implemented something but then the correction impacted Word files. So, I could not publish it as a pull request.
It seems that the Excel format is a little bit different from other OpenXML. And more generally the OpenXML format is very complicated.
Last edit: Thomas CORDONNIER 2025-02-24
The filter is disabled by default in 6.0
https://github.com/omegat-org/omegat/pull/1611
@t_cordonnier — could you share an update on the StaX filter enhancement? This has been pending for a while, so a status update would be appreciated.
@miurahr9 unless the message from 2025-02-24 is not clear, actually I could only make a diagnostic, not find a solution. For the moment my attempts to solve this had side effects on other Excel files, reason why I cannot publish anything
Appreciations for the update of status. The StaX filter is very complex and difficult to modify by other developers, and modifications can be easily produce side-effect...
If you are ok to share your attempt in dev list, it is valuable to educate other developers.