File Release Notes and Changelog
Notes:
Changes:
Revision history for file2xliff4j.
2007/02/15
ExcelExporter.java ExcelImporter.java PPTExporter.java
PPTImporter.java RTFExporter.java RTFImporter.java
WordExporter.java WordImporter.java: Fix renaming
bug that occurs if the JOOConvert call fails.
OdfHandler.java: Fix bug that stored resolved entity values in the
format file, resulting in invalid content.xml/styles.xml at
export time.
2007/02/07
OdfExporter.java, HtmlHandler.java, MifTuPreener.java,
TuPreener.java, HtmlExporter.java, OdfHandler.java,
XMLExporter.java: Remove foreign namespace elements from
source and target elements (per the XLIFF spec). Instead use
the XLIFF mrk element (with mtype='x-coretext') to delimit the
the core text of the source and targets.
2007/01/31
ExcelExporter.java: Handle case where the .xls extension
is in mixed or uppercase.
HtmlHandler.java: Include indication in a sentence segment's
trans-unit tag of whether it is followed by another sentence
in the same paragraph.
HtmlImporter.java: Default to sentence segment boundaries rather
than paragraph boundaries.
MifImporter.java: Include indication in a sentence segment's
trans-unit tag of whether it is followed by another sentence
in the same paragraph. Default to sentence segment boundaries
rather than paragraph boundaries.
OdfExporter.java: During export, remove all XLIFF mrk tags
from exported segments.
OdfHandler.java: Convert Wingdings characters (present in ODF that
was generated by OOo from RTF documents) to unicode. Compress
text:tab elements that are surrounded by text:span open and
close tags to a single XLIFF x tag. Default to sentence
segmentation.
OdfImporter.java: Handle nested text:p elements in styles.xml
(in the same way as in content.xml).
OOoTextExporter.java: Handle original file extensions that are in
mixed or all uppercase.
PPTExporter.java: Handle .ppt extension in mixed or all uppercase.
RTFExporter.java: Handle .rtf extension in mixed or all uppercase.
TuPreener.java: Handle XLIFF mrk elements of
mtype='x-mergeboundary'
TuStrings.java: Make "Not yet translated" message (in exported
documents) more helpful. Fix unnecessary (and bug-infested)
conversions between String and UUID.
WordExporter: Handle .doc extension in mixed or all uppercase.
XMLExporter: Make sentence the default segment boundary. Fix a few
XLIFF generation bugs.
2007/01/09
Converter.java: Add support for styles.xml skeleton (if
applicable); increase BLKSIZE to 8192. Added skipList
parameter to the convert method (a list of potential
structures to omit)--used (for now) in the generic XML
converters.
OdfStateObject.java: New class to maintain states between
conversion/import of content.xml and styles.xml files in OOo
ODF documents.
XMLImporter.java, XMLExporter.java, XMLSkeletonMerger.java: Three
classes for the import/export of generic XML to/from XLIFF.
ConverterFactory.java: Add support for generic XML
conversions. Add getDataTypeFromXliff() method.
OdfExporter.java, OdfImporter.java, OdfSkeletonMerger.java,
OdfHandler.java: Add support for styles.xml file (in addition
to content.xml), which frequently has translatable text for
headers and footers (etc.)
HtmlHandler.java: No longer preserve newlines in translation
units--unless within a <pre> tag. Fix inconsistencies in the
handling of the <br> tag. Fix bug (introduced when sentence
segmentation was added) in the writing of attribute values to
the temporary skeleton file.
MifImporter.java, MifExporter.java, MifSkeletonMerger.java,
MifTuPreener.java: Add support for sentence
segmentation. Otimized regex Matchers.
WordImporter.java, WordExporter.java, RTFImporter.java,
RTFExporter.java, PPTImporter.java, PPTExporter.java,
ExcelImporter.java, ExcelExporter.java: Added workaround to
latest JOOConverter's inability to convert files with names
containing non-ASCII characters.
TuPreener.java: Modifications to support the generic XML
converters.
ConversionStatus.java: Added WARNING_INVALID_XML_EXPORTED status.
OOoTextExporter.java: Optimize reading of input streams. Support
styles.xml.
HtmlSkeletonMerger.java: Increased BLKSIZE to 8192 (for reading
streams); optimized regex Matchers.
FileType.java: Added XML file type.
HtmlImporter.java: Added form, option and input tags to the list
of tags that can break translation units.
Format.java: Read the format file a block at a time rather than a
character at a time.
XliffImporter.java: Upgraded convert signature to include skipList
parameter. (This importer still isn't fully supported ... and
may never be.)
Added new fx2utils package to be used by the generic XML importer.
2006/12/07
Converter.java, ExcelExporter.java, ExcelImporter.java,
HtmlExporter.java, HtmlImporter.java, HtmlHandler.java,
OdfExporter.java, OdfImporter.java, OdfHandler.java,
OOoTextExporter.java, OOoTextImporter.java, PPTExporter.java,
PPTImporter.java, RTFExporter.java, RTFImporter.java,
WordExporter.java, WordImporter.java: Added boundary and
generatedFileName parameters to the convert method. The
boundary is an indication of where segmentation is to occur on
import (i.e., at the sentence or paragraph level). The
StringWriter generatedFileName is where converters can return
the actual generated filename (if generatedFileName is
non-null). Removed the oldest deprecated convert() method
variant.
MifExporter.java, MifImporter.java: Sentence segmentation not yet
implemented. However, fixed a bug that prevented MIF
conversion on Tomcat (viz., Tomcat's non-support of the
Charset Service Provider Interface.)
OdfSkeletonMerger.java: Support sentence segmentation; fix a bug
or two in existing skeleton generation.
SegmentBoundary.java: New enum for paragraph and sentence
segmentation boundaries.
TuPreener.java, TuStrings.java, HtmlSkeletonMerger: Modifications
to support sentence segmentation.
XliffImporter.java: Changed method signatures to match the
Converter interface. (No other changes implemented.)
2006/11/20
OOoTextImporter.java: Before attempting to extract content.xml from
an odt file with a Japanese characters in the file name, rename
(temporarily) the file to a straight ASCII name. Then extract
content.xml. Restore the file name afterward. (This is a workaround
for a bug that occurs in the Windows JDK's ZipFile class.)
2006/11/01
TuPreener.java, TuStrings.java, HtmlExporter.java: Implement
TuPreener.checkAndRepairTuTags method, a "relaxed" version of
the TuPreener.validateAndRepairTu method. The new method doesn't
require that all bx tags be matched by an ex tag. (This is
consistent with the way many HTML pages are constructed.)
2006/10/30
HtmlHandler.java: Treat the text of buttons (etc.) as translatable
text.
2006/10/24
OdfHandler.java: Add collapseAnnotations and collapseFootnotes
method to collapse the sequences of bx/ex/x tags required
to define an office:annotation or text:note sequences into
a single x element. This will make XLIFF editors less
susceptible to unbalanced XLIFF tag errors.
OdfExporter.java: Make changes to recursively expand the bx/ex/x
tags (in particular, expand the single x tag above into
a sequence of bx/ex/x tags, then to expand the x tag in the
tag sequence by substituting the trans-unit of the note or
annotation for the x tag.
2006/10/20
TuPreener.java: Add the Unicode byte-order mark (U+FEFF) to the
character class of whitespace characters.
2006/10/17
HtmlImporter.java: Make the quotation marks around Content-Type
optional in the guessEncoding method.
2006/10/12
OdfExporter.java: Use String's replace method instead of
replaceFirst when expanding XLIFF tags to the text from the
format file (in case the original text includes the
literal string "$0", which will cause an infinite loop).
HtmlExporter.java: Use String's replace method instead of
replaceFirst when expanding XLIFF tags to the text from the
format file. (See OdfExporter above. Bug never encountered in
HTML, but could conceivably occur.)
2006/10/10
HtmlHandler.java: Fix the regular expression that moves empty
bookmarks outside the core of the translation unit. (Fixes a
bug that produced invalid XML.)
HtmlImporter.java: Have the guessEncoding method read the first
1024 lines, rather than the first 25 lines of the HTML
file. Make the guessEncoding method public and static.
MifParser.java: Comment out import of
com.thoughtworks.xstream.converters.collections.CharArrayConverter
(not used).
OOoTextExporter.java: Modifications to handle RTF, ODT, ODS, ODP
and Word files correctly, especially when the files were
originally imported without filename extensions.
2006/10/07
OdfHandler.java: Set hasTranslatableText later in the code--after
closing ex tag has been appended.
OdfSkeletonMerger.java: Fix initial passes through the original
ODF file (during conversion to a skeleton file), so that it
will work properly for text:p tags with nesting level greater
than 2.
2006/10/05
HtmlImporter.java: Use the detected encoding when calling the
skeleton merger method (the same as when we parsed the
input file.)
OdfExporter.java: Fix signature (almost) in exported ODT,
ODS and ODP files. (Byte at offset 4--counting from
zero--should be 0x14. Maybe later ...)
OdfHandler.java, Format.java, OdfExporter, OdfSkeletonMerger:
Rewritten to handle text:p elements at any
depth (e.g., within tables, draw:* elements, etc.)
ExcelExporter.java, OOoTextExporter, PPTExporter. Fix bugs
that prevent export from happening.
OOoTextImporter.java, OOoTextExporter.java, FileType.java,
ConverterFactory: Officially support ODS and ODP
documents.
ConverterFactory.java: Implemented format detection (based on
magic file signatures).
All java files: Replaced GPL statement in prolog with LGPL
statement.
Notifier.java: New interface that lets converters send/
make notifications in case of error.
Converter.java, ExcelExporter.java, ExcelImporter.java,
HtmlExporter.java, HtmlImporter.java, MifExporter.java,
MifImporter.java, OOoTextExporter.java, OOoTextImporter.java,
OdfExporter.java, OdfHandler.java, OdfImporter.java,
PPTExporter.java, PPTImporter.java, RTFExporter.java,
RTFImporter.java, WordExporter.java, WordImporter.java,
XliffImporter.java: Modifications to implement the
Notifier interface,
2006/09/01
MifImporter.java, MifExporter.java, MifParser.java,
MifTuPreener.java, MifFrameRomanCharset.java,
MifCharsetProvider.java, MifSkeletonMerger.java,
META-INF/services/java.nio.charset.spi.CharsetProvider: New files
to support FrameMaker's Maker Interchange Format (MIF).
PPTImporter.java, PPTImporter.java: New files to support
PowerPoint format files.
Converter.java and its implementers (Importers and Exporters):
Added phase-name support; deprecated methods that were
replaced.
Most other files were changed to reflect MIF support, to add
full name and e-mail address in @author JavaDoc tags, and
to fix numerous bugs.
2006/06/05
ExcelExporter.java, ExcelImporter.java: New converters to convert
Excel spreadsheets to and from XLIFF, using OpenOffice.org.
RTFExporter.java, RTFImporter.java: New converters to convert Rich
Text Format (RTF) documents to and from XLIFF, using
OpenOffice.org.
FileType.java, ConverterFactory.java: Modified to handle Excel and
RTF.
OdfHandler.java: Create separate ctypes for text:s and text:tab
tags (of OpenDocument format).
XliffImporter.java: New importer for generic XLIFF to an XLIFF
subset. (All format tags map to bx, ex and x; sub tags map
recursively to new trans-units; multiple file elements in
input XLIFF become a single file element; non-text elements
are omitted.
XliffSkeletonMerger.java: New partial implementation of a merger
of the tskeleton into a final skeleton. (To do: account for
CTYPE and comment areas in the original document; implement
Xliff Exporter.)
2006/06/05
WordImporter.java: Close the OpenOffice socket connection only if
it isn't null ... and remove it from the finally statement,
since that *always* results in a null pointer exception.
WordExporter.java: See WordImporter.java (above)
TuPreener.java: Recognize more kinds of bullets.
OpenOfficeConnectException.java: New subclass of
ConversionException, thrown by WordImporter.java and
WordExporter.java if unable to connect to a listening
OpenOffice.org (soffice) process.
HtmlSkeletonMerger.java: Fixed bug in locating trans-units within
attribute values. Also fixed bug in stack of pointers to
locations within the skeleton buffer. (The stack cries for a
new implementation.)
2006/05/31:
OdfImporter.java: Really ignore the native encoding specified as a
parameter to the convert method (as the JavaDoc claimes that
OdfImporter does). This ensures that XLIFF ends up encoded in
UTF-8 (which is nice for things like Arabic documents).
build.xml: Added rudimentary ant build file.
file2xliff4j_intro.html: An overview of how to use file2xliff4j.
2006/05/30:
HtmlExporter.java: Handle trans-units that appear within title
and alt attributes of img and a (anchor) tags. (Those tags
appear in the format file.)
HtmlHandler.java: Move "empty" bookmarks (i.e., anchor tags with
name attribute but no href attribute, and no characters
between the beginning and end "a" tags) "outside" the
trans-unit--i.e., to the "left" of the beginning <lt:core> tag
that introduces the "core" area of the trans-unit.
Recognize img, br, param, applet, embed and object tags. (The
br tag now maps to an <x ctype='lb' ... /> XLIFF tag.)
Expand ctype attributes in bx and x tags. (This release
introduces x tags in HTML.) Now all bx and x tags have a ctype
attribute. If no "standard" ctype value maps to an HTML tag,
use a ctype value of "x-html-<tag_name>" (where <tag-name> is
a lower-case HTML tag). Examples: x-html-big, x-html-cite.
Replace the non-compliant superscript and subscript ctype
values (values of the ctype attribute of x and bx tags) with
compliant x-html-sup and x-html-sub.
HtmlSkeletonMerger.java: Handle applet, br, embed, img, object and
param HTML tags in the creation of the skeleton file. In the
process, fix a bug in text of the alt attribute of img tags
(which were previously skipped under certain conditions).
OdfHandler.java: Expand the ctype attributes of bx and x tags to
match the expansion done in HtmlHandler.java. (Also replace
superscript and subscript with x-odf-superscript and
x-odf-subscript.)
Fix bug in Word Documents that include annotations (which
become office:annotation tags in the ODF XML generated by
OpenOffice.org).
OcfSkeletonMerger.java: Fix for the office:annotation bug
mentioned above.