As I'm trying to copy an access 97 db to a new 2007 format, I'm iterating columns and The new db and table seem fine (ex: TEXT length of 25 is ported just fine).
However, when writing the data (string) to the new table new columns (of length= 25), the max writable string length is 12, and when larger I get:
com.healthmarketscience.jackcess.InvalidValueException: Text is wrong length for TEXT column, max 12, min 0, got 13 (Db=std jet db.pnc_8970746374852730438.accdb;Table=foo_table;Column=name).
Sample code to reproduce:
static void testSimple(File tmp) throws IOException {
try(Database db2 = DatabaseBuilder.create(FileFormat.V2007, tmp)) {
List<ColumnBuilder> colBuilders = Arrays.asList(new ColumnBuilder("name", DataType.TEXT).setLength(25));
TableBuilder tbb = new TableBuilder("foo_table");
tbb.addColumns(colBuilders);
Table tb = tbb.toTable(db2);
String src = "12345678901234567890123456789012345678901234567890";
for(int i=0; i<=25; i++) {
Map<String,Object> r = new HashMap<>();
r.put("name", src.substring(0,i));
try {
tb.addRowFromMap(r);
} catch(Exception e) {
System.out.println("failed at length = "+i+"; "+e);
}
}
}
}
Looking at source code a bit reveals something about units. There is some validation comparing string.length with the columns units length, which is an integer division of its lenght / 2 (after digging into jet format stuff and I got lost and couldn't tell you where it came from exactly...). Basically, 25/2 = 12.
Note that I was able to call columnbuilder.setLength(...) with 255*2 without error or warning, and it wasn't saved in the new table schema, it was the previous value from the columnbuildewr.setFromColumn().
if you are having a problem, it's best to ask a question in the help forums as opposed to going straight to filing a bug. that also increases the chances that other people in the community can see the question and chime in with suggestions.
Last edit: James Ahlborn 2024-05-05
correct, the column length is in bytes (noted here), but, depending on the encoding in use, a character can end up being more than one byte. that's why there is a
setLengthInUnits()method which can be used to set the column length for a column in units (characters), which will be appropriately translated to bytes for the resulting database column, noted here.Last edit: James Ahlborn 2024-05-05
Thanks for answering. Sorry for resorting to bug first, but it seemed reasonable given the following.
Unfortunately, the example I gave is just one way to reproduce.
Another way is that I load a db/table/column (from an access 97 db if it matters), and I create a ColumnBuilder from it. So, I'm not actually involved in choosing the length or lengthInUnits; one would expect the columnbuilder to pickup all necessary information to duplicate the column spec. I have not seen any charset input anywhere either (the size per unit seems hard coded to 2, likely for the typical MS utf-16).
The old column (showing _columnLength = 25, _lengthInUnits=-1 and _type._unitSize = 2) managed to store a 14 chars string correctly in the retrieved Row (map), but it can't be inserted to the new table. By the -1 up there, I suspect the column spec was initialized incorrectly, or if it is an expected state by the implementation upon loading the table spec, it may not be properly handled upon writes.
Hope this helps narrow the problem. Thanks.
ah, think i see the problem. (note, there's no reason to add bold emphasis to your comments, i can read them quite well). since the source is access 97 single type encoding, the column length is not getting correctly translated over to the appropriate 2 byte column length for a modern access variant.
Great.
And take no offense. People frequently use bold/italics (when supported) to grab the attention for fast readers on the essence of the post. Efficient communications for all, not just destined to you personally. It's not the same as all caps for sure.
Thanks.
i think the bold does have a similar effect as all caps, so i'd recommend choosing carefully when to use that in the future.
Oh, and the old column (TestColumnImpl) also shows _codePage = 1252 (well known windows single-byte charset) but there is no way to port that in a new column.
yes, modern access dbs don't need a code page because they use utf-16 instead of a single byte character encoding.
Last edit: James Ahlborn 2024-05-06
checked in some changes to trunk that should fix the problem. will be in the 4.0.6 release.