dbUnit / Bugs / #91 problem because tableName is case insensitive

Sycorob - 2005-06-22

Logged In: YES
user_id=1301399

Duplicated the same problem - Oracle keeps files in the
recycle bin after they've been dropped. I had a table named:
gABzUHjPFjgNAADuo6wMQ==$0

There was another table in the recycle bin with the exact
same name, except with a different case on one of the
letters. I removed the .toUpperCase() calls, and it removed
the error.

I also don't see why this check is neccesary. In bilaco's
case, this could actually cause problems if you had two
identically named, but differently cased, tables. How would
you differentiate them, if everything is forced to upper-case?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Felipe Leme - 2006-05-28

Logged In: YES
user_id=129356

Hi all,

I think it's dangerous to change such behaviour, as it might break backwards
compatibility. Besides, there must be a reason the class use it - maybe it's an
issue with some databases.

On the other hand, we could add something like a isCaseSensitive() method in
the IDataSet interface (and the respective setCaseSensitive() method in the
implementations).

And if we add such methods, we have 2 decisions to make:

- keep the current behaviour (to keep compabitibility) or the new one
- whether this flag should be explicitely set by the end user or implicitly by
the database factories.

If we change the behaviour and the current tests pass, I'm favorable to set it
as default and leave the change to the end user - eventually, we will identify
databases where that is a problem (in the form of bug requests and/or
complaints in the lists :-) and then we could delegate the decision to the
factory.

Anyway, if someone can come with the proper patch (including the changes,
documentation and test cases), I would be glad to review and apply it.

-- Felipe

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-10-20

Hi there,

I introduced some changes related to this on the current trunk rev. 841 and tried to maintain backwards compatibility (i.e. default is case-insensitive). I also added a constructor to the DatabaseDataSet taking a boolean "caseSensitiveTableNames" which should solve this issue.

I deliberately did not add the "isCaseSensitive()" method to the IDataSet interface since I believe it should not be there. I prefer the pre-1.5 way where an IDataSet can be decorated with a CaseInsensitiveDataSet. In my eyes this is more elegant.

Another important item to me is <b>not</b> to add a method "setCaseSensitive()" since in my eyes an IDataSet instance should be immutable after its creation (it also was before I think). Instead I added a constructor to the AbstractDataSet taking a boolean arg "caseSensitiveTableNames".

Moreover I heavily thought about how to easily set the case sensitivity via a factory but nothing sensible came to my mind. The DatabaseConfiguration object is only available to DatabaseDataSet and QueryDataSet. All other datasets would not be able to access it. So this was not an option.

The advantage of the current implementation is that the only location where the case sensitivity is handled is in class OrderedTableNameMap. If you have a more detailed look at this you will find several locations where it is used - unfortunately at two of those locations I had to invoke the default constructor (which means that caseSensitiveTableNames is set to false). These are SequenceTableFilter and FlatXmlProducer. If anyone has an idea on how to handle/fix this I would be very grateful.

So please drop your comments on this quite big change so that I know how to proceed with this.

Thanks in advance and regards,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-08

I've just upgraded from 2.3.0 to 2.4.1. This fixed a problem having to do with running DbUnit inside Ant inside CruiseControl (thanks!), but created a new one having to do with the case of table names. The code that tries to do a clean insert now receives a NoSuchTableException from DataBaseDataSet.getTableMetaData(String tablename).

The XML dataset that I'm trying to insert was originally extracted from a MySql DB on Linux, where MySql is case sensitive. It contains table names with mixed case (poor form I know but it isn't my DB). The exception is thrown when I try to do the clean insert on a MySql DB on Windows, where MySql is not case sensative. This worked on 2.3.0 --- that is, a dataset extracted from a case-sensitive DB could be inserted into a case-insensitive one.

One way of fixing this might be to have DataBaseDataSet.getTableMetaData(String tablename) not use _caseSensitiveTableNames to determine whether to convert the case of the tablenames but to use the "case" properties of the metadata object of the DB being inserted into to --- e.g., if DatabaseMetaDate.supportsMixedCaseIdentifiers() is true just use the table name as stored in the extracted dataset.

/Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-08

Hi Paul,

thanks a lot for your suggestion! I like the idea of calling "DatabaseMetaData.supportsMixedCaseIdentifiers()" for identifying the behaviour of the DB. On the other hand I want to avoid putting a quick hack into the DatabaseDataSet to fix this problem. I can imagine change the "CASE_SENSITIVE_TABLE_NAMES" feature to a property which might allow the following values: "true", "false", "native" (which would be the "supportsMixedCaseIdentifiers"). Do you have any ideas on how to consistently implement this?

Regards,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-08

assigned_to: rlogiacco --> gommma
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-09

Hi Mat,

Thanks for all your work on DbUnit.

I've only had a small look at how DbUnit works internally but it seems to me a good way to go would be to remove the concept of case-sensitivity from DataSets entirely. When a DataSet is extracted from a DB it would get whatever case the DB presents itself as all upper case, all lower case, or mixed. When one goes to insert a DataSet into a DB some logic would use the properties of the DatabaseMetaData to figure out how best find the corresponding tables in the DB. E.g., if DatabaseMetaData.storesLowerCaseIdentifiers() is true it would convert the keys from the tableNameMap to lower case before comparing them with the metadata. If DatabaseMetaData.storesUpperCaseIdentifiers() it would first convert the keys to upper case before comparing them with the metadata.

I'm not sure whether there are cases where one needs a DataSet not to be case-impartial. If there are, using a decorator to wrap the DataSet would seem to be a good solution.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-09

You're welcome - I actually enjoy working on dbunit :-)

I agree that removing the case sensitivity from the DataSet would be a nice thing. There is one case I am not sure about how it will work in the future. Imagine you have a FlatXmlDataSet having one table name with different cases (for examle "CUSTOMER" and "Customer"). If the database (e.g. oracle) has both of the tables as well how should dbunit decide which one should be populated during a DatabaseOperation.INSERT? During the creation of the FlatXmlDataSet dbunit must somehow - without having database access at this time - figure out whether the both "CUSTOMER" and "Customer" represent the same ITable (adding all rows to the same dbunit object) or whether they are two different tables. I do not yet see a neat solution for this.

BTW, I appreciate your input very much! Thanks a lot for your participation.

Best regards,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-10

Hi Mat -

It's very possible that I'm missing something becuase I'm not nearly as familiar with DbUnit as you are, but the the INSERT case doesn't seem to present a problem to me. If the FlatXmlDataSet where to be case-impartial then it would create a separate table entry for each distinct table name for which data is added to it. So if somehow data were added to it for "sometable", "SomeTable" and "SoMeTaBle" it would store these as three separate table entries. When the FlatXmlDataSet is INSERTed into a DB it would use the case properties of the DB it's writing to to determine how to map its table names to DB table names. If the DB uses case sensitive table names then it would try to map them as three different tables, if the DB stores all tables names as uppercase it would try to map them all as "SOMETABLE", . . .

In the case where the DB is case sensitive it works correctly. In the case where the DB is not case sensitive it results in three sets of INSERTS to the same table instead of one, but the final result will still be correct too.

The CLEAN_INSERT case would require a bit of extra logic however. Doing three separate CLEAN_INSERTS will produce a different result to doing a single CLEAN_INSERT with all the same data.

One way to make this work would be to have the DeleteAllOperation used by CLEAN_INSERT store the names (in their DB format) of all the tables it has deleted and throw an exception if it is ever called to delete the same table more than once. This would not only catch this case but would also catch cases where someone accidentally adds more than one set of data the same table to a FlatXmlDataSet file.

/Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-10

Hi Paul,

this is a quite challenging solution. I'd like to draw the following use case:

FlatXmlDataSet (caseSensitive according to your proposal):
1. <SOMETABLE ...>
2. <SomeTable ...>

Now imagine to have the following database tables on an Oracle Database (where DbMetaData.supportsMixedCaseQuotedIdentifiers()==true but DbMetaData.supportsMixedCaseIdentifiers()==false):

A) SOMETABLE, "SomeTable"
This case is clear to me. XmlTable "1" is inserted into SOMETABLE, XmlTable "2" is inserted into "SomeTable".

B) SOMETABLE
In this case, following your suggestion, both XmlTables are stored to SOMETABLE automatically. I find this is a bit confusing but I could live with this.

C) "SomeTable"
Here it is not clear to me what should happen. I can also not see a clean solution to this. Do you have an idea for this?

Another interesting point would be how to handle the DatabaseConfig#PROPERTY_ESCAPE_PATTERN property which is used to escape table names for cases like those mentioned above.

BTW, you could solve your problem by setting the case-sensitivity depending on the current platform you are testing using the ant environment property:

<condition property="not.windows">
<and>
<not><os family="windows" /></not>
</and>
</condition>
...
<dbunit driver="${dbunit.profile.driverClass}"
url="${dbunit.profile.connectionUrl}"
userid="${dbunit.profile.user}"
password="${dbunit.profile.password}">
<dbconfig>
<feature name="caseSensitiveTableNames" value="${not.windows}" />
</dbconfig>
...

mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-10

Hi Mat -

In case C), wouldn't it in fact be an error if there was a <SOMETABLE> element in the FlatXmlDataSet? If so then it would be appropriate for DbUnit to just through an exception.

Thanks for the suggested work-around.

/Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-11

Hi Paul,

for case C) an exception would be appropriate. But then for consistency reasons I'd say that case B) should also throw an exception (because table "SomeTable" does not exist). Do you agree? I think that your use case (MYSQL on Windows and Unix having different default-case settings) is quite a bit special, but probably the removal of the "_tableMap.getTableName(tableName)" in method "DatabaseDataSet.getTableMetaData" (where the new DatabaseTableMetaData is created) might solve your issue? This is one location I want to investigate in detail.

Regarding your CLEAN_INSERT proposal: it sounds a bit like a hack to me because we would introduce a dependency from the DeleteAll to the Insert operation. Or did I miss something here?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-11

Hi Mat -

I can see the argument for throughing an exception for B) too, but to me B) seems ok. The same issue comes up with SQL as well --- if you run SQL against a case-sensitive DB then it treats it case-sensively, if you run it against a case-insensitive DB it just silently converts it into whatever case it uses. I think DbUnit should be no more restrictive then SQL.

There are a few ways that the CLEAN_INSERT probelm could be handled. One way would be to do it in two passes: on the first pass do all the cleans and then on the second pass do all the inserts. However, because CLEAN_INSERT is implmented as a CompositeOperation that combines the DeleteAllOperation and the InsertOperation, this would require a lot ofextra logic.

The alternative that I was suggesting was to have the DeleteAllOperation just keep a list of all the tables that it has already been run against (using the names from the tables from their metadata). Then if the same name comes up again just throw an exception.

/Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-12

Hi Paul,

there is one more scenario we probably should think about. It is when first importing and then exporting a dataset to/from a database, should dbunit not guarantee both datasets to be equal? So another example, following your suggestion (if I got everything right):

FlatXmlDataSet (caseSensitive according to your proposal):
1. <SOMETABLE ...>
2. <SomeTable ...>

The oracle database (having DbMetaData.supportsMixedCaseQuotedIdentifiers()==true) has the following tables:
A) SOMETABLE

So during the INSERT both XML tables SOMETABLE and "SomeTable" are written into the one DB table SOMETABLE.
If we now export this table from the DB we will get a FlatXmlDataSet which has the following entries:

1. <SOMETABLE ...>
2. <SOMETABLE ...>

So this 2nd FlatXmlDataSet would later fail (or even worse fill the wrong DB table) when being INSERTed into MYSQL if there both tables SOMETABLE and SomeTable exist. So I think with your approach this information is lost and you do not have any possibility to avoid this. With the current implementation you can do so by explicitly specifying "caseSensitive=true".

Is these thoughts sensible or am I missing something?

Best regards,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-14

Hi Mat -

I believe the case that you give shouldn't come up, and if it does something has gone wrong in producing the FlatXmlDataSet and so DbUnit should throw an exception. It would only ever make sense for DbUnit to convert the case of a dataset table name when it is exporting into a DB that is case insensitive, in a case like your example where the DB allows mixed case names it should require exact case matches.

Lets pretend, just to simplify things for a moment, that the distinction between quoted and unquoted identifiers doesn't exist. Then there are three possible types of DB:

1) DBs that are case sensitive --- when you query their metadata for table names the table names may have mixed case.

2) DBs that are case insensitive and store identifiers in lower case --- when you query their metadata you get table names in lower case.

3) DBs that are case insensitive and store identifiers in upper case --- when you query their metadata you get table names in upper case.

What I have been arguing for, although there may be complicating factors that I have overlooked, is really simple:

A) When DbUnit extracts data from a DB to write into a dataset, it should use the case of the identifiers as the DB defines them in its metadata. For DBs of type 1 these may contain any combination of upper and lower case. For DBs of type 2 these will be all lower case and for type 3 these will be all upper case.

B) Don't try to associate any sort of "isCaseSensitive" state with each dataset.

C) At the time that you go to write a dataset to a DB, use the DB's metadata to figure out how to map the dataset's table names to the DB's table names. If the DB is case insensative, determine whether it is type 2 or 3 and change the case of the dataset's table names to the appropriate case before comparing them with the names given in the DB's metadata. If the DB is case sensative, don't do any conversion of the case of the dataset's table names, but just compare them as they are.

D) Valid datasets never need to have more than one table entry for each table name. To support case-sensitive DBs DbUnit needs to allow dataset table entries to differ only in case. However, it should throw an exception when such a dataset is exported to a case insensitive DB.

This would follow the principle of being strict in what you produce but being lenient in what you accept. In cases where none of the table names in a dataset differ only in case, these datasets could be exported without problem to any of the three types of DBs. In cases where the original DB was case sensitive and two or more table names differ only in case, it would allow a such dataset to be exported to any other case sensative DB, which is the best that you could hope for.

The fact that some databases allow mixed case for quoted but not for unquoted table names makes things slightly more complicated. I don't believe that there will be any DBs which support mixed case for unquoted but not for quoted table names. That is, if a DB supports mixed case for unquoted names it will also support mixed case for quoted names. And if it doesn't support mixed case for unquoted names it may allow names to be quoted to allow mixed case. So I think the following logic should work.

When DbUnit needs to find the table in a DB that matches a table name in a dataset:

- First, try to find the table in the DB whose name matches the table name in the dataset exactly as it appears in the dataset without mdifying its case. If a match is found return it.

- If no match was found and DatabaseMetaData.storesUpperCaseQuotedIdentifiers() is true, convert the dataset table name to uppercase and look for a match for this.

- If no match was found and DatabaseMetaData.storesLowerCaseQuotedIdentifiers() is true, convert the dataset table name to lowercase and look for a match for this.

Hope this makes sense,

/Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-15

Hi Paul,

thanks for your exhaustive explanation - they absolutely do make sense.

Nevertheless I see two more things to look at (maybe you missed them or I still didn't get it):
1) FlatXmlDataSet (and others that are not base on a DB, e.g. csv, xls) internally keep ITable objects, each of them representing a single table. So if you have an XML like the following

<SOMETABLE ...>
<SomeTable ...>

how should dbunit handle this? (note that at this stage the XML is only parsed into memory, no database access is available for this) Should it create one ITable or two of them? This question is crucial for a comparison/assertion of the XML (as "expected results") with the actual database state, since the IDataSet will either check for the existence of a single table or two tables.

2) One more question I do have: what if you did your first export on the Windows machine (if I got it right all tables would be in uppercase here) and then try to import this uppercase FlatXmlDataSet on your linux box? Would that work and if yes, how?

Best regards and thanks for your patience,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-16

Hi Mat -

If, as in your example, a dataset contains two tables whose names differ only in case, then that dataset can only match another dataset with the same two table names, so the comparison must necessarily be case-sensitive. If neither dataset contains pairs of tables that differ only in case, the I think it's fair and sensible to make the comparison between them case-insensitive.

So if we were to follow the principle of not specifying datasets as either case-sensitive or case-insensitive then there are at least two ways of handling the problem of comparing datasets for equivilence:

1) Have comparison operations be specified as case-sensitive or -insensitive when the comparison is coded.

2) Have DbUnit automatically check whether either dataset contains two tables whose names differ only in case, if so do a case-sensitive comparison and otherwise do a case-insensitve comparison.

Regarding the second question, MySql on Windows converts all table names to lower case, so if your table names on a Linux box have some upper case characters and you try to insert into it a dataset extracted from the same schema on a Windows box, it doesn't work. Exporting data from a Linux box to a Windows box works without problems. This isn't unique to DbUnit datasets, it's an SQL issue. You can run SQL extracted from a case-sensitive DB in a case-insensitive DB, but not, generally, vice versa.

/Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-16

Hi Paul,

If I had to choose one I'd prefer solution 1) because the user can control how comparison is performed. In older dbunit releases where DataSets have been case sensitive by default you could use the CaseInsensitiveDataSet wrapper to achieve exactly this (whereas this kind of "merge" behaviour for same tables with different case was not available here - it just throws an AmbiguousTableNameException in this case).

So if I got that right you implicitly state that all IDataSets should be case sensitive? Hopefully one of the other dbunit developers (Felipe & Sebastien are longer on the project than myself) does remember the reason for switching from case-sensitive to case-insensitive as default and can drop a comment here.

If I have some more time I will browse the sources again for the locations where I had to introduce the case sensitive flag on IDataSets to recap whether we did not oversee anything.

Thanks so far,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

paulr - 2008-12-16

Hi Mat,

I think we are talking about the same thing but just to clarify, what I've been suggesting is not to make datsets implicltiy case-senstive or case-insensitive but to explicitly make them case-neutral. What I mean by this is:

- There is no "_isCaseSensitive" like property on a dataset that specifies how it will behave or should be treated

- The table names in a dataset can have any combination of upper and lower case letters (and case may turn out to be significant or insignificant, just as in SQL)

- Whether the case of its tables names is significant or not is determined when an operation a dataset.

- For insertion operations, if the DB is case-insensitive then first convert the table names to the appropriate case before looking up tables in the DB's metadata. If the DB is case-sensitive then use the table names as they are.

- For comparison operations, first check whether either dataset has two or more tables whose names differ only in case, if not then do comparison in a case-insensitive way, if so then the comparison must be done case-sensitively.

When a schema contains two or more tables whose names differ only in case, the developer should be aware of it. So making the developer have to explicitly choose a case-sensitive comparison operation is probably a good idea.

I was able to get around my problem with MySql by reverting back to 2.3.0. However, I haven't had time to look into why it works.

Best regards,

Paul

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-18

Hi Paul,

could you try the 2.4.2 + the suggested workaround. This should then work for you again.

I am still thinking about all your proposals...I think the next time we touch the table names thing we should really provide a proper and consistent solution so that problems with case sensitivity would be past.

BTW, do we also need to consider the case of schema names? If someone uses an XML with a fully qualified table name (e.g. "<MY_SCHEMA.CUSTOMER name='john' />") might the "MY_SCHEMA" case also be relevant?

Regards,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

matthias g - 2008-12-28

Hi Paul,

I am still thinking about the question how dbunit IDataSets should store table names while completely ignoring the _caseSensitive property. In 2.3 there was an "implicit" toUpperCase() in all IDataSets (the tables were stored in a map whereas the table names were used as map-key being always "toUpperCase'd"). With the 2.4 I tried to make this explicit by introducing the class OrderedTableNameMap. If I got that right to be case neutral we must remove this general "toUpperCase" from the IDataSets and always take table names as they are. I would also like this but it would definitely be incompatible to the current dbunit release. I do not yet have a solution on how to keep backwards compatibility...

Regards,
mat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

problem because tableName is case insensitive

Automated tests with databases: prep, verify, restore tables

Fixed Release

Found Release

Searches

Help

#91 problem because tableName is case insensitive

Discussion