formalParameterList[i].formalType contains the value nvarchar allthough in the database it is set correctly to ntext. The problem then is that a limit of 4000 characters is implemented....
Is this a bug?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, it is a bug (IMHO) and I have run into this issue myself. I opened issue 861154 for this (which I have not spent the time to create a proper patch for yet) and patch 861821. If this is a critical issue for you and you are only working in a SQL Server 2000 environment let me know and I can give a copy of my driver which works around the issue.
-Brian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The problem is as follows: jTDS uses temporary stored procedures for PreparedStatements. Other drivers, as far as I have seen, are using sp_executesql instead.
The problem with jTDS's approach is that it has to declare the parameters as having one type or another. In the case of String values this is done by checking the length the first time the procedure is executed and declaring the parameter as either VARCHAR or TEXT, based on this. The next time the PreparedStatement is called it will have to use the same type for each parameter. So even if your column in the database is TEXT, if you first used a String shorter than 8000 characters the type will be VARCHAR (or NVARCHAR) so the new String will be cut down to the maximum length of the type.
The sp_executesql approach is much better as the driver doesn't have to declare a stored procedure, so it doesn't have to use fixed types. It simply submits a query with placeholders and values for those placeholders can be of any type. The server then internally identifies if the query was used before and it's already compiled and reuses it. The problem with this approach is a tiny note on the reference page for sp_executesql: "If object names in the statement string are not fully qualified, the execution plan is not reused."
That is why in order to use sp_executesql we would need an SQL parser (which the drivers using it have) in order to rewrite the query. If anyone has anything like that or is interested in writing it, it would really help a lot, as we could also get rid of the procedure cache maintained by jTDS (which could be seen as a big memory leak in its current implementation).
By the way, I just got an idea about how the current implementation could be fixed as I was writing this message. In case we have declared a parameter as VARCHAR, for example, and we get a value which does not fit that type, we could simply drop the stored procedure and declare another. How about that? It seems to be an acceptable idea...
Alin.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Seems reasonable to me. The way I worked around it was by implementing setCharacterStream to always declare the parameter as NTEXT. While not the best solution, it certainly allowed us to work past the problem.
I agree that the sp_execsql (and the variant sp_prepexec) methods are an elegant solution. Just so you know, they appear to be slower than the current jTDS implementation. I have tested the jTDS driver against the Microsoft Free JDBC driver, both of BEAs drivers, and two commercial drivers (which prevent me from naming them based on their licensing) and jTDS is at least 2x faster than even even the fastest driver I tested. Perhaps setting up jTDS to work either way would be beneficial?
On a side note, looking at the code, it would be nice if there were an abstract Tds class that could be extended to Tds42, Tds50, Tds70, Tds80 etc. and perhaps delegates for SQL Server and Sybase. I think this might make it a little easier to make changes without causing adverse side-effects with the other Tds implementations. If this abstraction was made for all the appropriate underlying classes, it would make it much easier to add support for SQL Server 2000 and Yukon (which I may be tasked with adding support for). It may also improve performance slightly since methods can be overloaded for a Tds implementation instead of having checks all over the place to determine the version. Just some thoughts...
-Brian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've written a SQL parser and a stored procedure parser in the past... I could probably reuse much of the code I already have.
I would throw up a big flag with the nvarchar versus ntext being an automatic determination... that could have some side affects. I have experienced some cases where using nvarchar rather than ntext when the column in the database is defined as a ntext caused CRLF to be stored as only a CR or only a LF (don't remember which at this point). This probably wouldn't be a problem if you are pulling the text out and displaying it in java; but if you were using .NET against the same database you would see blocks in your text rather than 'nextlines'. It might also be a problem if you expect to do a binary compare of that data to something stored in a text file to see if there were changes. What I'm not sure of is whether this problem was caused by the driver I was using at the time or by SQL server. Just thought I'd bring this up because it might be worth checking into.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is an interesting discussion ... it sounds like sp_executesql is the right way to go but parsing the sql ourselves sounds kind of error-prone. Plus, how do we know how to make the object names fully qualified? Is there some programmatic way to do this?
I like Alin's suggestion for dealing with nvarchar vs. ntext, as long as we can ensure Mark's concern about \r\n turned into \n can be addressed.
FYI, I found and fixed a bug recently where long integers in prepared statements were sometimes being incorrectly treated as regular (32 bit) integers ... another example of where having to make SQL type decisions on the client side bit us, but luckily in this case it was an easy fix.
--matt
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is nice to see this project spring back to life.
I have posted a patch containing my solution to the problem of cached procedure parameter errors. It passes all the test cases OK and probably fixes the problems at the root of bugs: 684567, 732644, 857211, 861154 etc.
It is a pretty generic solution unlike some of the other patches I have seen.
The sp_executesql approach is interesting and would probably solve problems such as that reported in 705749 which is caused by the decimal(28,10) parameter definition in the current procedure logic.
There are a few things that concern me about sp_executesql:
1. Which versions of SQL Server/Sybase support this function and will this harm Jtds backwards compatibility?
2. If the client is parsing the sql and the server is parsing the sql and all of the sql is being sent each time, the only saving is in optimizing the query on the server. The documentation does not give any guarantees about reusing an existing query plan even if the sql is the same.
3. It looks like a lot of work to get right. My vote would go to implementing additional functionality for example the getGeneratedKeys() method.
People seem to be using jtds in preference to the free Microsoft driver even with SQL 2000 because it is faster. I would recommend benchmarking the sp_executesql approach against the stored procedures before embarking on a major rewrite to ensure that this advantage is maintained.
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wow! This is really impressive. I never imagined there were people interested in more than using jTDS. ;) Thanks a lot, guys.
Getting down to business, thank you for the ideas and oppinions. Before your postings I was sure that sp_executesql was the way to go, but it was never going to be implemented as it required a really big effort. Now, on one hand I see that it's not as impossible as I thought it was and on the other that it's not necessarily the best approach.
I haven't had the time yet to look at the patches that were submitted (there are now 5 of them) but here are a few ideas: first of all I just realized that sp_executesql just moves the responsability of "compiling" down to the driver; the driver has to parse the query and practically rewrite it.
There are two possible implementations here, none of which sounds particularly promising. First, there is the brute-force approach, which implies that we would have to parse and rewrite each and every submitted query. This is obviously very time-consuming and it defeats the goal of PreparedStatements (i.e. to avoid compilation/parsing of the query at each run). The second approach would be to cache queries, which is very similar to what happens now (caching temporary stored procedures). But this very much brings up to where we left from in the first place. The only advantage would be that we wouldn't need fixed types anymore. But the big problem would be the SQL parsing: what if it's not 100% correct?
Now, looking at the idea in my last posting (dumping and recreating the SP each time one of the parameters doesn't fit), there is a problem with this approach too. Suppose that for a reason or another the user runs the procedure with a large parameter and the procedure is resubmitted to match the new parameter. But what happens is that the parameter's new type is incompatible with the actual type that is needed there. This means that every single execution of the PreparedStatement from then on would return the same conversion error. I don't know if this is a real issue or not, that's (partially) the reason I'm posting it here: maybe one of you has a better view. I just hope it's understandable (i.e. that I'm not just talking out of my a$$ here).
About Brian's proposition to break up Tds into specific implementations, I don't think that's necessary. The reason I think that is that jTDS actually only implements two protocols: TDS 4.2 and TDS 7.0. Tds 8.0 is VERY similar to 7.0 (basically it's two more data types and a version number that the Microsoft driver hangs on to, just to make sure it doesn't work with SQL Server 7.0). Sybase compatibility is only achieved through TDS 4.2, which is common. However, if you want to refactor the current Tds implementation, you are welcome to do it. But then, splitting it in 2 would be more than enough.
Mark, your problem with newline characters seems very weird. I don't think SQL Server is doing that. Someone would have noticed it before and it would have been fixed. Maybe it was a driver issue (jTDS could also have it) but that is fixable.
Benchmarking temporary stored procedures against sp_executesql sounds interesting (and could prove good advertisement for jTDS ;) ) but I don't think that actually shows the complete picture. Even the related things I discussed above don't show the complete picture. But if any of you has some clear numbers showing a comparison between jTDS and other drivers (even if we can't publicize their names) that would be awsome. :)
Whoa, that was a long post. Sorry for taking up so much of your time, please go on with your lives. ;)
Thanks again for the ideas and hope to get even more of them,
Alin.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I guess one reason for seperation of the two protocols might be that it may make it easier for those that only work with one protocol or another to track down and make appropriate fixes. In places where the implmentation is the same, perhaps a base class with devrived variations is a simple way to prevent duplicate code? I'm not claiming to be for or against... just tossing out ideas.
On a more philosophical note... if you have a statement that you are executing over and over again with only the parameter values changing, why are you using client produced statements? This is a perfect situation for a stored procedure (permanent). I guess the cases I have encountered where SQL was being dynamically generated (based on user input and / or variables) and then reused over and over where very very rare and when they did occurr a small degredation in performance was not that large of an issue. I mean are we talking about 200 milliseconds per hundred executions? I guess my point is that if going to another method makes the driver more robust because it doesn't have to worry about the datatype issue and it makes it every so slightly slower, I think it is worth the performance drop; but it all depends on how frequently this functionality is really getting used.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The reason we do not use stored procedures is because our application needs to work with 4 different databases (SQL Server 2000, Oracle 8.1.7.4.0 & 9.2.0.4.0, DB2 8.1 and MySQL). Some databases (such as MySQL) do not support stored procedures, while others (DB2) have poor support (IMHO). I have thought of creating statement proxies that would create database specific stored procedures on the fly the same way that jTDS does. The names and signatures of the procedures could then be written to a table to be loaded upon first connection; when the list is in memory it would be shared statically and a few concurrency issues would need to be addressed for updating the table from different VM's (application servers) but should all be very doable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
When using MSSQL2000 the following situation occurs
on a column set to use datatype ntext:
in method
Tds.executeProcedure(
String procedureName,
ParameterListItem[] formalParameterList,
ParameterListItem[] actualParameterList,
TdsStatement stmt,
SQLWarningChain wChain,
int timeout)
the
formalParameterList[i].formalType contains the value nvarchar allthough in the database it is set correctly to ntext. The problem then is that a limit of 4000 characters is implemented....
Is this a bug?
Hansique,
Yes, it is a bug (IMHO) and I have run into this issue myself. I opened issue 861154 for this (which I have not spent the time to create a proper patch for yet) and patch 861821. If this is a critical issue for you and you are only working in a SQL Server 2000 environment let me know and I can give a copy of my driver which works around the issue.
-Brian
Guys,
The problem is as follows: jTDS uses temporary stored procedures for PreparedStatements. Other drivers, as far as I have seen, are using sp_executesql instead.
The problem with jTDS's approach is that it has to declare the parameters as having one type or another. In the case of String values this is done by checking the length the first time the procedure is executed and declaring the parameter as either VARCHAR or TEXT, based on this. The next time the PreparedStatement is called it will have to use the same type for each parameter. So even if your column in the database is TEXT, if you first used a String shorter than 8000 characters the type will be VARCHAR (or NVARCHAR) so the new String will be cut down to the maximum length of the type.
The sp_executesql approach is much better as the driver doesn't have to declare a stored procedure, so it doesn't have to use fixed types. It simply submits a query with placeholders and values for those placeholders can be of any type. The server then internally identifies if the query was used before and it's already compiled and reuses it. The problem with this approach is a tiny note on the reference page for sp_executesql: "If object names in the statement string are not fully qualified, the execution plan is not reused."
That is why in order to use sp_executesql we would need an SQL parser (which the drivers using it have) in order to rewrite the query. If anyone has anything like that or is interested in writing it, it would really help a lot, as we could also get rid of the procedure cache maintained by jTDS (which could be seen as a big memory leak in its current implementation).
By the way, I just got an idea about how the current implementation could be fixed as I was writing this message. In case we have declared a parameter as VARCHAR, for example, and we get a value which does not fit that type, we could simply drop the stored procedure and declare another. How about that? It seems to be an acceptable idea...
Alin.
Seems reasonable to me. The way I worked around it was by implementing setCharacterStream to always declare the parameter as NTEXT. While not the best solution, it certainly allowed us to work past the problem.
I agree that the sp_execsql (and the variant sp_prepexec) methods are an elegant solution. Just so you know, they appear to be slower than the current jTDS implementation. I have tested the jTDS driver against the Microsoft Free JDBC driver, both of BEAs drivers, and two commercial drivers (which prevent me from naming them based on their licensing) and jTDS is at least 2x faster than even even the fastest driver I tested. Perhaps setting up jTDS to work either way would be beneficial?
On a side note, looking at the code, it would be nice if there were an abstract Tds class that could be extended to Tds42, Tds50, Tds70, Tds80 etc. and perhaps delegates for SQL Server and Sybase. I think this might make it a little easier to make changes without causing adverse side-effects with the other Tds implementations. If this abstraction was made for all the appropriate underlying classes, it would make it much easier to add support for SQL Server 2000 and Yukon (which I may be tasked with adding support for). It may also improve performance slightly since methods can be overloaded for a Tds implementation instead of having checks all over the place to determine the version. Just some thoughts...
-Brian
I've written a SQL parser and a stored procedure parser in the past... I could probably reuse much of the code I already have.
I would throw up a big flag with the nvarchar versus ntext being an automatic determination... that could have some side affects. I have experienced some cases where using nvarchar rather than ntext when the column in the database is defined as a ntext caused CRLF to be stored as only a CR or only a LF (don't remember which at this point). This probably wouldn't be a problem if you are pulling the text out and displaying it in java; but if you were using .NET against the same database you would see blocks in your text rather than 'nextlines'. It might also be a problem if you expect to do a binary compare of that data to something stored in a text file to see if there were changes. What I'm not sure of is whether this problem was caused by the driver I was using at the time or by SQL server. Just thought I'd bring this up because it might be worth checking into.
This is an interesting discussion ... it sounds like sp_executesql is the right way to go but parsing the sql ourselves sounds kind of error-prone. Plus, how do we know how to make the object names fully qualified? Is there some programmatic way to do this?
I like Alin's suggestion for dealing with nvarchar vs. ntext, as long as we can ensure Mark's concern about \r\n turned into \n can be addressed.
FYI, I found and fixed a bug recently where long integers in prepared statements were sometimes being incorrectly treated as regular (32 bit) integers ... another example of where having to make SQL type decisions on the client side bit us, but luckily in this case it was an easy fix.
--matt
It is nice to see this project spring back to life.
I have posted a patch containing my solution to the problem of cached procedure parameter errors. It passes all the test cases OK and probably fixes the problems at the root of bugs: 684567, 732644, 857211, 861154 etc.
It is a pretty generic solution unlike some of the other patches I have seen.
The sp_executesql approach is interesting and would probably solve problems such as that reported in 705749 which is caused by the decimal(28,10) parameter definition in the current procedure logic.
There are a few things that concern me about sp_executesql:
1. Which versions of SQL Server/Sybase support this function and will this harm Jtds backwards compatibility?
2. If the client is parsing the sql and the server is parsing the sql and all of the sql is being sent each time, the only saving is in optimizing the query on the server. The documentation does not give any guarantees about reusing an existing query plan even if the sql is the same.
3. It looks like a lot of work to get right. My vote would go to implementing additional functionality for example the getGeneratedKeys() method.
People seem to be using jtds in preference to the free Microsoft driver even with SQL 2000 because it is faster. I would recommend benchmarking the sp_executesql approach against the stored procedures before embarking on a major rewrite to ensure that this advantage is maintained.
Mike
Wow! This is really impressive. I never imagined there were people interested in more than using jTDS. ;) Thanks a lot, guys.
Getting down to business, thank you for the ideas and oppinions. Before your postings I was sure that sp_executesql was the way to go, but it was never going to be implemented as it required a really big effort. Now, on one hand I see that it's not as impossible as I thought it was and on the other that it's not necessarily the best approach.
I haven't had the time yet to look at the patches that were submitted (there are now 5 of them) but here are a few ideas: first of all I just realized that sp_executesql just moves the responsability of "compiling" down to the driver; the driver has to parse the query and practically rewrite it.
There are two possible implementations here, none of which sounds particularly promising. First, there is the brute-force approach, which implies that we would have to parse and rewrite each and every submitted query. This is obviously very time-consuming and it defeats the goal of PreparedStatements (i.e. to avoid compilation/parsing of the query at each run). The second approach would be to cache queries, which is very similar to what happens now (caching temporary stored procedures). But this very much brings up to where we left from in the first place. The only advantage would be that we wouldn't need fixed types anymore. But the big problem would be the SQL parsing: what if it's not 100% correct?
Now, looking at the idea in my last posting (dumping and recreating the SP each time one of the parameters doesn't fit), there is a problem with this approach too. Suppose that for a reason or another the user runs the procedure with a large parameter and the procedure is resubmitted to match the new parameter. But what happens is that the parameter's new type is incompatible with the actual type that is needed there. This means that every single execution of the PreparedStatement from then on would return the same conversion error. I don't know if this is a real issue or not, that's (partially) the reason I'm posting it here: maybe one of you has a better view. I just hope it's understandable (i.e. that I'm not just talking out of my a$$ here).
About Brian's proposition to break up Tds into specific implementations, I don't think that's necessary. The reason I think that is that jTDS actually only implements two protocols: TDS 4.2 and TDS 7.0. Tds 8.0 is VERY similar to 7.0 (basically it's two more data types and a version number that the Microsoft driver hangs on to, just to make sure it doesn't work with SQL Server 7.0). Sybase compatibility is only achieved through TDS 4.2, which is common. However, if you want to refactor the current Tds implementation, you are welcome to do it. But then, splitting it in 2 would be more than enough.
Mark, your problem with newline characters seems very weird. I don't think SQL Server is doing that. Someone would have noticed it before and it would have been fixed. Maybe it was a driver issue (jTDS could also have it) but that is fixable.
Benchmarking temporary stored procedures against sp_executesql sounds interesting (and could prove good advertisement for jTDS ;) ) but I don't think that actually shows the complete picture. Even the related things I discussed above don't show the complete picture. But if any of you has some clear numbers showing a comparison between jTDS and other drivers (even if we can't publicize their names) that would be awsome. :)
Whoa, that was a long post. Sorry for taking up so much of your time, please go on with your lives. ;)
Thanks again for the ideas and hope to get even more of them,
Alin.
Alin,
I guess one reason for seperation of the two protocols might be that it may make it easier for those that only work with one protocol or another to track down and make appropriate fixes. In places where the implmentation is the same, perhaps a base class with devrived variations is a simple way to prevent duplicate code? I'm not claiming to be for or against... just tossing out ideas.
On a more philosophical note... if you have a statement that you are executing over and over again with only the parameter values changing, why are you using client produced statements? This is a perfect situation for a stored procedure (permanent). I guess the cases I have encountered where SQL was being dynamically generated (based on user input and / or variables) and then reused over and over where very very rare and when they did occurr a small degredation in performance was not that large of an issue. I mean are we talking about 200 milliseconds per hundred executions? I guess my point is that if going to another method makes the driver more robust because it doesn't have to worry about the datatype issue and it makes it every so slightly slower, I think it is worth the performance drop; but it all depends on how frequently this functionality is really getting used.
aussiemm,
The reason we do not use stored procedures is because our application needs to work with 4 different databases (SQL Server 2000, Oracle 8.1.7.4.0 & 9.2.0.4.0, DB2 8.1 and MySQL). Some databases (such as MySQL) do not support stored procedures, while others (DB2) have poor support (IMHO). I have thought of creating statement proxies that would create database specific stored procedures on the fly the same way that jTDS does. The names and signatures of the procedures could then be written to a table to be loaded upon first connection; when the list is in memory it would be shared statically and a few concurrency issues would need to be addressed for updating the table from different VM's (application servers) but should all be very doable.