Incorrect charset determination w/ MS SQL 2K

John Craig
  • John Craig
    John Craig

    We've discovered an annoying circumstance: apparently, w/ MS SQL 2K, if you
    have a different collation/default charset in master vs. a user DB, when you
    connect to the user DB via jTDS, you end up with the
    net.sourceforge.jtds.jdbc.ConnectionJDBC3 instance thinking the
    collation/charset is that of the master DB, rather than the user DB you
    connected to. The result being that you get back bad String values for char
    data that contains byte values > 127. To put this another way, MS SQL 2K (at
    least--haven't tested against other versions) sends a TDS_ENV_CHARSET token in
    response to the login packet. But, the token identifies the charset of the
    master DB (in this case iso_1), not the DB you're going to actually connect to
    (TDS 7.0 includes the DB in the login packet). It never sends another
    TDS_ENV_CHARSET token with the right charset info for the actual DB you're
    connected to.

    I tried coding around this by doing an explicit USE <dbname> call, but even if
    you change DB context later, it does not send a TDS_ENV_CHARSET token along
    with acknowledging the change in DB context.

    My programs call this query to find the charset of the current DB, but unless
    the master DB's charset matches that of the user DB, I end up with a different
    answer than the ConnectionJDBC3 instance has determined:

                                     , DATABASEPROPERTYEX( DB_NAME(), 'Collation') ) , 'CodePage' )

    This may be a fairly rare circumstance (we have run for years without seeing
    it, but it sure bit us when we encountered it). Does anyone know if this is
    particular to MS SQL 2K? And, whether it is or is not, it seems to me we'll
    have to add code to our jTDS implementations to do the above query (in the
    ConnectionJDBC2 constructor after the login, but before it returns the
    Connection instance)--which is kind of clunky, but I don't know how else to
    force MS SQL to give up the right data. If this is also a characteristic of
    later MS SQL versions, then probably something needs to happen in the trunk
    itself. (Can't you have separate collations right on the column level in
    recent versions? Does jTDS handle that at all?)

    Thanks for any suggestions you can offer.


    I'd be interested to hear other suggestions, if anyone has them. In
    particular, is there a way to poke MS SQL so it will give you a
    TDS_ENV_CHARSET packet?


  • John Craig
    John Craig

    Well, we ultimately reimplemented ConnectionJDBC2.determineServerCharset() as

    private String determineServerCharset() throws SQLException {

    String queryStr = null;

    switch (serverType) {

    case Driver.SQLSERVER:

    if (databaseProductVersion.indexOf("6.5") >= 0) {


    } else {

    // not reliably returning charset when environment changed; explicitly ask

    queryStr = "select COLLATIONPROPERTY( CONVERT( sysname"

    • " , DATABASEPROPERTYEX( DB_NAME(), 'Collation') )"
    • " , 'CodePage' )";


    case Driver.SYBASE:

    // There's no need to check for versions here




    Statement stmt = this.createStatement();

    ResultSet rs = stmt.executeQuery(queryStr);;

    String charset = rs.getString(1);

    if ( ! charset.toUpperCase().startsWith( "CP" ) ) {

    charset = "CP" + charset;




    return charset;

    // determineServerCharset()


    And modified this logic from the ConnectionJDBC2 constructor so that it always
    checks for the charset rather than trying to figure out when it needs to:

    // If charset is still unknown and the collation is not set either,

    // determine the charset by querying (we're using Sybase or SQL Server

    // 6.5)

    // if ((serverCharset == null || serverCharset.length() == 0)

    // && collation == null) {

    // reget the charset info now that connected to right DB


    // }

    I'd like to see this added to trunk, but maybe this is not a problem w/ later
    versions of MS SQL Server and the whole thing is probably a lot bigger issue
    with column-wise collation settings anyway so maybe it does not matter much.