Menu

Unable to insert Arabic data

Help
tnabil
2006-09-06
2012-08-15
  • tnabil

    tnabil - 2006-09-06

    Hi,

    I'm using jTDS to access a SQL server 2000 database, as part of a migration project from plain old ASP and VB COM objects to Java EE.

    I'm also using iBatis for database access.

    I have a problem inserting Arabic values in the database, when the data type of the field is varchar.

    I don't have a problem with nvarchar fields.

    Previously, I had a problem with both read and update, but adding the "charset=Cp1256" property to the URL fixed the read problem, unfortunately, it didn't work for the insert/update.

    For that particular case, I switched to plain JDBC in order to debug the problem, and I found a solution, albeit a very awkward one. The solution will only work with plain JDBC (not iBatis), so I'm not inclined to use it.

    The solution I found was to set the value using setBinaryStream instead of setString. I'm encoding the String using Cp1256 to a byte array and then using a ByteArrayInputStream to set the value on the CallableStatement.

    I would very much appreciate if someone can guide me on how to fix this problem without those kinds of tricks, so I can revert back to iBatis.

    Please note that the ASP code works with the same database.

    Here's the code I used to fix the problem (I'm using a prepared statement here to simplify things a bit as the original procedure has 46 parameters):

        String update =
        "update tmp_company_request set company_name_ar = ? where registration_id = ?";
    
        ps = con.prepareStatement(update);
        String value = "اختبار آخر للغة العربية"; // Those are Arabic characters
        InputStream is = Utilities.getStringAsByteStream(value);
        ps.setBinaryStream(1, is, is.available());
    
        ps.setInt(2, 703);
        ps.execute();
    

    The utility method:

    public static ByteArrayInputStream getStringAsByteStream(String value) {
    try {
        byte[] valueBytes;
        if (value == null)
        valueBytes = new byte[0];
        else
        valueBytes = value.getBytes(Constants.DB_CHARSET); // Cp1256
    
        return new ByteArrayInputStream(valueBytes);        
    } catch (UnsupportedEncodingException e) {
        throw new SystemException(new StringBuffer(
        "An error occurred while attempting to use the ")
        .append(Constants.DB_CHARSET)
        .append(" character encoding").toString(), e);
    }
    }
    

    The database server configuration as provided by the DBA is as follows:

    Ø Server default collation Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

    Ø select @@language

                     us_english
    

    Ø SQL Server 2000 supports these code pages.

    Code page Description

    1258 Vietnamese

    1257 Baltic

    1256 Arabic <-- Relevant

    1255 Hebrew

    And so on...

    Every help is appreciated.

     
    • Alin Sinpalean

      Alin Sinpalean - 2006-09-06

      Why do you expect storing and retrieving arabic characters (Cp1256) in VARCHAR columns in a Latin-1 (aka Cp1252) database to work?

      Either use NVARCHAR or create the database/relevant column(s) with a Cp1256 charset.

      BTW the "read works, update/insert doesn't" part, is not very helpful. It should work, but there are tens of reasons why it might not, so unless you provide some code, database schema and description of what happens I can't really help. And -- this is quite important -- even if this gets fixed and starts working for you, it's still a hack and switching to another JDBC driver or DB access library will most likely break all your existing data. So do it right and take my first suggestion.

      Alin.

       
      • tnabil

        tnabil - 2006-09-06

        Alin,

        Thanks a lot for your help.

        Had this not been a migration project, then I would have taken your first suggestion right away. But, it's impossible to mess with those fields, there are multiple existing applications that are using them, and changing the fields (even if the data could be migrated) will mandate change to all of them.

        I can supply the code of a test case, and here it is

        [code]
        Connection con = null;
        PreparedStatement ps = null;

        try {
            con = ds.getConnection();
            con.setAutoCommit(false);
        
            String update =
            &quot;update tmp_company_request set company_name_ar = ? where registration_id = ?&quot;;
        
            ps = con.prepareStatement(update);
            ps.setString(1, &quot;طارق نبيل&quot;);
            ps.setInt(2, 703);
            ps.execute();
            con.commit();
        
        } finally {
            if (ps != null) {
            try {
                ps.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
            }
        
            if (con != null) {
            try {
                con.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
            }
        
        }
        

        [/code]

        This code causes question marks to be inserted in the field.

        As for the table definition, here it is (the column of interest is company_name_ar)

        CREATE TABLE [tmp_company_request] (
        [registration_id] numeric IDENTITY (1, 1) NOT NULL ,
        [registration_date] [smalldatetime] NOT NULL ,
        [registration_type] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
        [company_type] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
        [company_code] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [company_name] [varchar] (35) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
        [company_name_ar] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [business_name] [varchar] (80) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        [po_boxno] [varchar] (6) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
        .
        .
        Some more column definitions
        .
        .
        [receipt_amount] numeric NULL ,
        [receipt_location] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
        CONSTRAINT [PK_tmp_company_request] PRIMARY KEY CLUSTERED
        (
        [registration_id]
        ) ON [PRIMARY] ,
        CONSTRAINT [company_type_CK] CHECK ([company_type] = 'I' or [company_type] = 'F' or [company_type] = 'C' or [company_type] = 'S'),
        CONSTRAINT [registration_status_CK] CHECK ([registration_status] = 'P' or [registration_status] = 'A' or [registration_status] = 'R'),
        CONSTRAINT [registration_type_CK] CHECK ([registration_type] = 'N' or [registration_type] = 'M' or [registration_type] = 'A')
        ) ON [PRIMARY]
        GO

        If there's more information that can help solve the problem, please tell me.

         

        Related

        Code: code


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.