From: SourceForge.net <no...@so...> - 2007-05-01 09:40:54
|
Bugs item #1663711, was opened at 2007-02-19 13:44 Message generated for change (Comment added) made by cgroves You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1663711&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: None Status: Open Resolution: None Priority: 3 Private: No Submitted By: craig (codecraig) Assigned to: Nobody/Anonymous (nobody) Summary: 32767 characters is max string size Initial Comment: Jython can't handle strings over 32767 characters.... StringBuffer sb = new StringBuffer(); for (int i = 0; i < 32768; i++) { sb.append("a"); } PythonInterpreter pi = new PthonInterpreter(); pi.exec("data = {}"); String x = "data[\"stuff\"] = {\"val\" : \"" + sb + \"}"; pi.exec(x); When the line, "pi.exec(x)" is execute the following exception occurs: Exception in thread "main" Traceback (innermost last): (no code object) at line 0 SyntaxError: ('string constant too large (more than 32767 characters)', ('<string>', 1, 23, '')) Can this be fixed? ---------------------------------------------------------------------- >Comment By: Charles Groves (cgroves) Date: 2007-05-01 04:40 Message: Logged In: YES user_id=1174327 Originator: NO Just to be clear, Jython can handle any string that fits in memory. It can't handle a string *literal* longer than 32767 characters. Because you're putting the string directly into the exec'd code it's turned into a literal and compiled into bytecode. If you set a PyString in the interpreter, this will work fine. StringBuffer sb = new StringBuffer(); for (int i = 0; i < 32768; i++) { sb.append('a'); } PythonInterpreter pi = new PythonInterpreter(); pi.exec("data = {}"); pi.set("s", new PyString(sb.toString())); pi.exec("data[\"stuff\"] = {\"val\" : s}"); pi.exec("print len(s)"); That prints '32768'. ---------------------------------------------------------------------- Comment By: craig (codecraig) Date: 2007-02-19 20:48 Message: Logged In: YES user_id=1258995 Originator: YES Currently I have to do my own management for this problem, where I check the length of any string before putting it into Python and splitting into pieces smaller 32767 characters. guess that'll do for now :) ---------------------------------------------------------------------- Comment By: Khalid Zuberi (kzuberi) Date: 2007-02-19 19:23 Message: Logged In: YES user_id=18288 Originator: NO To clarify your description, its a limit of the size of string constants in the source and not a limit to the size of strings handled by the program (i think that's what you mean anyway). Looking in the source history, it seems to have been introduced with this ancient checkin: http://jython.svn.sourceforge.net/viewvc/jython?view=rev&revision=131 But the bug number mentioned there refers to a system that predated our use of the sourceforge trackers (a jitterbug instance?), and i've not been able to dig up the actual bug report. Experimenting with that limit removed in CodeCompiler.java using a little one-liner like: exec('x="%s"' % ('1' * 65536 )) shows an underlying problem. The relavant bit of stacktrace is: java.io.UTFDataFormatException: encoded string too long: 65536 bytes at java.io.DataOutputStream.writeUTF(DataOutputStream.java:347) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306) at org.python.compiler.ConstantPool.UTF8(ConstantPool.java:88) at org.python.compiler.ConstantPool.String(ConstantPool.java:188) So i think what's happening here is that the string constants that appear in the source are stored in the java class's constant pool, but that the max size allowed there and allowed by writeUTF() is 64k bytes. Here's an old reference to this limit: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4071592 Notice that the check in CodeCompiler.java is actually comparing the number of (presumably 16-bit encoded) characters in the string to this 32767 limit and not the length of its encoding in UTF-8. So its possible that we are actually disallowing string constants that would actually fit, say in the case of the plain old ascii subset that is represented by 1-byte chars in UTF-8. Anyhow, if you can control your input, you may be able to work around this by transforming your large string constants into smaller constants concatenated at runtime. It would be interesting to see if a similar transformation were possible to do automagically within jython, but i wouldn't expect it for the upcoming release. Lowering priority and removing assignment to next beta. - kz ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1663711&group_id=12867 |