Bugs item #1709162, was opened at 2007-04-28 05:58
Message generated for change (Comment added) made by fwierzbicki
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1709162&group_id=12867
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
Status: Closed
Resolution: Wont Fix
Priority: 5
Private: No
Submitted By: Dennis Ushakov (dushakov)
Assigned to: Nobody/Anonymous (nobody)
Summary: String performance problem
Initial Comment:
Adding string to another like this
for line in lines:
xml = xml + line
has extremely low performance in comparison with CPython
Simple testcase is attached.
generator.py generates file of given size (first argument) with approximate line count (second argument)
test.py reads file from generated file, collecting all lines in one, and prints time of this operation
For input:
python generator.py 21 90000
(that is 2M file with approx. 90K lines)
python takes 1882 ms
jython fails to finish in more than 10 minutes
----------------------------------------------------------------------
>Comment By: Frank Wierzbicki (fwierzbicki)
Date: 2007-04-28 09:22
Message:
Logged In: YES
user_id=193969
Originator: NO
This discussion is getting long enough that we should probably move any
further discussion to jython-dev so others can benefit from any decisions
we make. iceslice, any patch would need to address the issues that
pedronis brings up.
----------------------------------------------------------------------
Comment By: Frank Wierzbicki (fwierzbicki)
Date: 2007-04-28 09:17
Message:
Logged In: YES
user_id=193969
Originator: NO
iceslice,
We will be happy to review a patch if you wish to submit one based on your
proposal. Because we are getting so close to 2.2's release, it stands a
good chance of missing this release. If StringBuilder is used it will
definitely need to wait for the next release, since StringBuilder is a JDK
5 feature.
----------------------------------------------------------------------
Comment By: Samuele Pedroni (pedronis)
Date: 2007-04-28 09:14
Message:
Logged In: YES
user_id=61408
Originator: NO
there are synchronisation issues, also what is the cost of charAt or
indexOf for a StringBuilder etc
we are disagreeing of what counts as reasonably simple, in particular I'm
very much not to eager to add more synchronisation code.
----------------------------------------------------------------------
Comment By: Sergey Salishev (iceslice)
Date: 2007-04-28 09:07
Message:
Logged In: YES
user_id=1608697
Originator: NO
Of course the 'join' workaround is more correct in this particular case.
But this inefficiency can affect other existing code which runs well on
CPython. So in my opinion it should be considered as bug and not as
implementation detail.
Actually the reference counting isn't needed as it's done by Java itself.
What's needed optimize append operations:
1. Separate PyString and string storage. The storage will keep the char
vector and PyString will be wrapper keeping (storage, start, length). The
PyString will be still immutable while the storage can be mutable.
2. Use StringBuilder as the storage. This way append to the end of storage
would be very cheap.
3. Copy on Write the storage when need to modify the internal characters.
These changes require approximately <100 lines of code and are localized
in PyString.
As this doesn't add an additional indirection compared to current
implementation probably the performance of other string uses will not be
affected.
----------------------------------------------------------------------
Comment By: Samuele Pedroni (pedronis)
Date: 2007-04-28 07:55
Message:
Logged In: YES
user_id=61408
Originator: NO
that code should be written as:
xml = ''.join(lines)
this is quality implementation issue. CPython uses ref counting based
optimisations to avoid the quadratic perfomance,
is nevertheless the case that the ''.join idiom is the right way to
concatenate many strings that works across Python implementations.
Given that Jython doesn't uses ref counting is unlikely that somethigng
reasonably simple can be done to change this.
Java simple switching to use StringBuilder is possible because types are
known at compile time.
----------------------------------------------------------------------
Comment By: Dennis Ushakov (dushakov)
Date: 2007-04-28 06:12
Message:
Logged In: YES
user_id=1736034
Originator: YES
This happens because on every addition a new Java String is created.
Something more like StringBuilder should be used for PyString.
----------------------------------------------------------------------
Comment By: Dennis Ushakov (dushakov)
Date: 2007-04-28 06:02
Message:
Logged In: YES
user_id=1736034
Originator: YES
This happens because on every addition a new Java String is created.
Something more like StringBuilder should be used for PyString.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1709162&group_id=12867
|