Thanks Jim. I've had reasonable success getting the float %-formatting cases covered by the new FloatFormatter (to save us two skips). I'll try an IntegerFormatter along the same lines, to fix the oddness I noted, and make the refactoring neat.

I've no intention of replacing the switch here, but I'll look out where I may be better off using one to choose between actions. I understand how it compiles.

I agree non-BMP codepoints would not be an issue unless maybe you put one where a formatting type was expected.

Jeff
Jeff Allen
On 05/05/2014 00:16, Jim Baker wrote:
Jeff,

I'm glad you are looking at this! In looking at StringFormatter, I agree with your refactoring plan. StringFormatter grew by accretion, and at this point it's quite hard to follow.

The current method StringFormatter of using a driving switch statement for its format string interpreter remains the fastest way to dispatch in Java. However, it will be both more inlineable by the JVM and more testable by us if we breaking out each chunk of work into its own small method.

Another thing: although peek/push/pop mechanism in conjunction with index is not codepoint aware and just uses format.charAt, this doesn't matter in the end. This is because the format characters are all in the BMP - and in fact are ASCII. So we can continue to use charAt. However, I would recommend also splitting this functionality into a small helper class, for the above inlining/testing reasons.

- Jim



On Sun, May 4, 2014 at 4:39 PM, Jeff Allen <ja.py@farowl.co.uk> wrote:
I hope to give %-formatting for floats the same treatment I gave
float.__format__ recently.  It has the same difficulty getting exactly
the right digits that I fixed then, and I hope to use the same code.

str.__mod__, which does the work, has to understand all the format codes
and types, and does it in one highly-integrated piece of code. The
parsing, conversion and padding are all wrapped up together for all
types. While replacing just the float part should be possible, reworking
the integer parts in the same framework is attractive, and would be more
compact in the long run.

I've been working through this code, which is quite old (<2006), and
adding comments. Although it closely mirrors CPython 2.7, it is tortured
stuff.  For example, in calculating "%#8.5X" % 429L, the code gets the
right answer ' 0X001AD' but it discards and adds the 0X prefix three
times over, and it involves two StringBuilders along the way.

It also exhibits an odd divergence from CPython, in calling __hex__,
__oct__ and __str__ as it does. It is upset by redefinition of __hex__
in a way CPython is not:
 >>> class mylong(long) :
...     def __hex__(self) : return "(16):"+long.__hex__(self)[2:-1]
...
 >>> nn = mylong(429)
 >>> hex(nn)
'(16):1ad'
 >>> "%#12.8X" % nn
'  0X006):1AD'

This is because formatting long assumes __hex__ will return what it does
in the base class. Maybe I deserve what I get here too much to call it a
bug, but this is another reason I think there's some mileage in a wider
rationalisation.

Jeff