From: Colin P. A. <co...@co...> - 2007-01-27 08:38:06
|
Under what circumstances is it necessary to call rescale (0, shared_context) on an MA_DECIMAL representing a zero. I ask because the following program: indexing description: "Re-usable test root" class TEST create make feature make is local i: INTEGER l_sum, l_index: MA_DECIMAL do from i := 1 create l_sum.make_zero until i > 1_000_000 loop create l_index.make_from_integer (i) l_sum := l_sum + l_index i := i + 1 end print (l_sum.to_scientific_string + "%N") end end Prints 3.90000975E+11 rather than the correct value of 500000500000, so I am wondering if I need to rescale at some point. But in XM_XPATH_INTEGER_VALUE, the only points at which I am calling rescale are: 1) after creation from another MA_DECIMAL 2) after a divide_integer operation, and 3) when deliberatly changing the scale, and omitting rescale was the only thing I can think of which might be causing the problem. Incidentally, this test arose from a problem using gexslt on a test program in the FXSL library (functional programming using xslt). The author reported gexslt as not running in constant space. I have fixed this problem (I'm about to make my first check-ins to subversion - i.e. the fix is not in Gobo 3.5), but gexslt still runs something like the order of 60 times slower than Saxon. At Franck's suggestion, I reproduced the algorithm in pure Eiffel, using first INTEGER_64, and then MA_DECIMAL. The runtime for INTEGER_64 was 3 milli-seconds, whereas for MA_DECIMAL, it is over 6 seconds. This satisfactorily explains the slow running time of gexslt (at present I represent all xs:integers using MA_DECIMAL), so my proposed fix is to introduce a seconf class, XM_XPATH_SMALL_INTEGER_VALUE, which uses INTEGER_64 if the xs:integer is small enough, and check for possible overflow/underflow before attempting any arithmetic operations. If the check indicates it is not safe, then and only then revert to using decimal arithmetic. This should eliminate the vast majority of cases where using xs:integer results in slow speed. So I need permission to start using INTEGER_64. I guess we will need a few infrastructure changes first (like maximum and minimum platfrom integer_64 attributes). -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2007-01-28 15:58:53
|
Colin Paul Adams wrote: > So I need permission to start using INTEGER_64. OK. If I remember correctly this will rule out VE, but it should be OK with the other compilers. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2007-02-02 09:05:03
|
>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: Eric> Colin Paul Adams wrote: >>> So I need permission to start using INTEGER_64. Eric> OK. If I remember correctly this will rule out VE, but it Eric> should be OK with the other compilers. Colin> SE fails to compile it as I am Colin> using {STRING}.is_integer_64, which SE 1.2r7 doesn't Colin> have. As at this point in the code I already know that the Colin> string consists of decimal digits only, I could just check Colin> the count, and err on the safe side. But I think we will Colin> need a STRING_.is_integer_64 routine anyway, assuming we Colin> are going to stay with SE for a while. I've coded STRING_.is_integer_64 and (but not for VE) STRING_.to_integer_64. I'm not sure what to do about writing tests. Test_is_integer_64 is no problem. But test_to_integer_64 will fail to compile for VE, and kl_test_string_routines.e is not a ge class. Perhaps a separate class? Also, I should really but in a conditional in the .eant files for the xpath and xslt tests, so as not to run them for VE. Or do we just drop VE completely now? -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2007-02-02 16:49:42
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> We drop VE completely. If it's not for INTEGER_64, it will Eric> be for Agents/Tuples. So let's drop it now. OK. There appears to be a problem with gec. I have the following tests, which se and ise (5.7) pass: test_is_integer_64 is -- Test feature `is_integer_64'. local uc_string: UC_UTF8_STRING do assert ("is_integer_64_1", STRING_.is_integer_64 ("1234")) assert ("is_integer_64_2", STRING_.is_integer_64 ("00078")) create uc_string.make_from_string ("4534") assert ("is_integer_64_3", STRING_.is_integer_64 (uc_string)) assert ("is_integer_64_4", STRING_.is_integer_64 ("9223372036854775807")) assert ("is_integer_64_5", STRING_.is_integer_64 ("00000000009223372036854775807")) assert ("not_is_integer_64_1", not STRING_.is_integer_64 ("9223372036854775808")) assert ("not_is_integer_64_2", not STRING_.is_integer_64 ("10223372036854775807")) assert ("not_is_integer_64_1", not STRING_.is_integer_64 ("00019223372136854775807")) end test_to_integer_64 is -- Test feature `to_integer_64'. local uc_string: UC_UTF8_STRING do assert_equal ("to_integer_64_1", (1234).to_integer_64, STRING_.to_integer_64 ("1234")) assert_equal ("to_integer_64_2", (78).to_integer_64, STRING_.to_integer_64 ("00078")) create uc_string.make_from_string ("4534") assert_equal ("to_integer_64_3", (4534).to_integer_64, STRING_.to_integer_64 (uc_string)) assert_equal ("to_integer_64_4", (9223372036854775807).to_integer_64, STRING_.to_integer_64 ("9223372036854775807")) assert_equal ("to_integer_64_5", (9223372036854775807).to_integer_64, STRING_.to_integer_64 ("00000000009223372036854775807")) end Gec fails one: Test Results: FAIL: [KL_TEST_STRING_ROUTINES.test_to_integer_64] to_integer_64_4 expected: -1 but got: 9223372036854775807 so it looks like GEC's built-in {INTEGER}.to_integer_64 is off by one on the edge case. -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2007-02-03 14:56:43
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I had a look at the routines `is_integer_64' and Eric> `to_integer_64' in class KL_STRING_ROUTINES. And there are Eric> two things that I don't like. The first thing is that they Eric> create too many temporary objects (calling `substring', Eric> creating manifest arrays). Yes. I wasn't happy with that myself. Eric> The second thing that I don't Eric> like is the precondition "is_integer". I don't think that it Eric> is a good idea to have to write: Eric> is_integer (s) and then is_integer_64 (s) Well I thought about that. And I thought about the use case I have, which is parsing numbers. The tokenizer has already created tokens one of which is certified to consist entirely of decimal digits (there may be a preceding token consisting of a minus sign). In this use case the pre-condition "is_integer" is guarenteed true. So I tried to think about other use cases for "is_integer_64". All I could think of was parsing, and I presumed therefore that the pre-condition would always be satisfied. -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2007-02-10 10:29:08
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> Colin Paul Adams wrote: >>>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> I had a look at the routines `is_integer_64' and Eric> `to_integer_64' in class KL_STRING_ROUTINES. And there are Eric> two things that I don't like. The first thing is that they Eric> create too many temporary objects (calling `substring', Eric> creating manifest arrays). >> Yes. I wasn't happy with that myself. Eric> The second thing that I don't like is the precondition Eric> "is_integer". I don't think that it is a good idea to have Eric> to write: is_integer (s) and then is_integer_64 (s) >> Well I thought about that. And I thought about the use case I >> have, which is parsing numbers. The tokenizer has already >> created tokens one of which is certified to consist entirely of >> decimal digits (there may be a preceding token consisting of a >> minus sign). In this use case the pre-condition "is_integer" >> is guarenteed true. So I tried to think about other use cases >> for "is_integer_64". All I could think of was parsing, and I >> presumed therefore that the pre-condition would always be >> satisfied. Eric> I think that this assumption is too strong for a general Eric> purpose library class like KL_STRING_ROUTINE. It would have Eric> been OK if this routine was part of your parsing class, but Eric> here with KL_STRING_ROUTINE we cannot make such an Eric> assumption. Then the question is under what circumstances should `is_integer_64' return True? One answer that is inconsistent with your desired post-condition would be an Eiffel INTEGER_64 literal (as it might contain underscores). Likewise it can't contain a leading negative sign. I still don't see any other uses for it. In any case, I don't want to be using a routine that is doing the additional checks, so I think it might be best if I simply remove these routines, and add my own local routines to the XPath parser. -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2007-02-10 12:19:43
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> Colin Paul Adams wrote: >> Then the question is under what circumstances should >> `is_integer_64' return True? One answer that is inconsistent >> with your desired post-condition would be an Eiffel INTEGER_64 >> literal (as it might contain underscores). Likewise it can't >> contain a leading negative sign. Eric> That's true I forgot about that. I have taken that as expressing some more sympathy with my original usage presumption. Therefore I have updated the tests and routines (which should now be more efficient). All tests pass with all three compilers (but I didn't try with assertions on for SE, as I'm getting too short of remaining years). -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2007-02-10 12:37:59
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> Colin Paul Adams wrote: > >> Then the question is under what circumstances should > >> `is_integer_64' return True? One answer that is inconsistent > >> with your desired post-condition would be an Eiffel INTEGER_64 > >> literal (as it might contain underscores). Likewise it can't > >> contain a leading negative sign. > > Eric> That's true I forgot about that. > > I have taken that as expressing some more sympathy with my original > usage presumption. Well, I still believe that in general purpose class like KL_STRING_ROUTINES there should not be any assumption. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2007-02-23 07:51:12
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> Colin Paul Adams wrote: >> And I thought about the use case I have, which is parsing >> numbers. The tokenizer has already created tokens one of which >> is certified to consist entirely of decimal digits (there may >> be a preceding token consisting of a minus sign). In this use >> case the pre-condition "is_integer" is guarenteed true. So I >> tried to think about other use cases for "is_integer_64". All I >> could think of was parsing, and I presumed therefore that the >> pre-condition would always be satisfied. Eric> I still believe that a routine in KL_STRING_ROUTINES should Eric> not make such assumption as to whether the string comes from Eric> a parser or not. Actually, the assumption is not that it ALWAYS comes from a parser, but that it usually does. In the other cases, the client can always call: if is_integer and then is_integer_64 then Of course you didn't like this. It wouldn't like quite so bad if it were written as: if is_decimal_digits and then is_integer_64 then and is_decimal_digits is certainly a better name than is_integer, because the latter suggests (to me) a necessary and sufficient condition for converting to INTEGER, which it isn't. But I'm guessing you won't like it any better. Eric> if my_string.count < 19 then use INTEGER_64 else use Eric> MA_DECIMAL end I had thought of that, and indeed it may be the better practical solution (this isn't clear - runtime performance is far more important than XPath parsing time on the whole, but the extra times that MA_DECIMAL is needed are unlikely to occur often). In any case, an is_integer_64 routine is needed as a pre-condition for to_integer_64. And we first need to decide what it should stand for. The most intuitive meaning is that is_integer_64 implies that the contents of the string concerned matches the lexical pattern of an Eiffel INTEGER_64 literal. In which case my initial implementation of to_integer_64 is insufficient. But then we have an inconsistent story with is_integer. Can we rename this to is_decimal_digits? This will break existing clients, so we can only do that if the next Gobo release is numbered 4.0 (there is a precedent in that the upgrade from 2.0 to 3.0 broke some existing clients). -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2007-02-24 09:18:32
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> For consistency with Eric> `is_hexadecimal' I would name the old feature `is_decimal' Eric> (rather than `is_decimal_digits'). OK. And perhaps the header comment should explain that it returns True only if every character is a decimal digit from the ASCII block of Unicode (there are other sets of decimal digits). Eric> what is in ELKS for `is_integer' to see how we can apply it Eric> to `is_integer_64'. As far as I can see, none of those Eric> accepts underscores (unlike Eiffel INTEGER_64 literals). What about leading +/-? >> But then we have an inconsistent story with is_integer. Can we >> rename this to is_decimal_digits? This will break existing >> clients, so we can only do that if the next Gobo release is >> numbered 4.0 (there is a precedent in that the upgrade from 2.0 >> to 3.0 broke some existing clients). Eric> Do we need to break code now, or go to the "obsolete" stage Eric> before? I guess we can define is_decimal as a duplicate of is_integer for now, and mark is_integer as obsolete for the next release. -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2007-02-24 16:22:13
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> For consistency with > Eric> `is_hexadecimal' I would name the old feature `is_decimal' > Eric> (rather than `is_decimal_digits'). > > OK. > And perhaps the header comment should explain that it returns True > only if every character is a decimal digit from the ASCII block of > Unicode (there are other sets of decimal digits). OK. That's what the ELKS spec for `is_integer' does: http://www.gobosoft.com/eiffel/nice/elks01/string.html#is_integer I wonder whether ISE (and hence FreeELKS) accepts leading and trailing spaces. > Eric> what is in ELKS for `is_integer' to see how we can apply it > Eric> to `is_integer_64'. As far as I can see, none of those > Eric> accepts underscores (unlike Eiffel INTEGER_64 literals). > > What about leading +/-? They're accepted. > >> But then we have an inconsistent story with is_integer. Can we > >> rename this to is_decimal_digits? This will break existing > >> clients, so we can only do that if the next Gobo release is > >> numbered 4.0 (there is a precedent in that the upgrade from 2.0 > >> to 3.0 broke some existing clients). > > Eric> Do we need to break code now, or go to the "obsolete" stage > Eric> before? > > I guess we can define is_decimal as a duplicate of is_integer for now, > and mark is_integer as obsolete for the next release. Yes. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2007-03-02 17:49:29
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> I wonder whether ISE (and hence FreeELKS) accepts leading > Eric> and trailing spaces. > > I've assumed not (this needs checking, but do we need to be > consistent?). We need to be consistent if we want to call the compiler version (which might be more optimized than what we can do) when available. And having {STRING}.to_integer_64 (when available) and {KL_STRING_ROUTINES}.to_integer_64 with different behaviors is just asking for troubles. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Paul G. C. <pau...@sc...> - 2007-01-28 17:04:06
|
Hello, Colin Paul Adams wrote: > Under what circumstances is it necessary to call rescale (0, > shared_context) on an MA_DECIMAL representing a zero. > > <snip/> > Prints 3.90000975E+11 > rather than the correct value of 500000500000, so I am wondering if I > need to rescale at some point. This is not a question of rescale. This is a question of setting the correct number of digits desired in the shared decimal context. By default, it's 9. If you set the digits to 20, you'll get the correct answer and it will go much faster because there won't be any overflow handling. l_sum.shared_decimal_context.set_digits (20) Time on my computer: 9 digits : 5.828 seconds 20 digits : 2.953 seconds Hope this helps, Paul G. Crismer |
From: Colin P. A. <co...@co...> - 2007-01-28 17:22:00
|
>>>>> "Paul" == Paul G Crismer <pau...@sc...> writes: Paul> This is not a question of rescale. Paul> This is a question of setting the correct number of digits Paul> desired in the shared decimal context. Of course! I should have remembered. Paul> By default, it's 9. If you set the digits to 20, you'll get Paul> the correct answer and it will go much faster because there Paul> won't be any overflow handling. Paul> l_sum.shared_decimal_context.set_digits (20) Paul> Time on my computer: 9 digits : 5.828 seconds 20 digits : Paul> 2.953 seconds Mine improved from 6.3 to 4.4. Thanks. -- Colin Adams Preston Lancashire |
From: Colin P. A. <co...@co...> - 2007-01-28 21:33:24
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> Colin Paul Adams wrote: >> So I need permission to start using INTEGER_64. Eric> OK. If I remember correctly this will rule out VE, but it Eric> should be OK with the other compilers. I have it mostly working now with ISE and GEC. It gives a 30% speed-up on the original problem, which makes it worth while. I've not checked anything in yet as: 1) There are still a few tests failing, and 2) SE fails to compile it as I am using {STRING}.is_integer_64, which SE 1.2r7 doesn't have. As at this point in the code I already know that the string consists of decimal digits only, I could just check the count, and err on the safe side. But I think we will need a STRING_.is_integer_64 routine anyway, assuming we are going to stay with SE for a while. -- Colin Adams Preston Lancashire |
From: Berend de B. <be...@po...> - 2007-01-29 00:45:35
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>>>> "Colin" == Colin Paul Adams <co...@co...> writes: Colin> the count, and err on the safe side. But I think we will Colin> need a STRING_.is_integer_64 routine anyway, assuming we Colin> are going to stay with SE for a while. Yes, as far as possible for now please. - -- Live long and prosper, Berend de Boer PS: This email has been digitally signed if you wonder what the strange characters are that your outdated email client displays. PGP public key: http://www.pobox.com/~berend/berend-public-key.txt -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/> iD8DBQFFvUOpIyuuaiRyjTYRArJkAKC6SxRP2lLJwvaarUIphlzYcwWtfwCfXeeN KE0ygFfNbF3M4JC15NSz1xM= =hAnk -----END PGP SIGNATURE----- |
From: Eric B. <er...@go...> - 2007-02-02 13:21:20
|
Colin Paul Adams wrote: >>>>>> "Colin" == Colin Paul Adams <co...@co...> writes: > > Eric> Colin Paul Adams wrote: > >>> So I need permission to start using INTEGER_64. > > Eric> OK. If I remember correctly this will rule out VE, but it > Eric> should be OK with the other compilers. > > Colin> SE fails to compile it as I am > Colin> using {STRING}.is_integer_64, which SE 1.2r7 doesn't > Colin> have. As at this point in the code I already know that the > Colin> string consists of decimal digits only, I could just check > Colin> the count, and err on the safe side. But I think we will > Colin> need a STRING_.is_integer_64 routine anyway, assuming we > Colin> are going to stay with SE for a while. > > I've coded STRING_.is_integer_64 and (but not for VE) > STRING_.to_integer_64. > > I'm not sure what to do about writing tests. > > Test_is_integer_64 is no problem. > > But test_to_integer_64 will fail to compile for VE, and > kl_test_string_routines.e is not a ge class. Perhaps a separate class? > > Also, I should really but in a conditional in the .eant files for the > xpath and xslt tests, so as not to run them for VE. > > Or do we just drop VE completely now? We drop VE completely. If it's not for INTEGER_64, it will be for Agents/Tuples. So let's drop it now. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2007-02-03 12:40:38
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> We drop VE completely. If it's not for INTEGER_64, it will > Eric> be for Agents/Tuples. So let's drop it now. > > OK. > > There appears to be a problem with gec. > > I have the following tests, which se and ise (5.7) pass: > > test_is_integer_64 is > -- Test feature `is_integer_64'. > local > uc_string: UC_UTF8_STRING > do > assert ("is_integer_64_1", STRING_.is_integer_64 ("1234")) > assert ("is_integer_64_2", STRING_.is_integer_64 ("00078")) > create uc_string.make_from_string ("4534") > assert ("is_integer_64_3", STRING_.is_integer_64 (uc_string)) > assert ("is_integer_64_4", STRING_.is_integer_64 ("9223372036854775807")) > assert ("is_integer_64_5", STRING_.is_integer_64 ("00000000009223372036854775807")) > assert ("not_is_integer_64_1", not STRING_.is_integer_64 ("9223372036854775808")) > assert ("not_is_integer_64_2", not STRING_.is_integer_64 ("10223372036854775807")) > assert ("not_is_integer_64_1", not STRING_.is_integer_64 ("00019223372136854775807")) > end > > test_to_integer_64 is > -- Test feature `to_integer_64'. > local > uc_string: UC_UTF8_STRING > do > assert_equal ("to_integer_64_1", (1234).to_integer_64, STRING_.to_integer_64 ("1234")) > assert_equal ("to_integer_64_2", (78).to_integer_64, STRING_.to_integer_64 ("00078")) > create uc_string.make_from_string ("4534") > assert_equal ("to_integer_64_3", (4534).to_integer_64, STRING_.to_integer_64 (uc_string)) > assert_equal ("to_integer_64_4", (9223372036854775807).to_integer_64, STRING_.to_integer_64 ("9223372036854775807")) > assert_equal ("to_integer_64_5", (9223372036854775807).to_integer_64, STRING_.to_integer_64 ("00000000009223372036854775807")) > end > > Gec fails one: > > Test Results: > FAIL: [KL_TEST_STRING_ROUTINES.test_to_integer_64] to_integer_64_4 > expected: -1 > but got: 9223372036854775807 > > so it looks like GEC's built-in {INTEGER}.to_integer_64 is off by one > on the edge case. This might be surprising at first, but the other compilers are wrong. According to ECMA Eiffel, the number 9223372036854775807 in (9223372036854775807).to_integer_64 is of type INTEGER. Therefore this cannot work. The problem with GEC (and with the other compilers) is that it should have reported an error at compile time. The ECMA way to do it is: {INTEGER_64} 9223372036854775807 but this won't work with SE I think. Note that I'm surprised that `assert_equal' in your test above works with SE. It accepts ANY as arguments and in SE expanded types (such as INTEGER_64) don't conform to ANY. That's why we introduced `assert_integers_equal', `assert_characters_equal', etc. Or do SE 1.2 and 2.* have a different behavior with this respect? One way (which is not 100% ECMA compliant but would work with all Eiffel compilers would be to introduce a routine `assert_integer_64_equal' and write: assert_integer_64_equal ("to_integer_64_4", 9223372036854775807, STRING_.to_integer_64 ("9223372036854775807")) Here it works because the compilers know that the expected type for 9223372036854775807 (the type of the formal argument of the routine) is INTEGER_64. Likewise, this would work: i64: INTEGER_64 ... i64 := 9223372036854775807 assert_equal ("to_integer_64_4", i64, STRING_.to_integer_64 ("9223372036854775807")) We could also have a feature: maximum_integer_64: INTEGER_64 in class KL_PLATFORM. This latter solution would probably be better. I had a look at the routines `is_integer_64' and `to_integer_64' in class KL_STRING_ROUTINES. And there are two things that I don't like. The first thing is that they create too many temporary objects (calling `substring', creating manifest arrays). I know that Eiffel is a language equipped with GC, but the less work we give to the GC, the faster the program will run. The second thing that I don't like is the precondition "is_integer". I don't think that it is a good idea to have to write: is_integer (s) and then is_integer_64 (s) On the other hand "is_integer" should be a postcondition of `is_integer_64': Result implies is_integer (a_string) -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Colin P. A. <co...@co...> - 2007-02-10 10:17:36
|
>>>>> "Eric" == Eric Bezault <er...@go...> writes: Eric> We could also have a feature: Eric> maximum_integer_64: INTEGER_64 Eric> in class KL_PLATFORM. This latter solution would probably be Eric> better. I've done this. -- Colin Adams Preston Lancashire |
From: Eric B. <er...@go...> - 2007-02-03 21:07:41
|
Colin Paul Adams wrote: >>>>>> "Eric" == Eric Bezault <er...@go...> writes: > > Eric> I had a look at the routines `is_integer_64' and > Eric> `to_integer_64' in class KL_STRING_ROUTINES. And there are > Eric> two things that I don't like. The first thing is that they > Eric> create too many temporary objects (calling `substring', > Eric> creating manifest arrays). > > Yes. I wasn't happy with that myself. > > Eric> The second thing that I don't > Eric> like is the precondition "is_integer". I don't think that it > Eric> is a good idea to have to write: > > Eric> is_integer (s) and then is_integer_64 (s) > > Well I thought about that. > And I thought about the use case I have, which is parsing > numbers. The tokenizer has already created tokens one of which is > certified to consist entirely of decimal digits (there may be a > preceding token consisting of a minus sign). > In this use case the pre-condition "is_integer" is guarenteed true. > > So I tried to think about other use cases for "is_integer_64". All I > could think of was parsing, and I presumed therefore that the > pre-condition would always be satisfied. I think that this assumption is too strong for a general purpose library class like KL_STRING_ROUTINE. It would have been OK if this routine was part of your parsing class, but here with KL_STRING_ROUTINE we cannot make such an assumption. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2007-02-10 11:09:10
|
Colin Paul Adams wrote: > Then the question is under what circumstances should `is_integer_64' > return True? > One answer that is inconsistent with your desired post-condition would > be an Eiffel INTEGER_64 literal (as it might contain > underscores). Likewise it can't contain a leading negative sign. That's true I forgot about that. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2007-02-17 13:32:59
|
Colin Paul Adams wrote: > And I thought about the use case I have, which is parsing > numbers. The tokenizer has already created tokens one of which is > certified to consist entirely of decimal digits (there may be a > preceding token consisting of a minus sign). > In this use case the pre-condition "is_integer" is guarenteed true. > > So I tried to think about other use cases for "is_integer_64". All I > could think of was parsing, and I presumed therefore that the > pre-condition would always be satisfied. I still believe that a routine in KL_STRING_ROUTINES should not make such assumption as to whether the string comes from a parser or not. If you don't want to parse the string twice, then ask your tokenizer to make the segregation between numbers that fit into INTEGER_64 and those that don't. And if this appears to be too complicated, then there is another solution. If I understood correctly the whole idea is to improve performance by using INTEGER_64 instead of MA_DECIMAL whenever possible. But should you use INTEGER_64 every time it is possible, or have a heuristic which is even faster than your current implementation of `is_integer_64' and will work most of the time? A possible heuristic for example is just to check the number of characters in your string: if my_string.count < 19 then use INTEGER_64 else use MA_DECIMAL end This check is just a heuristic. It will not catch leading zeros and numbers between 1000000000000000000 and 9223372036854775807. But it's super-fast. So it might be worth using that, even if for some numbers you will use MA_DECIMAL instead of INTEGER_64. But hey, we have to make trade-off and stop somewhere. For example in your current implementation there are numbers that could fit into an NATURAL_64 but are handled as MA_DECIMAL because they don't fit into INTEGER_64. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |
From: Eric B. <er...@go...> - 2007-02-23 14:35:11
|
Colin Paul Adams wrote: > and is_decimal_digits is certainly a better name than is_integer, > because the latter suggests (to me) a necessary and sufficient > condition for converting to INTEGER, which it isn't. Class KL_STRING_ROUTINES was initially written to have a compiler independent version of features that are in class STRING. At the time of writing, feature `is_integer' was what was available in some compilers but not in others. ELKS fixed the specification of `is_integer' and now it is clear that we should rename this feature in KL_STRING_ROUTINE and provide another implementation (following ELKS spec) for `is_integer'. In fact when `is_integer' was introduced in KL_STRING_ROUTINES it was to be used as precondition of STRING.to_integer, so this modification will just fix a potential bug (old versions of compilers were not dealing with the fact that the value had to fit into an INTEGER). For consistency with `is_hexadecimal' I would name the old feature `is_decimal' (rather than `is_decimal_digits'). > In any case, an is_integer_64 routine is needed as a pre-condition for > to_integer_64. > > And we first need to decide what it should stand for. > > The most intuitive meaning is that is_integer_64 implies that the > contents of the string concerned matches the lexical pattern of an > Eiffel INTEGER_64 literal. In which case my initial implementation of > to_integer_64 is insufficient. I don't think that KL_STRING_ROUTINES is there to reinvent the wheel (we have the string library for that). In KL_STRING_ROUTINES we should just make routines compiler independent. So we should just look at what is already provided in Eiffel compilers supporting this feature, and what is in ELKS for `is_integer' to see how we can apply it to `is_integer_64'. As far as I can see, none of those accepts underscores (unlike Eiffel INTEGER_64 literals). > But then we have an inconsistent story with is_integer. Can we rename > this to is_decimal_digits? This will break existing clients, so we can > only do that if the next Gobo release is numbered 4.0 (there is a > precedent in that the upgrade from 2.0 to 3.0 broke some existing clients). Do we need to break code now, or go to the "obsolete" stage before? -- Eric Bezault mailto:er...@go... http://www.gobosoft.com |