From: Eric B. <gob...@if...> - 2002-04-29 21:06:00
|
Franck Arnaud wrote: > > Once Eric's UC_STRING (that inherits from STRING) is in, it will > be trivially possible to have UC_STRINGs only for strings > that actually contain >127 characters. I finally committed the new implementation of UC_STRING in CVS. I ran the bootstrap and the test cases in debug mode (i.e. with all assertions on) with no problem. It was under Windows NT with: ISE 5.1.14 HACT 4.0.1 VE 4.0 (build 4001) SE -0.74b21 and: MSVC 6.0 I took this opportunity to add to the global test procedure (i.e. when one executes 'geant test_*' in $GOBO/test/) the test cases for XML library using Oasis, and also the compilation of the XML examples. I had some problems with the test cases for the XML library using Oasis, but I got the same problems with the old implementation of UC_STRING, so I will report that to this mailing list in another message since it is not related to the new UC_STRING. As already mentioned many times in this mailing list, having UC_STRING inherit from STRING is not a great design, to say the least. What we want is to have a common class interface for our XML library when using STRING and UC_STRING, and hence avoiding duplicated code. A good design would probably be to have a common ancestor for STRING and UC_STRING sharing the common interface, but this class does not exist in ELKS. So having UC_STRING inherit from STRING is a workaround. If you were using the old implementation of UC_STRING, the major changes are that UC_CHARACTER is not expanded anymore (so make sure to explicitly create it now), and UC_STRING is now deferred. It currently has one concrete descendant, UC_UTF8_STRING, but implementations based on 16 ad 32 bits will follow. Unless you explicitly want to create a UTF-8 string, it is recommended to continue declaring unicode strings with UC_STRING and use the factory routine UC_UNICODE_FACTORY.new_unicode_string to create them (other factory routines with different signatures could be added in the future). This will make it easier for a project to switch between one unicode encoding and another by just modifying one routine (or a small set of routines). The routines in the new UC_STRING try to follow those in ELSK 2001 STRING. I also added routines from the old UC_STRING and marked them as obsolete to make the transition smoother. If there is a routine in the old UC_STRING that you used to call and is not available anymore in the new UC_STRING, just let me know and I'll try to add it as obsolete in the new UC_STRING. Now, for those who will want to take advantage of the fact that UC_STRING inherits from STRING, please note that the only routines that are garanted to be portable and polymorphically available in STRING and UC_STRING are those which are listed in KS_STRING (in $GOBO/library/ kernel/elks/). So when writing a routine accepting STRINGs but where UC_STRINGs are expected, please use only these routines. Note that routines in STRING which assumed that the arguments were about characters with code less than `Maximum_character_code' have been renamed with 'latin1' in their names in class UC_STRING, even though it is clear that there is no garanty that characters with code between 128 and 255 are encoded using Latin-1. When STRING.item is called polymorphically on a UC_STRING and the character has a code greater than 255, then '%U' is returned. Therefore it is recommended to handle character codes (i.e. INTEGER) instead if CHARACTER (or even UC_CHARACTER, because UC_CHARACTER is not expanded and a new object is created each time, which is time and memory consumming). For that there is STRING.item_code, and many other routines in UC_STRING with names containing 'code' (e.g. `append_code'). I started to do that in the Regexp library to make it Unicode aware, and it seems to work quite well, both in terms of correctness but also in terms of performance and memory usage (when compiled with SE with no GC). As already discussed with Franck, some of the routines of STRING (even though listed in KS_STRING) will cause problems (probably a run-time crash) when the target is dynamically attached to a STRING and the argument is dynamically attached to a UC_STRING. This is because the implementation of STRING provided by the Eiffel vendors is not aware of the unicode encoding in UC_STRING. To work around this problem helper routines will be provided when possible, such as a `concat' routine instead of calling `append_string' as already explained during the discussion on this topic with Franck. These routines are not available yet. PS: A test case for the new Unicode classes is available in UC_TEST_UTF8_STRING in $GOBO/test/kernel/. As a reminder, there is also a test case for testing the routines listed in KS_STRING with the class STRING provided by all Eiffel compilers. This test case is in KS_TEST_STRING. PPS: Because of a bug in VE 4.0, the new UC_STRING does not work when compiling with the inlining optimization. This bug does not allow polymorphic calls of `put' and `item' in class STRING. So while waiting for this bug to be fixed it is recommended not to use VE's ESD inlining optimization option. -- Eric Bezault mailto:er...@go... http://www.gobosoft.com ______________________________________________________________________________ ifrance.com, l'email gratuit le plus complet de l'Internet ! vos emails depuis un navigateur, en POP3, sur Minitel, sur le WAP... http://www.ifrance.com/_reloc/email.emailif |