From: <tj...@us...> - 2009-01-27 13:17:05
|
Revision: 11511 http://alleg.svn.sourceforge.net/alleg/?rev=11511&view=rev Author: tjaden Date: 2009-01-27 13:00:38 +0000 (Tue, 27 Jan 2009) Log Message: ----------- Start UTF-8 string routines, based on bstrlib for the underlying storage. This initial commit sets the foundation for the types. Functions will be added incrementally, with test cases and documentation. Modified Paths: -------------- allegro/branches/4.9/cmake/FileList.cmake allegro/branches/4.9/docs/src/refman/CMakeLists.txt allegro/branches/4.9/docs/src/refman/inc.a.txt allegro/branches/4.9/docs/src/refman/index.txt allegro/branches/4.9/examples/CMakeLists.txt Added Paths: ----------- allegro/branches/4.9/docs/src/refman/utf8.txt allegro/branches/4.9/examples/ex_utf8.c allegro/branches/4.9/include/allegro5/utf8.h allegro/branches/4.9/src/utf8.c Modified: allegro/branches/4.9/cmake/FileList.cmake =================================================================== --- allegro/branches/4.9/cmake/FileList.cmake 2009-01-27 12:59:22 UTC (rev 11510) +++ allegro/branches/4.9/cmake/FileList.cmake 2009-01-27 13:00:38 UTC (rev 11511) @@ -31,6 +31,7 @@ src/timernu.c src/tls.c src/unicode.c + src/utf8.c src/misc/bstrlib.c src/misc/vector.c ) Modified: allegro/branches/4.9/docs/src/refman/CMakeLists.txt =================================================================== --- allegro/branches/4.9/docs/src/refman/CMakeLists.txt 2009-01-27 12:59:22 UTC (rev 11510) +++ allegro/branches/4.9/docs/src/refman/CMakeLists.txt 2009-01-27 13:00:38 UTC (rev 11511) @@ -14,12 +14,13 @@ mouse opengl path + primitives state system threads time timer - primitives + utf8 ) set(PAGES_TXT) Modified: allegro/branches/4.9/docs/src/refman/inc.a.txt =================================================================== --- allegro/branches/4.9/docs/src/refman/inc.a.txt 2009-01-27 12:59:22 UTC (rev 11510) +++ allegro/branches/4.9/docs/src/refman/inc.a.txt 2009-01-27 13:00:38 UTC (rev 11511) @@ -21,6 +21,7 @@ * [Threads](threads.html) * [Time](time.html) * [Timer](timer.html) +* [UTF-8](utf8.html) * [Audio addon](kcm_audio.html) * [Color addon](color.html) * [Font addons](font.html) Modified: allegro/branches/4.9/docs/src/refman/index.txt =================================================================== --- allegro/branches/4.9/docs/src/refman/index.txt 2009-01-27 12:59:22 UTC (rev 11510) +++ allegro/branches/4.9/docs/src/refman/index.txt 2009-01-27 13:00:38 UTC (rev 11511) @@ -20,6 +20,7 @@ * [Threads](threads.html) * [Time](time.html) * [Timer](timer.html) +* [UTF-8](utf8.html) # Addons Added: allegro/branches/4.9/docs/src/refman/utf8.txt =================================================================== --- allegro/branches/4.9/docs/src/refman/utf8.txt (rev 0) +++ allegro/branches/4.9/docs/src/refman/utf8.txt 2009-01-27 13:00:38 UTC (rev 11511) @@ -0,0 +1,95 @@ +% UTF-8 string routines + +Here we should give a short overview of Unicode/UCS and in particular UTF-8 +encoding. + +Explain about code points and relationship to "characters". + +Explain that pos parameters are in byte offsets, not code point indices. + +Explain about half-open intervals. + + +# Creating and destroying strings + +## API: al_ustr_new + +Create a new empty string. It must be freed with [al_ustr_free]. + + +## API: al_ustr_free + +Free a previously allocated string. + + +## API: al_cstr + +Get a `char *` pointer to the data in a string. This pointer will only be +valid while the underlying string is not modified and not destroyed. +The pointer may be passed to functions expecting C-style strings, +with the following caveats: + +* ALLEGRO_USTRs are allowed to contain embedded NUL ('\0') bytes. + That means `al_ustr_size(u)` and `strlen(al_cstr(u))` may not agree. + +* An ALLEGRO_USTR may be created in such a way that it is not NUL terminated. + A string which is dynamically allocated will always be NUL terminated, + but a string which references the middle of another string or region + of memory will *not* be NUL terminated. + + +# Predefined strings + +## API: al_ustr_empty_string + +Return a pointer to a static empty string. The string is read only. + + +# Creating strings by referencing other data + +## API: al_ref_cstr + +Create a string that references the storage of a C-style string. The +information about the string (e.g. its size) is stored in the structure +pointed to by the `info` parameter. The string will not have any other +storage allocated of its own, so if you allocate the `info` structure on the +stack then no explicit "free" operation is required. + +The string is valid until the underlying C string disappears. + +Example: + + ALLEGRO_USTR_INFO info; + ALLEGRO_USTR us = al_ref_cstr(&info, "my string"); + + +## API: al_ref_buffer + +Like [al_ref_cstr] but the size of the string data is passed in as a +parameter. Hence you can use it to reference only part of a string or an +arbitrary region of memory. + +The string is valid while the underlying C string is valid. + + +## API: al_ref_ustr + +Create a read-only string that references the storage of another string. +The information about the string (e.g. its size) is stored in the structure +pointed to by the `info` parameter. The string will not have any other +storage allocated of its own, so if you allocate the `info` structure on the +stack then no explicit "free" operation is required. + +The referenced interval is \[start_pos, end_pos). + +The string is valid until the underlying string is modified or destroyed. + + +# Sizes and offsets + +## API: al_ustr_size + +Return the size of the string in bytes. This is equal to the number of code +points in the string if the string is empty or contains only 7-bit ASCII +characters. + Modified: allegro/branches/4.9/examples/CMakeLists.txt =================================================================== --- allegro/branches/4.9/examples/CMakeLists.txt 2009-01-27 12:59:22 UTC (rev 11510) +++ allegro/branches/4.9/examples/CMakeLists.txt 2009-01-27 13:00:38 UTC (rev 11511) @@ -83,6 +83,7 @@ example(ex_threads2) example(ex_timedwait) example(ex_user_events) +example(ex_utf8) if(SUPPORT_D3D) example(ex_d3d ex_d3d.cpp d3dx9) Added: allegro/branches/4.9/examples/ex_utf8.c =================================================================== --- allegro/branches/4.9/examples/ex_utf8.c (rev 0) +++ allegro/branches/4.9/examples/ex_utf8.c 2009-01-27 13:00:38 UTC (rev 11511) @@ -0,0 +1,126 @@ +/* + * Example program for the Allegro library. + * + * Test UTF-8 string routines. + */ + +#include <allegro5/allegro5.h> +#include <allegro5/utf8.h> +#include <stdio.h> + +typedef void (*test_t)(void); + +int error = 0; + +#define CHECK(x) \ + do { \ + bool ok = (x); \ + if (!ok) { \ + printf("FAIL %s\n", #x); \ + error++; \ + } else { \ + printf("OK %s\n", #x); \ + } \ + } while (0) + +/* Test that we can create and destroy strings and get their data and size. */ +void t1(void) +{ + ALLEGRO_USTR us1 = al_ustr_new(""); + ALLEGRO_USTR us2 = al_ustr_new("áƵ"); + + CHECK(0 == strcmp(al_cstr(us1), "")); + CHECK(0 == strcmp(al_cstr(us2), "áƵ")); + CHECK(4 == al_ustr_size(us2)); + + al_ustr_free(us1); + al_ustr_free(us2); +} + +void t2(void) +{ + CHECK(0 == al_ustr_size(al_ustr_empty_string())); + CHECK(0 == strcmp(al_cstr(al_ustr_empty_string()), "")); +} + +/* Test that we make strings which reference other C strings. */ +/* No memory needs to be freed. */ +void t3(void) +{ + ALLEGRO_USTR_INFO info; + ALLEGRO_USTR us = al_ref_cstr(&info, "A static string."); + + CHECK(0 == strcmp(al_cstr(us), "A static string.")); +} + +/* Test that we can make strings which reference arbitrary memory blocks. */ +/* No memory needs to be freed. */ +void t4(void) +{ + const char *s = "This contains an embedded NUL: \0 <-- here"; + ALLEGRO_USTR_INFO info; + ALLEGRO_USTR us = al_ref_buffer(&info, s, sizeof(s)); + + CHECK(al_ustr_size(us) == sizeof(s)); + CHECK(0 == memcmp(al_cstr(us), s, sizeof(s))); +} + +/* Test that we can make strings which reference (parts of) other strings. */ +void t5(void) +{ + ALLEGRO_USTR us1; + ALLEGRO_USTR us2; + ALLEGRO_USTR_INFO us2_info; + + us1 = al_ustr_new("aábdðeéfghiíjklmnoóprstuúvxyýþæö"); + + us2 = al_ref_ustr(&us2_info, us1, 36, 36 + 4); + CHECK(0 == memcmp(al_cstr(us2), "þæ", al_ustr_size(us2))); + + /* Left pos underflow */ + us2 = al_ref_ustr(&us2_info, us1, -10, 7); + CHECK(0 == memcmp(al_cstr(us2), "aábdð", al_ustr_size(us2))); + + /* Right pos overflow */ + us2 = al_ref_ustr(&us2_info, us1, 36, INT_MAX); + CHECK(0 == memcmp(al_cstr(us2), "þæö", al_ustr_size(us2))); + + al_ustr_free(us1); +} + +/*---------------------------------------------------------------------------*/ + +const test_t all_tests[] = +{ + NULL, t1, t2, t3, t4, t5 +}; + +#define NUM_TESTS (int)(sizeof(all_tests) / sizeof(all_tests[0])) + +int main(int argc, const char *argv[]) +{ + int i; + + if (argc < 2) { + for (i = 1; i < NUM_TESTS; i++) { + printf("# t%d\n\n", i); + all_tests[i](); + printf("\n"); + } + } + else { + i = atoi(argv[1]); + if (i > 0 && i < NUM_TESTS) { + all_tests[i](); + } + } + + if (error) { + exit(EXIT_FAILURE); + } + + return 0; +} +END_OF_MAIN() + +/* vim: set sts=3 sw=3 et: */ Added: allegro/branches/4.9/include/allegro5/utf8.h =================================================================== --- allegro/branches/4.9/include/allegro5/utf8.h (rev 0) +++ allegro/branches/4.9/include/allegro5/utf8.h 2009-01-27 13:00:38 UTC (rev 11511) @@ -0,0 +1,123 @@ +#ifndef __al_included_utf8_h +#define __al_included_utf8_h + +#include "allegro5/base.h" + +#ifdef __cplusplus + extern "C" { +#endif + +typedef struct ALLEGRO_USTR ALLEGRO_USTR; +typedef const struct ALLEGRO_USTR_INFO ALLEGRO_USTR_INFO; + +struct ALLEGRO_USTR { + struct _al_tagbstring *b; /* internal */ +}; + +struct ALLEGRO_USTR_INFO { + /* This struct needs to be at least as big as struct _al_tagbstring. */ + int __pad[4]; +}; + +/* Creating strings */ +AL_FUNC(ALLEGRO_USTR, al_ustr_new, (const char *s)); +AL_FUNC(void, al_ustr_free, (ALLEGRO_USTR us)); +AL_FUNC(const char *, al_cstr, (ALLEGRO_USTR us)); + +/* Predefined string */ +AL_FUNC(ALLEGRO_USTR, al_ustr_empty_string, (void)); + +/* Reference strings */ +AL_FUNC(ALLEGRO_USTR, al_ref_cstr, (ALLEGRO_USTR_INFO *info, const char *s)); +AL_FUNC(ALLEGRO_USTR, al_ref_buffer, (ALLEGRO_USTR_INFO *info, const char *s, + size_t size)); +AL_FUNC(ALLEGRO_USTR, al_ref_ustr, (ALLEGRO_USTR_INFO *info, ALLEGRO_USTR us, + int start_pos, int end_pos)); + +/* Sizes and offsets */ +AL_FUNC(size_t, al_ustr_size, (ALLEGRO_USTR us)); + +/* To be added: + +UTF-8 HELPERS + + al_utf8_width + al_utf8_encode + +CREATE + + al_ustr_newf + al_ustr_dup + al_ustr_dup_substr + al_cstr_dup + +LENGTH AND OFFSET + + al_ustr_length + al_ustr_offset + al_ustr_next + al_ustr_prev + +GET CODE POINTS + + al_ustr_get + al_ustr_get_next + al_ustr_prev_get + +INSERT + + al_ustr_insert_chr + al_ustr_insert + al_ustr_insert_cstr + +APPEND + + al_ustr_append_chr + al_ustr_append + al_ustr_append_cstr + al_ustr_appendf + al_ustr_vappendf + +REMOVE + + al_ustr_remove_chr + al_ustr_remove_range + al_ustr_truncate + al_ustr_ltrim_ws + al_ustr_rtrim_ws + al_ustr_trim_ws + +REPLACE + + al_ustr_set_char + al_ustr_replace_from + al_ustr_replace_range + al_ustr_assign + al_ustr_assign_mid (too similar to replace_range?) + +SEARCHING + + al_ustr_find_chr + al_ustr_rfind_chr + al_ustr_find_any + al_ustr_find_str + al_ustr_find_span + al_ustr_find_cspan + +COMPARE + + al_ustr_compare + al_ustr_ncompare + al_ustr_equal + al_ustr_has_prefix + al_ustr_has_suffix + +*/ + +#ifdef __cplusplus + } +#endif + +#endif + +/* vim: set sts=3 sw=3 et: */ Added: allegro/branches/4.9/src/utf8.c =================================================================== --- allegro/branches/4.9/src/utf8.c (rev 0) +++ allegro/branches/4.9/src/utf8.c 2009-01-27 13:00:38 UTC (rev 11511) @@ -0,0 +1,103 @@ +/* ______ ___ ___ + * /\ _ \ /\_ \ /\_ \ + * \ \ \L\ \\//\ \ \//\ \ __ __ _ __ ___ + * \ \ __ \ \ \ \ \ \ \ /'__`\ /'_ `\/\`'__\/ __`\ + * \ \ \/\ \ \_\ \_ \_\ \_/\ __//\ \L\ \ \ \//\ \L\ \ + * \ \_\ \_\/\____\/\____\ \____\ \____ \ \_\\ \____/ + * \/_/\/_/\/____/\/____/\/____/\/___L\ \/_/ \/___/ + * /\____/ + * \_/__/ + * + * UTF-8 string handling functions. + * + * By Peter Wang. + * + * See LICENSE.txt for copyright information. + */ + + +#include "allegro5/allegro5.h" +#include "allegro5/utf8.h" +#include "allegro5/internal/bstrlib.h" + + +/* Function: al_ustr_new + */ +ALLEGRO_USTR al_ustr_new(const char *s) +{ + return (ALLEGRO_USTR) { _al_bfromcstr(s) }; +} + + +/* Function: al_ustr_free + */ +void al_ustr_free(ALLEGRO_USTR us) +{ + _al_bdestroy(us.b); +} + + +/* Function: al_cstr + */ +const char *al_cstr(ALLEGRO_USTR us) +{ + /* May or may not be NUL terminated. */ + return _al_bdata(us.b); +} + + +/* Function: al_ustr_empty_string + */ +ALLEGRO_USTR al_ustr_empty_string(void) +{ + static struct _al_tagbstring empty = _al_bsStatic(""); + return (ALLEGRO_USTR) { &empty }; +} + + +/* Function: al_ref_cstr + */ +ALLEGRO_USTR al_ref_cstr(ALLEGRO_USTR_INFO *info, const char *s) +{ + struct _al_tagbstring *tb = (struct _al_tagbstring *) info; + ASSERT(info); + ASSERT(s); + + _al_btfromcstr(*tb, s); + return (ALLEGRO_USTR) { tb }; +} + + +/* Function: al_ref_buffer + */ +ALLEGRO_USTR al_ref_buffer(ALLEGRO_USTR_INFO *info, const char *s, size_t size) +{ + struct _al_tagbstring *tb = (struct _al_tagbstring *) info; + ASSERT(s); + + _al_blk2tbstr(*tb, s, size); + return (ALLEGRO_USTR) { tb }; +} + + +/* Function: al_ref_ustr + */ +ALLEGRO_USTR al_ref_ustr(ALLEGRO_USTR_INFO *info, const ALLEGRO_USTR us, + int start_pos, int end_pos) +{ + struct _al_tagbstring *tb = (struct _al_tagbstring *) info; + + _al_bmid2tbstr(*tb, us.b, start_pos, end_pos - start_pos); + return (ALLEGRO_USTR) { tb }; +} + + +/* Function: al_ustr_size + */ +size_t al_ustr_size(ALLEGRO_USTR us) +{ + return _al_blength(us.b); +} + + +/* vim: set sts=3 sw=3 et: */ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |