Henrique - 2007-08-18

I have created a very basic utility called gstrings:
- it is similar to known utility 'strings', but instead of judging a character as not part of the string (because it is accented, like character 'a´' or 'ç' -- perfectly valid in latin languages), it only judges Unicode control characters (like ASCII 0x80) as not part of strings.

Here's an example:

+++clip+++
[henrique@fuji src]$ ./gstrings ../etc/test.txt

2007 Prized Season and Sons

[[en: english, ISO8859-1]]

The UniCode consortium (www.unicode.org) contains multiple definitions of characters, called 'Unicode'.

[[pt: portuguese, ISO8859-1]]

O consórcio UniCode (www.unicode.org) contém multiplas

definições de caracteres, chamados 'Unicode'.

INSTALAÇÃO DEFAULT

As directorias de instalação default são:

     /usr/share/unicodeplus   => caminho de base ('Base path')

[[fr: french, ISO8859-1]]

Le consortium UniCode (www.unicode.org) contient multiples définitions de caractères, s'appelle 'Unicode '.

[[es: spanish, ISO8859-1]]

El consorcio de UniCode (www.unicode.org) contiene las múltiples definiciones de caracteres, llamadas 'Unicode'.

[Cataluña]

[[SOME MATHEMATICAL CHARS]]

2 x 3 can be written as: 2
3 (where '
' is the multiplicatin sign, ASCII 215d).

P
+++clip+++
If you would call the classic 'strings' on this ISO-8859-1 latin text, you would see lots of words cut.

Especially useful for text based on Western languages.

Uses xpfweb_v2x.