I have created a very basic utility called gstrings:
- it is similar to known utility 'strings', but instead of judging a character as not part of the string (because it is accented, like character 'a´' or 'ç' -- perfectly valid in latin languages), it only judges Unicode control characters (like ASCII 0x80) as not part of strings.
I have created a very basic utility called gstrings:
- it is similar to known utility 'strings', but instead of judging a character as not part of the string (because it is accented, like character 'a´' or 'ç' -- perfectly valid in latin languages), it only judges Unicode control characters (like ASCII 0x80) as not part of strings.
Here's an example:
+++clip+++
[henrique@fuji src]$ ./gstrings ../etc/test.txt
2007 Prized Season and Sons
[[en: english, ISO8859-1]]
The UniCode consortium (www.unicode.org) contains multiple definitions of characters, called 'Unicode'.
[[pt: portuguese, ISO8859-1]]
O consórcio UniCode (www.unicode.org) contém multiplas
definições de caracteres, chamados 'Unicode'.
INSTALAÇÃO DEFAULT
As directorias de instalação default são:
/usr/share/unicodeplus => caminho de base ('Base path')
[[fr: french, ISO8859-1]]
Le consortium UniCode (www.unicode.org) contient multiples définitions de caractères, s'appelle 'Unicode '.
[[es: spanish, ISO8859-1]]
El consorcio de UniCode (www.unicode.org) contiene las múltiples definiciones de caracteres, llamadas 'Unicode'.
[Cataluña]
[[SOME MATHEMATICAL CHARS]]
2 x 3 can be written as: 2
3 (where '
' is the multiplicatin sign, ASCII 215d).
P
+++clip+++
If you would call the classic 'strings' on this ISO-8859-1 latin text, you would see lots of words cut.
Especially useful for text based on Western languages.
Uses xpfweb_v2x.