Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#15 Support for Big5 encoding (larger character sets)

open
nobody
None
5
2012-09-16
2005-11-16
Anonymous
No

The following illustrates changes to the JLog parser (pTokenizerTable
and pTokenizeStream) to add support for large character sets (e.g., for
Chinese), and the big5 encoding (the ** lines below appear to indicate
lines to add to the methods they appear in):

ubc.cs.JLog.Parser.pTokenizerTable:

public pTokenizerTable() {
table = new int[65280];
//table = new int[256]; resetSyntax(); // comment out these lines
};

beacause every word in big5 start with 4 byte so I increase
the table to 65536-256 = 65280
another change: ubc.cs.JLog.Parser.pTokenizeStream

protected void initTable() {
table.setTokenType('%', TOKEN_LINECOMMENT);
table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initSinglesTable() {
singles_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initBaseNumberTable() {
basenumber_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initStringTable() {
string_table.setTokenType(256, 65279, TOKEN_WORDS); .
...
}
protected void initArrayTable() {
array_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initLineCommentTable() {
linecomment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initCommentTable() {
** comment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}

Discussion