The following illustrates changes to the JLog parser (pTokenizerTable
and pTokenizeStream) to add support for large character sets (e.g., for
Chinese), and the big5 encoding (the ** lines below appear to indicate
lines to add to the methods they appear in):
ubc.cs.JLog.Parser.pTokenizerTable:
public pTokenizerTable() {
** table = new int[65280];
** //table = new int[256]; resetSyntax(); // comment out these lines
};
beacause every word in big5 start with 4 byte so I increase
the table to 65536-256 = 65280
another change: ubc.cs.JLog.Parser.pTokenizeStream
protected void initTable() {
table.setTokenType('%', TOKEN_LINECOMMENT);
** table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initSinglesTable() {
** singles_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initBaseNumberTable() {
** basenumber_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initStringTable() {
** string_table.setTokenType(256, 65279, TOKEN_WORDS); .
...
}
protected void initArrayTable() {
** array_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initLineCommentTable() {
** linecomment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initCommentTable() {
** comment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}