The following illustrates changes to the JLog parser (pTokenizerTable
and pTokenizeStream) to add support for large character sets (e.g., for
Chinese), and the big5 encoding (the ** lines below appear to indicate
lines to add to the methods they appear in):
ubc.cs.JLog.Parser.pTokenizerTable:
public pTokenizerTable() {
table = new int[65280];
//table = new int[256]; resetSyntax(); // comment out these lines
};
beacause every word in big5 start with 4 byte so I increase
the table to 65536-256 = 65280
another change: ubc.cs.JLog.Parser.pTokenizeStream
protected void initTable() {
table.setTokenType('%', TOKEN_LINECOMMENT);
table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initSinglesTable() { singles_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initBaseNumberTable() {
basenumber_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initStringTable() { string_table.setTokenType(256, 65279, TOKEN_WORDS); .
...
}
protected void initArrayTable() {
array_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initLineCommentTable() { linecomment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initCommentTable() {
** comment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}