Support for Big5 encoding (larger character sets)

Brought to you by: grholst

#15 Support for Big5 encoding (larger character sets)

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2012-09-16

Created: 2005-11-16

Creator: Anonymous

Private: No

The following illustrates changes to the JLog parser (pTokenizerTable
and pTokenizeStream) to add support for large character sets (e.g., for
Chinese), and the big5 encoding (the ** lines below appear to indicate
lines to add to the methods they appear in):

ubc.cs.JLog.Parser.pTokenizerTable:

public pTokenizerTable() {
table = new int[65280];
//table = new int[256]; resetSyntax(); // comment out these lines
};

beacause every word in big5 start with 4 byte so I increase
the table to 65536-256 = 65280
another change: ubc.cs.JLog.Parser.pTokenizeStream

protected void initTable() {
table.setTokenType('%', TOKEN_LINECOMMENT);
table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initSinglesTable() { singles_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initBaseNumberTable() {
basenumber_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initStringTable() { string_table.setTokenType(256, 65279, TOKEN_WORDS); .
...
}
protected void initArrayTable() {
array_table.setTokenType(256, 65279, TOKEN_WORDS);
....
}
protected void initLineCommentTable() { linecomment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}
protected void initCommentTable() {
** comment_table.setTokenType(256, 65279, TOKEN_WORDS); ....
}

Support for Big5 encoding (larger character sets)

Group

Searches

Help

#15 Support for Big5 encoding (larger character sets)

Discussion