Introduction:
Tiny cc is an compiler, which is written by Fabrice Bellard.
When we "tycc.exe -run gbk.c",
and it error "gbk.c:15: error: unrecognized character \xbd".
According to https://leisurebamboo.wordpress.com/2022/12/11/tinycc-0926
we modify the libTcc.dll by commandLine "difc.exe fc.txt".
After that, "tycc.exe -run gbk.c" runs OK,
it can read symbols with kanji now.
--------- 1.symptoms -----------------------------
Once a letter is great than 0x80, then the symbol(of the above letter)
can not be compiled by Tiny cc.
That is a bug.
--------- 2.1.affected part in c code ------------
In tccPp.c\void next_nomacro1(void):
this function declare a local variable "int c"
and a static global variable "unsigned char isidnum_table[256]".
If c(the above letter) is greater than 0x80,
then after the function judge "if (!isidnum_table[c])",
it cause tcc_error("unrecognized character \\x%02x", c);
And where did it initialize the array of isidnum_table[]?
That is in "inline void preprocess_new(void)":
'for(i=0;i<256;i++){isidnum_table[i] = isid(i) || isnum(i);}
memset(hash_ident, 0, sizeof...);'
And "int isid(int c)"
"{return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_';"
And "inline int isnum(int c)"
"{return c >= '0' && c <= '9';}"
We can see, if i is great than 0x80, then isidnum_table[i] will be 0.
--------- 2.2.solution in c code -----------------
So, if we memset(&(isidnum_table[128]),1,128),
then we can kill the above bug.
And we can rebuild the c code now.
But, if we without c code, and we without any compiler
(Yes, we just unzip tcc-0.9.26-win32-bin.zip,
or unzip the tcc-0.9.26-win64-bin.zip)?
Um...
--------- 3.1.affected part in asm code ----------
libTcc.dll export a function "TCCState *tcc_new(void)"
which can be seen by our debugger.
In win32, debugger maybe ollydbg.exe;
In win64, debugger maybe x64dbg.exe.
I used the debugger, opened the tcc.exe which imported libTcc.dll,
saw the above exported function, and copyed them as "libTcc.asm".
In the asm, we can see the assembly code, they have a same sequence
as the c code in "libTcc.c\LIBTCCAPI TCCState *tcc_new(void)":
After the inline function preprocess_new(),
and 4 funtions define_push(), there are
'sscanf("0.9.26", "%d.%d.%d", &a, &b, &c);
register tmp=a*10000 + b*100 + c;
sprintf(buffer, "%d", tmp);'
--------- 3.2.solution in asm code ---------------
If we declare "tmp=926(0x39e)",
then we need NOT sscanf() any more.
and we can modify the assembly code to:
'for(i=0;i<128;i++){isidnum_table[i] = isid(i) || isnum(i);}
label_1:
goto label_4;
label_2:
memset(hash_ident, 0, sizeof...);'
....
#if 0 //org
sscanf("0.9.26", "%d.%d.%d", &a, &b, &c);
register tmp=a*10000 + b*100 + c;
#else //new_code
label_3:
goto label_5;
label_4:
memset(&(isidnum_table[128]),1,128);
goto label_2;
#endif
label_5:
sprintf(buffer, "%d", 926);'
--------- 3.3.rewrite asm code -------------------
According by the solution and libTcc.asm,
we can rewrite code into the following file_offset:
+-------+---------------+---------------+
| | win32 | win64 |
| +--------+------+--------+------+
| |memory- |file- |memory- |file- |
| |address |offset|address |offset|
+-------+--------+------+--------+------+
|label_1|6bb169fb|15fdb |6dbd8abb|17ebb |
+-------+--------+------+--------+------+
|label_2|6bb16a07|15e07 |6dbd8ac0|17ec0 |
+-------+--------+------+--------+------+
|label_3|6bb16aa8|15ea8 |6dbd8b62|17f62 |
+-------+--------+------+--------+------+
|label_4|6bb16aaa|15eaa |6dbd8b6a|17f6a |
+-------+--------+------+--------+------+
|label_5|6bb16adc|15edc |6dbd8ba6|17fa6 |
+-------+--------+------+--------+------+
========= 3.3.1.win32 content ==============
If we use win32, so we needn't to 3.3.2.
15def: 80 00
15dfb: BF 1D(19 B2 6B); //edi=6bb2191d;//&(isidnum_table[128])
15e00: B0 01; //al=1
15e02: E9 A3 00(00 00); //jmp label_4;//6bb16aaa
15e07: 31 C0; label_2://eax=0
15ea8: EB 32; //jmp label_5;//6bb16adc
15eaa: 33 C9 B1 80 F3 AA//ecx=0, cl=80, rep stosb;//memset(&(isidnum_table[128]),1,128);
15eb0: 83 C7 03; //edi+=3
15eb3: B9 00 20 00 00; //ecx=2000
15eb8: E9 4A FF ff FF; //jmp label_2;//6bb16a07
15ee7: B8 9E 03 00 00 90//eax=39e
Don't care 3.3.2, goto 3.4 [backup and compare]
========= 3.3.2.win64 content ==============
But if we use win64, so we needn't to 3.3.1.
According by the solution and libTcc.asm,
we can rewrite win64 code:
17eae: 50
17ebb: E9 A4 00(00 00); //jmp label_4;//6dbd8b64
17f62: EB 42; //jmp label_5;//6dbd8ba6
17f64: 31 C9 B0 01 B1 80 F3 AA //ecx=0, al=1, cl=80, rep stosb
17f6c: 66 B9 00 40; //cx=4000
17f70: E9 4B FF ff FF; //jmp label_2;//6dbd8ac0
17f75: 90 90 90 90 90
17fb3: 41 B8 9E 03 00 00 90 90; //r8d=0x39e
--------- 3.4.backup and compare ------------------
After we backup libTcc.dll as <<libTcc.bak>>,
we can write the above new content in their file offset.
About 40 bytes, some in( ) are the same as old content.
After we save the libTcc.dll,
we can "tycc.exe -run gbk.c" in console, without error, now.
To prove we just modify about 40 bytes,
we can "fc /b libTcc.bak libTcc.dll > fc.txt" to do
file-compare in binary mode. And we will get a text file,
which name is "fc.txt", and its' content are:
'Comparing files libTcc.bak and libTcc.dll
0001xxxx: xx xx
0001xxxx: xx xx
...'
The bytes in column of libTcc.dll
are we just inputed.
--------- 4.after word ---------------------------
If you had download a same tinycc 0.9.26,
after you notice bugs above, you can
1)send message to bellard.org...
but Bellard had said he "no longer working on TCC"
2)download the source code, compile by yourself...
even you maybe not a programer.
3)rename your libTcc.dll as libTcc.bak,
and input "difc.exe fc.txt" in console.
Of course, the name of file(fc.txt) is not import.