Introduction:
Tcc 2.0, is product of borland in 1988.
When we "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c",
it error "Illegal character"
and error "Declaration syntax error".
According to https://leisurebamboo.wordpress.com/2023/10/06/tc2
we modify the tcc.eve by commandLine "difc.exe fc.txt".
After that, "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c" runs OK,
it can read symbols with kanji now.
--------- 1.symptoms ----------------------------
We can compile WordCnt.c(attached in turbo c's
debugging chapter of the User's Guide)
by commandLine
"tcc.exe -c -oWordCnt.obj ..\WordCnt.c" and
"tlink.exe c0s.obj WordCnt.obj,WordCnt.exe,,cs.lib".
But after we change some function_name or variable_name
or macro_name in c, once a letter is great than 0x80,
(example for cnt_big5.c or cnt_gbk.c),
then the symbol can not be compiled by tcc.
That is a bug.
--------- 2.affected part -----------------------
========= 2.1.switch ============================
In tcc.exe, after it's fgetch(), it:
cs:077 268A07 B400 al=es:[bx],ah=0;//means "c=fgetch()"
cs:07C 8946FC [bp-4]=ax
cs:07F 8B1ED450 bx=wo[50D4];//switch_enum_table==30B6
cs:083 035EFC + wo[bp-4]
cs:086 8A07 al=[bx];//means switch_enum_table[c]
cs:088 98 8946FE cbw, [bp-2]=ax
cs:08C 2DE0FF ax+=20
cs:08F 3D2000 7603 if(ax>=20)
cs:094 E9F800 jmp 018F
cs:097 8BD8 D1E3 bx=ax<<1
cs:09B 2E FFA7A000 jmp cs:0A0[bx]
;+--------------------------------------------------+
;|note: |
;| ds:[30B6~31B5] are switch_enum_table, they are: |
;|db 8 dup(0EEh); 00~07 |
;|db 0EEh,4 dup(-9),0ech,2 dup(0EEh); 08~0F |
;|db 8 dup(0EEh); 10~17 |
;|db 2 dup(0EEh),0E7h,5 dup(0EEh); 18~1f |
;|db -9,0,0F4h,2 dup(0EEh),-1,-2,0F2h; 20~27 |
;|db 1,2,-3,-4,8,0F1h,0F0h,-5; 28~2f |
;|db 0Ah dup(0F5h); '0'~'9' |
;|db 1Fh,7,0EFh,0F8h,0EDh,1Eh; 3A~3F |
;| |
;|db 0EEh,1Ah dup(0F6h); '@', 'A'~'Z'|
;|db 3, 0EAh,4, 0FAh, 0F6h; 5B~5F |
;|db 0EEh,1Ah dup(0F6h); 60, 'a'~'z' |
;|db 5, 0F9h,6, 29h, 0EEh; 7B~7F |
;| |
;|db 80h dup(0EEh); 80~ff |
;+--------------------------------------------------+
Once a letter is great than 80, its switch_enum is 0EE,
then cause an error "illegal character".We shall change
this enum_value to 0F6(like the enum_value of 'a'~'z').
========= 2.2.strcpy ============================
And when tcc call strcpy, it:
cs:B17 A02424 do{....
cs:B27 8A4604 AL=c; //[bp+04]
cs:B2A C45EFC es:bx=dst; //les bx,[bp-4]
cs:B2D 268807 FF46FC *(dst++)=AL; //es:[bx]=al, wo[bp-4]++
...
cs:B43 C41E3850 es:bx=src; //les bx,[5038]
cs:B47 FF063850 src++; //inc wo[5038]
cs:B4B 268A07 B400 AX=*src; //al=es:[bx],ah=00
cs:B50 894604 8B5E04 c=AX; //bx=[bp+04]=ax
cs:B56 F687BD4B0E 75BA }while(attr[c]&0x0E
//test by 4BBD[bx],0E, jnz B17
cs:B5D 83FB5F 74B5 || c=='_' //cmp bx,5f, jz B17
cs:B62 83FB24 74B0 || c=='$');//cmp bx,24, jz B17
;+---------------------------------------------------+
;|note: |
;| ds:[4bbd~4cbc] are 100h attrib8 of ascii,they are:|
;|db 8 dup(20); 00~07 |
;|db 20,21,21 ; 08~0a |
;|db 5 dup(20); 0b~0f |
;|db 8 dup(20); 10~17 |
;|db 8 dup(20); 18~1f |
;|db 1,7 dup(0) ; 20~27 |
;|db 8 dup(0) ; 28~2f |
;|db 0A dup(2) ; '0'~'9' |
;|db 6 dup(0) ; 3a~3f |
;|db 0,6 dup(14); '@',"ABCDEF" |
;|db 14 dup(4) ; 'G'~'Z' |
;|db 5 dup(0) ; 5b~5f |
;|db 0,6 dup(18); 60, "abcdef" |
;|db 14 dup(08); 'g'~'z' |
;|db 4 dup(0) ; 7b~7e |
;|db 20 ; 7f |
;|db 80 dup(0) ; 80~ff |
;+---------------------------------------------------+
Once a letter is great than 80, its attrib8 is 00,
then cause error too. To avoid this error,
we can change this attrib8 to 08(like the attrib8
of 'g'~'z').
========= 2.3.strlen ==============================
cs:385 8C46FE 895EFC p=es:bx; //[bp-02]=es, [bp-04]=bx
cs:38B 33F6 EB04 strlen=0; //si=0,jmp 393
cs:393 C45EFC //les bx,[bp-04]
cs:396 26803F00 7406 while(p[0]!='\0' //cmp by es:[bx],0, je 3A2
cs:39C 26F60780 74ED && (p[0]&80h)==0) //test by es:[bx],80; je 38F
cs:38F FF46FC {p++; //inc wo [bp-04]
cs:392 46 strlen++; //inc si
}
cs:3A2 C45EF4 ...
========= 2.4.next_word ============================
cs:684 8B56FA 8BC3 //dx=[bp-06]; ax=bx
cs:689 3B16B674 7516 if(lp8==g_74b6) //cmp dx, [74B6], jne 6A5
cs:68F 3B06B474 7510 //cmp ax, [74B4], jne 6A5
cs:695 C45EF4 { //les bx, [bp-0C]
cs:698 8C063A50 891E3850 g_5038=lpC; //[503A]=es; [5038]=bx
cs:6A0 EB03 };//jmp 6A5
cs:6A5 C45EF4 //les bx, [bp-0C]
cs:6A8 26803F00 7406 while(lpC[0]!='\0' //cmp by es:[bx],00; je 6B4
cs:6AE 26F60780 74EE && (lpC[0]&80h)==0)//test by es:[bx],80; je 6A2
cs:6A2 FF46F4 lpC++; //inc wo[bp-0C]
cs:6B4 C45EF8 ...
Once a letter is great than 80, it will be treat as
macro. That cause some mistake.
--------- 3.solution ----------------------------
Lots of kanji_letter are great than 0x80,
so we can
=================================================
3.1.change their switch_enum in switch_table,
this means change [80~ff]'s switch_enum
(for normal text:
ds:[50D4]==30B6,it means the words in file tcc.exe
corresponding part are 2a506h~2a585h.)
from "db 80 dup(0EE)" to "db 80 dup(0F6)".
and change [80~ff]'s switch_enum
(for macro text:
ds:[50D4]==31B6,it means the words in file tcc.exe
corresponding part are 2a606h~2a685h.)
from "db 80 dup(0E6)" to "db 80 dup(0F6)".
=================================================
3.2.change their attrib8 to the same as 'g'~'z',
this means change ascii80~ff(the bytes in file tcc.exe
corresponding part are 2c00dh~2c08ch )
from "db 80 dup(00)" to "db 80 dup(08)".
=================================================
3.3.change strlen(file offset==19d85) to
cs:385 31f6 strlen=0; //si=0
cs:387 268A07 do{ //al=es:[bx]
cs:38A 08C0 740E if(p[0]=='\0')break; //or al,al, jz 39C
cs:38E 7908 else if(p[0]&80h // jns 398
cs:390 3C90 7208 && (p[0]<90h //if(al<90) jb 39C
cs:394 3CFC 7304 || p[0]>=FCh))break; //if(al>=FC)jae 39C
cs:398 46 43 else{strlen++; p++} //si++, bx++
cs:39A EBEB }while(1); //jmp 387
cs:39C 8C46FE p=es:bx; //[bp-2]=es
cs:39F 895EFC //[bp-4]=bx
cs:3A2 C45EF4 ...
=================================================
3.4.change next_word(file offset==1a084) to
cs:684 668B46F8 //eax=lp8;//[bp-08]
cs:688 C45EF4 p=lpC; //les bx,lpC
cs:68B 663B06B474 if(lp8==g_74b6) //cmp eax,[74B4]
cs:690 7508 //jne 69A
cs:692 8C063A50 g_5038=lpC; // [503A]=es
cs:696 891E3850 // [5038]=bx
cs:69A 268A07 do{ //al=p[0];//es:[bx]
cs:69D 0AC0 740D if(p[0]=='0')break; //or al,al; je 6AE
cs:6A1 7908 else if(p[0]&80h // jns 6AB
cs:6A3 3C90 7207 && (p[0]<90h //cmp al,90; jb 6AE
cs:6A7 3CFC 7303 || p[0]>=FCh))break;//cmp al,FC; jnb 6AE
cs:6AB 43 else p++; //inc bx
cs:6AC EBEC }while(1); //jmp 69A
cs:6AE 895EF4 lpC=p; //[bp-0C]=bx
cs:6B1 90 90 90 //nop
cs:6B4 C45EF8 ...
=================================================
After changed, tcc.exe can compile symbols with kanji.
The demo are cnt_big5.c and cnt_gbk.c,
which willbe error in old days.
They can be "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c",
and "tlink.exe c0s.obj WordCnt.obj,WordCnt.exe,,cs.lib" now.
And how many code we change? just 1c6h bytes.
--------- 4.backup and compare ------------------
To prove we just change 1c6h bytes:
before we modify the tcc.exe, we can backup it as TCC.BAK.
After modified, we can "fc /b TCC.EXE TCC.BAK > fc.txt"
to do file-compare in binary mode. And we will get a text file,
which name is "fc.txt", and its' content are:
'Comparing files TCC.EXE and TCC.BAK
00019D85: 31 8C
00019D86: F6 46
00019D87: 26 FE
...
00019D9F: 89 80
00019DA0: 5E 74
00019DA1: FC ED
0001A084: 66 8B
0001A085: 8B 56
0001A086: 46 FA
...
0001A0AF: 5E F6
0001A0B0: F4 07
0001A0B1: 90 80
0001A0B2: 90 74
0001A0B3: 90 EE
0002A506: F6 EE
0002A507: F6 EE
...
0002A580: F6 EE
0002A581: F6 EE
0002A582: F6 EE
0002A606: F6 E6
0002A607: F6 E6
0002A608: F6 E6
...
0002A681: F6 E6
0002A682: F6 E6
0002C00D: 08 00
0002C00E: 08 00
0002C00F: 08 00
...
0002C088: 08 00
0002C089: 08 00'
There are 1c6h lines above.
Obviosly we change the file content
from 19D85th byte to 19DA1th byte, are 0x1d bytes in strlen();
from 1a084th byte to 1a0b3th byte, are 0x30 bytes in next_word();
from 2a506th byte to 2a582th byte, are 0x7d bytes in switch_enum_table for normal text;
from 2a606th byte to 2a682th byte, are 0x7d bytes in switch_enum_table for macro text;
from 2c00dth byte to 2c089th byte, are 0x7d bytes in attrib8.
--------- 5.after word --------------------------
If you had tcc 2.0,
after you notice bugs above, you can
1)send message to borland.com...
but borland had been chng to codeGear,
and codeGear had been sold to embarcadero & microfocus...
2)seek the writer of that exe...
but Anders Hejlsberg had been m$.
3)rename your tcc.exe as tcc.bak,
and input "difc.exe fc.txt" in console.
Of course, the name of file(fc.txt) is not import.