Home / turbo_c / 2.0
Name Modified Size InfoDownloads / Week
Parent folder
readme.txt 2023-10-06 10.9 kB
tc2.asm 2023-10-06 3.7 kB
fc.txt 2023-10-06 7.8 kB
readme.gbk.txt 2023-10-06 9.9 kB
Totals: 4 Items   32.2 kB 0
Introduction:
  Tcc 2.0, is product of borland in 1988.
When we "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c",
it error "Illegal character"
and error "Declaration syntax error".

  According to https://leisurebamboo.wordpress.com/2023/10/06/tc2
we modify the tcc.eve by commandLine "difc.exe fc.txt".
After that, "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c" runs OK,
it can read symbols with kanji now.

--------- 1.symptoms ----------------------------
  We can compile WordCnt.c(attached in turbo c's
debugging chapter of the User's Guide)
by commandLine
"tcc.exe -c -oWordCnt.obj ..\WordCnt.c" and
"tlink.exe c0s.obj WordCnt.obj,WordCnt.exe,,cs.lib".
But after we change some function_name or variable_name
or macro_name in c, once a letter is great than 0x80,
(example for cnt_big5.c or cnt_gbk.c),
then the symbol can not be compiled by tcc.
  That is a bug.

--------- 2.affected part -----------------------
========= 2.1.switch ============================
  In tcc.exe, after it's fgetch(), it:
cs:077 268A07 B400   al=es:[bx],ah=0;//means "c=fgetch()"
cs:07C 8946FC        [bp-4]=ax
cs:07F 8B1ED450      bx=wo[50D4];//switch_enum_table==30B6
cs:083 035EFC         + wo[bp-4]
cs:086 8A07          al=[bx];//means switch_enum_table[c]
cs:088 98 8946FE     cbw, [bp-2]=ax
cs:08C 2DE0FF        ax+=20
cs:08F 3D2000 7603   if(ax>=20)
cs:094 E9F800            jmp 018F
cs:097 8BD8 D1E3     bx=ax<<1
cs:09B 2E FFA7A000   jmp  cs:0A0[bx]

;+--------------------------------------------------+
;|note:                                             |
;| ds:[30B6~31B5] are switch_enum_table, they are:  |
;|db 8 dup(0EEh);                       00~07       |
;|db 0EEh,4 dup(-9),0ech,2 dup(0EEh);   08~0F       |
;|db 8 dup(0EEh);                       10~17       |
;|db 2 dup(0EEh),0E7h,5 dup(0EEh);      18~1f       |
;|db -9,0,0F4h,2 dup(0EEh),-1,-2,0F2h;  20~27       |
;|db 1,2,-3,-4,8,0F1h,0F0h,-5;          28~2f       |
;|db 0Ah dup(0F5h);                     '0'~'9'     |
;|db 1Fh,7,0EFh,0F8h,0EDh,1Eh;          3A~3F       |
;|                                                  |
;|db 0EEh,1Ah dup(0F6h);                '@', 'A'~'Z'|
;|db 3, 0EAh,4, 0FAh, 0F6h;             5B~5F       |
;|db 0EEh,1Ah dup(0F6h);                60, 'a'~'z' |
;|db 5, 0F9h,6, 29h, 0EEh;              7B~7F       |
;|                                                  |
;|db 80h dup(0EEh);                     80~ff       |
;+--------------------------------------------------+

  Once a letter is great than 80, its switch_enum is 0EE,
then cause an error "illegal character".We shall change
this enum_value to 0F6(like the enum_value of 'a'~'z').

========= 2.2.strcpy ============================
  And when tcc call strcpy, it:
cs:B17 A02424        do{....
cs:B27 8A4604           AL=c;        //[bp+04]
cs:B2A C45EFC           es:bx=dst;   //les  bx,[bp-4]
cs:B2D 268807 FF46FC    *(dst++)=AL; //es:[bx]=al, wo[bp-4]++
...
cs:B43 C41E3850         es:bx=src;   //les  bx,[5038]
cs:B47 FF063850         src++;       //inc  wo[5038]
cs:B4B 268A07 B400      AX=*src;     //al=es:[bx],ah=00
cs:B50 894604 8B5E04    c=AX;        //bx=[bp+04]=ax

cs:B56 F687BD4B0E 75BA  }while(attr[c]&0x0E
                                     //test by 4BBD[bx],0E, jnz B17
cs:B5D 83FB5F 74B5        || c=='_'  //cmp  bx,5f, jz  B17
cs:B62 83FB24 74B0        || c=='$');//cmp  bx,24, jz  B17


;+---------------------------------------------------+
;|note:                                              |
;| ds:[4bbd~4cbc] are 100h attrib8 of ascii,they are:|
;|db   8 dup(20); 00~07                              |
;|db  20,21,21  ; 08~0a                              |
;|db   5 dup(20); 0b~0f                              |
;|db   8 dup(20); 10~17                              |
;|db   8 dup(20); 18~1f                              |
;|db 1,7 dup(0) ; 20~27                              |
;|db   8 dup(0) ; 28~2f                              |
;|db 0A dup(2)  ; '0'~'9'                            |
;|db   6 dup(0) ; 3a~3f                              |
;|db 0,6 dup(14); '@',"ABCDEF"                       |
;|db  14 dup(4) ; 'G'~'Z'                            |
;|db   5 dup(0) ; 5b~5f                              |
;|db 0,6 dup(18); 60, "abcdef"                       |
;|db  14 dup(08); 'g'~'z'                            |
;|db   4 dup(0) ; 7b~7e                              |
;|db  20        ; 7f                                 |
;|db  80 dup(0) ; 80~ff                              |
;+---------------------------------------------------+

  Once a letter is great than 80, its attrib8 is 00,
then cause error too. To avoid this error,
we can change this attrib8 to 08(like the attrib8
of 'g'~'z').


========= 2.3.strlen ==============================
cs:385 8C46FE 895EFC   p=es:bx;            //[bp-02]=es, [bp-04]=bx
cs:38B 33F6 EB04       strlen=0;           //si=0,jmp 393
cs:393 C45EFC                              //les  bx,[bp-04]
cs:396 26803F00 7406   while(p[0]!='\0'    //cmp  by es:[bx],0,  je 3A2
cs:39C 26F60780 74ED   && (p[0]&80h)==0)   //test by es:[bx],80; je 38F
cs:38F FF46FC             {p++;            //inc  wo [bp-04]
cs:392 46                  strlen++;       //inc  si
                          }
cs:3A2 C45EF4          ...


========= 2.4.next_word ============================
cs:684 8B56FA 8BC3                         //dx=[bp-06]; ax=bx
cs:689 3B16B674 7516   if(lp8==g_74b6)     //cmp  dx, [74B6], jne 6A5
cs:68F 3B06B474 7510                       //cmp  ax, [74B4], jne 6A5
cs:695 C45EF4             {                //les  bx, [bp-0C]
cs:698 8C063A50 891E3850   g_5038=lpC;     //[503A]=es; [5038]=bx
cs:6A0 EB03               };//jmp 6A5
cs:6A5 C45EF4                              //les  bx, [bp-0C]
cs:6A8 26803F00 7406   while(lpC[0]!='\0'  //cmp  by es:[bx],00; je 6B4
cs:6AE 26F60780 74EE    && (lpC[0]&80h)==0)//test by es:[bx],80; je 6A2
cs:6A2 FF46F4             lpC++;           //inc  wo[bp-0C]

cs:6B4 C45EF8          ...


  Once a letter is great than 80, it will be treat as
macro. That cause some mistake.


--------- 3.solution ----------------------------
  Lots of kanji_letter are great than 0x80,
so we can

=================================================
  3.1.change their switch_enum in switch_table,
this means change [80~ff]'s switch_enum
 (for normal text:
ds:[50D4]==30B6,it means the words in file tcc.exe
corresponding part are 2a506h~2a585h.)
from "db 80 dup(0EE)" to "db 80 dup(0F6)".

and change [80~ff]'s switch_enum
(for macro text:
ds:[50D4]==31B6,it means the words in file tcc.exe
corresponding part are 2a606h~2a685h.)
from "db 80 dup(0E6)" to "db 80 dup(0F6)".

=================================================
  3.2.change their attrib8 to the same as 'g'~'z',
this means change ascii80~ff(the bytes in file tcc.exe
corresponding part are 2c00dh~2c08ch )
from "db 80 dup(00)" to "db 80 dup(08)".

=================================================
  3.3.change strlen(file offset==19d85) to
cs:385 31f6       strlen=0;                //si=0
cs:387 268A07     do{                      //al=es:[bx]
cs:38A 08C0 740E    if(p[0]=='\0')break;   //or al,al, jz  39C
cs:38E 7908         else if(p[0]&80h       //          jns 398
cs:390 3C90 7208    && (p[0]<90h           //if(al<90) jb  39C
cs:394 3CFC 7304     || p[0]>=FCh))break;  //if(al>=FC)jae 39C
cs:398 46 43        else{strlen++; p++}    //si++, bx++
cs:39A EBEB         }while(1);             //jmp 387

cs:39C 8C46FE     p=es:bx;                 //[bp-2]=es
cs:39F 895EFC                              //[bp-4]=bx
cs:3A2 C45EF4     ...


=================================================
  3.4.change next_word(file offset==1a084) to
cs:684 668B46F8                            //eax=lp8;//[bp-08]
cs:688 C45EF4     p=lpC;                   //les bx,lpC
cs:68B 663B06B474 if(lp8==g_74b6)          //cmp eax,[74B4]
cs:690 7508                                //jne 69A
cs:692 8C063A50       g_5038=lpC;          //  [503A]=es
cs:696 891E3850                            //  [5038]=bx
cs:69A 268A07     do{                      //al=p[0];//es:[bx]
cs:69D 0AC0 740D     if(p[0]=='0')break;   //or  al,al; je  6AE
cs:6A1 7908          else if(p[0]&80h      //           jns 6AB
cs:6A3 3C90 7207      && (p[0]<90h         //cmp al,90; jb  6AE
cs:6A7 3CFC 7303       || p[0]>=FCh))break;//cmp al,FC; jnb 6AE
cs:6AB 43            else p++;             //inc bx
cs:6AC EBEC         }while(1);             //jmp 69A
cs:6AE 895EF4     lpC=p;                   //[bp-0C]=bx
cs:6B1 90 90 90                            //nop
cs:6B4 C45EF8     ...

=================================================
  After changed, tcc.exe can compile symbols with kanji.
The demo are cnt_big5.c and cnt_gbk.c,
which willbe error in old days.
They can be "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c",
and "tlink.exe c0s.obj WordCnt.obj,WordCnt.exe,,cs.lib" now.

  And how many code we change? just 1c6h bytes.

--------- 4.backup and compare ------------------
  To prove we just change 1c6h bytes:
before we modify the tcc.exe, we can backup it as TCC.BAK.
After modified, we can "fc /b TCC.EXE TCC.BAK > fc.txt"
to do file-compare in binary mode. And we will get a text file,
which name is "fc.txt", and its' content are:
'Comparing files TCC.EXE and TCC.BAK
00019D85: 31 8C
00019D86: F6 46
00019D87: 26 FE
...
00019D9F: 89 80
00019DA0: 5E 74
00019DA1: FC ED
0001A084: 66 8B
0001A085: 8B 56
0001A086: 46 FA
...
0001A0AF: 5E F6
0001A0B0: F4 07
0001A0B1: 90 80
0001A0B2: 90 74
0001A0B3: 90 EE
0002A506: F6 EE
0002A507: F6 EE
...
0002A580: F6 EE
0002A581: F6 EE
0002A582: F6 EE
0002A606: F6 E6
0002A607: F6 E6
0002A608: F6 E6
...
0002A681: F6 E6
0002A682: F6 E6
0002C00D: 08 00
0002C00E: 08 00
0002C00F: 08 00
...
0002C088: 08 00
0002C089: 08 00'
  There are 1c6h lines above.

  Obviosly we change the file content
from 19D85th byte to 19DA1th byte, are 0x1d bytes in strlen();
from 1a084th byte to 1a0b3th byte, are 0x30 bytes in next_word();
from 2a506th byte to 2a582th byte, are 0x7d bytes in switch_enum_table for normal text;
from 2a606th byte to 2a682th byte, are 0x7d bytes in switch_enum_table for macro  text;
from 2c00dth byte to 2c089th byte, are 0x7d bytes in attrib8.

--------- 5.after word --------------------------
  If you had tcc 2.0,
after you notice bugs above, you can
1)send message to borland.com...
  but borland had been chng to codeGear,
  and codeGear had been sold to embarcadero & microfocus...
2)seek the writer of that exe...
  but Anders Hejlsberg had been m$.
3)rename your tcc.exe as tcc.bak,
  and input "difc.exe fc.txt" in console.
  Of course, the name of file(fc.txt) is not import.


Source: readme.txt, updated 2023-10-06