Home / turbo_c / 1.0
Name Modified Size InfoDownloads / Week
Parent folder
readme.txt 2023-10-05 15.1 kB
readme.gbk.txt 2023-10-05 14.1 kB
fc.txt 2023-10-05 5.4 kB
tcc_1.asm 2023-10-05 13.9 kB
Totals: 4 Items   48.6 kB 0
Introduction:
  Tcc 1.0, is product of borland in 1987.
When we "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c",
it error "Illegal character"
and error "Declaration syntax error".

  According to https://leisurebamboo.wordpress.com/2023/10/05/tc1
we modify the tcc.eve by commandLine "difc.exe fc.txt".
After that, "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c" runs OK,
it can read symbols with kanji now.

--------- 1.symptoms ----------------------------
  We can compile WordCnt.c(attached in turbo c's
debugging chapter of the User's Guide)
by commandLine
"tcc.exe -c -oWordCnt.obj ..\WordCnt.c" and
"tlink.exe c0s.obj WordCnt.obj,WordCnt.exe,,cs.lib".
But after we change some function_name or variable_name
or macro_name in c, once a letter is great than 0x80,
(example for cnt_big5.c or cnt_gbk.c),
then the symbol can not be compiled by tcc.
  That is a bug.

--------- 2.affected part -----------------------
========= 2.1.data ==============================
  In tcc.exe, after it's fgetch(), it:
cs:0024 9AB505xxxx call far fgetc();//xxxx:05B5
cs:0029 8BF0       mov  si,ax
cs:002B 8BDE       mov  bx,si
cs:002D 83C302     add  bx,02
cs:0030 D1E3       shl  bx,1
cs:0032 8B87A61A   ax=[1aa6+bx]
cs:0036 8946FE     mov  [bp-02],ax
cs:0039 2DE7FF     ax+=19h
cs:003C 3D1900     cmp  ax,0019
cs:003F 7603       jbe  0044
cs:0041 E99600     jmp  00da
cs:0044 8BD8       mov  bx,ax
cs:0046 D1E3       shl  bx,1
cs:0048 2EFFA74D00 jmp  cs:[004D+bx];//cs:[004D+(0019+fff6)*2]==00a1

;+------------------------------------------------------------+
;|note:                                                       |
;| ds:[1aaa~1ca9] are switch_enum_table, they are:            |
;|dw ffee,ffee,ffee,ffee, ffee,ffee,ffee,ffee; 00~07          |
;|dw ffee,fff7,ffec,fff7, fff7,fff7,ffee,ffee; 08~0f          |
;|dw ffee,ffee,ffee,ffee, ffee,ffee,ffee,ffee; 10~17          |
;|dw ffee,ffee,ffe7,ffee, ffee,ffee,ffee,ffee; 18~1f          |
;|dw fff7,0000,fff4,fff3, ffee,ffff,fffe,fff2; 20~27          |
;|dw 0001,0002,fffd,fffc, 0008,fff1,fff0,fffb; 28~2f          |
;|dw fff5,fff5,fff5,fff5, fff5,fff5,fff5,fff5; '0'~'7'        |
;|dw fff5,fff5,001f,0007, ffef,fff8,ffed,001e; '8','9', 3a~3f |
;|                                                            |
;|dw ffee,fff6,fff6,fff6, fff6,fff6,fff6,fff6; '@', 'A'~'G'   |
;|dw fff6,fff6,fff6,fff6, fff6,fff6,fff6,fff6; 'H'~'O'        |
;|dw fff6,fff6,fff6,fff6, fff6,fff6,fff6,fff6; 'P'~'W'        |
;|dw fff6,fff6,fff6,0003, ffea,0004,fffa,fff6; 'X'~'Z', 5b~5f |
;|dw ffee,fff6,fff6,fff6, fff6,fff6,fff6,fff6; 60, 'a'~'g'    |
;|dw fff6,fff6,fff6,fff6, fff6,fff6,fff6,fff6; 'h'~'o'        |
;|dw fff6,fff6,fff6,fff6, fff6,fff6,fff6,fff6; 'p'~'w'        |
;|dw fff6,fff6,fff6,0005, fff9,0006,0029,ffee; 'x'~'z', 7b~7f |
;|                                                            |
;|dw 80 dup(ffee)                                             |
;+------------------------------------------------------------+

  Once a letter is great than 80, its switch_enum is ffee,
then cause an error "illegal character". So, if we change this
enum_value to fff6(like the enum_value of 'a'~'z')?

label_fff6:
cs:00A1 C6061A5F00 mov  byte[5F1A],00
 ...
cs:00BB F684574B02 test byte[si+4B57],2
cs:00C0 7574       jnz  0136
cs:00C2 9A8D04xxxx call xxxx:048D
cs:00C7 E95AFF     jmp  0024

;+---------------------------------------------------+
;|note:                                              |
;| ds:[4b57~4c56] are 100h attrib8 of ascii,they are:|
;|db   8 dup(20); 00~07                              |
;|db  20,21,21  ; 08~0a                              |
;|db   5 dup(20); 0b~0f                              |
;|db   8 dup(20); 10~17                              |
;|db   8 dup(20); 18~1f                              |
;|db 1,7 dup(0) ; 20~27                              |
;|db   8 dup(0) ; 28~2f                              |
;|db 0A dup(2)  ; '0'~'9'                            |
;|db   6 dup(0) ; 3a~3f                              |
;|db 0,6 dup(14); '@',"ABCDEF"                       |
;|db  14 dup(4) ; 'G'~'Z'                            |
;|db   5 dup(0) ; 5b~5f                              |
;|db 0,6 dup(18); 60, "abcdef"                       |
;|db  14 dup(08); 'g'~'z'                            |
;|db   4 dup(0) ; 7b~7e                              |
;|db  20        ; 7f                                 |
;|db  80 dup(0) ; 80~ff                              |
;+---------------------------------------------------+

  Once a letter is great than 80, its attrib8 is 00,
then cause error too. To avoid this error,
we can change this attrib8 to 08(like the attrib8
of 'g'~'z').

 All the above are "after fgetch".


========= 2.2.code ==============================
 When tcc call fgetch(maybe fgetch_1,fgetch_2,
fgetch_3 or fgetch_4), it:

fgetch_4:;{
cs:01F9 56             push   si
                     ; do {lp_8=(lp_struct->lp8)++
cs:01FA C41E225F           les    bx,[5F22]
cs:01FE 268B570E           mov    dx,es:[bx+0E]
cs:0202 268B470C           mov    ax,es:[bx+0C]
cs:0206 26FF470C           inc    es:word ptr [bx+0C]
cs:020A 8BD8               mov    bx,ax
cs:020C 8EC2               mov    es,dx

                     ;     c=*lp8
cs:020E 268A07             mov    al,es:[bx]
cs:0211 B400               mov    ah,00
cs:0213 8BF0               mov    si,ax

                    ;      if(!(c & 0x80))
cs:0215 F7C68000           test   si,0080
cs:0219 7503               jne    021E
cs:021B E9A100                jmp    02BF;//normal

                    ;      else if(c == 0xFE)
cs:021E 81FEFE00           cmp    si,00FE
cs:0222 7503               jne    0227
cs:0224 E9A500                jmp    02CC;//nothing

                    ;      else if(c == 0xFF)
cs:0227 81FEFF00           cmp    si,00FF
cs:022B 751E               jne    024B;//judge_fd
cs:022D C41E225F label_ff:    {lp_8=(lp_struct->lp8)++
cs:0231 ...
cs:0241 268A07                 c=*lp8
cs:0244 ...
cs:0248 EB79 90               };jmp    02C3

                 judge_fd: else if(c == 0xFD)
cs:024B 81FEFD00           cmp    si,00FD
cs:024F 7557               jne    02A8;//label_8x
cs:0251 C41E225F              {lp_8=(lp_struct->lp8)++
cs:0255 ...
cs:0265 268A07                 c=*lp8
cs:0268 ...
cs:026C F7C68000               if((c&0x80==0)||(c>=0xFD))
cs:0270 ...
cs:0278 B81C00                    {print_err('# operator not followed by macro argument name')
cs:027B ...
cs:028F EB3B                       break;jmp    02CC
                                   }
cs:0291 C41E225F               call   replace_macro;//cs:0582
cs:0295 ...
cs:02A6 EB39                  }jmp    02E1

                label_8x: else{;//c==0x8?
cs:02A8 C41E225F               ...
cs:02B5 9A4E05880C             call   reset_pointer;//cs:054E
cs:02BA E87A01                 call   fgetch_2;//0437
cs:02BD EB22                   }jmp   02E1

                    ;   //if(!(c&80h))
                 normal:      {if(c)
cs:02BF 0BF6                   or     si,si
cs:02C1 7404                   je     02C7
cs:02C3 8BC6                       {mov    ax,si
cs:02C5 EB1A                       }jmp    02E1
cs:02C7 9A8C03880C             else call   free;//cs:038C
                               }

                 nothing: }while(lp_struct)
cs:02CC A1225F             mov    ax,[5F22]
cs:02CF 0B06245F           or     ax,[5F24]
cs:02D3 7403               je     02D8
cs:02D5 E922FF             jmp    01FA

cs:02D8 C7061C64C504   mov    word ptr [641C],04C5
cs:02DE E8E401         call   fgetch_1;//04C5
cs:02E1 5E             pop    si
cs:02E2 C3             ret
;}


=================================================
fgetch_2:;{
cs:0437 C41E265F       les    bx,[5F26]
cs:043B 26803F00       cmp    es:byte ptr [bx],00
cs:043F 740A           je     044B
cs:0441 FF06265F           inc    word ptr [5F26]
cs:0445 268A07             mov    al,es:[bx]
cs:0448 98                 cbw
cs:0449 EB29               jmp    0474;return
cs:044B C706285F0000   mov wo[5F28],0
cs:0451 C706265F0000   mov wo[5F26],0
cs:0457 A1225F         mov ax,[5F22]
cs:045A 0B06245F       or  ax,[5F24]
cs:045E 740B           je  046B
cs:0460 C7061C64F901       {mov   wo[641C],01F9
cs:0466 E890FD              call  fgetch_4;01F9
cs:0469 EB09               }jmp   0474
cs:046B C7061C64C504   else{mov   wo[641C],04C5
cs:0471 E85100              call  fgetch_1;04C5
                           }
cs:0474 C3             ret
;}


=================================================
getch_in_macro(char c):;{
cs:0661 55             push   bp
cs:0662 8BEC           mov    bp,sp
cs:0664 F746           if(c & 80h)
        068000         test word ptr [bp+06],0080
cs:0669 7436           je   06A1;//nothing
cs:066B A13264             {if(lp8<&(str[SIZE-1]))
cs:066E 8CDA               ...
                 label_8x:    {*(lp8++)=0xff
cs:0679 8BD8                   mov    bx,ax
cs:067B C607FF                 mov    byte ptr [bx],FF
cs:067E FF063264               inc    word ptr [6432]
cs:0682 EB1D                  }jmp    06A1;//nothing

                
cs:0684 A13264   judge_ov: else if(lp8==&(str[SIZE-1]))
cs:0687 8CDA               ...
cs:0696 B82200                {print_err('Macro expansion too long')
cs:0699 50                     ...
cs:069F 8BE5                   mov    sp,bp
                               }
                            }
                 nothing:
cs:06D9 8BE5           mov    sp,bp
cs:06DB 8B1E3264       mov    bx,[6432]
cs:06DF C60700         mov    byte ptr [bx],00
cs:06E2 5D             pop    bp
cs:06E3 CA0200         retf   0002
;}

  Once a letter is great than 80, it will be treat as
macro. That cause some mistake.


--------- 3.solution ----------------------------
  Lots of kanji_letter are great than 0x80,
so we can

  3.1.change their switch_enum in switch_table,
this means change ascii80~ff(the words in file <<tcc.exe>>
corresponding part are 2632ah~26429h )
from "80 dup(ffee)" to "80 dup(fff6)".

  3.2.change their attrib8 to the same as 'g'~'z',
this means change ascii80~ff(the bytes in file <<tcc.exe>>
corresponding part are 29357h~293d6h )
from "80 dup(00)" to "80 dup(08)".

=================================================
  3.3.change fgetch_4(file offset==10479) to
{
cs:01F9 56       push   si
cs:01FA C41E225F do        {lp_8=(lp_struct->lp8)++
cs:020E 260FB607            movzx  ax,es:byte ptr [bx]
cs:0212 8BF0                mov    si,ax
cs:0214 3C80                cmp    al,80
cs:0216 0F82A500            jb     02BF;//normal
cs:021A 3CFF                cmp    al,FF
cs:021C 740F                je     022D;//label_ff
cs:021E 3CFE                cmp    al,FE
cs:0220 0F84A800            je     02CC;//nothing
cs:0224 3C90                cmp    al,90
cs:0226 0F827E00            jb     02A8;//label_8x
cs:022A EB1F                jmp    024B;//judge_fd
cs:022C 90                  nop
           
cs:022D C41E225F   label_ff: ...
...
cs:024B 81FEFD00   judge_fd: else if(c != 0xFD)
cs.024F 756E                         jne 02BF;//normal
...
cs:02A8 C41E225F   label_8x: ...
...
cs:02BF 0BF6       normal:...
...
cs:02CC A1225F     nothing: }while(lp_struct)
}
These change will switch kanji_letter to 02BF(normal).


=================================================
  3.4.change fgetch_2(file offset==106b7) to
{
cs:0437 C41E265F les   bx,[5F26]
cs:043B 26803F00 cmp   es:byte ptr [bx],00
cs:043F 7417     je    0458
cs:0441 FF06265F    inc   word ptr [5F26]
cs:0445 260FB607    movzx ax,es:byte ptr [bx]
cs:0449 3C80        cmp   al,80
cs:044B 720A        jb    0457
cs:044D 3C90        cmp   al,90
cs:044F 7204        jb    0455
cs:0451 3CFD        cmp   al,FD
cs:0453 7202        jb    0457
cs:0455 FECC        dec   ah
cs:0457 C3          ret
cs:0458 6633C0   xor   eax,eax
cs:045B 66A3265F mov   [5F26],eax
cs:045F 663306.. xor   eax,[5F22]
cs:0464 BAC504   mov   dx,04C5;fgetch_1
cs:0467 7403     je    046C
cs:0469 BAF901   mov   dx,01F9;fgetch_4
cs:046C 89161C64 mov   [641C],dx
cs:0470 FFD2     call  dx
cs:0472 C3       ret
cs:0473 90       nop
cs:0474 C3       ret
;}
These change will set sign(kanji_letter)>0.


=================================================
  3.5.change getch_in_macro(file offset==11ef1) to
{
cs:0661 55       push   bp
cs:0662 8BEC     mov    bp,sp
cs:0664 8A4606   mov    al,[bp+06]
cs:0667 3C80     cmp    al,80
cs:0669 7236     jb     06A1;//nothing
cs:066B 3C90     cmp    al,90
cs:066D 720A     jb     0679;//label_8x
cs:066F 3CFC     cmp    al,FC
cs:0671 762E     jbe    06A1;//nothing
cs:0673 EB0F     jmp    0684;//judge_ov
cs:0675 90 90 90 90
...
}

These change will switch kanji_letter to 06A1(nothing).

  After changed, tcc.exe can compile symbols with kanji.
The demo are cnt_big5.c and cnt_gbk.c,
which willbe error in old days.
They can be "tcc.exe -c -oWordCnt.obj ..\Cnt_big5.c",
and "tlink.exe c0s.obj WordCnt.obj,WordCnt.exe,,cs.lib" now.

  And how many code we change? just 140h bytes.

--------- 4.backup and compare ------------------
  To prove we just change 140h bytes:
before we modify the tcc.exe, we can backup it as TCC.BAK.
After modified, we can "fc /b TCC.EXE TCC.BAK > fc.txt"
to do file-compare in binary mode. And we will get a text file,
which name is "fc.txt", and its' content are:
'Comparing files TCC.EXE and TCC.BAK
0001048F: 0F 8A
00010490: B6 07
00010491: 07 B4
...
000104AA: EB 00
000104AB: 1F 75
000104AC: 90 1E
000104D0: 6E 57
000106C0: 17 0A
000106C6: 0F 8A
000106C7: B6 07
000106C8: 07 98
000106C9: 3C EB
...
000106F2: C3 51
000106F3: 90 00
00011EF4: 8A F7
00011EF7: 3C 80
00011EF8: 80 00
...
00011F03: EB 8C
00011F04: 0F D9
00011F05: 90 3B
00011F06: 90 C3
00011F07: 90 73
00011F08: 90 0B
00011F1F: C3 D1
00011F20: 72 75
00011F21: E7 0F
00011F22: 90 3B
00011F23: 90 C3
0002634A: F6 EE
0002634C: F6 EE
...
00026420: F6 EE
00026422: F6 EE
00029367: 08 00
...
000293D2: 08 00
000293D3: 08 00'
  There are 140h lines above.

  Obviosly we change the file content
from 1048Fth byte to 104ACth byte, are 0x1d bytes in fgetch_4();
from 106C6th byte to 106F3th byte, are 0x2f bytes in fgetch_2();
from 11EF7th byte to 11F08th byte, are 0x12 bytes in getch_in_macro();
from 2634Ath byte to 26422th byte, are 0x6d bytes in switch_enum_table;
from 29367th byte to 293D3th byte, are 0x6d bytes in attrib8.

--------- 5.after word --------------------------
  If you had tcc 1.0,
after you notice bugs above, you can
1)send message to borland.com...
  but borland had been chng to codeGear,
  and codeGear had been sold to embarcadero & microfocus...
2)seek the writer of that exe...
  but Anders Hejlsberg had been m$.
3)rename your tcc.exe as tcc.bak,
  and input "difc.exe fc.txt" in console.
  Of course, the name of file(fc.txt) is not import.


Source: readme.txt, updated 2023-10-05