SAKURA Editor / PatchUnicode / #57 (文字コード変換)文字として読めないデータをU+DC00からU+DCFFにエンコードするように

R.T.H. - 2008-12-31

パッチが当たらないCJis.cppを添付

FixCodeIO20081231b__r1497_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2008-12-31

charset/CJis.cppにパッチが当たらないようなので、
該当ファイルを添付しておきました

File Added: FixCodeIO20081231b__r1497_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-05

前回からの変更：

メニューから
[変換]->[文字コード変換]
と辿ったあたりにある
「EUC->SJISコード変換」などが、
正常に動くようにするため
以下の変換が一対一対応になるように
変更しました。

SJIS<->Unicode
EUC<->Unicode

File Added: ChgCodeConv20090105__r1948_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-05

ChgCodeConv20090105__r1948_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-08

前回からの変更：

CESU-8の文字チェックCheckCesu8Char()に使われている
CheckUtf8Char()で、４バイトコードを抑制するときに
サロゲート領域のCESU-8コードを許可していなかったのを修正しました。
(charset/codechecker.cpp)

CUtf8でUnicodeToHexがCESU-8に対応していなかったのを
UnicodeToHexにbool値のbCESU8Modeを引数にとることで、
CESU-8に対応させました。
(charset/CUtf8.h,charset/CUtf8.cpp)

Unicode->XXXの処理で、ユニコード値のチェックをするように
修正しました。
(charset/CUtf8.cpp, charset/CShiftJis.cpp, charset/CEuc.cpp)

----
charset/CJis.cpp
上記のファイルにパッチが当たらないため、
変更後のソースコードを添付しておきました。

----
ChgCodeConv20090105__r1948_uni.zipの
r1948となっているところは、r1498の間違いでした。。。
File Added: ChgCodeConv20090108__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-08

ChgCodeConv20090108__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-09

前回からの変更：

1.　CEuc.cpp内の_EucjpToUni_char()で、
switch文のなかでbreak;が抜けていたところを修正しました。
(CEuc.cpp)

2.　UTF-8の書き戻し処理ができなくなっていたので、
codechecker.cppのCheckUtf8Char()で、
予約コードポイントを不正文字とする処理を追加しました。
CheckCesu8Char()にも、同様の処理を追加しました。
(codechecker.cpp)

3.　codeutil.hのMyMultiByteToWideChar()で、
双方向変換をSJIS1バイト文字に対しても行っており、
正しい確認ができていなかったのを修正しました。
(codeutil.h)

4.　codechecker.cppにあるIsUnicodeResvdCP()を、
IsUnicodeResvdCP_normalとIsUnicodeResvdCP_surrogとに
分けました。
(codechecker.h, codechecker.cpp)

----
charset/CJis.cpp
上記のファイルにパッチが当たらないため、
変更後のソースコードを添付しておきました。

File Added: ChgCodeConv20090109__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-09

ChgCodeConv20090109__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-10

前回からの変更：

1.　高速化のため以下の関数をインライン化しました。

IsUnicodeResvdCP_normal
IsUnicodeResvdCP_surrog
CShiftJis::_SjisToUni_char
CShiftJis::_UniToSjis_char
CEuc::_EucjpToUni_char
CEuc::_UniToEucjp_char
CUtf8::_Utf8ToUni_char
CUtf8::_UniToUtf8_char

File Added: ChgCodeConv20090110__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-10

ChgCodeConv20090110__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-12

前回からの変更：

1.　codehcecker.cppのCheckUtf8Char, CheckCesu8Charで、
CHARSET_BINARY が検出された時は、必ず戻り値を1にする
ようにしました。（CheckEucjpCharやCheckSjisCharなどに
仕様を合わせた。）

2.　SJIS,EUC,UTF-8の書き込み処理(UniToXXX)に
（書き込み処理にユニコード値のチェックをするようにした影響で）
バグが発生していて、
"\xffff" "XX" (正規表現的には/\x{ffff}[0-9A-F]{2}/)
のぶぶんが ?XX （正規表現的には/\?[0-9A-F]{2}/）と
書き込まれてしまう問題を修正しました。

File Added: ChgCodeConv20090111__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-12

ChgCodeConv20090111__r1498_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-26

掲示板のお返事記事：

▼ 2009/1/24 (土) 22:21:32 ryoji 返信　削除
[788] Re:PatchUnicode#2478365no
▼ ラスティブさん

(文字コード変換)文字として読めないデータを
　テキストで表現するように　　PatchUnicode#2478365

え、と。自分はこのパッチを適用したときの利点がよくわかってません、すみません。(^^;;;

▼ 2009/1/25 (日) 02:08:13 げんた返信　削除
[789] Re2:PatchUnicode#2478365no
変換できない文字が混入したファイルを開いて保存すると，
内容が変わってしまう問題を解決したいと言うことでしょうか．
それともバイナリ部分も編集できるようにしたいのかな？

文字コードとして不正なバイト列が混入していることを分かるようにするというので
あれば"ffff" + 16進で表現するのではなくてUnicodeの空いている部分に
マップするなりして通常の文字とは異なるコードとして扱い，表示時に特殊文字として
扱うようなアプローチの方が良いと思います．

UNICODE上あり得ないコードと言うことでffffなのでしょうけど，
外字領域を255文字使ってマップして，表示用のフォントを登録する
とか(動的にできるのかな？やり方がちょっと分かりませんが...)
外字領域を避けて全然使っていないところを使おうとするとサロゲートペアに
なるのかな．(これこそフォントで何とかできるのか不明ですが...)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-26

前回からの変更：

1.　文字コード(SJIS,EUC,UTF8だけ)として不正なバイト列を
"\xffff"XXと変換するのをやめ、サロゲート片を利用し、
U+D800からU+D8FFに対応付けるようにしました。

2.　(codechecker.cpp の _CheckUtf16Char関数)
サロゲート片をCHARSET_BINARYと認識されなかったバグを
修正しました。

File Added: ChgCodeConv20090126__r1520_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-26

変換方式を変える。とりあえず

ChgCodeConv20090126__r1520_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ryoji - 2009-01-27

不正文字を'〓'で描画する

Imp@FigureBinary_1_r1520.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ryoji - 2009-01-27

ChgCodeConv20090126__r1520_uni.zipの追加パッチで、
内部コード(UTF-16)に含まれる不正文字（CHARSET_BINARY）を'〓'表示にしてみました。

→Imp@FigureBinary_1_r1520.patch

＃色はコントロールコード指定色にしてます
＃デフォの黄色よりピンクとかに変更したほうが見やすいかも

サロゲート片も不正文字になるので'〓'になるかと。
こんなイメージでしょうか？

あと、_CheckUtf16Char関数ですが、
codechecker.cpp(531-534)がちょとおかしくないでしょうか。
IsUnicodeResvdCP_surrog()がtrueのときCHARSET_BINARYをセット
しているのに、すぐ次の行でCHARSET_UNI_SURROGに戻しちゃってます。
File Added: Imp@FigureBinary_1_r1520.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ryoji - 2009-01-28

不正文字を'〓'で描画する(2)

Imp@FigureBinary_2_r1520.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ryoji - 2009-01-28

間違って修正途中のパッチをUpしてしまいました。すみません。
訂正版です。
File Added: Imp@FigureBinary_2_r1520.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-28

zip ファイルでテストしました。

ChgCodeConv20090128__r1520_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-01-28

前回からの変更：

1.　CheckUtf16Charで、CHARSET_BINARYの場合、必ず 1 を返すように変更しました。
2.　CheckUtf16Charで、CHARSET_BINARYと判定された後に CHARSET_UNI_NORMALと
判定結果を戻していたバグを修正しました。
（ryojiさんより。毎度ありがとうございます m(_ _)m）
3.　メモリリーク数か所を修正しました（汗）
4.　CUnicodeで、入力時にCheckUtf16leChar、CheckUtf16beCharをはさんで、
文字コードとして不正なバイト列は、BinToTextを使って、U+D800からU+D8FFに
対応付けるようにしました。
5.　CShiftJis, CJis, CEuc, CUnidoeで、文字型に内部的に
unsinged charとunsigned shortを使うように変更しました。
6.　codeutil.hのMyMultiByteToWideChar で、1バイト文字の
相互変換確認処理を追加しました。

----
charset/CJis.cpp
上記のファイルにパッチが当たらないため、
変更後のソースコードを添付しておきました。

----
ryojiさんが投稿して下さったパッチは一緒にしてもよろしいでしょうか。
File Added: ChgCodeConv20090128__r1520_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ryoji - 2009-01-28

ryojiさんが投稿して下さったパッチは一緒にしてもよろしいでしょうか。
あ、はい。一緒にしてください。(^^)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-02-03

前回からの変更：

1.　Unicode/UnicodeBeで開いたときに、奇数バイトのファイルを開くと最後の1バイトを
　保管し忘れるバグを修正しました。(io/CFileLoad.cpp,CEol.h,CEol.cpp)
2.　Unicode の予約文字を不正バイトとして扱っていたところを、
　普通のユニコード文字として扱うようにしました。
　(charset/codecheckerにIsUnicodeNoncharacter関数を追加、charset/codecheckerの
　_CheckUtf16Char,CheckUtf8Char,CheckUtf7SetBに引数を追加、
　UTF-7の検査では、charset/CESIでBASE64エンコードされている部分の内容を確認していたが、
　その処理をcharset/codecheckerで行うように変更)
3.　CUnicode::_UnicodeToUnicode_in()で起こりうるバッファオーバーランを修正
　(とりあえずバッファサイズを2倍にしました。修正出来てるか不安です。）
　(開発U板、記事番号796に報告されたもの。ryojiさんありがとうございます。)

----
・ryojiさん作のImp@FigureBinary_2_r1520.patchを含めました。
・charset/CJis.cpp上記のファイルにパッチが当たらないため、
　変更後のソースコードを添付しておきました。

File Added: ChgCodeConv20090203__r1520_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-02-03

UTF-16LE/BE のテストを完了しました。

ChgCodeConv20090203__r1520_uni.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

R.T.H. - 2009-02-03

Date: 2009-02-03 16:33
の投稿で、readme.txtにCheckUtf7SetBと書いてあるのは、
CheckUtf7BPartの間違いでした。すみません。

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

(文字コード変換)文字として読めないデータをU+DC00からU+DCFFにエンコードするように

A free Japanese text editor for Windows

Group

Searches

Help

#57 (文字コード変換)文字として読めないデータをU+DC00からU+DCFFにエンコードするように

Discussion