patch to enable CJK charset supports
Brought to you by:
benlitchfield
This patch is applied to 0.7.2 version.
It enable the supports for CJK charset.
But there are still two kind of encodings, Identity-H and Identity-V, cannot be handled.
We have created an ad-hoc solution for this problem with very little change to the source code, but this patch is much more graceful.
0.7.2 CJK charset support.
sample of file encoded in Identity-H
Logged In: YES
user_id=1147498
Originator: YES
PS: the patch is created by diff in cygwin.
try:
patch -p1 < pdfbox-0.7.2-cjk.patch
in the root of the 0.7.2 source code tree.
File Added: Preliminary_Application_Form.pdf
Logged In: NO
hi ,i using the patch but chinese dos't show
Logged In: YES
user_id=1768927
Originator: NO
thanks
Logged In: YES
user_id=1586822
Originator: NO
Hi, I'm trying with patched pdfbox-0.7.2 but don't show Japanese PDF encoded by Identity-H correctly too. I await a new pdfbox supported Identity-H and Identity-V a lot.
sample of Japanese PDF document encoded in Identity-H:
http://www.bridgestone.co.jp/info/library/csr_report/pdf/2006/BS07-08.pdf
Logged In: NO
good~~~~~~~~~
nobody nobodynobodynobodynobody
Hello Pin,
Pdfbox has already been maintained at Apache project.
Unfortunately, some of your patches are not applied the current latest source code at Apache PDFbox.
I have submitted your patches and my changes to the community.
The official PDFbox 1st release is planned soon.
I want the PDFbox to support CJK.
In order to include your codes, I want to have your kind offer of contributing your changes to the apache community.
( At the view point of copyright perspective. )
If you can or can not agree with this, please let me know.
Regards,
Takashi.
Hello Pin,
Please take a look into this site.
https://issues.apache.org/jira/browse/PDFBOX-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672967#action_12672967
My address is oohack@time-trend.com.
How to use the patch?
I add the patch to the sources of PDFbox 0.7.2,but i don't know whether the thread go through the class CJKEncodings.
Though the CJKEncodings has some map for Chinese (Simplified) like " _mapping.put("GB-EUC-H","GB2312");" and Chinese (Traditional) like "_mapping.put("B5pc-H", "BIG5");", I don't kown when the pdfbox is working,the class of CJKEncodings can work??? If some one know,can tell me some details for it?
Why the PDFBox-0.7.3 donot support chinese and the error is :"Unknown encoding for 'UniGB-UCS2-H'"