Menu

#41 vietnamese unicode space character error

1.0
wont-fix
nobody
None
2023-09-05
2022-05-04
No

I generated pdf but font error

1 Attachments

Related

Tickets: #41
Tickets: #43

Discussion

  • zyx

    zyx - 2022-05-04

    Do you have a code to generate the page, please? A minimal test case is welcome. I also do not know what to look for in the PDF, thus if there's anything wrong, then I've absolutely no idea what that is. That means I'd need also a picture of the expected result, to be able to compare it with the current result.

    Also, which exact version of the litePDF do you use, please? Being it the latest 2.0.4., could you try with 2.0.3.0, please? There had been a change on the PoDoFo side, which could cause some issues with Unicode letters, which did not cause any trouble with the previous version.

     
  • Hoang Van Hay

    Hoang Van Hay - 2022-05-05

    Hello @mc-zyx
    I modified hello world example and tested with litePDF version 2.0.4, 2.0.3.0 but result not as expected.
    source code modified in helloworld.rar file

     

    Last edit: Hoang Van Hay 2022-05-05
  • zyx

    zyx - 2022-05-05

    Thanks for the update. If it's same broken on both versions, then it's something else than I thought. Could you repack the files with something more standard, like a zip, please? I've a hard time to unpack it (when I unpack it all but the "expected" PDF claim size of several tera-bytes, which is a clear nonsense).

    By the way, no need to mention my nick, I receive mail notifications without it too, the same as you received my comment.

     
  • Hoang Van Hay

    Hoang Van Hay - 2022-05-05

    Thank you for reply
    I only modify code of helloworld.cpp file in example folder of you and run test
    and build with option Character Set="Use Unicode Character Set"
    """

    include <windows.h></windows.h>

    include <stdio.h></stdio.h>

    include <string.h></string.h>

    include <tchar.h></tchar.h>

    include "share/litePDF.h"

    int main(void)
    {
    int res = 0;

    using namespace litePDF;

    try {
    TLitePDF litePDF;

      // begin write-only PDF file
      litePDF.CreateFileDocument("helloworld-1.pdf");
    
      // add a new page to it, with large-enough pixel scale
      HDC hDC = litePDF.AddPage(litePDF.MMToUnit(210), litePDF.MMToUnit(297), 2100, 2970, LitePDFDrawFlag_None);
    
      // draw the text
      LOGFONT lf = {0, };
      lf.lfHeight = -52; // ~1/10 of the page height
      _tcscpy(lf.lfFaceName, _T("Times New Roman"));
    
      HFONT fnt;
      HGDIOBJ oldFnt;
    
      fnt = CreateFontIndirect(&lf);
      oldFnt = SelectObject(hDC, fnt);
    
      SetTextColor(hDC, RGB(128, 0, 0));
    
      std::wstring szText = L"Kỹ thuật: Chụp MRI cột sống thắt lưng với các chuỗi xung T1W, T2W, STIR theo mặt phẳng sagital, T2W theo mặt phẳng axial, không tiêm thuốc đối quang từ. \r\n\
        - Đường cong sinh lý cột sống thắt lưng bình thường \r\n\
        - Các đĩa đệm có chiều cao và tín hiệu bình thường, không thấy thoát vị hay phồng đĩa đệm. \r\n\
        - Chóp tủy ở ngang mức D12-L1, hình thái và tín hiệu bình thường. Không thấy khối choán chỗ trong tuỷ hay ống sống. \r\n\
        - Các thân đốt sống có chiều cao và tín hiệu đồng đều, không thấy xẹp hay trượt đốt sống. Không thấy hình gai xương hay cầu xương. \r\n\
        - Các khớp mấu sau và cung sau đốt sống không thấy bất thường. \r\n\
        - Phần mềm quanh cột sống không thấy khối khu trú.";
    
      RECT rc;
      rc.left = 50;
      rc.top = 50;
      rc.right = rc.left+2100;
      rc.bottom = rc.top+2970;
    
      DrawText(hDC, szText.c_str(), szText.length(), &rc, DT_LEFT|DT_WORDBREAK);
    
      SelectObject(hDC, oldFnt);
      DeleteObject(fnt);
    
      // finish drawing
      litePDF.FinishPage(hDC);
    
      // close the document
      litePDF.Close();
    

    } catch (TLitePDFException &ex) {
    fprintf (stderr, "litePDF Exception: %x: %s\n", ex.getCode(), ex.getMessage());
    res = 1;
    }

    return res;
    }

    """

     
  • zyx

    zyx - 2022-05-05

    Thanks for a quick update. It looks like the font has saved incorrect glyph widths, causing the overlap. I'll check it some time soon, but no promises when it'll be. I'm sorry.

     
  • zyx

    zyx - 2022-07-08

    I've been busy with some other things, thus I didn't have time to look on this yet. This is still in my todo, together with the other litePDF opened tickets. I might get to this in the following days/weeks, maybe by the end of this month, unless anything else steps in.

     
  • zyx

    zyx - 2022-07-09

    It helped to me to set also correct lfCharSet on the LOGFONT structure, thus the code (and GDI) knows what to use. I tried with lf.lfCharSet = CHINESEBIG5_CHARSET; and it produced a better output (I do not know whether it's the right character set for your text, I'm sorry).

    Could you give it a try, please?

     
  • zyx

    zyx - 2022-07-28

    The changes.patch shows the changes I use. Note of the LitePDFDrawFlag_EmbedFontsSubset, I get similar output when not using it, but this makes sure the receiving part shows the same thing as you.

     
    • zyx

      zyx - 2022-07-28

      (I'm sorry, the browser doesn't let me add more attachments to the comment.)

      It generates this PDF, which looks better than that yours, but not as the expected.

       
      • zyx

        zyx - 2022-07-28

        I shortened the text and let it draw only the few first words. The page.zip contains these files:

        • page.emf - to see what had been sent
        • page.log - decoded .emf content as the litePDF receives it
        • page-abcde.log - a log for a page, which draws only "ABC DE" text, aka all ASCII.
          .

        Looking into the page.log, the GDI decided to split the text on the Unicode characters, sometimes with accents and bottom dots and so on.

        I do not know these low level font things, it seems to me the letters are a composition of a letter with some addition (accents,...). The litePDF doesn't handle these things, if the font itself doesn't contain the specified letter, then it draws what it can - in fact, it's left to the font itself.

        Looking closely to the shortened output (page.pdf), I think my guess is correct. I see the "underline dots" being shifted by one letter, which can be due to the string containing a letter and the dot as two consecutive characters.

         
  • zyx

    zyx - 2022-07-28

    Down to the Unicode standard, your text uses "Combining Diacritical Marks". Looking on the output, there needs to be done a lot of typography to place the diacritical mark at the right place. This is kinda out of scope for the litePDF, I'm sorry. Patches are welcome, but I do not plan to make the code complex in this regard myself.

    For the record, there are three ranges for the combining marks:

    0x0300 - 0x036f    combining diacritical marks
    0x20d0 - 0x20ff    combining marks for symbols
    0xfe20 - 0xfe2f    combining half marks
    

    each of them needs special processing in the meta2pdf.cpp:drawText().

    A naive pseudo-solution:

          PdfString combiningStr;
          DWORD nChars = emrtext.nChars;
          if (nChars > 1 && (
              (wstr[nChars - 1] >= 0x0300 && wstr[nChars - 1] <= 0x036f) ||  // combining diacritical marks
              (wstr[nChars - 1] >= 0x20d0 && wstr[nChars - 1] <= 0x20ff) ||  // combining marks for symbols
              (wstr[nChars - 1] >= 0xfe20 && wstr[nChars - 1] <= 0xfe2f))) { // combining half marks
             combiningStr.setFromWchar_t ((const wchar_t *) (wstr + nChars - 1), 1);
             nChars--;
          }
          PdfString str;
          str.setFromWchar_t ((const wchar_t *) wstr, nChars);
          ....
          ....
                } else {
             painter->DrawText(pdfX, pdfY, str);
          }
          if (combiningStr.GetLength() > 0) {
             painter->DrawText(pdfX, pdfY, combiningStr);
          }
          painter->GetFont()->SetFontCharSpace(100.0);
    

    shows that it's not enough.

     
  • zyx

    zyx - 2022-07-28
    • status: open --> wont-fix
     
  • zyx

    zyx - 2022-08-03

    I hope this problem can be solved soon

    I won't have much hope, this is a complex task. Maybe if some volunteer steps in, who knows.

     
    • Hoang Van Hay

      Hoang Van Hay - 2023-09-05

      Hello
      I want to ask you, is there any way to remove a signature that has been
      signed in a pdf file through signature name or signatureIndex? And reset it
      is an empty signature

       
      • zyx

        zyx - 2023-09-05

        Ehm, this has not much to do with this ticket, right? Just it'll be bad for people reading closed tickets to see two very different things in a single ticket.

        Nonetheless, the litePDF API has no function to edit exiting signatures. The only way to do it is to use PoDoFo functions directly. If I recall correctly, the signatures in PDF have multiple conditions to be met to have them recognized, thus even if you remove the corresponding Annotation (signatures are special annotations), you'd need to ensure other conditions are void as well. I vaguely recall there is some flag set somewhere, which tells the Adobe Reader to check for signatures in the document or something like that, but it's too long time ago that I do not recall any specific detail, I'm sorry.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.