Menu

#35 DJVU Support

open
nobody
None
5
2022-11-08
2010-03-05
No

DjVu (pronounced "déjà vu") is a image compression technology. DjVu is an open standard. The file format specification, as well as an open source implementations of the decoder (and part of the encoder) are available.

Discussion

  • Nam-Quang Tran

    Nam-Quang Tran - 2010-03-08

    The main problem is finding a decent DjVU library for Java. I'll see what I can do about it :-)

     
  • Iwan Mouwen

    Iwan Mouwen - 2012-03-05

    Are there plans to add support for indexing DjVU files?
    If a good java library cannot be found, would it be possible to use external programs for text extraction? For DjVU the djvutxt(.exe) program can be used which prints the text to stdout.

     
    👍
    1
  • Nam-Quang Tran

    Nam-Quang Tran - 2012-03-05

    Sorry, there are currently no such plans, and the main reason is still the lack of DjVu Java libraries. Falling back on external programs is - at least for now - not an option either, as this would add a significant layer of complexity to the program and lead to all kinds of problems. (It may look simple on the outside, but internally it certainly isn't.)

     
  • Ji Ling

    Ji Ling - 2018-11-16

    I have tons of books in djvu but DocFetcher turned into a pumpkin on it. Problem is actual

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2018-11-16

    @ Ji Ling: As far as I can tell, there are no open-source DjVu libraries for Java out there due to patent issues. As long as those issues remain resolved, DocFetcher cannot have DjVu support. Therefore, you'll have to either convert your DjVu files to other formats such as PDF, or find another program with DjVu support.

     
  • Sergey Chelnokov

    Now I use batch files to generate .meta-files with text from djvu/mcdx(Mathcad). For djvu uses DjvuLibre\djvutxt.exe, for matchad - 7-zip. And I use special batch file associated with .meta, that find and open origin document - it's usable, but not easy and effective.

    Is it possible to customize user-defined document (djvu, mathcad, e.t.c.) to text converter?

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2022-11-02

    I'm not quite sure what you're talking about. But if you want integrate some custom DJVU processing into DocFetcher, the only way is to modify the source code and build your own version of DocFetcher.

     
    😕
    1
  • Sergey Chelnokov

    No, I don't want for version that support djvu. I want version with suport the external user-defined converters (that converts document to plain text, and then DocFetcher can use they stdout for indexing)
    I.e. user just setup a table:

    Filter Command Stdout encoding
    *.djvu;*.djv c:\tooling\convert_djvu_to_txt.bat {file} cp1251
    *.mcdx c:\tooling\convert_mathcad_to_txt.bat {file} utf-8

    May be it will be more clear to convert documents with intermediate file:

    Filter Command
    *.djvu;*.djv " C:\Program Files (x86)\DjvuLibre\djvutxt" {file} {temp_text_file}
    *.mcdx c:\tooling\convert_mathcad_to_text_file.bat {file} {temp_text_file}`
     

    Last edit: Sergey Chelnokov 2022-11-03
  • Sergey Chelnokov

    And there, let the user at least run OCR on *.png if he wants (for example, what user can add to table FILTER-COMMAND)

     

    Last edit: Sergey Chelnokov 2022-11-03
  • Nam-Quang Tran

    Nam-Quang Tran - 2022-11-03

    This support for external parsers might come to DocFetcher Pro someday (probably in the far future...). DocFetcher on the other hand is no longer being developed and will only receive bugfixes, not new features.

     
  • Nam-Quang Tran

    Nam-Quang Tran - 2022-11-03

    OCR support integrated into DocFetcher Pro is also under consideration.

     
  • Sergey Chelnokov

    It's annoying, but okay. I continue to use text file generators and run DocFetcher on them.

     

Log in to post a comment.