Extracting text from DjVu files

  • Marcin Werla

    Marcin Werla - 2005-04-21


    I use following code to extract text from DjVu files:

                com.lizardtech.djvu.Document djvuDocument = new com.lizardtech.djvu.Document\(
                DjVmDir dir = djvuDocument.getDjVmDir\(\);
                if \(dir.is\_indirect\(\)\) \{
                    //skipping index files
                    return null;
                Vector v = dir.get\_files\_list\(\);
                for \(Iterator iter = v.iterator\(\); iter.hasNext\(\);\) \{
                    DjVmDir.File djvuFile = \(DjVmDir.File\) iter.next\(\);
                    int pageNum = djvuFile.get\_page\_num\(\);
                    if \(pageNum >= 0\) \{
                        DjVuPage page = djvuDocument.getPage\(pageNum\);
                        Codec text = page.text;
                        if \(text \!= null\) \{

    I find this code quite slow. Is there some way to determine if there is text in DjVu files without trying to extract it?

    • Dr Bill C Riemers

      Instead of using DjVuPage use:

      Codec text=DjVuText.createDjVuText(djvuDocument).init(pageNum);

      This will avoid the bzz decoding of non-text chunks.

      I hope this helps.


      • Marcin Werla

        Marcin Werla - 2005-04-22

        Thanks for response. In 0.8 version of JavaDjVu in class DjVu text there is no init method with int parameter :-(


    • Marcin Werla

      Marcin Werla - 2005-04-22

      Ok - Finally I have used
      Codec text = DjVuText.createDjVuText(djvuDocument) .init(djvuDocument.get_data(pageNum));

      And it works quite fast - thanks :-)



Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks