Menu

Getting error "org.pdfclown.util.NotImplementedException: LZWDecode" when highlighting words inside pdfs

Help
2017-05-22
2017-05-22
  • Rohini Gate

    Rohini Gate - 2017-05-22

    Hi,

    We are using below code to highlight word in pdf,

    public String highlight(String inputPath, String outputPath, String searchWord1,javax.servlet.http.HttpServletResponse res )
    {
    BufferedInputStream bis = null;
    BufferedOutputStream bos = null;
    final String searchWord= searchWord1;

            // 1. Open the PDF file!
            File file;
            try
            {
    
                file = new File(inputPath);
            }
            catch(Exception e)
            {
                throw new RuntimeException(inputPath + " file access error.",e);
            }
             count = 0;
            Pattern pattern = Pattern.compile(searchWord, Pattern.CASE_INSENSITIVE);
    
            // 2. Iterating through the document pages...
            TextExtractor textExtractor = new TextExtractor(true, true);
    
            for(final Page page : file.getDocument().getPages())
            {
    
                // 2.1. Extract the page text!
                Map<Rectangle2D,List<ITextString>> textStrings = textExtractor.extract(page);
    
                // 2.2. Find the text pattern matches!
                final Matcher matcher = pattern.matcher(TextExtractor.toString(textStrings));
                // 2.3. Highlight the text pattern matches!
    
                textExtractor.filter(textStrings,
                        new TextExtractor.IIntervalFilter()
                {
                    public boolean hasNext()
                    {                   
                        if (matcher.find()) {
                            count++;
                            return true;
                        }
                        return false;
                    }
    
                    public Interval<Integer> next()
                    {
                        return new Interval<Integer>(matcher.start(), matcher.end());
                    }
    
                    public void process(Interval<Integer> interval, ITextString match)
                    {
                        // Defining the highlight box of the text pattern match...
                        List<Quad> highlightQuads = new ArrayList<Quad>();
                        {
                            Rectangle2D textBox = null;
                            for(TextChar textChar : match.getTextChars())
                            {
                                Rectangle2D textCharBox = textChar.getBox();
                                if(textBox == null)
                                {textBox = (Rectangle2D)textCharBox.clone();}
                                else
                                {
                                    if(textCharBox.getY() > textBox.getMaxY())
                                    {
                                        highlightQuads.add(Quad.get(textBox));
                                        textBox = (Rectangle2D)textCharBox.clone();
                                    }
                                    else
                                    {textBox.add(textCharBox);}
                                }
                            }
                            textBox.setRect(textBox.getX(), textBox.getY(), textBox.getWidth(), textBox.getHeight()+5);
                            highlightQuads.add(Quad.get(textBox));
                        }
    
                        TextMarkup temp = new TextMarkup(page, searchWord, MarkupTypeEnum.Highlight, highlightQuads);
                        temp.setColor(new DeviceRGBColor((35.0/255.0), (35.0/255.0), (142.0/255.0))); 
                        //TextMarkup temp = new TextMarkup(
                        temp.setVisible(true);
                    //  temp.setColor(new DeviceRGBColor((35.0/255.0), (35.0/255.0), (142.0/255.0))); 
                        //temp.setColor("white");
                    }
    
                    public void remove()
                    {throw new UnsupportedOperationException();}
                }
                        );
            }
    
            SerializationModeEnum serializationMode = SerializationModeEnum.Incremental;
             ServletOutputStream out =null;
            try
            {
    
                file.save(new java.io.File(outputPath), serializationMode);
                //ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
                file.close();
    
            }
    
           However we are getting below error for some pdfs
    
    org.pdfclown.util.NotImplementedException: LZWDecode
    org.pdfclown.bytes.filters.Filter.get(Filter.java:74)
    org.pdfclown.objects.PdfStream.getBody(PdfStream.java:193)
    org.pdfclown.objects.PdfStream.getBody(PdfStream.java:155)
    org.pdfclown.documents.contents.Contents$ContentStream.moveNextStream(Contents.java:279)
    org.pdfclown.documents.contents.Contents$ContentStream.<init>(Contents.java:86)
    org.pdfclown.documents.contents.Contents.load(Contents.java:591)
    org.pdfclown.documents.contents.Contents.<init>(Contents.java:366)
    org.pdfclown.documents.contents.Contents.wrap(Contents.java:345)
    org.pdfclown.documents.Page.getContents(Page.java:571)
    org.pdfclown.documents.contents.ContentScanner.<init>(ContentScanner.java:1033)
    org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:297)
    com.att.bcpr.actions.MyPdfHighlighting.highlight(MyPdfHighlighting.java:92)
    com.att.bcpr.actions.MyPdfHighlighting.doGet(MyPdfHighlighting.java:59)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:620)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    org.apache.struts2.dispatcher.ng.filter.StrutsPrepareAndExecuteFilter.doFilter(StrutsPrepareAndExecuteFilter.java:86)
    
    When we traced the log then came to know that error is with below line,
    
    Map<Rectangle2D,List<ITextString>> textStrings = textExtractor.extract(page);
    
    Please help me in this.
    
    Thanks
    
     
    • Rohini Gate

      Rohini Gate - 2017-05-22

      Please guide me on this

       

Log in to post a comment.