Menu

#196 Unknown FirstFileContentDateNormalized

v8.5
open
nobody
1
2024-06-03
2022-12-31
Anonymous
No

Hello, I LOVE this program and if I knew anything about coding I'd be happy to help (where can I learn?) In the last year or two, I've noticed an ongoing inability to recognize dates in the content of the PDF files. It used to work excellent, but lately it seems to not be able to identify and gives the default output:

Unknown FirstFileContentDateNormalized

When I open the PDF, the text is all OCR and easily readable and I can "highlight" or select the date/text so it should be able to find it?

Related

Bugs: #196

Discussion

  • David Colbeth

    David Colbeth - 2022-12-31

    following

     
  • divinity666

    divinity666 - 2023-09-19

    I am wondering about this issue. I am still using it, and as long as the PDF files do contain text (e.g. via OCR), the text is being recognized properly.

    You might want to check the pattern configuration in the configuration file.

     
  • David Colbeth

    David Colbeth - 2024-06-03

    Hello Divinity, is there a better OCR program you are using? I scan all of my documents and the built in OCR service for ScanSnap used to be great but seems to be inadequate lately.

     
    • divinity666

      divinity666 - 2024-06-03

      We use pdftotext to extract content from PDF files, i.e. you can use any OCR on your PDF documents to recognize content (e.g. from ScanSnap) and DropIt will use exactly that recognized texts for the further evaluation.

       
      • David Colbeth

        David Colbeth - 2024-08-31

        Thanks for the reply.
        Unfortunately it was not helpful.
        That is exactly the issue I am experiencing.

        The PDF has been OCR and text is readable, but DropIt does not find it
        for some reason.

        Please help.

        --
        David Colbeth
        206-850-5368
        http://www.dot.cards/davidcolbeth

        http://www.ColbethGroup.com

        On Mon, Jun 3, 2024, 2:06 PM divinity666 divinity666@users.sourceforge.net
        wrote:

        We use pdftotext to extract content from PDF files, i.e. you can use any
        OCR on your PDF documents to recognize content (e.g. from ScanSnap) and
        DropIt will use exactly that recognized texts for the further evaluation.


        [bugs:#196] https://sourceforge.net/p/dropit/bugs/196/ Unknown
        FirstFileContentDateNormalized

        Status: open
        Group: v8.5
        Labels: PDF OCR FirstFileContentDateNormalized Associations
        Created: Sat Dec 31, 2022 01:01 AM UTC by Anonymous
        Last Updated: Mon Jun 03, 2024 12:26 AM UTC
        Owner: nobody

        Hello, I LOVE this program and if I knew anything about coding I'd be
        happy to help (where can I learn?) In the last year or two, I've noticed an
        ongoing inability to recognize dates in the content of the PDF files. It
        used to work excellent, but lately it seems to not be able to identify and
        gives the default output:

        Unknown FirstFileContentDateNormalized

        When I open the PDF, the text is all OCR and easily readable and I can
        "highlight" or select the date/text so it should be able to find it?


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/dropit/bugs/196/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

         

        Related

        Bugs: #196


Log in to post a comment.