VietOCR / Discussion / Help: 3.5 not loading Hindi pdf (made through MS word), 4.0 Beta not doing any ocr at all

3.5 not loading Hindi pdf (made through MS word), 4.0 Beta not doing any ocr at all

Forum: Help

Creator: Rawat

Created: 2014-09-05

Updated: 2017-08-14

Rawat - 2014-09-05

I am using 3.5 on w8

It was working fine but recently I have also loaded 4.0 beta that is installed in separate folder and is running.

in 3.5, when I loaded a Hindi pdf file (made using MS Word) today, it gave the error message, telling about tmp1B43.tif error.

All other things, English, as well as Hind images are getting ocr-ed well. just Hindi pdf is not loading.

What is it and how to resolve it?

In 4.0 Beta, when I load any file, whether pdf or image, it gets loaded. But on getting it ocr, gave the error message. No error No. or description, just a terse "error occurred".

It is just not doing any ocr. and every time is just giving this message.

What is it and how to resolve it?

Thanks.

Rawat
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quan Nguyen - 2014-09-05

Can you attach some sample files for our investigation of the issue?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rawat - 2014-09-06

this is 3.5 problem on loading Hindi pdf made by MS Word.
The word file is enclosed.

ocrerr.jpg

th_जीवन में उच्च स्तरीय सत्यनिष्ठा का निर्वाह.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rawat - 2014-09-06

this is the error 4.0 beta is throwing everywhere on ocr-ing any file.

jpg/ pdf file gets loaded and is displayed well, but doesn't ocr at all.

ocrerr1.jpg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quan Nguyen - 2014-09-06

Can't reproduce the issue for both versions. The recognition took a little long but output the text below:

जीवन में उच्च स्तरीय सत्यनिष्ठच्चा का निर्वाह
कबीर का एक दोहा है…
सांच बराबर प्तप नहीं, झूठ बराबर पाप ।

जाके हिरदै सांच है, ताके हिरदै झप 1।
...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rawat - 2014-09-06

:-) How does that help in resolving problem at my system.

Anyway, I did some trial and found that if the name of the pdf file is in unicode Hindi, then it is not getting loaded and the above error in 3.5.

When I changed the name of the pdf to latin english only, that loaded and did the ocr.

So that issue is solved.

--

However I think pdf with Hindi name got loaded well in 4.0 beta, not sure.

--

But, the 4.0 beta is not working. All files are loading but not getting ocr. I had just unzipped the file to a separate folder and moved my language files from the 3.5 to 4.0 beta. I don't know if some file or something is still missing.

I shall wait for final releases .exe of 4.0, I guess.

Thanks.

Rawat

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quan Nguyen - 2014-09-06

I tried your PDF file on Windows 8.1, and both 3.5 and 4.0 beta had no problem reading and recognizing it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.