Turns out I'd not simplied too much as the above solution works (needs one extra close parenthesis). THANK YOU. I wasn't trying to argue, rather I had no idea what syntax I needed to implement Hans' suggestion. strcol(1) was what I needed to know. I am curious why I can't just do "using 1:2", i.e why when doing it this spits (1) out as %f with exponent.
No, I've not simplied too much at all. The above solution works (needs one extra close parenthesis). I wasn't trying to argue, I had no idea what syntax I needed. The strcol(1) was what I needed. I am curious why I can't just do using 1:2, i.e why when doing it this spits (1) out as %f with exponent.
It shouldn't hang. I didn't pick that syntax out of choice, I can't get any of the simpler options to work. It's a 10 line example. with a provided input file. If you have a solution can you paste a similar number of lines that show what you're suggesting? The goal is to write out a temp table which has the time/date in the same "%B %d %Y" format so the temp table can be processed by a subsequent plot command. Just doing "using 1:2" doesn't work as the time is written out as %f. Specifying "set format...
There seem to be some issues with hpw the text is formatted in the above. Here is tarball of the files.
strftime goes into infinite loop as internal rep is NaN
I tried using scantailor. I noticed it's not really maintained. Last update was several years ago. a) it's much harder to use than NAPS b) the image quality is badly affected by the transformations as you can see in the attached. For pages that contain a lot of greyscale already, there isn't much size expansion after NAPS deskews and the quality is very good compared to the original. It's just the simpler pages containing b/w elements that suffer a size expansion. When I get some time I'll try digging...
I see. Thanks for the clarification.
Also, I'm not sure the comment "is that you start with a two-tone image (black and white only)." is true. If you look closely at 006 and 111 the hole punches for example are greyscale in both images. Obviously preserving the nature of the holes in paper isn't anything I much care about, I was just pointing out that these are not b/w images. OTOH maybe I'm not understanding something.
Also, I'm not sure the comment "is that you start with a two-tone image (black and white only)." is true. If you look closely at 006 and 111 the hole punches for example are greyscale in both images.
I was aware of the special case (pass thru on non edit). I'm fairly sure I actually hand edited many of the picture pages but of course since they are greyscale to begin with, they don't grow in size much. I understand what you're saying about the black and white pages. However, I need the images to be readable and they won't be if the whole document is converted to black and white. This would seem to be something that would be fairly common, no? Or (in the input doc) is having the simple pages optimized...
Hmmn. orig directory is the images exported out of NAPS2 after importing the PDF, with no editing. deskew.fixed directory is after deskew and hand rotating and export images $ stat --printf="%s " orig/img.006.png deskew.fixed/img.006.png 533326 1602573 So it's 3x the size after deskew/rotation. Image Magic claims the resolution of the files is pretty close. $ magick identify -format "%w x %h %x x %y" orig/img.006.png deskew.fixed/im g.006.png 2550 x 3300 118.11 x 118.11 2550 x 3300 118.09999999999999...
Looking at the previous two files (img.006) using ImageMagick's identify feature I'm not seeing a lot of differences to account for the 3x size increase, The modified version has a PNG color type of RGBA compared to TruColor in the original. pngcrunch didn't achieve any meaningful reduction of the deskewed img.006.png, it's still 3x the size of the original. The comment on greyscale is interesting. The manual has lots of pages containing greyscale photos but none of those pages show any noticable...
Looking at the two using ImageMagick's identify feature I'm not seeing a lot of differences to account for the 3x size increase, The modified version has a PNG color type of RGBA compared to TruColor in the original. pngcrunch didn't achieve any meaningful reduction for the deskewed img.006.png The comment on greyscale is interesting. The manual has lots of pages containing greyscale photos but none of those pages show any noticable increase in file size. Rather you get pages like the attached that...
Hmmn. orig directory is the images exported out of NAPS2 after importing the PDF, with no editing. deskew.fixed directory is after deskew and hand rotating and export images $ stat --printf="%s " orig/img.006.png deskew.fixed/img.006.png 533326 1602573 So it's 3x the size after deskew/rotation. Image Magic claims the resolution of the files is pretty close. $ magick identify -format "%w x %h %x x %y" orig/img.006.png deskew.fixed/im g.006.png 2550 x 3300 118.11 x 118.11 2550 x 3300 118.09999999999999...
Here is the 3x size deskewed image.
Hmmn. orig/ is the images exported out of NAPS2 after importing the PDF, with no editing. deskew.fixed is after deskew and hand rotating $ stat --printf="%s " orig/img.006.png deskew.fixed/img.006.png 533326 1602573 So it's 3x the size after deskew/rotation. Image Magic claims the resolution of the files is pretty close. $ magick identify -format "%w x %h %x x %y" orig/img.006.png deskew.fixed/im g.006.png 2550 x 3300 118.11 x 118.11 2550 x 3300 118.09999999999999 x 118.09999999999999 Attached is...
Hmmn. orig/ is the images exported out of NAPS2 after importing the PDF, with no editing. deskew.fixed is after deskew and hand rotating $ stat --printf="%s " orig/img.006.png deskew.fixed/img.006.png 533326 1602573 So it's 3x the size after deskew/rotation. Image Magic claims the resolution of the files is pretty close. $ magick identify -format "%w x %h %x x %y" orig/img.006.png deskew.fixed/im g.006.png 2550 x 3300 118.11 x 118.11 2550 x 3300 118.09999999999999 x 118.09999999999999
Hmmn. orig/ is the images exported out of NAPS2 after importing the PDF, with no editing. deskew.fixed is after deskew and hand rotating $ stat --printf="%s " orig/img.006.png deskew.fixed/img.006.png 533326 1602573 So it's 3x the size after deskew/rotation. Image Magic claims the resolution of the files is pretty close. $ magick identify -format "%w x %h %x x %y" orig/img.006.png deskew.fixed/im g.006.png 2550 x 3300 118.11 x 118.11 2550 x 3300 118.09999999999999 x 118.09999999999999
Ben. Any more suggestions here? I just started trying to clean up another PDF file, 502 pages. 31,401,008 bytes initially (http://vintagedirtbiker.com/photos/Motorcycles%20and%20Projects/Manuals/Honda/NSR250%20(English)%20Manual.PDF ) 31,495,944 bytes if I import it into NAPS and save it out again (default PDF settings) 332,370,526 bytes if I import it into NAPS, run deskew on all pages and save it out again (default PDF settings) I was actually able to cause it to save out as 1,042,226,204 bytes...
I emailed you a link to PDFs. They are large.
If I take the original PDF (85,687,738 bytes), import it and save it (no changes) the resuiting pdf is 119,512,540 bytes. If I take the original PDF (85,687,738 bytes), import it, run all the pages through deskew, save it a) the save takes ~30x longer [I assume there is an optimization for unaltered pages] b) the resulting pdf is 221,698,700 bytes which explains the situation I've got myself into. So there are two issues: 1) The above. I can make the original pdf available if it will help. 2) Previous...
If I take the original PDF (85,687,738 bytes), import it and save it, no changes, the resuiting pdf is 119,512,540 bytes. In the above case, as I said, I added 6 pages, deskewed pages, rotated some pages that deskew didn't do a good job on etc and combined some 11x17 pages. As I said, it's now 232,270,694 bytes when exported which isn't great. I don't believe at any point I did a wholesale export to png and reimport. I always exported via saving the PDF but I suspect that modifying the images via...
It doesn't appear to be an issue with the PDF import. I think it's just an intrinsic issue with png files and how NAPS2 converts to/from it's internal format Using a png previously exported by NAPS2, if I compress it using pngcrunch the resulting file is reduced in size from 323,672 to 162,659 bytes (attached). If I import this into NAPS, and save it as a png, the resulting file is 326,819 bytes I do have the very original PDF file. I can certainly make it available but I'm not use it's useful as...
Also, if I specify no options to pdfimages, it chooses by default .ppm
I'm using the following: NAPS 6.0.3 (I don't think it's 6.X related) pdfimages v0.51 The help for pdfimages is rather confusing, it states: -png : change the default output format to PNG -tiff : change the default output format to TIFF -j : write JPEG images as JPEG files -jp2 : write JPEG2000 images as JP2 files -jbig2 : write JBIG2 images as JBIG2 files -ccitt : write CCITT images as CCITT files -all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt My understanding (I would welcome education on...
Add a retry option to better handle file in use error
thanks, --density 300x300 which matches the initial scanning dpi allowed beta2 to import
Custom rotation in beta2 causes crash
Hmmn. I wonder if there are any useful arguments (to convert) to specify the page size.
The actual scan that it came from was only three 8.5x11 sheets. Stitched together into a single image using Microsoft Image Composite Editor. Not really sure why the page size is being reported as 3m x 1.3m.
Interesting. It was created by: 'magick convert image-0075.pbm test.pdf' $ magick identify image-0075.pbm image-0075.pbm PBM 8639x3805 8639x3805+0+0 1-bit Bilevel Gray 3.91904MiB 0.031u 0:00.034
Above is from 6.0.2
Distilled down to a single page. 2018-09-27 10:35:21.2041 Error importing PDF file. System.OverflowException: Overflow error. at Microsoft.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at Microsoft.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccess(Task task) at NAPS2.ImportExport.Pdf.PdfSharpImporter.<>cDisplayClass6_0.<<import>b</import>0>d.MoveNext()
2018-09-26 08:23:10.8547 Error importing PDF file. System.OverflowException: Overflow error. at Microsoft.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at Microsoft.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccess(Task task) at NAPS2.ImportExport.Pdf.PdfSharpImporter.<>cDisplayClass6_0.<<import>b</import>0>d.MoveNext()
Now just get "could not be imported" after 75 pages. v6.0.2
I'll check the image and if necessary upload a new one.
out of memory exception trying to import pdf
dsave segfaults if no target is specified
deldir of directory containing files does not correctly free up space
I sent you a messga with link. LMK when you've downloaded so I can delete. Thanks for all the help!
Incorrect location of text in OCR
Odd as the above tutorial reports version 3.03 and what I'm running on Windows is 3.04 but does not have the -png option:: pdfimages -h pdfimages version 3.04 Copyright 1996-2014 Glyph & Cog, LLC On my openSUSE Linux system I get: $ pdfimages -h pdfimages version 0.56.0 Copyright 2005-2017 The Poppler Developers - http://poppler.freedesktop.org Copyright 1996-2011 Glyph & Cog, LLC So that's probably a fork. I found a version of poppler here: http://blog.alivate.com.au/poppler-windows/ which includes...
Doesn't exist in the Windows version. If it's not a useful addition, then ignore. I can always use imagemagick to convert to PNG (magick mogrify -format png *.ppm)
Possible as I guess PNG is lossless compression. I have no idea if the actual image stored in the PDF is PPM, or if pdfimages is performing a conversion but by default it saves as PPM so it would be nice if NAPS2 could handle it..The only other option is 'JPG' if you specify -j and if the PDF image is a "JPG image".
Add support for importing PPM.
I opened a new ticket for the above.
As an aside. Is there any possibility that NAPS2 could directly import PPM files? pdfimages is able to extract the files (all PPM) from the above two pdfs, so if NAPS2 could directly import these files it would provide an alternative when the direct PDF import goes wrong (obviously conversion to JPG is problematic as it's lossy).
File1: Attachment in Google Drive. Opens fine in eVince and Adobe Acrobat. 2017-12-06 12:29:44.1561 Error importing PDF file. PdfSharp.Pdf.IO.PdfReaderException: Unexpected character '0x0023' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file. at PdfSharp.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch) at PdfSharp.Pdf.IO.Lexer.ScanNextToken() at PdfSharp.Pdf.IO.Parser.ParseObject(Symbol stop) at PdfSharp.Pdf.IO.Parser.ReadArray(PdfArray...
At a minimum it needs an improved diagnostic. It's occuring with two large documents but I have no idea if it's the same underlying issue. I'm not sure on distribution rights on one document so attaching just the other. it's too large to attach directly so here is Google Docs link. Thanks for all your work on this great product. https://drive.google.com/file/d/1a_AFFuEJAFjGs4iaKMknfB7biDAytJQJ/view?usp=sharing
NAPS2 fails to import large external PDF
I did create it in NAPS2. Or rather I imported an existing PDF and rotated all the pages in NAPS2 before saving it out of NAPS2 to be processed by BRISS. As an aside; a native BRISS like feature in NAPS would be nice (though maybe a bit esoteric) as the UI of BRISS is hard to fathom.
PDF scanned as a booklet and then processed using BRISS imports in NAPS still as booklet
Yes, works well now. Thanks. Familar with Ghostscript. Perhaps make the dialog state that is what it's downloading?
Works great. This program really has very useful functionality at this point. Thank you for all your effort.
https://sourceforge.net/p/naps2/tickets/364/
I'm not convinced OCR is working in the above situation. Opened ticket https://sourceforge.net/p/naps2/tickets/366/ Anyways, back to the question above about what exactly the additional install is and what it supports?
OCR does not occur for external PDF imported into NAPS2 containing PBM data.
I'm not convinced OCR is working in the above situation. I will open a ticket. Anyways, back to the question above about what exactly the additional install is and what it supports?
I'm not convinced OCR is working in the above situation. It saves the PDF instantaneously and without any OCR info even though it's enabled. I'm able to run tesseract 3.05.01 directly against the PBM files and get expected text output. If I use imagemagick to convert the PBM to JPG and then import the JPG into NAPS2, then the resultant PDF gets OCR but of course too much loss (and PDF size is 7x). I will open a ticket.
I'm not convinced OCR is working in the above situation. It saves the PDF instantaneously and without any OCR info even though it's enabled. I'm able to run tesseract 3.05.01 directly against the PBM files and get expected text output. If I use imagemagick to convert the PBM to JPG and then import the JPG into NAPS2, then the resultant PDF gets OCR but of course too much loss (and PDF size is 7x). I opened a ticket.
I'm not convinced OCR is working in the above situation. It saves the PDF instantaneously and without any OCR info even though it's enabled. Should I open a ticket, or am I missing something? I'm able to run tesseract 3.05.01 directly against the PBM files and get expected text output. If I use imagemagick to convert the PBM to JPG and then import the JPG into NAPS2, then the resultant PDF gets OCR but of course too much loss (and PDF size is 7x).
I'm not convinced OCR is working in the above situation. It saves the PDF instantaneously and seemingly without any OCR info even though it's enabled. Should I open a ticket, or am I missing something. I'm able to run tesseract 3.05.01 directly against the PBM files and get expected text output.
I'm not convinced OCR is working in the above situation. It saves the PDF instantaneously and seemingly without any OCR info even though it's enabled. Should I open a ticket, or am I missing something?
I'm not convinced OCR is working in the above situation. It saves the PDF instantaneously and seemingly without any OCR info even though it's enabled.
I see in the changelog "Added support for importing any PDF (requires an additional download, can be disabled by NoUpdatePrompt or DisableGenericPdfImport in appsettings.xml)". Thank you for adding this!!! Can you elaborate on this "additional download"? It didn't specify when I was prompted to download. Also is there more info on exactly what image formats are supported inside these external PDFs? I have an external pdf and it can now be successfully imported into NAPS. The native image format inside...
The videos are worth watching as there are several different issues over the two videos. Hard crash in one. Progress box bouncing around, getting hidden behind window, weird screen updating in the other. Also please LMK when I can remove above files off google drive.
Also please LMK when I can remove above files off google drive.
Deskew feature not working correctly, crashes, fails to progress, doesn't deskew
I opened a ticket as there are issues when trying to deskew. I imported 192 jpegs I'd exported out of naps and tried to deskew them. Several issues.
This is great news. Is there any way to control how how agressively the deskew works? I have some pages that are all about 0.7 degrees skewed. This is visible to my eye as they contain lots of tables with vertical line borders but I can imagine that the algorithm is perhaps tuned for more extreme cases than this.
This is great news. Is there any way to control how the deskew works? I tried it and the first page in my doc that I thought was skewed, it didn't make any change to.
Peter. Would love to hear an update on your change.
On Linux using img2pdf (https://github.com/josch/img2pdf) I get way better results...
I'm trying to deskew an existing PDF file (basically remove the ADF skew) and add...
Are there any open source tools that can straighten scanned pages (of text)?
Unzip above archive. [windows desktop] right click on image-pre.jpg, properties,...
Unzip above archive. Right click on image-pre.jpg, properties, details -> 144dpi...
Attached is images pre and post NAPS2 rotation.
Similar issue still with 5.0.2.23866 Pages I rotated in NAPS2 have size of 12.75x16.75...
Similar issue still with 5.0.2.23866 Pages I rotated in NAPS2 have size of 12.75x16.75...
Similar issue still with 5.0.2.23866 Pages I rotated in NAPS2 have size of 12.75x16.75...
interesting as the metadata claimed they were all still 300dpi i overwrote the originals,...
What do I need to do? Can I just re-import the existing jpegs, or do I need to (somehow...
What do I need to do? Can it just import the existing jpegs, or do I need to (somehow...
Also, unlike before (where I used faststone and there was the weird 96/300 dpi inconsistency)...
Ben. There is definitely a problem here. I just scanned another document and had...
hmmn. it's set to 300x300 dpi in the edit/set-dpi option. but sure enough other tools...
The image size in Faststone is correct. 7.96 x 10.31 inches I did a resize forcing...
The image size in Faststone is correct. 7.96 x 10.31 inches I did a resize forcing...
The image size in Faststone is correct. 7.96 x 10.31 inches I did a resize forcing...
Correct. I found that some of the scanned pages I wasn't happy with (ADF feeder wasn't...
Correct.
Also, I'm seeing the issue in 4.7.0 and 4.7.2
I'm having a weird issue with page size on generated PDFs. I think it is happening...
Problem of importing seems fixed in latest version. No issues saving to a PDF either....
actually another of the regular smaller images (that were directly scanned by NAPS2)...
Also, this is bizarre. I got a failed to import error on each. None were displayed...
Any idea when this version will be released?
2016-02-08 21:55:37.7512 The file 'image.91.jpg' could not be imported. System.OutOfMemoryException:...
I have some large images, NAPS2 fails to import them After thinking about it for...