Menu

#26 Watermarks are covering Important TEXT sections, making the un-readable

1.0
closed
nobody
None
2023-08-20
2020-07-26
Akos Simon
No

Hi,...not sure if anyone had noticed yet: there are massiv Watermarks which are covering Important TEXT sections, making theses TEXT sections then un-readable, this happens when optioning for 300% Zoom factor, and downlaoding pages as individual images ..... This does NOT happen when using 100% Zoom factor . In 100% ZOOM Mode that watermark will NOT cover that very same TEXT Section at all....
Would this massive Watermark in 300% Zoom mode be generated by HATHI Websiite or by this Hathi Download Helper App itself internally?
There are checkboxes to remove Watermarks, but they do not function / not work at all ...

2 Attachments

Discussion

  •  hdh-creator

    hdh-creator - 2020-07-26

    Hi Akos,

    the option to hide watermarks works only with downloaded PDF files and only for pdf files downloaded from hathitrust.org.

    Watermarks within images cannot be removed with HDH.

    • Concerning your example:

    "Gothaischer genealogischer Hof-Kalender: auf das Schalt-Jahr 1840"

    I was not able to find it on hathitrust, but you can download the complete book
    from the original source: Bayerische Staatsbibliothek (select "pdf-download" on the left)

    https://reader.digitale-sammlungen.de/en/fs1/object/display/bsb10428700_00250.html

    without any watermarks :-)

    or

    from google books (via settings symbol -> "download pdf" on the right)
    https://books.google.de/books?id=AJpAAAAAcAAJ&printsec=frontcover&dq=bibliogroup:%22Gothaischer+genealogischer+Hof-Kalender:+auf+das+Jahr%22&hl=de&sa=X&ved=2ahUKEwjJktHnyuvqAhVMDOwKHemTDW0Q6AEwAHoECAEQAg#v=onepage&q&f=false

    • Concerning the watermark size
      I've checked your comment about the watermark size and found something interessing i've not noticed before.

      In general: The watermarks are generated by hathitrust and sometimes, when there is not enough space at the margins, they hide some text. There is nothing you can do about this expect downloading the pdf files instead and "hide" the pdf with HDH.

    Normally the ratio of the watermarks is fixed. That means even with higher resolution (zoom factor up to 400% is avaiable by default ) the watermarks should not increase their size compared to the image.

    In your case the size increased for higher resolutions (zoom factors)

    Here is an example:
    Size =200: Watermark stays at the bottom
    https://babel.hathitrust.org/cgi/imgsrv/image?id=wu.89097313431;seq=173;size=200;rotation=0
    Size = 400: Watermark has increased
    https://babel.hathitrust.org/cgi/imgsrv/image?id=wu.89097313431;seq=173;size=400;rotation=0

    I guess, that depending on the quality of the soucre material, the zoom factor is limited. That means, that the image size will not increase by increasing the zoom factor. But the watermarks did and that might be the reason why their size increase.

    You should re-download the book with a lower zoom factor than before.

    Unfortunately there is no way to check this beforehand. A workarround is to download the best available quality instead (using "res=0" instead of "size=400" parameter ).

    This feature will be available in the next version of HDH.

    Best regards

    Martin aka Hathi Download Helper

     
  •  hdh-creator

    hdh-creator - 2020-07-26
    • status: open --> accepted
     
  • Anonymous

    Anonymous - 2020-07-26

    Ha ! WOW !!!... After 2 years diving into these Gothas , I have never encountered someone who was able to find a rare Gothat that fast in Libraries around the world like You just did!! ...i was thinking about 100% as weel, after i tried dwonloading 200% it seemd it was actually a higher res image,... it was not getting more blurry, which would have indicated it was only inrepolated upward. That is why i was trying 300% then 400%, and the Watermarks sizes kept sudenly behaving erratic, it did not seem to make any sense,... a discover whih was not solving the issue that it would cover information crutial to my project though.... i tried several already, adn 200% was leaving the watermark away from TEXT ...

    i still am confused how to activate that watermark removal , for i had the boxes checked and i was downlaoding from hathitrust weblink, it was a 1940 Gotha :

    https://babel.hathitrust.org/cgi/pt?id=njp.32101063970717&view=1up&seq=7

    but that watermark remained.

    Do You mean that i would need to dowload without your app? directly from their site?

     
  •  hdh-creator

    hdh-creator - 2020-07-27

    As mentioned before, HDH is not able to remove the watermarks from image files. The reason is quite simple:
    The watermarks and the raw data images are combined to single images. So the watermark is a part of the image, all information covered by the watermarks are lost.
    In case of PDF files the situation is very different. A PDF is container. It contains the OCR text, the images for the book pages and separat image data for the watermarks. In addition there are parameter which defines positioning and size of the watermarks. Thus by changing the parameter you can 'hide' the watermarks, so that they aren't visible anymore.

    To obtain a copy without watermarks you have to download a book in PDF format.
    To hide the watermarks make sure that the "hide watermarks" checkbox is selected within the "PDF merge & conversion groupbox" before the PDF merge process has started.

    Alternatively you can hide or recover watermarks for PDF files already downloaded from Hathitrust.org via the menu bar: select Tools -> edit watermarks . This will open a file selection dialog where you can select files and either 'hide' or 'remove' the hathitrust-watermarks. Again, this only works for PDF files downloaded from hathitrust.org. It will not work for PDF files which has been created by HDH or other apps based on downloaded image files.

    Best regards

    Martin

     
  •  hdh-creator

    hdh-creator - 2020-12-19

    Changed download opention from zoom factor to "quality levels".: "best" will download the highest resolution availabe for a book item. "Standard" should meet the normal resolution.

    Fixed in HDH v1.2.0

     
  •  hdh-creator

    hdh-creator - 2020-12-19
    • status: accepted --> closed
     
  • Wouter Franssen

    Wouter Franssen - 2023-08-19

    Hi there,

    I recently came up with a way to be able to remove watermarks in hi-res images from Hathi. The algorithm is the following:

    1. Download an image using your favorite settings
    2. Download the same image again, but now with 'orient=2' set in the hathi api, this rotates the image by 180 degrees, while retaining the position of the watermark!
    3. Rotate image 2. by 180 degrees to have it upright, with the watermark on the top of the page.
    4. Combine the top half of image 1. with the bottom half of image 2.

    In this way, an image without any watermark is obtained!

    The disadvantages of the method are:
    1. Each image needs to be downloaded twice.
    2. If the final image is saved as JPEG (or another lossy format) image encoding needs to be done again, lowering the quality of the image.

    Perhaps something to consider for inclusion in your nice program :)

    Regards,

    Wouter

     
  •  hdh-creator

    hdh-creator - 2023-08-20

    Hi Wouter,

    this is how it works in HDH 1.2.3 😄

     
  • Wouter Franssen

    Wouter Franssen - 2023-08-20

    Haha. Ok. See between me coming up with the idea 2 weeks ago, and me posting it here, you release a new HDH version which already has the feature :D.

    Anyway, thanks for your work. It is a very nice feature to have.

     

Anonymous
Anonymous

Add attachments
Cancel