I've seen that, but the valgrind log is unfortunately useless. The lost
memory blocks are just known things. I suspect that the wasted memory
is still referenced in the cache, but not counted towards the maximum
cache size. This is going to be tricky and I lack the time to do this
well...
Leon
On 8/17/20 2:30 AM, Janusz wrote:
After browsing the first 100 pages one by one with
I ran the valgrind "massif" tool to study the allocations while browsing
50 pages (25% magnification, continuous side-by-side, scroll until
reaching 50 pages). No memory leak per se. But I was surprised to see
that the decoded hidden text (which is kept around in order to search
quickly) can take about 500KB per page in this document. For 100 pages,
that 50MB. This explains pretty much all the allocation growth that I
can see with massif. But that is far from enough to exhaust all your
computer memory in 200 pages....
Any additional hint about better ways to reproduce the problem you describe?
Leon
On 8/18/20 12:37 AM, Janusz wrote:
Is there a way I can help? My qualifications are very limited but
after retiring the time is not a problem.
Thanks for your effort. The sort answer to your question is unfortunately negative, but just in case I add some comments.
The problem was first noticed in our djview4poliqarp, so the culprit is somewhere in the shared code. The document in question is a dictionary with large dense pages scanned in 600DPI , so perhaps this explains the size of the decode page . I practically never use continuous side-by-side, I browse with PageDown with zoom "wide" and "page". I was quite happy to be able to reproduce it in djview4 with just browsing page by page as I was unable to reproduce it in djview4poliqarp. Any suggestions how to try to reproduce it in some other way?
I had an impression which I've not verified (had no idea which tool to use) that the leak is stopped or more probably restarted but some actions like saving a page fragment. Do you want me to make more such experiments?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did not use this fancy visualization but the simple ms_print ascii tool.
Regardless of my efforts, I did not succeed in getting this kind of explosive memory consumption in plain djview.
I tried using continuous, fit width, and go over 50 pages using the space key. No such effect….
In the context of djview4poliqarp, the key thing to check is to make sure that one does not keep around GP<djvufile> or GP<djvuimage> for the previously visualized pages. One way to check this is to compile with -DDEBUGLVL=1 and to look for the DjVuFile destruction message “DjVuFile::~DjVuFile(): destroying...\n” …</djvuimage></djvufile>
Here is my massif visualisation when going over the first 50 pages of
your document (fit width, pressing space bar). It shows some growing
memory consumption because we record all the text (allocated in
new_block). But that's nothing like yours. I wonder what's going on.
Leon
On 8/20/20 5:18 PM, Leon Bottou wrote:
I did not use this fancy visualization but the simple ms_print ascii tool.
Regardless of my efforts, I did not succeed in getting this kind of
explosive memory consumption in plain djview.
I tried using continuous, fit width, and go over 50 pages using the
space key. No such effect….
In the context of djview4poliqarp, the key thing to check is to make
sure that one does not keep around GP<djvufile> or GP<djvuimage> for
the previously visualized pages. One way to check this is to compile
with -DDEBUGLVL=1 and to look for the DjVuFile destruction message
“DjVuFile::~DjVuFile(): destroying...\n” …</djvuimage></djvufile>
I reproduced the problem on Windows 10, however instead of a crash I got a nice error message "Out of memory. Cannot decode page 290", repeated for the page 291. Perhaps the message can be suplemented by some additional useful information?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I reproduced the problem on another installation of Windows 10 on a computer with bigger RAM. I got "Out of memory" for page 284 despite the fact that there was still several GB memory free! Moreover both the pixel and the decode page cache was increased by me twice. So instead or beside the memory leak we have a bug which should be easier to diagnose. What about extending "out of memory" message with the information what kind of memory is insufficient?
Last edit: Janusz 2020-08-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying now to go over your file page per page, and this seems a lot more reasonable.
I find that having the text at the character level eats a lot more memory than I would like, but I am reaching page 289 with 289MB of memory usage...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I got "out of memory" error for the page 282 on a 4GB RAM laptop, earlier it was for 200 (on the same computer). So the is a small improvement, but I still don't understand the problem. Why the memory demand is not constant for displaying pages one after one? Evidently some memory is not released.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I reproduced the problem from my post of 2020-08-27 , but this time there is just a crush instead of the "out of memory" error.
What interesting and probably easier to diagnose, there is the program crush several seconds after displaying the first page of https://djvu.szukajwslownikach.uw.edu.pl/linde/index.djvu (during downloading the subsequent pages to the cache?); unable to reproduce this on Linux.
This experiments were made on a 8GB RAM desktop (Windows 10), and the monitor shows that only about half of it was used.
As for the original problem with https://djvu.szukajwslownikach.uw.edu.pl/linde-t/01/, perhaps the character level segmentation is the culprit? Recently I browsed page after page several other large documents (about 700 pages each) and the memory footprint was constant as it should be.
Last edit: Janusz 2020-11-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is getting very strange.
I went over the first 300 pages of https://djvu.szukajwslownikach.uw.edu.pl/linde-t/01/ on a windows 10 without problems. The max memory was about 300MB because of the character level text. But all was fine.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I attach the valgrind output for reading a 100 pages one by one.
After browsing the first 100 pages one by one with
djview4 https://djvu.szukajwslownikach.uw.edu.pl/linde-t/01/index.djvu&
'free' shows the increase of memory used from 2209292 to 2973452, after
the next 100 pages to 3561568.
Can you reproduce this?
Regards
Janusz
I've seen that, but the valgrind log is unfortunately useless. The lost
memory blocks are just known things. I suspect that the wasted memory
is still referenced in the cache, but not counted towards the maximum
cache size. This is going to be tricky and I lack the time to do this
well...
On 8/17/20 2:30 AM, Janusz wrote:
Related
Bugs: #320
Is there a way I can help? My qualifications are very limited but after retiring the time is not a problem.
I ran the valgrind "massif" tool to study the allocations while browsing
50 pages (25% magnification, continuous side-by-side, scroll until
reaching 50 pages). No memory leak per se. But I was surprised to see
that the decoded hidden text (which is kept around in order to search
quickly) can take about 500KB per page in this document. For 100 pages,
that 50MB. This explains pretty much all the allocation growth that I
can see with massif. But that is far from enough to exhaust all your
computer memory in 200 pages....
Any additional hint about better ways to reproduce the problem you describe?
On 8/18/20 12:37 AM, Janusz wrote:
Related
Bugs: #320
Thanks for your effort. The sort answer to your question is unfortunately negative, but just in case I add some comments.
The problem was first noticed in our djview4poliqarp, so the culprit is somewhere in the shared code. The document in question is a dictionary with large dense pages scanned in 600DPI , so perhaps this explains the size of the decode page . I practically never use continuous side-by-side, I browse with PageDown with zoom "wide" and "page". I was quite happy to be able to reproduce it in djview4 with just browsing page by page as I was unable to reproduce it in djview4poliqarp. Any suggestions how to try to reproduce it in some other way?
I had an impression which I've not verified (had no idea which tool to use) that the leak is stopped or more probably restarted but some actions like saving a page fragment. Do you want me to make more such experiments?
Massif visualisation suggested by Joachim Aleszkiewicz.
I did not use this fancy visualization but the simple ms_print ascii tool.
Regardless of my efforts, I did not succeed in getting this kind of explosive memory consumption in plain djview.
I tried using continuous, fit width, and go over 50 pages using the space key. No such effect….
In the context of djview4poliqarp, the key thing to check is to make sure that one does not keep around GP<djvufile> or GP<djvuimage> for the previously visualized pages. One way to check this is to compile with -DDEBUGLVL=1 and to look for the DjVuFile destruction message “DjVuFile::~DjVuFile(): destroying...\n” …</djvuimage></djvufile>
Leon
From: Janusz jsbien@users.sourceforge.net
Reply-To: "[djvu:bugs]" 320@bugs.djvu.p.re.sourceforge.net
Date: Wednesday, August 19, 2020 at 4:47 PM
To: "[djvu:bugs]" 320@bugs.djvu.p.re.sourceforge.net
Subject: [djvu:bugs] #320 memory leak
Massif visualisation suggested by Joachim Aleszkiewicz.
Attachments:
massif.pdf (71.4 kB; application/pdf)
[bugs:#320] memory leak
Status: open
Group: djview
Created: Wed Jul 08, 2020 07:17 AM UTC by Janusz
Last Updated: Wed Aug 19, 2020 05:11 AM UTC
Owner: nobody
Cf. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964506.
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/djvu/bugs/320/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Bugs: #320
Hi Janusz,
Here is my massif visualisation when going over the first 50 pages of
your document (fit width, pressing space bar). It shows some growing
memory consumption because we record all the text (allocated in
new_block). But that's nothing like yours. I wonder what's going on.
On 8/20/20 5:18 PM, Leon Bottou wrote:
Related
Bugs: #320
I reproduced the problem on Windows 10, however instead of a crash I got a nice error message "Out of memory. Cannot decode page 290", repeated for the page 291. Perhaps the message can be suplemented by some additional useful information?
I reproduced the problem on another installation of Windows 10 on a computer with bigger RAM. I got "Out of memory" for page 284 despite the fact that there was still several GB memory free! Moreover both the pixel and the decode page cache was increased by me twice. So instead or beside the memory leak we have a bug which should be easier to diagnose. What about extending "out of memory" message with the information what kind of memory is insufficient?
Last edit: Janusz 2020-08-27
Please let me know exactly which release you're using (which installer).
Hoping that the new one fixes the problem (which I otherwise cannot see).
I am trying now to go over your file page per page, and this seems a lot more reasonable.
I find that having the text at the character level eats a lot more memory than I would like, but I am reaching page 289 with 289MB of memory usage...
I got "out of memory" error for the page 282 on a 4GB RAM laptop, earlier it was for 200 (on the same computer). So the is a small improvement, but I still don't understand the problem. Why the memory demand is not constant for displaying pages one after one? Evidently some memory is not released.
I reproduced the problem from my post of 2020-08-27 , but this time there is just a crush instead of the "out of memory" error.
What interesting and probably easier to diagnose, there is the program crush several seconds after displaying the first page of https://djvu.szukajwslownikach.uw.edu.pl/linde/index.djvu (during downloading the subsequent pages to the cache?); unable to reproduce this on Linux.
This experiments were made on a 8GB RAM desktop (Windows 10), and the monitor shows that only about half of it was used.
As for the original problem with https://djvu.szukajwslownikach.uw.edu.pl/linde-t/01/, perhaps the character level segmentation is the culprit? Recently I browsed page after page several other large documents (about 700 pages each) and the memory footprint was constant as it should be.
Last edit: Janusz 2020-11-21
This is getting very strange.
I went over the first 300 pages of https://djvu.szukajwslownikach.uw.edu.pl/linde-t/01/ on a windows 10 without problems. The max memory was about 300MB because of the character level text. But all was fine.