Thank you very much Paul and Waldek for your kind responses. I'm glad to hear that this a problem with Linux and SBCL rather than my Lisp programming skills :-).
I was able to change "/proc/sys/vm/max_map_count" on the machines I'm using after all, so I've got a work-around at the moment.
It is, however, fairly annoying. I'm not an expert on these things, but if Waldek is right, and there's a 256MB limit of dirty pages, that would seem hugely outdated. My programs are running into this problem when they use up to 30GB of RAM, but that seems like it should be reasonable these days. So, it would be *really* nice to have fix for this.
On Jan 4, 2011, at 3:12 PM, Waldek Hebisch wrote:
> Paul Khuong wrote:
>> In article <BB005B48-CD0E-4B4E-88E9-6EA01125DD24@...>,
>> Benjamin Lambert <benlambert@...> wrote:
>>> I'm running into the old "mprotect call failed with ENOMEM" error, which
>>> "probably means that the maximum amount of separate memory mappings was
>>> I've run into a situation where I can't easily change
>>> "/proc/sys/vm/max_map_count" , and I think I've maxed out
>>> *backend-page-bytes* at 256k or 1MB or so.
>>> However, rather than a work-around, I'd like to figure out why my code
>>> requires so many memory mappings (or why it's stressing SBCL/Linux in the way
>>> it is). The code is dealing with some gnarly data: lots of strings, arrays,
>>> and lists. I'm quite sure that my code is not handling this data
>>> optimally/elegantly. But I'm not sure how to begin debugging this.
>> mmap is used to grab memory from the OS. That usage is fairly normal, so
>> I doubt that's ever an issue.
>> SBCL's usage of mprotect, on the other hand, is very idiosyncratic.
>> A generational garbage collector is based on the assumption that old
>> data (that has already been garbage collected at least once) doesn't
>> change as much as younger data. In order to exploit that assumption,
>> they need to be able to tell when and which older data have been written
>> to, and might then point to young data.
>> Language implementations these days seem to mostly instrument code with
>> software write barriers. SBCL, CMUCL and Boehm (under certain settings)
>> instead depend on the hardware MMU to detect writes: pages are write
>> protected, and writes are logged in the appropriate signal handler
>> before unprotecting the written page. Unless your code really breaks the
>> generational assumption, that's probably not too bad, since Linux will
>> merge mappings. On top of that, SBCL treats unboxed pages (that don't
>> hold pointers) specially wrt mprotect as they don't need any write
>> barrier, and tends to allocate them between regular pages, which
>> precludes merging.
>> If your strings and arrays end up in unboxed pages, that could cause the
>> problem you're observing.
>> Ideally, SBCL would be fixed; in the meantime, I can see two avenues.
>> Unboxed objects like strings and arrays could be explicitly allocated in
>> the C heap, especially if they're long-lived; a couple people have code
>> lying around to pretend that these are regular Lisp objects. Otherwise,
>> it might be possible to slightly modify the generational GC to remove
>> the write barrier and assume everything has always been written.
>> Hopefully, someone else has better ideas.
> Some time ago I was reqularly hitting this problem: few hundreds of
> megabytes memory in use and running out of mappings. My program
> used a lot relatively small arrays of 32-bit numbers. I have modified
> the program to use smaller number of bigger arrays and that
> elliminated most of the failurs.
> Naively, I would think that pointer-free data should be the
> easiest one for garbage collector. Your message seem to indicate
> that the problem was fragmentation of mappings due to unboxed pages
> between regular ones. I wonder how hard would be for SBCL to
> keep unboxed pages together and limit interleaving between regular and
> unboxed pages.
> OTOH I think that problem is really due to Linux kernel. Namely,
> with modern memory sizes program which performs relatively
> infreqent randomly scattered writes into old genration can
> easily exceed the limit: default 2^16 mappings means 256Mb
> in dirty pages which is tiny fraction of whole memory.
> Waldek Hebisch
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment, and,
> should the need arise, upgrade to a full multi-node Oracle RAC database
> without downtime or disruption
> Sbcl-help mailing list