Yes, I got those information from erlang:process_info/2. The total_heap_size of hung process was referred as the word size in my earlier email, and the message_queue_len was zero.
After I set the 'fullsweep_after' as 0 on the machine where the heap size of hung process was bigger, the consumed memory size dropped dramatically to about 6M. Each hung process consumed the same memory size(about 10K) on both machine when I applied erlang:garbage_collect/1 to them. It seemed relative to the garbage collection mechanism of erlang process.
I am still curious about why GC behaves differently on these two machines, but I didn't find any tool to find how many times and when GC occurred before the process hung. So what we can do now is try to avoid its happening.
Thanks for your reply.
On Wed, Jan 30, 2013 at 1:29 AM, cao xu <email@example.com>
We've deployed one web application with yaws 1.88(erlang R14B) on two machines, and there is a web interface with problem in the business logic which can cause the handling yaws process hang when processes the http request.
What make me confused is that those hung processes consume quite different memory size on different machines. The heap size of one hung process is 3,029,104 words(24,233,888 bytes) on one machine, but on the other the size is 139,104 words(1,113,888 bytes). The application is deployed on separated server but the environment seems same to me. And the process is hanging at the same place when problem occurs.
I suspect it's because of the behavior of erlang allocators, but nothing special can be found from the system info of ert_alloc. Could someone help me find the reason?Thanks.
I assume you can remote shell into the systems?
What does erlang:process_info/2 show for the processes, especially with second arguments of backtrace, binary, dictionary, total_heap_size, and message_queue_len?
You might want to consider updating to Yaws 1.95, since 1.88 is kind of old at this point.