Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Carlo Marcelo Arenas Belon wrote:
> On Mon, Nov 30, 2009 at 08:12:34AM +0000, Daniel Pocock wrote:
>   
>> Carlo Marcelo Arenas Belon wrote:
>>     
>>> On Sun, Nov 29, 2009 at 10:57:01AM +0000, Carlo Marcelo Arenas Belon wrote:
>>>   
>>>       
>>>> On Tue, Nov 24, 2009 at 06:03:51PM -0800, Bernard Li wrote:
>>>>     
>>>>         
>>>>> Please help us test on as many OS/archs as possible, as this would go
>>>>> GA quite immediately ;-)
>>>>>       
>>>>>           
>>>> FreeBSD is not able to return any XML data through TCP/8649 (tested with
>>>> FreeBSD 8.0 amd64).
>>>>         
>>> the problem wasn't actually the TCP/8649 service but the fact that gmond
>>> was going into an infinite loop after sending the first metric update.
>>>
>>> the issue was tracked down to r2043 and a 3.1.5 development package with
>>> that patch reverted is available for testing from :
>>>
>>>   http://sajino.sajinet.com.pe/ganglia/ganglia-3.1.5.2101.tar.gz
>>>   
>>>       
>> Did you see this issue with 3.1.3 or 3.1.4?  They both contain the same  
>> patch.
>>     
>
> Both 3.1.3 and 3.1.4 should have the same problem, but haven't been able to
> test 3.1.3 since it is no longer available.  (FreeBSD 8 was just released a
> couple of days ago anyway).  3.1.4 shows the same behavior at least there
> and the "fixed" package seems to also work find with OpenBSD 4.4 amd64,
> NetBSD 4 i386 and DragonFlyBSD 2.4.1 i386 and amd64 (after also patched
> with r2124 to workaround BUG245).
>
>   
>>>> DragonFlyBSD fails to build but a 3.2 version of ganglia which includes
>>>> fixes for that fails with the same TCP issue than FreeBSD and so this
>>>> issue might be affecting other BSD as well.
>>>>         
>>> confirmed also to be affecting OpenBSD (tested with OpenBSD 4.5 amd64)
>>> but considering the nature of the "fix" wouldn't be surprised if other
>>> configurations were also affected.
>>>   
>>>       
>> Are you proposing a fix or just revert the change?
>>     
>
> Your call, eventhough a fix for this feature will be probably preferred as
> there is nothing special about the BSD for them to be affected and it might
> be that the problem is therefore more generic.
>   
It may be that this bug is revealing a more serious issue in the way 
initialisation is done, so I would prefer to know the real cause rather 
than just revert the change that forces the problem to show itself.
> At least a revert would be needed for 3.1 as this accounts for a regression
> but haven't done so either waiting for you to first revert it on trunk and
> then decide on how to proceed from there depending on how critical this
> feature was for the release.
>
>   
I agree that it is a recession, but reverting it may cause the real 
culprit to remain hidden.  I'd rather hold the release while we look 
more closely.
>> The change has been working on Linux, Solaris and Cygwin.
>>     
>
> Other than just doing a manual bisect (using git instead of svn here would
> had been useful) to find where the problem was introduced and validate that
> reverting it corrects the problem haven't done much analysis of it, but the
> fact that it broke in such a strange way (was indeed expecting the culprit
> to be somewhere else, specially considering all recent changes in the
> networking and the fact that it seemed originally to be triggered by a TCP
> request) probably points to a bigger issue which just happens to have not
> been visible on the configurations used to test Linux, Solaris and Cygwin,
> specially considering how pervasive it was (broke all BSD I had access to
> test, at least)
>   
Can you provide output from strace/truss and also a stack trace from the 
point where it is in the infinite loop?

There is a good reason for moving the daemonize code the way I did - an 
alternative would be to daemonize, but make the original process hang 
around until the daemon process has entered the main loop.

Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing

Scalable, distributed monitoring system for high-performance computing

Re: [Ganglia-developers] Ganglia 3.1.5 beta ready for final testing