I have been working with Chromium for a couple of weeks now. My current task is to get the SPEC Viewperf benchmarks running in a multi-server Chromium environment on Windows 2000. I have had a fair bit of success, but there have been some problems. I would appreciate any help you can give in the following areas
1. Mothership issues
In my setup the mothership crashes whenever the crappfaker has completed. I think this is because crappfaker never sends a "quit" request to the mothership.
Question: Should crappfaker send a "quit" request to the mothership?
I have tried adding the code to do this, and it prevents the mothership from crashing, which seems good. That leads to the next question, though: what is the correct way to terminate a mothership? I want to programmatically (through shell script or python) tell a mothership to exit. I don't see any mothership methods that would do this though. Any suggestions?
2. CRSERVER and GMlib
I notice that when I run Chromium using TCP/IP, the CRSERVER process will exit once the crappfaker and application have completed. However, when using GM/myrinet, the CRSERVER process hangs around "forever" displaying the last frame. Is this something that other folks have seen? Any suggestions?
3. General Instability
I've noticed a marked instability in the system, where various components (crserver, mothership, gm communications) will randomly crash. There doesn't seem to be any rhyme or reason to it. I suspect this might be due to the way that I'm using Chromium, which I suspect is unusual. Do you have any comments on the general stability of the system?
Here's what I'm doing: I've created a configuration file that's capable of taking various parameters, including the startdir, application and args, tiling geometry, window geometry, etc. and configuring the mothership with that information. Around that I've wrapped scripts that emulate the Viewperf scripts, but in a Chromium way (i.e. with tiling, etc.)
Each of these scripts runs series of benchmark programs, one after the other. The script ensures that the mothership is configured with the program name, etc. and starts the CRSERVERs on the various remote nodes. It then starts crappfaker. What I think might be different is that I am running many differnet configurations/programs, and I'm doing it quickly (i.e. the end each Chromium session is separated from the beginning of the next by only a few seconds) Is anybody else doing anything like this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've never seen the mothership "crash" -- it's written in Python, so if it's crashing there must be a problem with your python setup. I don't think it's right to send a quit message, since we want the mothership to keep running between jobs. You can just run "resetms" between jobs to bring it back to a virgin state.
The non-quitting behavior of GMlib is a known bug. In WireGL, we initiated a TCPIP connection *first*, to do some handshaking, and then set up a GM connection. We left the TCPIP connection open to see when the client had died. Since GM is connectionless, that isn't possible in Chromium, since all connections are brokered through the mothership. I think the right thing to do eventually will be to use the mothership as a "death monitor" as well, although this is pretty low priority.
As for instability, many people are working with chromium day-to-day, and I haven't heard any complaints about anything other than a memory leak which is being plugged as I write this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
isn't that a mothership crash? i don't see it on linux ever, but it's very common for me on win2k. i never bothered to investigate since i almost never use windows.
-d
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, OK, it's a caught exception, not a crash, but still annoying. I *have* seen this happen to some users on certain Windows installations, although it has never happened to me.
Since I can't reproduce it, I can't speak to the details of the problem, although it may have something to do with the way python networking is handling dropped connections.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I see this problem 100% of the time on my Windows 2000 cluster, using either TCP/IP or Myrinet GM as a network transport. When the crappfaker exits (successfully) I get the following mothership exception traceback, after which the mothership is dead. (It's this behavior that I termed a crash.)
MOTHERSHIP EXCEPTION! TERRIBLE!
Traceback (most recent call last):
File "U:\cr/mothership/server/mothership.py", line 353, in Go
self.ProcessRequest( self.wrappers[sock] )
File "U:\cr/mothership/server/mothership.py", line 758, in ProcessRequest
self.ClientError( sock_wrapper, SockWrapper.NOTHINGTOSAY, "Request was empty
?" )
File "U:\cr/mothership/server/mothership.py", line 375, in ClientError
sock_wrapper.Reply( code, msg )
File "U:\cr/mothership/server/mothership.py", line 245, in Reply
self.Send( tosend )
File "U:\cr/mothership/server/mothership.py", line 239, in Send
self.sock.send( str + "\n" )
error: (104, 'Connection reset by peer')
(Incidentally, this is with the Python 2.1 supplied with the Cygwin tools.)
It appears the mothership is having a problem when the appfaker connection is dropped, and it might have something to do with that last "empty" message.
What I did to correct it was to add a call to crMothershipDisconnect() as the last thing crappfaker.exe does before it exits.
This leaves the mothership "running", of course. That leaves one of my other original questions -- how do you tell the mothership to go away?
As to the stability question...
We're running Chromium in what might be an atypical way. Here's what we do:
for( i = 1 to 30 )
{
Select configuration(i)
Start mothership with configuration(i)
Start crservers(1...number-of-tiles)
Start appfaker
Cleanup (kill leftover crservers)
}
What we have found is that if you do this quickly (like in a script) you're guaranteed to run into problems -- random crashes of the mothership (ok exceptions), crserver or appfaker). What we've done to improve our stability is to liberally wait between varous steps to give the system a chance to settle down. That seems to have helped enormously.
I've been following the discussion about the memory leak, but that doesn't seeem to have anything to do with what we're seeing.
Finally, the GM issue...
OK, I can live with things the way they are indefinitely. I'm just glad somebody else has seen the same behavior. GM is new to us, so I was afraid that we were doing something terribly wrong.
Thanks for your help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been working with Chromium for a couple of weeks now. My current task is to get the SPEC Viewperf benchmarks running in a multi-server Chromium environment on Windows 2000. I have had a fair bit of success, but there have been some problems. I would appreciate any help you can give in the following areas
1. Mothership issues
In my setup the mothership crashes whenever the crappfaker has completed. I think this is because crappfaker never sends a "quit" request to the mothership.
Question: Should crappfaker send a "quit" request to the mothership?
I have tried adding the code to do this, and it prevents the mothership from crashing, which seems good. That leads to the next question, though: what is the correct way to terminate a mothership? I want to programmatically (through shell script or python) tell a mothership to exit. I don't see any mothership methods that would do this though. Any suggestions?
2. CRSERVER and GMlib
I notice that when I run Chromium using TCP/IP, the CRSERVER process will exit once the crappfaker and application have completed. However, when using GM/myrinet, the CRSERVER process hangs around "forever" displaying the last frame. Is this something that other folks have seen? Any suggestions?
3. General Instability
I've noticed a marked instability in the system, where various components (crserver, mothership, gm communications) will randomly crash. There doesn't seem to be any rhyme or reason to it. I suspect this might be due to the way that I'm using Chromium, which I suspect is unusual. Do you have any comments on the general stability of the system?
Here's what I'm doing: I've created a configuration file that's capable of taking various parameters, including the startdir, application and args, tiling geometry, window geometry, etc. and configuring the mothership with that information. Around that I've wrapped scripts that emulate the Viewperf scripts, but in a Chromium way (i.e. with tiling, etc.)
Each of these scripts runs series of benchmark programs, one after the other. The script ensures that the mothership is configured with the program name, etc. and starts the CRSERVERs on the various remote nodes. It then starts crappfaker. What I think might be different is that I am running many differnet configurations/programs, and I'm doing it quickly (i.e. the end each Chromium session is separated from the beginning of the next by only a few seconds) Is anybody else doing anything like this?
I've never seen the mothership "crash" -- it's written in Python, so if it's crashing there must be a problem with your python setup. I don't think it's right to send a quit message, since we want the mothership to keep running between jobs. You can just run "resetms" between jobs to bring it back to a virgin state.
The non-quitting behavior of GMlib is a known bug. In WireGL, we initiated a TCPIP connection *first*, to do some handshaking, and then set up a GM connection. We left the TCPIP connection open to see when the client had died. Since GM is connectionless, that isn't possible in Chromium, since all connections are brokered through the mothership. I think the right thing to do eventually will be to use the mothership as a "death monitor" as well, although this is pretty low priority.
As for instability, many people are working with chromium day-to-day, and I haven't heard any complaints about anything other than a memory leak which is being plugged as I write this.
MOTHERSHIP EXCEPTION! TERRIBLE! ...and bails
isn't that a mothership crash? i don't see it on linux ever, but it's very common for me on win2k. i never bothered to investigate since i almost never use windows.
-d
Well, OK, it's a caught exception, not a crash, but still annoying. I *have* seen this happen to some users on certain Windows installations, although it has never happened to me.
Since I can't reproduce it, I can't speak to the details of the problem, although it may have something to do with the way python networking is handling dropped connections.
I see this problem 100% of the time on my Windows 2000 cluster, using either TCP/IP or Myrinet GM as a network transport. When the crappfaker exits (successfully) I get the following mothership exception traceback, after which the mothership is dead. (It's this behavior that I termed a crash.)
Replying (200): "Bye"
Processing mothership request: ""
MOTHERSHIP EXCEPTION! TERRIBLE!
Traceback (most recent call last):
File "U:\cr/mothership/server/mothership.py", line 353, in Go
self.ProcessRequest( self.wrappers[sock] )
File "U:\cr/mothership/server/mothership.py", line 758, in ProcessRequest
self.ClientError( sock_wrapper, SockWrapper.NOTHINGTOSAY, "Request was empty
?" )
File "U:\cr/mothership/server/mothership.py", line 375, in ClientError
sock_wrapper.Reply( code, msg )
File "U:\cr/mothership/server/mothership.py", line 245, in Reply
self.Send( tosend )
File "U:\cr/mothership/server/mothership.py", line 239, in Send
self.sock.send( str + "\n" )
error: (104, 'Connection reset by peer')
(Incidentally, this is with the Python 2.1 supplied with the Cygwin tools.)
It appears the mothership is having a problem when the appfaker connection is dropped, and it might have something to do with that last "empty" message.
What I did to correct it was to add a call to crMothershipDisconnect() as the last thing crappfaker.exe does before it exits.
This leaves the mothership "running", of course. That leaves one of my other original questions -- how do you tell the mothership to go away?
As to the stability question...
We're running Chromium in what might be an atypical way. Here's what we do:
for( i = 1 to 30 )
{
Select configuration(i)
Start mothership with configuration(i)
Start crservers(1...number-of-tiles)
Start appfaker
Cleanup (kill leftover crservers)
}
What we have found is that if you do this quickly (like in a script) you're guaranteed to run into problems -- random crashes of the mothership (ok exceptions), crserver or appfaker). What we've done to improve our stability is to liberally wait between varous steps to give the system a chance to settle down. That seems to have helped enormously.
I've been following the discussion about the memory leak, but that doesn't seeem to have anything to do with what we're seeing.
Finally, the GM issue...
OK, I can live with things the way they are indefinitely. I'm just glad somebody else has seen the same behavior. GM is new to us, so I was afraid that we were doing something terribly wrong.
Thanks for your help.