Re: [Etherboot-discuss] Debugging something real together
Brought to you by:
marty_connor,
stefanhajnoczi
From: Marty C. <md...@et...> - 2006-06-10 09:02:09
|
On Jun 10, 2006, at 3:31 AM, Carl K wrote: >> So, what do you think we should do next (and why)? :) > either convert the tcpdump text into something ethereal can parse, or > run the 2 tests again using ethereal to do the sniffing and post those > logs so that I don't have to figure out what is going on. Hi Carl, thanks for writing! I hope you understand that part of the intent of this exercise is so that we _do_ have to figure out what is going on :) The journey truly is part of the reward in this case :) Fixing the bug is a side- effect :) I'm happy to to run the tests that you propose, but I want folks to understand that good debugging requires effort and care, and figuring things out is an important part of the game. Understanding why an undesired behavior is occurring is as important as eliminating it, since it is possible to make things work without understanding the cause of the symptom. That said, Ethereal is a very cool and useful tool for this sort of thing, but at this stage, we can do some useful analysis with what we have. > for instance, the first line of the 2 logs end with a different > number: > > 13:30:58.858667 0:c0:49:63:45:8d Broadcast 88a2 64: > 13:42:16.706926 0:c0:49:63:45:8d Broadcast 88a2 60: > > I have no idea what the last byte (or bit 3) means, so I have no > idea if > it is important. You could, of course, just do $ man tcpdump and see that the 64: and 60: are packet lengths. > Also, what are the chances of getting timestamps on the serial debug > output so that they can be paired up with the trafic logs? (which may > also help with future debugging efforts) I doubt this would be easy to do, since the server is adding the timestamps. It's a good idea though. Note that there is a sequence number in the packets that aids in identifying them as well. > In case it isn't obvious that I really have no clue what is going > on.... > does this represent a broadcast : > TX id 0 at f6e2000+40 > TX id 0 complete Yes. Have you looked at the source code? A quick: $ cd gpxe-0.5/src; grep -R "TX id" . would show the few files (src/drivers/net/rtl8139.c being the interesting one) where this string occurs. > and response from the server: > RX packet at offset 0+428 > RX packet at offset 42c+6e Well, I believe the client is receiving all broadcast packets, and since the debugging code is in the driver's polling routine, it's seeing that it received multiple broadcast packets. I suspect that not all of them are for us. > Here is my wild guess: the responce from the server is correct, but > gPXE > thinks it is invalid - like it thinks the packet is spoofed, malformed > or something else it isn't interested in. Sounds plausible. To debug further, we need to figure out what gPXE thinks of the received packets, which means finding where in the code the packet is received, and how it is evaluated. Now, we can add debugging output to Etherboot to make our lives easier, and since this is Open Source, we have the ability to have the code help figure out what's wrong. OK, that's a great start, Carl! Who else has some ideas about this? This is an "open universe" exercise. You can look at the source, suggest tools, patches, techniques, or anything else you think is useful. So, what shall we do next (and why) ? / Marty / |