Thread: [Etherboot-developers] [RFC] Multicast reception....
Brought to you by:
marty_connor,
stefanhajnoczi
|
From: <ebi...@ln...> - 2002-06-01 05:55:40
|
When booting large numbers of clients the reception of boot images, (if you only have one tftp server) has been show to be a bottleneck. To address that address that I am implemeting a reliable multicast download protocol in etherboot. Transmitting multicast packets is trivial and I have completed the implemetation in 10 lines of code. Receiving multicast packets is more interesting. Implementing IGMP and filtering for multicast addresses we are waiting for looks like about the same amount of work as ARP. My understanding of the guts of NIC is limited, so please check me. I believe NICs have a hardware filter that allows them to receive just broadcast packets, and packets to their own mac address. So to receive multicast packets I need open up or disable the filter all together. The simplest solution appears to be disabling the hardware filter and go into promiscous mode, and then replace the hardware filter with a software filter in await_reply. Does anyone know if there is any communication between NICs and switches about what a NIC is listening for that would make promiscous mode a bad state to put NICs into? Eric |
|
From: Donald J C. <dj...@ci...> - 2002-06-01 20:35:13
|
"Eric W. Biederman" wrote: ... > My understanding of the guts of NIC is limited, so please check me. I > believe NICs have a hardware filter that allows them to receive just > broadcast packets, and packets to their own mac address. So to > receive multicast packets I need open up or disable the filter all > together. The simplest solution appears to be disabling the hardware > filter and go into promiscous mode, and then replace the hardware > filter with a software filter in await_reply. I thought at least some NICs have support for monitoring some number of multicast addresses, in addition to broadcast and the local MAC. I'm not an expert on NICs, but I think it might be worthwhile to check the Linux drivers for some of the newer NICs and see if there is anything obvious there as far as multicast support. > Does anyone know if there is any communication between NICs and > switches about what a NIC is listening for that would make promiscous > mode a bad state to put NICs into? Other than IGMP, I am not aware of any communication along these lines. I think the main problem with promiscuous mode is the additional CPU overhead if there is a lot of unwanted traffic. Since you are doing this at boot time, it probably is not a concern, especially if all of the clients are booting at the same time, in which case there will be very little "unwanted" traffic. -Don -- Don Christensen Senior Software Development Engineer dj...@ci... Cisco Systems, Santa Cruz, CA "It was a new day yesterday, but it's an old day now." |
|
From: <ebi...@ln...> - 2002-06-01 21:14:48
|
Donald J Christensen <dj...@ci...> writes: > "Eric W. Biederman" wrote: > ... > > My understanding of the guts of NIC is limited, so please check me. I > > believe NICs have a hardware filter that allows them to receive just > > broadcast packets, and packets to their own mac address. So to > > receive multicast packets I need open up or disable the filter all > > together. The simplest solution appears to be disabling the hardware > > filter and go into promiscous mode, and then replace the hardware > > filter with a software filter in await_reply. > > I thought at least some NICs have support for monitoring some number > of multicast addresses, in addition to broadcast and the local MAC. > I'm not an expert on NICs, but I think it might be worthwhile to > check the Linux drivers for some of the newer NICs and see if there > is anything obvious there as far as multicast support. I have seen some support but unless there is a compelling reason to enable it, the complexity and the few percetage points performance gains are not worth it. > > Does anyone know if there is any communication between NICs and > > switches about what a NIC is listening for that would make promiscous > > mode a bad state to put NICs into? > > Other than IGMP, I am not aware of any communication along these lines. > I think the main problem with promiscuous mode is the additional CPU > overhead if there is a lot of unwanted traffic. Since you are doing > this at boot time, it probably is not a concern, especially if all of > the clients are booting at the same time, in which case there will be > very little "unwanted" traffic. Right. My research seems to back this up. And this probably explains why layer 2 switches have such a hard time with multicast traffic. All they have enough information to do is broadcast it. So a switch has to climb up to layer 3 and start interacting with IGMP to even see where the multicast traffic should go. Which make switches a very interesting tradeoff. When you are running through a switch generally you don't have much extraneous network traffic except broadcast traffic, so I don't see a lot of problems. The real question is for people with ISA NIC's where the bandwidth to memory is less than the network bandwidth if loosing a little extra performance is a problem. Because I'd like to enable promiscous mode unconditionally. Working at Cisco do you know why some switches loose 90% of their multicast traffic. I haven't had a chance to investigate this one personally but I have heard it reported of the Cisco gige switches amoung others. With enough clients (11+) multicast is still a win even at 90% packet loss but it would be nice to not need so many retransmissions. Eric |
|
From: Marty C. <ma...@et...> - 2002-06-01 21:45:13
|
On Fri, 31 May 2002 23:55:36 -0600 Eric W. Biederman
<ebi...@ln...> wrote:
>Transmitting multicast packets is trivial and I have completed the
>implemetation in 10 lines of code. Receiving multicast packets is
>more interesting.
This might somewhat more complicated than it first appears. Although one
can make a trivial change at the "kernel" level, every driver may have to
be changed in order to accomodate this mode of operation. I seem to
recall specifically turning off multicast reception in a number of drivers.
Some of the older cards we now support may not even reliably be able to
function in this mode. Then there's the testing of cards in this new
mode. Retransmits and -DCONGESTED may need to be revisted as well.
Although it sounds like a really cool thing, I advise caution, lots of
testing, maybe even a revival of the odd-numbered-release branch (maybe
re-initialized to current production state first or something) to find
out how well multicast will work. We also need people who are willing to
test it in busy networks.
So, my first reaction is "go for it", but carefully ;)
Marty
--
Try: http://rom-o-matic.net/ to make Etherboot images instantly.
Name: Marty Connor
US Mail: Entity Cyber, Inc.; P.O. Box 391827;
Cambridge, MA 02139; USA
Voice: (617) 491-6935; Fax: (617) 491-7046
Email: ma...@et...
Web: http://www.etherboot.org/
|
|
From: <ebi...@ln...> - 2002-06-02 14:20:46
|
Marty Connor <ma...@et...> writes: > On Fri, 31 May 2002 23:55:36 -0600 Eric W. Biederman > <ebi...@ln...> wrote: > >Transmitting multicast packets is trivial and I have completed the > >implemetation in 10 lines of code. Receiving multicast packets is > >more interesting. > > This might somewhat more complicated than it first appears. Although one > can make a trivial change at the "kernel" level, every driver may have to > be changed in order to accomodate this mode of operation. I seem to > recall specifically turning off multicast reception in a number of drivers. Do you recall any specific problems, or performance issues for why you did that? > Some of the older cards we now support may not even reliably be able to > function in this mode. Then there's the testing of cards in this new > mode. Retransmits and -DCONGESTED may need to be revisted as well. Cards won't operate reliably if you ask them to receive all packets? Hmm. Skimming the kernel it has both the concept of interfaces in promiscuous mode, as well as the concept of interfaces in just multicast reception mode. And it seems receiving all multicast packets is widely implemented, and generally easy to setup. > Although it sounds like a really cool thing, I advise caution, lots of > testing, maybe even a revival of the odd-numbered-release branch (maybe > re-initialized to current production state first or something) to find > out how well multicast will work. We also need people who are willing to > test it in busy networks. I think I can find a busy network or two. > So, my first reaction is "go for it", but carefully ;) Given the warning I will start with -DALLMULTI so the driver changes can be enabled/disabled. For some cards doubtless this will be promiscuous mode. But there is no point in asking for more than I need. Eric |
|
From: Anselm M. H. <an...@ho...> - 2002-06-02 17:24:45
|
> multicast reception mode. And it seems receiving all multicast > packets is widely implemented, and generally easy to setup. Another question is how to handle that multicast data. > I think I can find a busy network or two. Me, too. Good a dozen clients on an 10MB BNC can be.....making you brew lots of coffee. Especially when users start flood pinging (there have to winblows-machines on the cable) or transferring gigabytes to the old novell server) > Given the warning I will start with -DALLMULTI so the driver changes > can be enabled/disabled. For some cards doubtless this will be > promiscuous mode. You surely are able to enable/disable promiscous mode on the fly, are you? So let's just assume - if it is needed at all - that it is only activated while retrieving a file (so while one process is near to continuously listening on the wire). > But there is no point in asking for more than I need. Please see RFC2090 (tftp multicast). It offers you quite a lot of what you need. Can be downloaded from http://www.faqs.org/rfcs/rfc2090.html I'm afraid googl'ing didn't hand out any server for that. Perhaps someone needs to write one. Just to describe the additional needs for clients: - They have to implement listening on a multicast address arbitrarily given by the tftp server - They have to keep track of which packages were ok and which were not... 1 bit to be stored for each 512 bytes of image data to be downloaded. As block# is a 16bit-word, we would need 8k of RAM for just the case of an 32MB file to be transferred. - They have to be able to differentiate between TFTP and TFTP-MCAST files to be downloaded... Imagine a 1.2kB motd file to be retrieved versus a 1934kB kernel image. In first case, initiating an mcast request makes not so much sense, does it? So let's assume the multicast ability should be transparent to the rest of the prom and should only be programmed inside tftp_getfile (sorry, no idea right know how that function is called. Must be something like that) so that e.g. the os loader only asks for "/vmlinuz" and does not have any interests in the mode how the file comes to the local memory. Do you think it's a good idea to have the client always try multicast if that's enabled (remember: The server could differentiate after the criterium of [files smaller than 10kB -> no multicast], but per specification, multicast servers listen on port 1758 instead of 69)? We could of course replace the old tftp daemon by one that supports multicasting and on the fly decides for us. Or we could say all filenames beginning with e.g. the "{" character will be stripped and requested by multicast... or both... In any case, it's a lot of work. And it could improve a lot the look&feel, what I'm always interested in Anselm |
|
From: <ebi...@ln...> - 2002-06-02 18:55:38
|
"Anselm Martin Hoffmeister" <an...@ho...> writes: > > multicast reception mode. And it seems receiving all multicast > > packets is widely implemented, and generally easy to setup. > Another question is how to handle that multicast data. > > > I think I can find a busy network or two. > Me, too. Good a dozen clients on an 10MB BNC can be.....making you brew lots > of coffee. Especially when users start flood pinging (there have to > winblows-machines on the cable) or transferring gigabytes to the old novell > server) > > > Given the warning I will start with -DALLMULTI so the driver changes > > can be enabled/disabled. For some cards doubtless this will be > > promiscuous mode. > > You surely are able to enable/disable promiscous mode on the fly, are you? > So let's just assume - if it is needed at all - that it is only activated > while retrieving a file (so while one process is near to continuously > listening on the wire). An important maintenace feature is to keep the drivers as simple as possible. So unless I can confirm there are real world problems with enabling multicast reception all of the time -DALLMULTI will be just an option during development. > > But there is no point in asking for more than I need. > > Please see RFC2090 (tftp multicast). It offers you quite a lot of what you > need. Can be downloaded from http://www.faqs.org/rfcs/rfc2090.html It's a good RFC, but it has several problems. - It doesn't fix TFTP's streaming problem. That is the server has challenges to keep the network busy. - It requires all clients to be registered. My design goal is 10,000 clients. If something goes wrong 10,000 timeouts can be a real problem. - It doesn't handle large files. (A requirement for using multicast transfers for other purposes). I already have a tested protocol that is a little simpler on the server end but it about equivalent on the client end. > I'm afraid googl'ing didn't hand out any server for that. Perhaps someone > needs to write one. atftp almost implements it, but I really don't like it's use of threads. > Just to describe the additional needs for clients: > > - They have to implement listening on a multicast address arbitrarily given > by the tftp server > > - They have to keep track of which packages were ok and which were not... 1 > bit to be stored for each 512 bytes of image data to be downloaded. As > block# is a 16bit-word, we would need 8k of RAM for just the case of an 32MB > file to be transferred. > > - They have to be able to differentiate between TFTP and TFTP-MCAST files to > be downloaded... > Imagine a 1.2kB motd file to be retrieved versus a 1934kB kernel image. In > first case, initiating an mcast request makes not so much sense, > does it? - They have to handle receiving packets in random order. > So let's assume the multicast ability should be transparent to the rest of > the prom and should only be programmed inside tftp_getfile (sorry, no idea > right know how that function is called. Must be something like that) so that > e.g. the os loader only asks for "/vmlinuz" and does not have any interests > in the mode how the file comes to the local memory. Do you think it's a good > idea to have the client always try multicast if that's enabled (remember: > The server could differentiate after the criterium of [files smaller than > 10kB -> no multicast], but per specification, multicast servers listen on > port 1758 instead of 69)? There are only 2 cases where multicast is not a win. 1) Very few clients want the file, and lots of people are listening on the multicast ip, (it is being broadcast across a switch). 2) You are in an environment where multicast traffic is much worst than unicast trafic. > We could of course replace the old tftp daemon by > one that supports multicasting and on the fly decides for us. Or we could > say all filenames beginning with e.g. the "{" character will be stripped and > requested by multicast... or both... Or we can put a URL in the file name field, my preference. And then we can compile in the multicast client, the nfs client, and the classic tftp client, and the the dhcp server decide. > In any case, it's a lot of work. And it could improve a lot the look&feel, > what I'm always interested in It is some work but except for research and protocol scrutinization, the coding isn't that hard. The only really hard bit is the work of stabalization. Eric |
|
From: Anselm M. H. <an...@ho...> - 2002-06-02 19:18:51
|
[...] URL support Nice idea. Could be implemented right away, without longer testing, I think. [...] another protocol than tftp [...] 10000 clients So you think in some other scale. Is your protocol RFC'ed? Tested? I'm interested in that protocol, as in the implementation. Tell me more.... Anselm |
|
From: <ebi...@ln...> - 2002-06-02 19:27:43
|
"Anselm Martin Hoffmeister" <an...@ho...> writes: > [...] URL support > > Nice idea. Could be implemented right away, without longer testing, I think. > > [...] another protocol than tftp > [...] 10000 clients > > So you think in some other scale. Is your protocol RFC'ed? Tested? It has undergone some testing, and it handled 240 clients just fine. There was some usage of TCP that is being converted to UDP, and in doing so it is being made more robust. The testing for the networking booting side will happen as soon as I have a functinging implementation in etherboot. So the protocol is still undergoing refinement. > I'm interested in that protocol, as in the implementation. Tell me > more.... As I get the code completed/committed. Baring problems that should be later this week. Eric |
|
From: <ebi...@ln...> - 2002-06-09 19:11:08
|
"Anselm Martin Hoffmeister" <an...@ho...> writes:
> > To address that address that I am implemeting a reliable multicast
> > download protocol in etherboot.
>
> How far have you got? I'm just working on something similar (not to grab you
> the price, it was your idea) and would like to save work.
I have a functioning client. I still need to test multiple clients at
once and implement support in more NICs but the core work is done.
> > Transmitting multicast packets is trivial and I have completed the
> > implemetation in 10 lines of code. Receiving multicast packets is
> > more interesting.
>
> On hardware level, especially. This would have to be implemented for each
> card type on its own, if possible using the hardware filter.
Right. Since most cards implement a filter for multicast packets opening
up that wide instead of to everything that promiscuous mode allows looks fine.
> > Implementing IGMP and filtering for multicast addresses we are waiting
> > for looks like about the same amount of work as ARP.
>
> I was not concerned with arp. Let's assume - just for now - that we need no
> IGMP, as any client hangs on the same subnet as the multicasting server,
> which anyway sends its packets out to the first hardware network (doesn't
> it?)
Nope switches need IGMP. Besides I already have it implemtned. :)
> > My understanding of the guts of NIC is limited, so please check me. I
> > believe NICs have a hardware filter that allows them to receive just
> > broadcast packets, and packets to their own mac address. So to
> > receive multicast packets I need open up or disable the filter all
> > together. The simplest solution appears to be disabling the hardware
> > filter and go into promiscous mode, and then replace the hardware
> > filter with a software filter in await_reply.
>
> That's quite right. But afaik it's not that much more work to implement the
> hardware filter with
> some cards (e.g. via-rhine seems to have one that is programmable easily) so
> one {you, me, we} should intend any code changes to implement the hardware
> filter as well- if the nic supports it.
>
> As I learned from the packet driver specifiactions (somewhere at
> crynwr.com), there are different states a card can be in, listed (missing
> one? not sure)
> 1/ accept no input packets (not interesting for us)
> 2/ accept packets for our MAC-address
> 3/ like 2/ plus accept broadcast packets (standard etherboot modus?)
> 4/ like 3/ plus accept multicast packets to a list of mc. addresses
> 5/ like 3/ plus accept all multicast packets
> 6/ accept any packets travelling by
>
> As you might have read already, multicast packets are addressed to special
> ethernet addresses (00:50:5f:xx:yy:zz or so, no docs at hand), leaving away
> the top 8 bit of the Multicast IP address. So in any case we need a filter
> to differentiate between multicast packets to 235.12.45.67 and 228.12.45.67.
> BTW, the developers seem to be not too sure wether the topmost bit of [xx]
> always has to be zero (so that 228.12.45.67 maps to the same MAC as
> 228.140.45.67).
>
> Most (all?) cards seem to have support for mode 1, 2, 3, 6 so (in case)
> having no hardware filter is the general solution. Some cards (the more
> expensive ones like via-rhine :-) have at least support for mode 4 too,
> which is the most desirable.
And 5/ is even more common than 4/ from skimming the kernel. Only very
old cards don't seem to have ti implemented. That is what I think I
would like to make the new default etherboot mode. For most
practicial purposes it is what we have today but it allows us to
receive multicast packets as well.
> > Does anyone know if there is any communication between NICs and
> > switches about what a NIC is listening for that would make promiscous
> > mode a bad state to put NICs into?
>
> Switches are dumb, aren't they? They forward (afaik) multicast and broadcast
> packets to any device connected. If they don't work with multicast, you're
> lost of course. My switch works fine (test yours with issuing a "route
> add -net 224.0.0.0 netmask 240.0.0.0 dev eth0" on two PCs on the switch and
> pinging to 224.0.0.1 - every packet should be dup'ed, no matter which host
> you ping from). As I see it, no packet may have a multicast address as
> originator - that would break the concepts of retransmission on that OSI
> level etc pp. So the switch will treat packets to the multicast MAC
> addresses like these where it doesn't know the port of the destination -
> just dump it to all ports.
> That is not always what you want, as too much multicast packets will fill
> the bandwith even of parts of your structured network where they are not
> needed. That's where routers are for!
But there are lots of various levels of inteligence in switches. So a
smarter switch will sneak up to level 3 to do igmp to see where
multicast packets should go.
> What I wanted to find out:
> Did you in the meantime implement the card driver multicast code
> portion?
For the eepro100 yes. More to come.
> Let's make a standard for it, e.g. a function in the driver code for
> "listen_for_multicast (1=on, 0=off)" or even better the chance to listen for
> some addresses only... what the specific card driver makes out of it, no
> matter, a software filter is neccessary anyway and should be implemented
> *outside* the drivers (as it would be the same code in all of that
> drivers).
Unless someone can show me that listening for all multicast packets as
the come along is decidely bad. I'm just going to turn it on by
default, and have a compile time option during the transition period.
Etherboot doesn't need super high performance drivers, just drivers
that are good enough. And if a card really cares I guess it can spy
on the igmp table. But I would be really suprised if that mattered.
> Let me know if you have - specifically - drivers for ne2k-isa (ns8390.[ch])
> and via-rhine (via-rhine.c) as these are the cards I'm working with.
Grep through the linux drivers for ALLMULTI it looks like a single
additional outb in most cases.
> AMD-home-pna stands on this list as the next, as that is the card vmware
> simulates and can be tested more easily - rebooting is more commode, and you
> don't need the second screen and keyboard. If you have any of those, less
> work for me, more honor for you - else I will take off my gloves and grab
> right into the dustiest code.
I will see if I can get my code checked in the next couple of hours so
you can see where I have gone. No promises until tomorrow though.
Eric
|
|
From: Donald J C. <dj...@ci...> - 2002-06-10 16:46:31
|
"Eric W. Biederman" wrote: > > "Anselm Martin Hoffmeister" <an...@ho...> writes: ... > > As you might have read already, multicast packets are addressed to special > > ethernet addresses (00:50:5f:xx:yy:zz or so, no docs at hand), leaving away > > the top 8 bit of the Multicast IP address. ... Just a slight correction here. The least significant bit of the most significant byte of the MAC address needs to be a one for multicast. Ie, 01.00.5e.xx.yy.zz is typically used for addresses managed by IGMP. -Don -- Don Christensen Senior Software Development Engineer dj...@ci... Cisco Systems, Santa Cruz, CA "It was a new day yesterday, but it's an old day now." |
|
From: <ebi...@ln...> - 2002-06-10 17:24:42
|
Donald J Christensen <dj...@ci...> writes: > "Eric W. Biederman" wrote: > > > > "Anselm Martin Hoffmeister" <an...@ho...> writes: > ... > > > As you might have read already, multicast packets are addressed to special > > > ethernet addresses (00:50:5f:xx:yy:zz or so, no docs at hand), leaving away > > > the top 8 bit of the Multicast IP address. > ... > > Just a slight correction here. The least significant bit of the most > significant byte of the MAC address needs to be a one for multicast. > Ie, 01.00.5e.xx.yy.zz is typically used for addresses managed by IGMP. Right the code is already doing this correct. But thanks for the catch. Eric |
|
From: Anselm M. H. <an...@ho...> - 2002-06-11 14:26:07
|
> Right the code is already doing this correct. But thanks for the catch. You wrote about putting public your code. Did you? Where can I get it, is the CVS always the uptodate source (I never worked with CVS before)? As now minor changes to some (sooner or later most/all) hardware drivers arise (though only triggered when MULTICAST is enabled) perhaps 5.0.7-rc should stay rc for some days, so that these changes can make their way to the next release. The opposite argumentation would be to release stable and old-fashioned, giving the multicast out to public retarded. That's not what I (my two cents, you know, thanks to Peter Billson for explaning me that phrase) like, the more public the better the testing. We don't force anyone to enable that configuration switch. To Eric: What about the standard protocol (or less-standard?) you want to have for multicast? Do you have documentation at hand? Else I would like to start adding tftp-mcast support as my quickly hacked daemon for that protocol at least runs, on low-load-testing with two clients stably. Perhaps if ready I could have a mass-test (ok, 15 clients is not much, but better that nothing) the after-next weekend at my "private testing laboratory", until then I should have made a release out of it, announce will follow. But of course, if you have a better protocol at hand, please let me know. An*getting CVS to work right now*selm |
|
From: <ebi...@ln...> - 2002-06-11 18:37:00
|
Anselm Martin Hoffmeister <an...@ho...> writes: > > Right the code is already doing this correct. But thanks for the catch. > > You wrote about putting public your code. Did you? Where can I get it, is the > CVS always the uptodate source (I never worked with CVS before)? I'm getting there. I'm so busy working on it that I haen't had a free moment to do that yet. Hopefully I can get that done later today. > As now minor changes to some (sooner or later most/all) hardware drivers > arise (though only triggered when MULTICAST is enabled) perhaps 5.0.7-rc > should stay rc for some days, so that these changes can make their way to the > next release. The opposite argumentation would be to release stable and > old-fashioned, giving the multicast out to public retarded. That's not what I > (my two cents, you know, thanks to Peter Billson for explaning me that > phrase) like, the more public the better the testing. We don't force anyone > to enable that configuration switch. There are pieces that could make 5.0.7-rc but I have enough changes in other parts of the code with hard driver booting etc, that I'd rather push it to 5.0.8. > To Eric: > What about the standard protocol (or less-standard?) you want to have for > multicast? Do you have documentation at hand? Yes I have one but I'm not going to fix anything in stone until I get some successful large scale testing. My biggest hold up is someone let the test environment at work get into a sorry state. And I have been rebuilding it. > Else I would like to start > adding tftp-mcast support as my quickly hacked daemon for that protocol at > least runs, on low-load-testing with two clients stably. Perhaps if ready I > could have a mass-test (ok, 15 clients is not much, but better that nothing) > the after-next weekend at my "private testing laboratory", until then I > should have made a release out of it, announce will follow. But of course, if > you have a better protocol at hand, please let me know. Given that I will definentily see what I can push into the 5.1 tree in cvs. > An*getting CVS to work right now*selm Good luck on that. CVS works as a pretty good distribution mechanism. If you prefer patches I can go that route to. Eric |
|
From: <ebi...@ln...> - 2002-06-12 06:12:41
|
Anselm Martin Hoffmeister <an...@ho...> writes: > What about the standard protocol (or less-standard?) you want to have for > multicast? Do you have documentation at hand? You can scan the code for some more details but here is the basic protocol I am using. The target is multicast data on local networks. So I have ttl set to 1 on all of my nodes. The security is approxiametly equal to tftp. There are 3 types of packets (DATA, NACK, REQ). There are 2 channels multicast and unicast. There are 2 kinds of clients master/non-master. The protocol handles both small and terabyte sized files, a variable length binary encoding is used for the numbers to keep the overhead down. The DATA packet is transmitted over the multicast channel, it contains: transaction number, total file size, block size, packet number, block size bytes worth of data The NACK pact is transfered unicast to the server. It contains from the begining of the file, pairs of. received packets, requested packets The REQ packet is transfered unicast to the clients, it contains. transaction number, total file size, block size, Unicast is used for the NACK and REQ packets because they aren't broadcast to everyone on a pure layer 2 switch, and are a little more likely to get through. Additionally in any direction now there is only one type of packet per ip address, port number pair. server->REQ->client unicast server<-NACK<-client unicast server->DATA->client multicast The client: At startup it first listens on the multicast channel and if it finds data it downloads it, otherwise after a timeout the client sends a NACK from the server to get the download started. After receiving data if the client has not received everything it waits for the transmission to restart, after the appropriate timeout it sends a NACK doubles the possible timeout interal, and waits again. The exponential backoff of the clients should keep the network quiet when the clients are running and the network is busy. When all of the data has been received, if the client has transmitted a NACK, it should transmit an additional NACK to the master consisting of just the byte 0, to indicate it is going away. The final empty nack is an optimization to tell the server the client is gone. If the packet doesn't make it oh well. Another optimization involves the server sending a REQ to the client. In which case the client forgets it's timeout and sends a NACK immediately to the server. This allows the server to pick a ``master'' client and pick on it until that client has all of it's data, ensuring some level of fairness. The server: Starts up and listens for nacks. For every NACK it gets it adds the sending machine to it's list of known clients. Unless it is the special disconnect NACK in which case it removes the client. Looking at the data from the NACKs the server decides to send some data. Generally all the clients requested data is sent but on large files it can be beneficial to send only as much as the server can easily cache. After the data is transmitted the server picks on a known client and sends a REQ. If the client doesn't respond within the servers timeout it picks on another client and sends a REQ, and forgets the previous client even existed. Comments: By doing all of the control packets over udp, and having no explicit acknowledgement that the data even arived, some interesting things result. 1) Minimum network packet count. (except in the case of a slow server, whom all of the clients NACK) 2) Multiple policies can be implemented by both the client and the server. 3) By tracking which packets of the entire transmission have arrived, and delaying the NACKs full network bandwidth can be achieved. As opposed to TFTP which is limited by the round trip time. >Else I would like to start > adding tftp-mcast support as my quickly hacked daemon for that protocol at > least runs, on low-load-testing with two clients stably. Perhaps if ready I > could have a mass-test (ok, 15 clients is not much, but better that nothing) > the after-next weekend at my "private testing laboratory", until then I > should have made a release out of it, announce will follow. But of course, if > you have a better protocol at hand, please let me know. Everything is now in CVS. Take whichever one you prefer. It wouldn't be evil to have both in etherboot, but I would be surprised if the experimental multicast tftp had any advantages except better documentation. Eric |
|
From: <ebi...@ln...> - 2002-06-12 07:06:21
|
Other points, that occur to me. On testing. I have already tested on a 15 node test cluster, and the code will be regularly tested, and used on this cluster. In production it will be regularly used for installs on various small (4-64) node clusters. And in a month I can start testing on a 1000 node cluster as it is built up. Currently the protocol is going by the name SLAM. Scalable Local Area Multicast. Plus that is what it does to your network when it is transmitting data :) Slam seems to be close to one of those simple points where you gain power because you are so simple. The biggest advantage of the unicast channel for the clients is a sysadmin only has to worry about a bad server going crazy with multicast data, not the clients. The biggest downside at the moment is there is only a GPL'd client and not an open source server for SLAM. (The best I could arrange). But with the protocol details out there and a relatively simple protocol I can't imagine writing a server will be too hard. Eric |