Thread: Re: [Etherboot-developers] currticks function
Brought to you by:
marty_connor,
stefanhajnoczi
|
From: <ke...@us...> - 2002-07-01 10:19:16
|
>With the above implementation the bootp process works fine. However, I >have *many* problems with the tftp process. For example, the client >starts downloading the kernel image after at least 2-3 requests. For the >first requests (at least) the downloading fails due to timeouts. Many >times, the kernel downloading (even if it starts), stops before finish >succesfully (because the etherboot does not "read" a TFTP_DATA packet >from the tftpd). Moving arround with different values into the currticks >function, (especially with the value 182), I realized that the behaviour >of the system *changes*. I didn't find any documentation on currticks() >function and I don't know which has to be the value that it returns each >time is called, the period, the resolution etc. Just from a comment that >I found on cs89x0.c driver, I thought that it has a period of 55ms. Is >this true ? Is the tftp process so sensitive in time functions such as >currticks ? Thanks a lot. currticks started with the standard PC BIOS implementation. Later Eric Biederman wrote a BIOSless implementation, which is the one you are trying to mimic. currticks returns a counter that's incremented 18.2 times per second. The exact value is not very sensitive, it's only used for rough timing. However it must increase monotonically, i.e. not go backwards, or the timeout logic will get very confused. You should check that this property holds for your implementation. BTW, are you sure your arithmetic is being done in sufficient precision? You don't show the declaration of value, it should be a uint64_t. Maybe you don't even need all the bits in the low half of the timer and can do your arithmetic with 32 bit variables. Also consider the possibility you have tftp server problems. When you have this alternate implementation working, please submit it for inclusion (#ifdefed of course). Thanks. |
|
From: <ke...@us...> - 2002-07-01 15:27:16
|
>tfptd) and the problem persits. However, the behaviour of the system >changes whenever I modify the currticks() function. :( Any idea ? Sorry no. You're the one closest to the hardware. But I suggest you don't get hung up on the behaviour of the currticks constant. Collect more evidence, snoop on packets, put in printf statements, etc. Is the Infineon an x86 architecture or are you the first to port it to a different architecture? If so, watch out for latent structure alignments and byte order bugs. It should be interesting. |
|
From: <ebi...@ln...> - 2002-07-01 17:50:26
|
ke...@us... (Ken Yap) writes: > >tfptd) and the problem persits. However, the behaviour of the system > >changes whenever I modify the currticks() function. :( Any idea ? > > Sorry no. You're the one closest to the hardware. But I suggest you > don't get hung up on the behaviour of the currticks constant. Collect > more evidence, snoop on packets, put in printf statements, etc. > > Is the Infineon an x86 architecture or are you the first to port it to a > different architecture? If so, watch out for latent structure alignments > and byte order bugs. It should be interesting. The TriCore is a 32bit embedded RISC processor competing with the ARM. http://www.infineon.com/cgi/ecrm.dll/ecrm/scripts/prod_ov.jsp?oid=30926&cat_oid=-8362 It is little endian, and there are some alignment constraints. So congratulations on the first non-x86 port ar in order. Beyond this, it could be that there is a driver bug. Drivers being more susceptible than the core to differences in the hardware. With respect to currticks you will run into problems if it rolls over. overflows. Though it looks like you are probably o.k. The load_timer2 logic is also suspect. I don't suppose you have an x86 compatiable timer do you? Beyond that my best suggestion is to use tcpdump and get a packet trace from another machine on the same network segment. That and to verify that the tftp transfer works from another client machine, plugged into the same network port. If you are timing out I would suggest defing CONGESTED and see if retransmissions help. It could be just that your network loses some packets. Alternatively it could be a driver bug where it either drops packets being transmitted or received. My best guess is that playing with the timeout simply varies when another problem is detected. Anyway if you can pinpoint where in a tftp transfer the code is failing we may be able to point you in some productive directions. Another possibility is that the image you are loading is overwriting part of etherboot. Eric |
|
From: Fotis A. <fa...@te...> - 2002-07-02 10:06:43
|
Eric W. Biederman wrote: >>So congratulations on the first non-x86 port ar in order. >> Actually, it is not a port of the whole etherboot to TriCore :) We tried to "hack" the code in order to get a working version of etherboot for our needs. Our main task is to port the Linux kernel for the TriCore architecture. Now we want a fast way to download the kernel and debug it because the JTAG is very slow. Thus, we have ported only the part of the etherboot that refers to the ELF and the cs89x0 driver (because our development board has this chip). I noticed that there is a mistake in the cs89x0.h file. You have defined #define TX_AFTER_ALL 0x0060 /* Tx packet after all bytes copied */ but I think that it should be #define TX_AFTER_ALL 0x00c0 >>Beyond this, it could be that there is a driver bug. Drivers being >>more susceptible than the core to differences in the hardware. >> Why you decided to use poll instead of interrupts ? The cs89x0 drivers uses poll but I didn't check the rest of the drivers. >>With respect to currticks you will run into problems if it rolls over. >>overflows. Though it looks like you are probably o.k. >> Exactly. >>The load_timer2 logic is also suspect. I don't suppose you >>have an x86 compatiable timer do you? >> The original cs89x0 from the etherboot distribution does not use the load_timer2 so we don't use it. For the currticks we use a 55-bit timer with period of 2ns. >>Beyond that my best suggestion is to use tcpdump and get a packet >>trace from another machine on the same network segment. That and to >>verify that the tftp transfer works from another client machine, >>plugged into the same network port. >> All the transactions over the network seem to be fine. The problem is that sometimes the tftp process of the etherboot does not "read" the data of one block and the tftp server fails, because it never receives an ACK. Notice that the block that the process fails is not the same for all the times. >>If you are timing out I would suggest defing CONGESTED and see if >>retransmissions help. It could be just that your network loses >>some packets. >> I tried to use -DCONGESTED but the result remains the same. Also, I tried to connect through a hub *only* the PC that runs the tftpd server and the development board with the tricore but the problem persists. >>Alternatively it could be a driver bug where it either drops packets >>being transmitted or received. >> How can I checked this ? >>My best guess is that playing with the timeout simply varies when >>another problem is detected. >> >>Anyway if you can pinpoint where in a tftp transfer the code is >>failing we may be able to point you in some productive directions. >> It just not recognize a block of data and the server timeouts. Nothing more! :( >>Another possibility is that the image you are loading is overwriting >>part of etherboot. >> No. Because the GNU tools that we use are not support the PIC option, the executables that are produced are not rellocatable. Thus, we build the etherboot binary in such way and it is loaded "near at the end" of the SDRAM while the kernel image has been built for an address at the start of the SDRAM. If the tftp transfers the whole kernel the Linux works fine (we use a serial console for debugging purposes). I'm sure that there are no such conflicts. Fotis Andritsopoulos >> >> >>Eric >> >> >>------------------------------------------------------- >>This sf.net email is sponsored by:ThinkGeek >>Welcome to geek heaven. >>http://thinkgeek.com/sf >>_______________________________________________ >>Etherboot-developers mailing list >>Eth...@li... >>https://lists.sourceforge.net/lists/listinfo/etherboot-developers >> -- "Whom ever Controls your Perception of Reality Controls You" |
|
From: <ebi...@ln...> - 2002-07-02 20:04:47
|
Fotis Andritsopoulos <fa...@te...> writes: > Eric W. Biederman wrote: > > >>So congratulations on the first non-x86 port ar in order. > >> > Actually, it is not a port of the whole etherboot to TriCore :) We tried to > "hack" the code in order to get a working version of etherboot for our > needs. Our main task is to port the Linux kernel for the TriCore > architecture. Now we want a fast way to download the kernel and debug it because > > the JTAG is very slow. Thus, we have ported only the part of the etherboot that > refers to the ELF and the cs89x0 driver (because our development board has this > chip). I noticed that there is a mistake in the cs89x0.h file. You have defined Cool so you should generate a kernel image with valid physical addresses... This drives me nuts about the x86, and alpha ports. > >>Beyond that my best suggestion is to use tcpdump and get a packet > >>trace from another machine on the same network segment. That and to > >>verify that the tftp transfer works from another client machine, > >> plugged into the same network port. > All the transactions over the network seem to be fine. The problem is that > sometimes the tftp process of the etherboot does not "read" the data of one > block and the tftp server fails, because it never receives an ACK. Notice that > the block that the process fails is not the same for all the times. > > >>If you are timing out I would suggest defing CONGESTED and see if > >>retransmissions help. It could be just that your network loses > >>some packets. > >> > I tried to use -DCONGESTED but the result remains the same. Also, I tried to > connect through a hub *only* the PC that runs the tftpd server and the > development board with the tricore but the problem persists. Hmm. At the protocol level. Until an ack is seen a proper server will retransmit the DATA packet, or until a maximum retry count is reached. Until the next data packet is seen a proper client (-DCONGESTED) will retransmit the ack until the next data packet is seen or until a maximum retry count is reached. > >>Alternatively it could be a driver bug where it either drops packets > >>being transmitted or received. > >> > How can I checked this ? My best guess is watch the network traffic and see what the actual failure mode is. > >>My best guess is that playing with the timeout simply varies when > >>another problem is detected. > >> > >>Anyway if you can pinpoint where in a tftp transfer the code is > >>failing we may be able to point you in some productive directions. > >> > It just not recognize a block of data and the server timeouts. Nothing more! :( So occasionally it just times out? Hmm. I would really try this with another tftp client and verify that this isn't a server bug. Do you see retransmits from either the client or the server? > >>Another possibility is that the image you are loading is overwriting > >> part of etherboot. > No. Because the GNU tools that we use are not support the PIC option, the > executables that are produced are not rellocatable. Thus, we build the etherboot > > binary in such way and it is loaded "near at the end" of the SDRAM while the > kernel image has been built for an address at the start of the SDRAM. If the > tftp transfers the whole kernel the Linux works fine (we use a serial console > for debugging purposes). I'm sure that there are no such conflicts. Sounds good. Now that possibility can be ruled out :) Eric |
|
From: Fotis A. <fa...@te...> - 2002-07-02 09:40:36
|
Ken Yap wrote: > >more evidence, snoop on packets, put in printf statements, etc. > I 'm using ethereal to sniff the network and everything looks fine. The only strange thing that I noticed is that sometimes, in the tftp function, the variable prevblock still points to the previous session and the variable block points to the current one. For example, a) the tftp starts downloading the image and after some transactions stops (i.e. to the [block] 100, so [prevblock] = 100) b) the tftp session restarts and it sends a new RRQ message ([block] = 1) The [prevblock] has the value of 100 and the [block] has the value of zero or one for example. When the tftp checks if the [prevblock+1] is equal to [block] it fails. I think that the variables [block] and [prevblock] should be stored into the structure of the tftp in order to haveseperate values per tftp session. (Am I wrong?) However, I understand that this is not the "main" problem for the timeouts that I previously described. >Is the Infineon an x86 architecture or are you the first to port it to a > No, it is not a x86 architecture. However, it is mysterious because the etherboot downloads the image but after 5-6 retries... Fotis Andritsopoulos > >different architecture? If so, watch out for latent structure alignments >and byte order bugs. It should be interesting. > > >------------------------------------------------------- >This sf.net email is sponsored by:ThinkGeek >Welcome to geek heaven. >http://thinkgeek.com/sf >_______________________________________________ >Etherboot-developers mailing list >Eth...@li... >https://lists.sourceforge.net/lists/listinfo/etherboot-developers > -- "Whom ever Controls your Perception of Reality Controls You" |
|
From: Fotis A. <fa...@te...> - 2002-07-03 08:54:21
|
>
>
>So occasionally it just times out?
>Hmm. I would really try this with another tftp client and verify that
>this isn't a server bug.
>
>Do you see retransmits from either the client or the server
>
The retransmissions are occured by the tftpd server because the
etherboot does not send an ACK. However, I solved the problem by a
non-formal way. I set a breakpoint to the point that the tftp reads the
nic.packet struct
tr = (struct tftp_t *)&nic.packet[ETH_HLEN];
I realized that even if it waits for a DATA block (or an OACK), the tr
struct gets data from broadcast MAC addresses (in the tftp function).
Thus, the tftp checks in this packet for an OACK or DATA field and it
fails. It is not very clear to me if it is right to this point to find
data with broadcast addresses. So, at the beginning of the tftp function
I reconfigure the cs89x0 chip to process only packets with individual
MAC addresses. Therefore, in the driver I use the
#define DEF_RX_ACCEPT (RX_IA_ACCEPT | RX_BROADCAST_ACCEPT | RX_OK_ACCEPT)
and in the tftp function I use the
#define DEF_RX_ACCEPT_AFTER (RX_IA_ACCEPT | RX_OK_ACCEPT)
to reconfigure the chip, so as all the packet that will be processed
will have as destination only the MAC address of the development board.
The problem solved by I don't thing that this is the right way.
Fotis Andritsopoulos
--
"Whom ever Controls your Perception of Reality Controls You"
|
|
From: <ebi...@ln...> - 2002-07-03 09:12:38
|
Fotis Andritsopoulos <fa...@te...> writes: > > > > > >So occasionally it just times out? > >Hmm. I would really try this with another tftp client and verify that > >this isn't a server bug. > > > >Do you see retransmits from either the client or the server > > > > The retransmissions are occured by the tftpd server because the etherboot does > not send an ACK. However, I solved the problem by a non-formal way. I set a > breakpoint to the point that the tftp reads the nic.packet struct > > tr = (struct tftp_t *)&nic.packet[ETH_HLEN]; > > I realized that even if it waits for a DATA block (or an OACK), the tr struct > gets data from broadcast MAC addresses (in the tftp function). Thus, the tftp > checks in this packet for an OACK or DATA field and it fails. It is not very > clear to me if it is right to this point to find data with broadcast > addresses. So, at the beginning of the tftp function I reconfigure the cs89x0 > chip to process only packets with individual MAC addresses. Therefore, in the > driver I use the No it shouldn't deal with broadcast addresses. But how does the check for the appropriate tftp port fail. We should also check in software the ip address, and the mac address (We don't need the NIC to do it). But I'm curious how it got through the checks in await_reply. In 5.1.2+ I have cleaned this up a little more, and I think I may have actually implemented the check for the mac address, and the ip address. I know I noticed they were missing and implemented them on another protocol I was working on. Hmm. I wonder if that is a bug in -DCONGESTED that it doesn't retransmit ACKs when it receives a duplicate DATA packet. Eric |
|
From: <ke...@us...> - 2002-07-02 10:47:29
|
>development board has this chip). I noticed that there is a mistake in >the cs89x0.h file. You have defined > >#define TX_AFTER_ALL 0x0060 /* Tx packet after all bytes copied */ > >but I think that it should be > >#define TX_AFTER_ALL 0x00c0 Hmm, I don't have a data sheet to double check this. Maybe Markus Gutschke, who wrote the driver, can comment. >Why you decided to use poll instead of interrupts ? The cs89x0 drivers >uses poll but I didn't check the rest of the drivers. All of Etherboot uses polling. As explained in the history, this is the way the original was designed, and this design aspect has not been changed. In practice, there isn't anything wrong with polling. If you are thinking you are missing packets due to polling, all the protocols involved are synchronous so interrupts don't help there. Polling makes the drivers easier to write and debug, especially with a system just booted from raw metal---try debugging an asynchronous system someday. The drawback is that it makes callbacks hard. >All the transactions over the network seem to be fine. The problem is >that sometimes the tftp process of the etherboot does not "read" the >data of one block and the tftp server fails, because it never receives >an ACK. Notice that the block that the process fails is not the same for >all the times. According to the RFC, in this case the server is supposed to timeout and resend the packet. It's not supposed to start a new session. Things to check: Are you sure that you are generating and checking for a unique XID in the DHCP packet? That's how the client knows if the DHCP reply is meant for it. Have you checked that you are using the same tftp session that's offered by the server? There is some subtlety in the way the client switches the port number after receiving the ACK and starting the transfer, read the TFTP RFC carefully. If you reimplemented this part of the code yourself, you might have missed this subtlety. |
|
From: Markus G. <ma...@gu...> - 2002-07-02 15:51:25
|
Ken Yap wrote: > Hmm, I don't have a data sheet to double check this. Maybe Markus > Gutschke, who wrote the driver, can comment. I left all the manuals for this chip in Germany, so I can't check on what these flags should be set to. Recent Linux kernel sources seem to agree with you, though. So, this might very well be a bug in etherboot. Ken, you should probably change this value to 0xC0 so that it is the same as the one used by the Linux kernel. I guess, the reason why we got away with the old value is that it was probably interpreted as starting to transmit the packet after the first 381 bytes. As long as we delivered the remaining bytes fast enough (or as long as transmitted packets were small) this would still work. Both assumptions are probably true for most of the data that etherboot sends. Markus -- Markus Gutschke 3637 Fillmore Street #106 San Francisco, CA 94123-1600 +1-415-567-8449 ma...@gu... |
|
From: <ebi...@ln...> - 2002-07-02 19:53:44
|
ke...@us... (Ken Yap) writes: > > >Why you decided to use poll instead of interrupts ? The cs89x0 drivers > >uses poll but I didn't check the rest of the drivers. > > All of Etherboot uses polling. As explained in the history, this is the > way the original was designed, and this design aspect has not been > changed. In practice, there isn't anything wrong with polling. If you > are thinking you are missing packets due to polling, all the protocols > involved are synchronous so interrupts don't help there. Polling makes > the drivers easier to write and debug, especially with a system just > booted from raw metal---try debugging an asynchronous system someday. > The drawback is that it makes callbacks hard. There is a very real advantage in initial system bring up in that you don't need to have interrupt mapping from the pci interrupt to system interrupt numbers. On some systems it is easy on other it is hard, the only constant is that the necessary code varies widely from system to system. I have had multiple occasions where in bring up LinuxBIOS on x86 systems where etherboot works, but the kernel can't get interrupts working. Eric |
|
From: <ke...@us...> - 2002-07-03 09:06:43
|
>The retransmissions are occured by the tftpd server because the >etherboot does not send an ACK. However, I solved the problem by a >non-formal way. I set a breakpoint to the point that the tftp reads the >nic.packet struct > > tr = (struct tftp_t *)&nic.packet[ETH_HLEN]; > >I realized that even if it waits for a DATA block (or an OACK), the tr >struct gets data from broadcast MAC addresses (in the tftp function). >Thus, the tftp checks in this packet for an OACK or DATA field and it >fails. It is not very clear to me if it is right to this point to find But in this case it should just throw away the packet and wait for another packet. I don't know what your logic looks like but maybe you should look at it again. >data with broadcast addresses. So, at the beginning of the tftp function >I reconfigure the cs89x0 chip to process only packets with individual >MAC addresses. Therefore, in the driver I use the > >#define DEF_RX_ACCEPT (RX_IA_ACCEPT | RX_BROADCAST_ACCEPT | RX_OK_ACCEPT) > >and in the tftp function I use the > >#define DEF_RX_ACCEPT_AFTER (RX_IA_ACCEPT | RX_OK_ACCEPT) > >to reconfigure the chip, so as all the packet that will be processed >will have as destination only the MAC address of the development board. >The problem solved by I don't thing that this is the right way. This will work but is not the ideal solution. Although I have not seen one, it is possible for bootp servers to reply by broadcast, if they are not able to create raw packets or inject an entry into the ARP cache. So Etherboot has to accept broadcast also. |
|
From: Fotis A. <fa...@te...> - 2002-07-01 11:37:50
|
> > >>all the bits in the low half of the timer and can do your arithmetic >> I have used long long for the declarations of the variables. I also checked out the value that the currticks() returned, during an assembly-implemented udelay() function, for 1 second, and I found out that the difference before and after the delay is [18]. I have tried it with various tftpd servers (2 different Win tftpd servers and the Linux tfptd) and the problem persits. However, the behaviour of the system changes whenever I modify the currticks() function. :( Any idea ? Fotis Andritsopoulos >> >>with 32 bit variables. >> >>Also consider the possibility you have tftp server problems. >> >>When you have this alternate implementation working, please submit it >>for inclusion (#ifdefed of course). Thanks. >> -- "Whom ever Controls your Perception of Reality Controls You" |