etherboot-developers Mailing List for Etherboot (Page 248)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>O.k. Looking at this there is a bug in how the ISAPNP id's are
>encoded.  You have been using the pci vendor id's instead of the=20
>EISA/PNP vendor id's.

Ok, fixed and checked in.

ke...@us... (Ken Yap) writes:

> >In any event these are the macro the linux kernel uses to deal with
> >these numbers internal.  The isa-pnp spec has more information.
> >
> >#define ISAPNP_VENDOR(a,b,c)	(((((a)-'A'+1)&0x1f)<<2)|\
> >				((((b)-'A'+1)&0x18)>>3)|((((b)-'A'+1)&7)<<13)|
> >\
> >				((((c)-'A'+1)&0x1f)<<8))
> >#define ISAPNP_DEVICE(x)	((((x)&0xf000)>>8)|\
> >				 (((x)&0x0f00)>>8)|\
> >				 (((x)&0x00f0)<<8)|\
> >				 (((x)&0x000f)<<8))
> >
> >The endian issues with the EISA part number drive me crazy.  I think
> >the data is normally stored big endian.  But I am confused.  For
> >purposes of synching with modules.isapnpmap from the linux kernel=20
> >these values produced by the macros are the values we need.=20=20
> >
> >Using these in etherboot we should be able to have readable ISA part
> >numbers as well as fitting everything snuggly into 32 bits.
> >
> >The linux kernel 3c509.c driver is a good one to work against as it
> >actually implements the discovery via the isapnp id's.
> >
> >If we are going to fix the interface to the id's in 5.0.7 we need to
> >be certain we get this correct.
> 
> Ok, an interesting issue, the Linux 3c509 driver actually declares
> several vendor IDs, depending on the model, including the generic PNP
> for compatibles. Conceivably we could distinguish between the models in
> Etherboot as we know the configuration registers, though that's not very
> useful information to the sysadmin as one Linux driver will handle them
> all. A similar situation applies to the ne.c driver. So shall we just go
> for the generic ISAPNP_VENDOR('P','N','P'), with the exception of the
> 3c515 which uses TCM? In which case the vendor_id field is effectively a
> constant except for the 515. A pity but there is no ambiguity with the
> device id anyway.

As long as we are basically correct the system works.  If someone
finds it useful and they care about distinguishing between the fine
variants they can work out the details themselves.  I just want to
make certain that the id's we are passing are valid, and actually map
to the cards that we have.

The PNP vendor comes from retro fitting cards that were produced
before the isapnp spec was written.  Newer cards should have something
of their own.

> Also only a few Linux drivers actually declare this information but the
> table you supplied in a previous mail has most of the ones we care
> about.

Again this is one of those areas where if enough people care the
drivers will be updated.  Especially as the isapnp detection is much
more reliable than random port probes.

At this point I believe we have enough information to do a reasonable
job for all of our isa cards.  Getting isa working at all was going
the extra mile for me.  Now that the foundation is laid if enough
people on old low end machines care we can do some nice improvements
to the drivers.  But until then...

O.k. now back to dealing with the BIOS issues I get when I have 4GB of
ram plugged into a PC, and figuring out how to auto-detect pci-x or
pci and at which bus speed 33, 66, 100 or 133Mhz I should run the card
at :)  Except superio chips ISA isn't even a real case for me. 

Eric

>In any event these are the macro the linux kernel uses to deal with
>these numbers internal.  The isa-pnp spec has more information.
>
>#define ISAPNP_VENDOR(a,b,c)	(((((a)-'A'+1)&0x1f)<<2)|\
>				((((b)-'A'+1)&0x18)>>3)|((((b)-'A'+1)&7)<<13)|
>\
>				((((c)-'A'+1)&0x1f)<<8))
>#define ISAPNP_DEVICE(x)	((((x)&0xf000)>>8)|\
>				 (((x)&0x0f00)>>8)|\
>				 (((x)&0x00f0)<<8)|\
>				 (((x)&0x000f)<<8))
>
>The endian issues with the EISA part number drive me crazy.  I think
>the data is normally stored big endian.  But I am confused.  For
>purposes of synching with modules.isapnpmap from the linux kernel=20
>these values produced by the macros are the values we need.=20=20
>
>Using these in etherboot we should be able to have readable ISA part
>numbers as well as fitting everything snuggly into 32 bits.
>
>The linux kernel 3c509.c driver is a good one to work against as it
>actually implements the discovery via the isapnp id's.
>
>If we are going to fix the interface to the id's in 5.0.7 we need to
>be certain we get this correct.

Ok, an interesting issue, the Linux 3c509 driver actually declares
several vendor IDs, depending on the model, including the generic PNP
for compatibles. Conceivably we could distinguish between the models in
Etherboot as we know the configuration registers, though that's not very
useful information to the sysadmin as one Linux driver will handle them
all. A similar situation applies to the ne.c driver. So shall we just go
for the generic ISAPNP_VENDOR('P','N','P'), with the exception of the
3c515 which uses TCM? In which case the vendor_id field is effectively a
constant except for the 515. A pity but there is no ambiguity with the
device id anyway.

Also only a few Linux drivers actually declare this information but the
table you supplied in a previous mail has most of the ones we care
about.

ke...@us... writes:

> I have checked into CVS:
>=20
> For 5.0.7 and 5.1.2 candidates:
>=20
> Changes to NIC identification scheme from variable length string to
> fixed binary structure. Added ISA IDs to drivers.

O.k. Looking at this there is a bug in how the ISAPNP id's are
encoded.  You have been using the pci vendor id's instead of the=20
EISA/PNP vendor id's.

The 32-bit Vendor ID is an EISA Product Identifier (ID).  This ID consists =
of:
=B7	bits[15:0] -  three character compressed ASCII EISA ID.
	Compressed ASCII is defined as 5 bits per character, "00001" =3D "A" ... "=
11010" =3D "Z". This field is assigned to each manufacturer by the EISA adm=
inistrative agent.
=B7	bits[31:16] -  manufacturer specific product number and
        revision.  It is the responsibility of each vendor to select
        unique values for this field.=20

For the crystal lan the partnumber is CSC0007 (For the CSC8900 rev B)
Cyrus Logic has a good set of data sheets online for it.

In any event these are the macro the linux kernel uses to deal with
these numbers internal.  The isa-pnp spec has more information.

#define ISAPNP_VENDOR(a,b,c)	(((((a)-'A'+1)&0x1f)<<2)|\
				((((b)-'A'+1)&0x18)>>3)|((((b)-'A'+1)&7)<<13)|\
				((((c)-'A'+1)&0x1f)<<8))
#define ISAPNP_DEVICE(x)	((((x)&0xf000)>>8)|\
				 (((x)&0x0f00)>>8)|\
				 (((x)&0x00f0)<<8)|\
				 (((x)&0x000f)<<8))

The endian issues with the EISA part number drive me crazy.  I think
the data is normally stored big endian.  But I am confused.  For
purposes of synching with modules.isapnpmap from the linux kernel=20
these values produced by the macros are the values we need.=20=20

Using these in etherboot we should be able to have readable ISA part
numbers as well as fitting everything snuggly into 32 bits.

The linux kernel 3c509.c driver is a good one to work against as it
actually implements the discovery via the isapnp id's.

If we are going to fix the interface to the id's in 5.0.7 we need to
be certain we get this correct.

Eric

ke...@us... (Ken Yap) writes:

> >or what my dhcpd.conf says
> >	filename "x-tftm://172.16.75.1/vmlinuz.ltsp";
> >this morning, 10:39 GMT.
> 
> Hmm, also a redundant specification for TFTM. The server should be taken
> from siaddr, not the URI.

The URL spec allows for it, and I started this trend. 
> 
> Also I want it so that by default the current syntax implies TFTP so
> that it's backward compatible. That is, if I continue to use filename
> "lts/vmlinux.ltsp" I get TFTP, no surprises.

Skimming the code tftp is the default if you don't have a url.

I guess the big concern at this point is how to make it work
with a minimum of code size.  The size of the filename in the DHCP
packet should not be a big concern.

The question is what is the natural way to extend DHCP to direct
a client to boot from other protocols.  URLs seem a natural extension,
as does extending the next-server option.

So I suggest we just test for protocol:/// in all cases.  We should require
at least the double slash.  And the tripple slash probably works, in
most cases. 

For minimum code size probably the best solution is to have a table, of protocol
name, and handler.

struct proto {
        char *name;
        int (*load)(const char *rest, int (*fnc)(unsigned char *, unsigned int, unsigned int, int))
};
static struct proto protos[] = {
#ifdef DOWNLOAD_PROTO_SLAM
	{ "x-slam", url_slam },
#endif
#ifdef DOWNLOAD_PROTO_NFS
	{ "nfs", nfs },
#endif
#ifdef DOWNLOAD_PROTO_DISK
        { "file", url_disk },
#endif
        { 0, 0 }
};
int loadkernel(const char *fname)
{
        struct proto *proto;
        for(proto = &protos[0]; proto->name; proto++) {
		int len;
		len = strlen(proto->name);
		if ((memcmp(fname, proto->name, len) == 0) &&
			(memcmp(fname + len, ":///", 4) == 0)) {
			fname += len + 4;
			return proto->load(fname, load_block);
		}
	}
        return tftp(fname, load_block);
}

Given that more people are used to urls than parsing the weird dhcp
history it might be worth the little bit of extra code to parse an ip
address if that is present.

Thoughts?

Eric

Anselm Martin Hoffmeister <an...@ho...> writes:

> Hi list,
> 
> after some days of hard coding (and several times hanging around searching 
> for typos) I finally managed to boot my first kernel image with
> TRIVIAL FILE TRANSFER PROTOCOL MULTICAST MODE
> or what my dhcpd.conf says
> 	filename "x-tftm://172.16.75.1/vmlinuz.ltsp";
> this morning, 10:39 GMT.

> BTW: Is there any *tested* and *stable* Tftp-server out there that supports 
> multicast, blocksize and tsize (as my code requires all three options)?

Look up atftp.  As a normal tftp server I have gotten good results, and I
recently noticed it supports tftp multicast.  So in theory it should work
but it hasn't been tested with any other tftp clients yet.  I just noticed
this yesterday.

> I will have to adapt a lot anyhow, as timeouts and so need a touch, but I 
> would like to keep it mostly at this skeleton. If there is nothing really 
> usable out there, I will have to hand out my own server, but it's ugly coded 
> and miles away from being "ready" - it even has no timeouts yet, but handles 
> several clients on the same file, which was most important.
> 
> Following the philosophy of "Release early, patch often...":
> Comments welcome.

Eric

>or what my dhcpd.conf says
>	filename "x-tftm://172.16.75.1/vmlinuz.ltsp";
>this morning, 10:39 GMT.

Hmm, also a redundant specification for TFTM. The server should be taken
from siaddr, not the URI.

Also I want it so that by default the current syntax implies TFTP so
that it's backward compatible. That is, if I continue to use filename
"lts/vmlinux.ltsp" I get TFTP, no surprises.

>I got the lastest sources and made up a diff file against them - I'm not sure
>if all that stuff compiles, or if it breaks anything for you, but apply that 
>patch to the sources in etherboot 5.1 -not forgetting the backup option of 
>"patch" - and copy proto_tftm.c to that dir, so you get the following new 
>possibilities in "Config":
>
>-DURI_SUPPORT
>	you will need this anyway
>-DURI_SUPPORT_FILE
>	so you can use filenames like "file:/disk/0" and so on.
>-DURI_SUPPORT_TFTP
>	so "tftp:///filename" will be recognised.
>-DURI_SUPPORT_NFS
>	as well
>-DURI_SUPPORT_SLAM
>	so "x-slam://" URIs are recognised. Ask Eric how far his slam code wen
>t, you
>	probably want to set -DDOWNLOAD_PROTO_SLAM too.
>-DURI_SUPPORT_TFTM
>	so "x-tftm://server-ip/filename" can be recognised. You will also have
> to
>	set the -DDOWNLOAD_PROTO_TFTM option for this to work.

Please rework it so that only URI_SUPPORT needs to be defined and then
it implies URI_SUPPORT_XXX depending on whether DOWNLOAD_PROTO_XXX has
been defined. I don't want 5 additional options when 1 will do.

>These URIs can be extended to allow specification of anti-standard UDP ports 
>or another server's IP (no DNS [for now?], so no hostnames). For TFTM, 
>servername works fine (don't know if it's even required to be present !? - no
>matter, bugfixing tftm will be necessary for some time)

Please publish a specification of the URIs. Do you take the TFTP server
from only the DHCP siaddr field?  I don't want a redundant specification
of a different server when they should be using next-server in
dhcpd.conf if they want a different TFTP server.

Hi list,

after some days of hard coding (and several times hanging around searching 
for typos) I finally managed to boot my first kernel image with
TRIVIAL FILE TRANSFER PROTOCOL MULTICAST MODE
or what my dhcpd.conf says
	filename "x-tftm://172.16.75.1/vmlinuz.ltsp";
this morning, 10:39 GMT.

I got the lastest sources and made up a diff file against them - I'm not sure 
if all that stuff compiles, or if it breaks anything for you, but apply that 
patch to the sources in etherboot 5.1 -not forgetting the backup option of 
"patch" - and copy proto_tftm.c to that dir, so you get the following new 
possibilities in "Config":

-DURI_SUPPORT
	you will need this anyway
-DURI_SUPPORT_FILE
	so you can use filenames like "file:/disk/0" and so on.
-DURI_SUPPORT_TFTP
	so "tftp:///filename" will be recognised.
-DURI_SUPPORT_NFS
	as well
-DURI_SUPPORT_SLAM
	so "x-slam://" URIs are recognised. Ask Eric how far his slam code went, you
	probably want to set -DDOWNLOAD_PROTO_SLAM too.
-DURI_SUPPORT_TFTM
	so "x-tftm://server-ip/filename" can be recognised. You will also have to
	set the -DDOWNLOAD_PROTO_TFTM option for this to work.

These URIs can be extended to allow specification of anti-standard UDP ports 
or another server's IP (no DNS [for now?], so no hostnames). For TFTM, 
servername works fine (don't know if it's even required to be present !? - no 
matter, bugfixing tftm will be necessary for some time)

I didn't remove a "private" option from the code, as I want to use it. It was 
discussed earlier in this list and is called -DSILENTFORSPLASH....

As CVS'ing does not really work fine here, please Ken or Eric have a check of 
it and if ok take it into the 5.1 tree. Mostly it is the proto_tftm.c file 
plus some changes in main.c for the URI support and some small snippets of 
code without which etherboot5.1 wouldn't compile at all (?).

BTW: Is there any *tested* and *stable* Tftp-server out there that supports 
multicast, blocksize and tsize (as my code requires all three options)?
I will have to adapt a lot anyhow, as timeouts and so need a touch, but I 
would like to keep it mostly at this skeleton. If there is nothing really 
usable out there, I will have to hand out my own server, but it's ugly coded 
and miles away from being "ready" - it even has no timeouts yet, but handles 
several clients on the same file, which was most important.

Following the philosophy of "Release early, patch often...":
Comments welcome.

Anselm

P.S.:
As the mail went too big with the two files attached, I will upload them. You 
should - let's hope German Telekom doesn't again kill that leased line - be 
able to fetch them from
ftp://ftp@feldhaus.hn.org/anselm/ or http://feldhaus.hn.org/anselm/

Fotis Andritsopoulos <fa...@te...> writes:

> >
> >
> >So occasionally it just times out?
> >Hmm. I would really try this with another tftp client and verify that
> >this isn't a server bug.
> >
> >Do you see retransmits from either the client or the server
> >
> 
> The retransmissions are occured by the tftpd server because the etherboot does
> not send an ACK. However, I solved the problem by a non-formal way. I set a
> breakpoint to the point that the tftp reads the nic.packet struct
> 
>         tr = (struct tftp_t *)&nic.packet[ETH_HLEN];
> 
> I realized that even if it waits for a DATA block (or an OACK), the tr struct
> gets data from broadcast MAC addresses (in the tftp function). Thus, the tftp
> checks in this packet for an OACK or DATA field and it fails. It is not very
> clear to me if it is right to this point to find data with broadcast
> addresses. So, at the beginning of the tftp function I reconfigure the cs89x0
> chip to process only packets with individual MAC addresses. Therefore, in the
> driver I use the

No it shouldn't deal with broadcast addresses.  But how does the check
for the appropriate tftp port fail.  We should also check in software
the ip address, and the mac address (We don't need the NIC to do it).

But I'm curious how it got through the checks in await_reply.

In 5.1.2+ I have cleaned this up a little more, and I think I may have
actually implemented the check for the mac address, and the ip
address.  I know I noticed they were missing and implemented them
on another protocol I was working on.

Hmm.  I wonder if that is a bug in -DCONGESTED that it doesn't
retransmit ACKs when it receives a duplicate DATA packet.

Eric

>The retransmissions are occured by the tftpd server because the 
>etherboot does not send an ACK. However, I solved the problem by a 
>non-formal way. I set a breakpoint to the point that the tftp reads the 
>nic.packet struct
>
>        tr = (struct tftp_t *)&nic.packet[ETH_HLEN];
>
>I realized that even if it waits for a DATA block (or an OACK), the tr 
>struct gets data from broadcast MAC addresses (in the tftp function). 
>Thus, the tftp checks in this packet for an OACK or DATA field and it 
>fails. It is not very clear to me if it is right to this point to find 

But in this case it should just throw away the packet and wait for
another packet.  I don't know what your logic looks like but maybe you
should look at it again.

>data with broadcast addresses. So, at the beginning of the tftp function 
>I reconfigure the cs89x0 chip to process only packets with individual 
>MAC addresses. Therefore, in the driver I use the
>
>#define DEF_RX_ACCEPT (RX_IA_ACCEPT | RX_BROADCAST_ACCEPT | RX_OK_ACCEPT)
>
>and in the tftp function I use the
>
>#define DEF_RX_ACCEPT_AFTER (RX_IA_ACCEPT | RX_OK_ACCEPT)
>
>to reconfigure the chip, so as all the packet that will be processed 
>will have as destination only the MAC address of the development board. 
>The problem solved by I don't thing that this is the right way.

This will work but is not the ideal solution. Although I have not seen
one, it is possible for bootp servers to reply by broadcast, if they are
not able to create raw packets or inject an entry into the ARP cache. So
Etherboot has to accept broadcast also.

>
>
>So occasionally it just times out?
>Hmm. I would really try this with another tftp client and verify that
>this isn't a server bug.
>
>Do you see retransmits from either the client or the server
>

The retransmissions are occured by the tftpd server because the 
etherboot does not send an ACK. However, I solved the problem by a 
non-formal way. I set a breakpoint to the point that the tftp reads the 
nic.packet struct

        tr = (struct tftp_t *)&nic.packet[ETH_HLEN];

I realized that even if it waits for a DATA block (or an OACK), the tr 
struct gets data from broadcast MAC addresses (in the tftp function). 
Thus, the tftp checks in this packet for an OACK or DATA field and it 
fails. It is not very clear to me if it is right to this point to find 
data with broadcast addresses. So, at the beginning of the tftp function 
I reconfigure the cs89x0 chip to process only packets with individual 
MAC addresses. Therefore, in the driver I use the

#define DEF_RX_ACCEPT (RX_IA_ACCEPT | RX_BROADCAST_ACCEPT | RX_OK_ACCEPT)

and in the tftp function I use the

#define DEF_RX_ACCEPT_AFTER (RX_IA_ACCEPT | RX_OK_ACCEPT)

to reconfigure the chip, so as all the packet that will be processed 
will have as destination only the MAC address of the development board. 
The problem solved by I don't thing that this is the right way.

Fotis Andritsopoulos

-- 
"Whom ever Controls your Perception of Reality Controls You"

>Guess I'll have to rework dhcpd.conf so that it can cope with both
>versions.  This is doable but it's just incredibly irritating: the text
>version was discussed on the mailing list, was implemented by me without
>any serious objections, has remained unaltered for the past month and a
>half, made it into RC1 and then gets changed at the last minute.  I'd have
>appreciated it if you'd warned that you were planning this change.

I'm sorry about that. Due to lack of time on my part I hadn't actually
looked at your code before I released RC1. I was under the impression
that the binary representation had carried the day.

There are reasons why the binary representation is better. If you do man
dhcp-eval you will find functions for extracting integer sized fields
from the packet. There are no functions for going from text to integer.
Essentially DHCP data IS binary, and it's simply unthinking prejudice to
insist on human readability. There are no humans to read the request
packets and even if you used tcpdump you would pipe it to dhcpdump for
decoding.

With the binary conversion function you can write range tests to say
include 3com NICs from 0x9050 to 0x9060. Such a test is very cumbersome
to write for text fields. Straight equality tests can simply do test
against

	1:10:d7:90:50

as compared to

	"PCI:10d7:9050"

so that's equally easy.

Another reason is the text representation has redundancy.  Because you
chose %hx, it's lowercase but I expect someone would in the future try
to write "PCI:10D7:9050" and wonder why it doesn't work, and then we
would have to have an FAQ. With colon separated hex, it doesn't matter,
it's binary underneath.

On Wed, 3 Jul 2002, Ken Yap wrote:
> >Forgot to mention: I've also burned about 200 copies of the RC1 code into
> >silicon for various customers, since I was happy that RC1 worked
> >sufficiently well for my needs.  It's going to cause me and them a lot of
> >hassle if I have to replace 200 EPROMs because of a change that broke
> >backwards compatibility between RC1 and final release.
> This is always going to be an issue for versions. Again, nothing is
> official until it's non-RC. You can always keep a patch file around and
> stick to the text version.

Guess I'll have to rework dhcpd.conf so that it can cope with both
versions.  This is doable but it's just incredibly irritating: the text
version was discussed on the mailing list, was implemented by me without
any serious objections, has remained unaltered for the past month and a
half, made it into RC1 and then gets changed at the last minute.  I'd have
appreciated it if you'd warned that you were planning this change.

Michael

>Forgot to mention: I've also burned about 200 copies of the RC1 code into
>silicon for various customers, since I was happy that RC1 worked
>sufficiently well for my needs.  It's going to cause me and them a lot of
>hassle if I have to replace 200 EPROMs because of a change that broke
>backwards compatibility between RC1 and final release.

This is always going to be an issue for versions. Again, nothing is
official until it's non-RC. You can always keep a patch file around and
stick to the text version.

>A couple of issues:
>
>1. the "nic_id" struct now contains the length of the
>   etherboot-encapsulated-options packet, which does not belong in this
>   struct since it is unrelated to the NIC id.

Originally it wasn't in the structure but to make sure the structure is
well aligned, I prepended it. It also saves copying. This is only an
implementation detail. I agree it's ugly.

>2. These changes break backwards compatibility for the sake of 8 bytes
>   saved in the DHCP request packet (which is not really size-critical).
>   I know that the Mandrake distribution has already included related code
>   that depends the old-format NIC ID scheme present in RC1.

Mandrake should not be using test distributions. Everything is subject
to change until the official release. I have no sympathy for Mandrake.

It's not packet size I care so much about as code size.

>I really think that we should revert to the old plain-text method, where
>the IDs were strings such as "PCI:1186:1300" rather than a binary sequence
>0x01,0x86,0x11,0x00,0x13.  Aside from anything else, it makes a real mess
>of dhcpd.conf files to start manipulating binary structures like this. :-(

But you'd have to do the same thing for other binary strucutres in
dhcpd.conf. Can't you use colon notation?

Fotis Andritsopoulos <fa...@te...> writes:

> Eric W. Biederman wrote:
> 
> >>So congratulations on the first non-x86 port ar in order.
> >>
> Actually, it is not a port of the whole etherboot to TriCore :) We tried to
> "hack" the code in order to get a working version of etherboot for our
> needs. Our main task is to port the Linux kernel for the TriCore
> architecture. Now we want a fast way to download the kernel and debug it because
> 
> the JTAG is very slow. Thus, we have ported only the part of the etherboot that
> refers to the ELF and the cs89x0 driver (because our development board has this
> chip). I noticed that there is a mistake in the cs89x0.h file. You have defined

Cool so you should generate a kernel image with valid physical
addresses...  This drives me nuts about the x86, and alpha ports.

> >>Beyond that my best suggestion is to use tcpdump and get a packet
> >>trace from another machine on the same network segment.  That and to
> >>verify that the tftp transfer works from another client machine,
> >> plugged into the same network port.
> All the transactions over the network seem to be fine. The problem is that
> sometimes the tftp process of the etherboot does not "read" the data of one
> block and the tftp server fails, because it never receives an ACK. Notice that
> the block that the process fails is not the same for all the times.
> 
> >>If you are timing out I would suggest defing CONGESTED and see if
> >>retransmissions help.  It could be just that your network loses
> >>some packets.
> >>
> I tried to use -DCONGESTED but the result remains the same. Also, I tried to
> connect through a hub *only* the PC that runs the tftpd server and the
> development board with the tricore but the problem persists.

Hmm.  At the protocol level.

Until an ack is seen a proper server will retransmit the DATA packet,
or until a maximum retry count is reached.

Until the next data packet is seen a proper client (-DCONGESTED) will
retransmit the ack until the next data packet is seen or until a maximum
retry count is reached.

> >>Alternatively it could be a driver bug where it either drops packets
> >>being transmitted or received.
> >>
> How can I checked this ?

My best guess is watch the network traffic and see what the actual
failure mode is.

> >>My best guess is that playing with the timeout simply varies when
> >>another problem is detected.
> >>
> >>Anyway if you can pinpoint where in a tftp transfer the code is
> >>failing we may be able to point you in some productive directions.
> >>
> It just not recognize a block of data and the server timeouts. Nothing more! :(

So occasionally it just times out?
Hmm. I would really try this with another tftp client and verify that
this isn't a server bug.

Do you see retransmits from either the client or the server?

> >>Another possibility is that the image you are loading is overwriting
> >> part of etherboot.
> No. Because the GNU tools that we use are not support the PIC option, the
> executables that are produced are not rellocatable. Thus, we build the etherboot
> 
> binary in such way and it is loaded "near at the end" of the SDRAM while the
> kernel image has been built for an address at the start of the SDRAM. If the
> tftp transfers the whole kernel the Linux works fine (we use a serial console
> for debugging purposes). I'm sure that there are no such conflicts.

Sounds good.  Now that possibility can be ruled out :)

Eric

ke...@us... (Ken Yap) writes:
> 
> >Why you decided to use poll instead of interrupts ? The cs89x0 drivers 
> >uses poll but I didn't check the rest of the drivers.
> 
> All of Etherboot uses polling. As explained in the history, this is the
> way the original was designed, and this design aspect has not been
> changed.  In practice, there isn't anything wrong with polling. If you
> are thinking you are missing packets due to polling, all the protocols
> involved are synchronous so interrupts don't help there.  Polling makes
> the drivers easier to write and debug, especially with a system just
> booted from raw metal---try debugging an asynchronous system someday.
> The drawback is that it makes callbacks hard.

There is a very real advantage in initial system bring up in that
you don't need to have interrupt mapping from the pci interrupt
to system interrupt numbers.  On some systems it is easy on other it
is hard, the only constant is that the necessary code varies widely from
system to system.  I have had multiple occasions where in bring up
LinuxBIOS on x86 systems where etherboot works, but the kernel can't
get interrupts working.

Eric

On Tue, 2 Jul 2002, Michael Brown wrote:
> > I have checked into CVS:
> > For 5.0.7 and 5.1.2 candidates:
> > Changes to NIC identification scheme from variable length string to
> > fixed binary structure. Added ISA IDs to drivers.
> > Please sync your working copy with cvs update.
> A couple of issues:
> 1. the "nic_id" struct now contains the length of the
>    etherboot-encapsulated-options packet, which does not belong in this
>    struct since it is unrelated to the NIC id.
> 2. These changes break backwards compatibility for the sake of 8 bytes
>    saved in the DHCP request packet (which is not really size-critical).
>    I know that the Mandrake distribution has already included related code
>    that depends the old-format NIC ID scheme present in RC1.
> I really think that we should revert to the old plain-text method, where
> the IDs were strings such as "PCI:1186:1300" rather than a binary sequence
> 0x01,0x86,0x11,0x00,0x13.  Aside from anything else, it makes a real mess
> of dhcpd.conf files to start manipulating binary structures like this. :-(

Forgot to mention: I've also burned about 200 copies of the RC1 code into
silicon for various customers, since I was happy that RC1 worked
sufficiently well for my needs.  It's going to cause me and them a lot of
hassle if I have to replace 200 EPROMs because of a change that broke
backwards compatibility between RC1 and final release.

Michael Brown
http://www.fensystems.co.uk
--
Fen Systems: Linux made easy for schools

On Tue, 2 Jul 2002 ke...@us... wrote:
> I have checked into CVS:
> For 5.0.7 and 5.1.2 candidates:
> Changes to NIC identification scheme from variable length string to
> fixed binary structure. Added ISA IDs to drivers.
> Please sync your working copy with cvs update.

A couple of issues:

1. the "nic_id" struct now contains the length of the
   etherboot-encapsulated-options packet, which does not belong in this
   struct since it is unrelated to the NIC id.

2. These changes break backwards compatibility for the sake of 8 bytes
   saved in the DHCP request packet (which is not really size-critical).
   I know that the Mandrake distribution has already included related code
   that depends the old-format NIC ID scheme present in RC1.

I really think that we should revert to the old plain-text method, where
the IDs were strings such as "PCI:1186:1300" rather than a binary sequence
0x01,0x86,0x11,0x00,0x13.  Aside from anything else, it makes a real mess
of dhcpd.conf files to start manipulating binary structures like this. :-(

Michael Brown
http://www.fensystems.co.uk
--
Fen Systems: Linux made easy for schools

Ken Yap wrote:
> Hmm, I don't have a data sheet to double check this. Maybe Markus
> Gutschke, who wrote the driver, can comment.

I left all the manuals for this chip in Germany, so I can't check on what these 
flags should be set to. Recent Linux kernel sources seem to agree with you, 
though. So, this might very well be a bug in etherboot. Ken, you should probably 
change this value to 0xC0 so that it is the same as the one used by the Linux 
kernel.

I guess, the reason why we got away with the old value is that it was probably 
interpreted as starting to transmit the packet after the first 381 bytes. As 
long as we delivered the remaining bytes fast enough (or as long as transmitted 
packets were small) this would still work. Both assumptions are probably true 
for most of the data that etherboot sends.

Markus

-- 
Markus Gutschke
3637 Fillmore Street #106
San Francisco, CA 94123-1600
+1-415-567-8449
ma...@gu...

I have checked into CVS:

For 5.0.7 and 5.1.2 candidates:

Changes to NIC identification scheme from variable length string to
fixed binary structure. Added ISA IDs to drivers.

For 5.0.7 candidate:

Some portability and code cleanups backported from 5.1.2 candidate.

Please sync your working copy with cvs update.

>development board has this chip). I noticed that there is a mistake in 
>the cs89x0.h file. You have defined
>
>#define TX_AFTER_ALL    0x0060       /*  Tx packet after all bytes copied */
>
>but I think that it should be
>
>#define TX_AFTER_ALL    0x00c0

Hmm, I don't have a data sheet to double check this. Maybe Markus
Gutschke, who wrote the driver, can comment.

>Why you decided to use poll instead of interrupts ? The cs89x0 drivers 
>uses poll but I didn't check the rest of the drivers.

All of Etherboot uses polling. As explained in the history, this is the
way the original was designed, and this design aspect has not been
changed.  In practice, there isn't anything wrong with polling. If you
are thinking you are missing packets due to polling, all the protocols
involved are synchronous so interrupts don't help there.  Polling makes
the drivers easier to write and debug, especially with a system just
booted from raw metal---try debugging an asynchronous system someday.
The drawback is that it makes callbacks hard.

>All the transactions over the network seem to be fine. The problem is 
>that sometimes the tftp process of the etherboot does not "read" the 
>data of one block and the tftp server fails, because it never receives 
>an ACK. Notice that the block that the process fails is not the same for 
>all the times.

According to the RFC, in this case the server is supposed to timeout and
resend the packet. It's not supposed to start a new session.  Things to
check: Are you sure that you are generating and checking for a unique
XID in the DHCP packet? That's how the client knows if the DHCP reply is
meant for it. Have you checked that you are using the same tftp session
that's offered by the server? There is some subtlety in the way the
client switches the port number after receiving the ACK and starting the
transfer, read the TFTP RFC carefully.  If you reimplemented this part
of the code yourself, you might have missed this subtlety.

Eric W. Biederman wrote:

>>So congratulations on the first non-x86 port ar in order.
>>
Actually, it is not a port of the whole etherboot to TriCore :) We tried 
to "hack" the code in order to get a working version of etherboot for 
our needs. Our main task is to port the Linux kernel for the TriCore 
architecture. Now we want a fast way to download the kernel and debug it 
because the JTAG is very slow. Thus, we have ported only the part of the 
etherboot that refers to the ELF and the cs89x0 driver (because our 
development board has this chip). I noticed that there is a mistake in 
the cs89x0.h file. You have defined

#define TX_AFTER_ALL    0x0060       /*  Tx packet after all bytes copied */

but I think that it should be

#define TX_AFTER_ALL    0x00c0

>>Beyond this, it could be that there is a driver bug.  Drivers being
>>more susceptible than the core to differences in the hardware.
>>
Why you decided to use poll instead of interrupts ? The cs89x0 drivers 
uses poll but I didn't check the rest of the drivers.

>>With respect to currticks you will run into problems if it rolls over.
>>overflows.  Though it looks like you are probably o.k.
>>
Exactly.

>>The load_timer2  logic is also suspect.  I don't suppose you
>>have an x86 compatiable timer do you?
>>
The original cs89x0 from the etherboot distribution does not use the 
load_timer2 so we don't use it. For the currticks we use a 55-bit timer 
with period of 2ns.

>>Beyond that my best suggestion is to use tcpdump and get a packet
>>trace from another machine on the same network segment.  That and to
>>verify that the tftp transfer works from another client machine,
>>plugged into the same network port. 
>>
All the transactions over the network seem to be fine. The problem is 
that sometimes the tftp process of the etherboot does not "read" the 
data of one block and the tftp server fails, because it never receives 
an ACK. Notice that the block that the process fails is not the same for 
all the times.

>>If you are timing out I would suggest defing CONGESTED and see if
>>retransmissions help.  It could be just that your network loses
>>some packets.
>>
I tried to use -DCONGESTED but the result remains the same. Also, I 
tried to connect through a hub *only* the PC that runs the tftpd server 
and the development board with the tricore but the problem persists.

>>Alternatively it could be a driver bug where it either drops packets
>>being transmitted or received.
>>
How can I checked this ?

>>My best guess is that playing with the timeout simply varies when
>>another problem is detected.
>>
>>Anyway if you can pinpoint where in a tftp transfer the code is
>>failing we may be able to point you in some productive directions.
>>
It just not recognize a block of data and the server timeouts. Nothing 
more! :(

>>Another possibility is that the image you are loading is overwriting
>>part of etherboot. 
>>
No. Because the GNU tools that we use are not support the PIC option, 
the executables that are produced are not rellocatable. Thus, we build 
the etherboot binary in such way and it is loaded "near at the end" of 
the SDRAM while the kernel image has been built for an address at the 
start of the SDRAM. If the tftp transfers the whole kernel the Linux 
works fine (we use a serial console for debugging purposes). I'm sure 
that there are no such conflicts.

Fotis Andritsopoulos

>>
>>
>>Eric
>>
>>
>>-------------------------------------------------------
>>This sf.net email is sponsored by:ThinkGeek
>>Welcome to geek heaven.
>>http://thinkgeek.com/sf
>>_______________________________________________
>>Etherboot-developers mailing list
>>Eth...@li...
>>https://lists.sourceforge.net/lists/listinfo/etherboot-developers
>>

-- 
"Whom ever Controls your Perception of Reality Controls You"

Ken Yap wrote:

>
>more evidence, snoop on packets, put in printf statements, etc.
>
I 'm using ethereal to sniff the network and everything looks fine. The 
only strange thing that I noticed is that sometimes, in the tftp 
function, the variable prevblock still points to the previous session 
and the variable block points to the current one. For example,
a) the tftp starts downloading the image and after some transactions 
stops (i.e. to the [block] 100, so [prevblock] = 100)
b) the tftp session restarts and it sends a new RRQ message ([block] = 1)
The [prevblock] has the value of 100 and the [block] has the value of 
zero or one for example. When the tftp checks if the [prevblock+1] is 
equal to [block] it fails. I think that the variables [block] and 
[prevblock] should be stored into the structure of the tftp in order to 
haveseperate values per tftp session. (Am I wrong?) However, I 
understand that this is not the "main" problem for the timeouts that I 
previously described.

>Is the Infineon an x86 architecture or are you the first to port it to a
>
No, it is not a x86 architecture. However, it is mysterious because the 
etherboot downloads the image but after 5-6 retries...

Fotis Andritsopoulos

>
>different architecture? If so, watch out for latent structure alignments
>and byte order bugs. It should be interesting.
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Etherboot-developers mailing list
>Eth...@li...
>https://lists.sourceforge.net/lists/listinfo/etherboot-developers
>

-- 
"Whom ever Controls your Perception of Reality Controls You"

2000	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug (10)	Sep (3)	Oct (10)	Nov (47)	Dec (20)
2001	Jan (41)	Feb (107)	Mar (76)	Apr (103)	May (66)	Jun (72)	Jul (27)	Aug (31)	Sep (33)	Oct (18)	Nov (33)	Dec (67)
2002	Jan (25)	Feb (62)	Mar (79)	Apr (74)	May (67)	Jun (104)	Jul (155)	Aug (234)	Sep (87)	Oct (93)	Nov (54)	Dec (114)
2003	Jan (146)	Feb (104)	Mar (117)	Apr (189)	May (96)	Jun (40)	Jul (133)	Aug (136)	Sep (113)	Oct (142)	Nov (99)	Dec (185)
2004	Jan (233)	Feb (151)	Mar (109)	Apr (96)	May (200)	Jun (175)	Jul (162)	Aug (118)	Sep (107)	Oct (77)	Nov (121)	Dec (114)
2005	Jan (201)	Feb (271)	Mar (113)	Apr (119)	May (69)	Jun (46)	Jul (21)	Aug (37)	Sep (13)	Oct (4)	Nov (19)	Dec (46)
2006	Jan (10)	Feb (18)	Mar (85)	Apr (2)	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2007	Jan	Feb	Mar	Apr	May	Jun (10)	Jul (20)	Aug (9)	Sep (11)	Oct (4)	Nov (1)	Dec (40)
2008	Jan (19)	Feb (8)	Mar (37)	Apr (28)	May (38)	Jun (63)	Jul (31)	Aug (22)	Sep (37)	Oct (38)	Nov (49)	Dec (24)
2009	Jan (48)	Feb (51)	Mar (80)	Apr (55)	May (34)	Jun (57)	Jul (20)	Aug (83)	Sep (17)	Oct (81)	Nov (53)	Dec (40)
2010	Jan (55)	Feb (28)	Mar (36)	Apr (7)	May	Jun	Jul (7)	Aug	Sep	Oct (1)	Nov (3)	Dec
2011	Jan (1)	Feb	Mar (3)	Apr	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (10)	Dec
2012	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

etherboot-developers Mailing List for Etherboot (Page 248)

etherboot-developers — Discussion of developer issues