Thread: [Etherboot-developers] gPXE/Bootix incompatibility
Brought to you by:
marty_connor,
stefanhajnoczi
From: Alex Z. <ale...@eu...> - 2010-03-24 12:02:53
|
Hi all, We've found that gPXE clients cannot boot successfully from a Bootix server. The problem appears to be to do with the cached filenames that gPXE saves and returns to the NBP when in response to a PXENV_GET_CACHED_INFO call. There are 3 such filenames: * from the last DHCP discovery (CACHED_INFO_DHCPDISCOVER) * from the last DHCP ACK (CACHED_INFO_DHCPACK) * from the last PXE request (CACHED_INFO_BINL) Now, when the Bootix NBP calls PXENV_GET_CACHED_INFO it doesn't provide a buffer. As a result gPXE returns a pointer to cached_info[CACHED_INFO_BINL] rather than copying this struct into a caller provided buffer. When the Bootix NBP calls PXENV_TFTP_READ_FILE gPXE updates the filenames in cached_info[CACHED_INFO_DHCPACK] and cached_info[CACHED_INFO_BINL]. According to the comments it does this because some Intel PXE implementation did this and even though this is a bug NTLDR depends on it. The problem is that the Bootix NBP has a pointer to cached_info[CACHED_INFO_BINL] and is using it to store the name of the next file to get! Perhaps the Bootix NBP shouldn't be doing this... but we've found that if we make PXENV_TFTP_READ_FILE only update the filename in cached_info[CACHED_INFO_DHCPACK] and leave the filename in cached_info[CACHED_INFO_BINL] then the boot succeeds. (See patch below.) If anybody is still reading, do you know whether this is an okay way to fix the problem, ... or will it break NTLDR? Regards, Alex --- pxe_preboot.c.orig 2010-03-24 10:27:50.000000000 +0000 +++ pxe_preboot.c 2010-03-24 10:28:13.000000000 +0000 @@ -111,22 +111,22 @@ * This is a bug-for-bug compatibility hack needed in order to work * with Microsoft Remote Installation Services (RIS). The filename * used in a call to PXENV_RESTART_TFTP or PXENV_TFTP_READ_FILE must * be returned as the DHCP filename in subsequent calls to * PXENV_GET_CACHED_INFO. */ void pxe_set_cached_filename ( const unsigned char *filename ) { memcpy ( cached_info[CACHED_INFO_DHCPACK].dhcphdr.file, filename, sizeof ( cached_info[CACHED_INFO_DHCPACK].dhcphdr.file ) ); - memcpy ( cached_info[CACHED_INFO_BINL].dhcphdr.file, filename, - sizeof ( cached_info[CACHED_INFO_BINL].dhcphdr.file ) ); +// memcpy ( cached_info[CACHED_INFO_BINL].dhcphdr.file, filename, +// sizeof ( cached_info[CACHED_INFO_BINL].dhcphdr.file ) ); } /** * UNLOAD BASE CODE STACK * * @v None - * @ret ... * */ |
From: Miller, S. <Sha...@yr...> - 2010-03-24 13:16:36
|
Good day Alex, You had a question in regards to gPXE. Are you aware that there is a gPXE mailing-list? http://www.etherboot.org/wiki/mailinglists http://etherboot.org/mailman/listinfo/gpxe You mentioned that the Bootix NBP has a SEG16:OFF16 pointer to the cached packet in PXE base code's data segment. As nearly as I can tell, this is according to spec. You mentioned that Bootix uses this area to store the next filename to fetch. I am assuming that you mean the 128-byte filename array portion of the cached packet structure, which is intended to hold a nul-terminated string. I'm confused though... Did you mean that if Bootix wants to fetch files A, B, in that order, that Bootix first writes B into the cached packet, then calls PXENV_TFTP_READ_FILE but for file A? Or did you mean that it puts A in the field, then calls PXENV_TFTP_READ_FILE for A? - Shao Miller |
From: Alex Z. <ale...@eu...> - 2010-03-24 14:18:53
|
Miller, Shao wrote: > > You mentioned that Bootix uses this area to store the next filename to > fetch. I am assuming that you mean the 128-byte filename array portion > of the cached packet structure, which is intended to hold a > nul-terminated string. I'm confused though... Did you mean that if > Bootix wants to fetch files A, B, in that order, that Bootix first > writes B into the cached packet, then calls PXENV_TFTP_READ_FILE but for > file A? Or did you mean that it puts A in the field, then calls > PXENV_TFTP_READ_FILE for A? Hi Shao, thanks for your reply. It looks like pxboot (the Bootix NBP) is using cached_info[CACHED_INFO_BINL].dhcphdr.file for its own purposes, and so things go awry when pxe_set_cached_filename() updates this field. In more detail: 1. At the start of day pxboot calls PXENV_GET_CACHED_INFO and gets a pointer to cached_info[CACHED_INFO_BINL]. 2. pxboot then writes "pxboot" into cached_info[CACHED_INFO_BINL].dhcphdr.file. 3. pxboot then calls PXENV_TFTP_READ_FILE with filename "8DFF5B8B.opt". This fails but has the side effect of calling pxe_set_cached_filename() to set cached_info[CACHED_INFO_DHCPACK].dhcphdr.file and cached_info[CACHED_INFO_BINL].dhcphdr.file to "8DFF5B8B.opt" 4. pxboot then calls PXENV_TFTP_READ_FILE with filename "8DFF5B8B.opt.opt". This fails. However, if we change pxe_set_cached_filename() so that it only updates cached_info[CACHED_INFO_DHCPACK].dhcphdr.file and leaves cached_info[CACHED_INFO_BINL].dhcphdr.file then the last event becomes 4. pxboot then calls PXENV_TFTP_READ_FILE for file "pxboot.opt". This succeeds. This leads me to think that pxboot is using its pointer to cached_info[CACHED_INFO_BINL].dhcphdr.file to save the name of the next file to get (minus the ".opt" extension). Now, from the comments above pxenv_tftp_read_file() it looks like the reason we are updating these two cached filenames is in order to emulate a bug in an Intel ROM that NTLDR relies on. However, the same comments *suggest* that we may be able to get away with only updating cached_info[CACHED_INFO_DHCPACK].dhcphdr.file. Does anyone know if this is true? I.e. is the patch I sent previously a safe way to fix the Bootix incompatibility? Regards, Alex |
From: Miller, S. <Sha...@yr...> - 2010-03-24 15:12:26
|
Good day again Alex, If it helps, here're a few commits in chronological order from most recent to longest ago: Michael Brown [Sat, 2 Feb 2008 15:59:32 +0000 (15:59 +0000)] The INFO_DHCPACK and INFO_BINL filenames were overwritten during pxe_tftp_open() in commit 5e4e2671775b14122848368d1dbc3a26aec70d86. Michael Brown [Tue, 22 Jan 2008 18:51:12 +0000 (18:51 +0000)] The code comments indicated that PXENV_TFTP_READ_FILE should also overwrite the filename. Previously, only PXENV_RESTART_TFTP was mentioned. I'm not too sure why this is. Michael Brown [Thu, 22 Nov 2007 04:43:11 +0000 (04:43 +0000)] We can see the two memcpy()s introduced in commit 838ecba1315a484f8e08f41a3537623dfd7f1966. I would have to follow the code to find out if the behaviour included the INFO_BINL before this point. Michael Brown [Sat, 30 Jun 2007 14:13:18 +0000 (15:13 +0100)] The filename used in PXENV_RESTART_TFTP was used for PXENV_GET_CACHED_INFO internally starting in commit d05d8edd428efeff6c08dbd2423572de8e89ce06. - Shao Miller |
From: Alex Z. <ale...@eu...> - 2010-03-24 15:52:14
|
It looks like PXENV_RESTART_TFTP updated the cached filename returned by PXENV_GET_CACHED_INFO for both PacketTypes from the start. I think the point at which interaction with Bootix broke must have been when PXENV_TFTP_READ_FILE was made to update the cache as well. Regards, Alex Miller, Shao wrote: > Good day again Alex, > > If it helps, here're a few commits in chronological order from most > recent to longest ago: > > Michael Brown [Sat, 2 Feb 2008 15:59:32 +0000 (15:59 +0000)] > The INFO_DHCPACK and INFO_BINL filenames were overwritten during > pxe_tftp_open() in commit 5e4e2671775b14122848368d1dbc3a26aec70d86. > > Michael Brown [Tue, 22 Jan 2008 18:51:12 +0000 (18:51 +0000)] > The code comments indicated that PXENV_TFTP_READ_FILE should also > overwrite the filename. Previously, only PXENV_RESTART_TFTP was > mentioned. I'm not too sure why this is. > > Michael Brown [Thu, 22 Nov 2007 04:43:11 +0000 (04:43 +0000)] > We can see the two memcpy()s introduced in commit > 838ecba1315a484f8e08f41a3537623dfd7f1966. I would have to follow the > code to find out if the behaviour included the INFO_BINL before this > point. > > Michael Brown [Sat, 30 Jun 2007 14:13:18 +0000 (15:13 +0100)] > The filename used in PXENV_RESTART_TFTP was used for > PXENV_GET_CACHED_INFO internally starting in commit > d05d8edd428efeff6c08dbd2423572de8e89ce06. > > - Shao Miller > |
From: Michael B. <mb...@fe...> - 2010-03-25 00:50:28
|
On Wednesday 24 Mar 2010 12:02:44 Alex Zeffertt wrote: > Perhaps the Bootix NBP shouldn't be doing this... but we've found that if > we make PXENV_TFTP_READ_FILE only update the filename in > cached_info[CACHED_INFO_DHCPACK] and leave the filename in > cached_info[CACHED_INFO_BINL] then the boot succeeds. (See patch below.) > > If anybody is still reading, do you know whether this is an okay way to fix > the problem, ... or will it break NTLDR? Nice debugging! Unfortunately I have no idea whether or not it will break NTLDR, and NTLDR compatibility probably has to take higher priority than Bootix compatibility. If you can verify that your change still allows a successful RIS deployment (for which you would need Windows Server 2003 R1; I believe RIS was obsoleted in 2003 R2 and replaced with WDS), then we could fairly safely apply this change. I have Windows Server 2003 R1 media and licence keys, so could test this for you, but I won't be able to do so any time soon, sorry. Michael |
From: Miller, S. <Sha...@yr...> - 2010-03-25 02:22:50
|
In regards to overwriting the CACHED_INFO_BINL filename with each pxenv_tftp_open(), pxenv_tftp_read_file(), pxenv_tftp_get_fsize(), pxenv_restart_tftp(): I have a setup that's a little different that I just tested removing the CACHED_INFO_BINL filename overwrite on, and it appeared to work. My setup does go through the motions of gPXE -> PXELINUX -> pxechain.com -> startrom.0 -> NTLDR -> many files via TFTP -> Windows, but doesn't use MS RIS as such. I have a DHCP service as well as a ProxyDHCP response, then I use pxechain.com. For what it's worth. Michael, I'm also curious if you happen to recall why performing the overwrite in pxenv_restart_tftp() was not sufficient and it was added to the rest of those calls. I know it's a while ago, now. :) - Shao Miller |
From: Alex Z. <ale...@eu...> - 2010-03-25 10:57:37
|
Michael Brown wrote: > On Wednesday 24 Mar 2010 12:02:44 Alex Zeffertt wrote: >> Perhaps the Bootix NBP shouldn't be doing this... but we've found that if >> we make PXENV_TFTP_READ_FILE only update the filename in >> cached_info[CACHED_INFO_DHCPACK] and leave the filename in >> cached_info[CACHED_INFO_BINL] then the boot succeeds. (See patch below.) >> >> If anybody is still reading, do you know whether this is an okay way to fix >> the problem, ... or will it break NTLDR? > > Nice debugging! > > Unfortunately I have no idea whether or not it will break NTLDR, and NTLDR > compatibility probably has to take higher priority than Bootix compatibility. > If you can verify that your change still allows a successful RIS deployment > (for which you would need Windows Server 2003 R1; I believe RIS was obsoleted > in 2003 R2 and replaced with WDS), then we could fairly safely apply this > change. > > I have Windows Server 2003 R1 media and licence keys, so could test this for > you, but I won't be able to do so any time soon, sorry. > > Michael > You're right, we can't break NTLDR. Here's an alternative way to fix the problem. Now, I know this is a terrible hack... but it does guarrantee(*) that non-Bootix NBPs will not be affected. Alex (*) sort of. |
From: Michael B. <mb...@fe...> - 2010-03-26 14:00:56
|
On Thursday 25 Mar 2010 10:57:29 Alex Zeffertt wrote: > You're right, we can't break NTLDR. Here's an alternative way to fix the > problem. Now, I know this is a terrible hack... but it does guarrantee(*) > that non-Bootix NBPs will not be affected. :) I implemented a similar thing back in some ancient version of Etherboot: http://git.etherboot.org/?p=etherboot.git;a=blob;f=src/arch/i386/firmware/pcbios/hidemem.c;h=a9ae001e25992ea6fef0f16ecb939c9244179529;hb=f6f6bad3f6c42898235284d144b800456312e39b#l67 This hack was worse than yours; it involved having Etherboot edit the binary of the running NBP. Of couse, it was suitably protected by an #ifdef, so all was well and good. I've started a Win2k3 R1 install going, so I can set up a RIS server for testing against. Michael |
From: Miller, S. <Sha...@yr...> - 2010-03-26 14:07:33
|
Good day Michael, I would be curious to see if overwriting both DHCPACK and BINL cached packets' filenames with "bar" at each set_cached_filename() call passes your tests, if you'd care to. - Shao Miller |
From: Michael B. <mb...@fe...> - 2010-03-26 17:51:32
|
On Friday 26 Mar 2010 14:06:34 Miller, Shao wrote: > I would be curious to see if overwriting both DHCPACK and BINL cached > packets' filenames with "bar" at each set_cached_filename() call passes > your tests, if you'd care to. Tests so far: Unmodified gPXE : RIS seems to work (reaches graphical setup) Skipping pxe_set_cached_filename() completely : RIS still seems to work Setting both DHCPACK and BINL filenames to "bar" : RIS dies before reaching the "Windows Setup" text-mode screen Setting both DHCPACK and BINL filename to "" : RIS dies before reaching the "Windows Setup" text-mode screen Setting only DHCPACK to "" : RIS seems to work Setting only BINL filename to "" : RIS dies before reaching the "Windows Setup" text-mode screen Note that this testing involves a Windows DHCP server with no explicitly- configured filename, relying on ProxyDHCP to provide the correct boot filename. In earlier tests (many years ago, when the "overwrite filename" logic was first added and tested), I think I was using an explicitly-configured filename. I'm going to try removing pxe_set_cached_filename() completely, and check that RIS proceeds right through to an installed and working Win2k3 system. Michael |
From: Michael B. <mb...@fe...> - 2010-03-26 18:42:57
|
On Friday 26 Mar 2010 17:51:53 Michael Brown wrote: > I'm going to try removing pxe_set_cached_filename() completely, and check > that RIS proceeds right through to an installed and working Win2k3 system. Confirmed. RIS is working perfectly for me with pxe_set_cached_filename() removed. I've pushed this change. Alex: could you check that this solves your Bootix problem? I'd appreciate any reports of breakages caused by this change, though I don't believe we ever had anything besides RIS that we thought depended on it. Michael |
From: Binh T. <bt...@nc...> - 2010-03-27 17:43:03
|
Hello everyone, Could someone let me know if the mainline gPXE already supports label and goto in scripting? I tried defining a label as ":abc" or "abc:" but both failed. Thanks a lot, Binh __________ Information from ESET NOD32 Antivirus, version of virus signature database 4965 (20100322) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com |
From: Miller, S. <Sha...@yr...> - 2010-03-27 18:09:54
|
Good day Binh, You had a question in regards to gPXE. Are you aware that there is a gPXE mailing-list? http://www.etherboot.org/wiki/mailinglists http://etherboot.org/mailman/listinfo/gpxe Since you appear to be using the Etherboot mailing-list, you might have missed yesterday's e-mail[1] in the gPXE mailing-list in regards to the fact that the scripting discussion in the gPXE developers' mailing-list[2] is slow-going and certain points have not yet been agreed upon by all. Labels and 'goto' have not yet been committed to gPXE's official codebase. You could look in Stefan Hajnoczi's repository[3] or my own[4] to checkout some possibilities. - Shao Miller [1] http://etherboot.org/pipermail/gpxe/2010-March/000735.html [2] http://etherboot.org/pipermail/gpxe-devel/2010-March/000089.html [3] http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=shortlog;h=refs/h eads/ifgoto [4] http://git.etherboot.org/?p=people/sha0/gpxe.git;a=shortlog;h=refs/heads /exit_if_goto_v4 |
From: Alex Z. <ale...@eu...> - 2010-03-29 08:57:21
|
Michael Brown wrote: > On Friday 26 Mar 2010 17:51:53 Michael Brown wrote: >> I'm going to try removing pxe_set_cached_filename() completely, and check >> that RIS proceeds right through to an installed and working Win2k3 system. > > Confirmed. RIS is working perfectly for me with pxe_set_cached_filename() > removed. > > I've pushed this change. Alex: could you check that this solves your Bootix > problem? > Thanks for all your efforts Michael. I can confirm that removing pxe_set_cached_filename() works with Bootix, since I've already tested a change that stubs it out. Regards, Alex > I'd appreciate any reports of breakages caused by this change, though I don't > believe we ever had anything besides RIS that we thought depended on it. > > Michael > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Etherboot-developers mailing list > Eth...@li... > https://lists.sourceforge.net/lists/listinfo/etherboot-developers > |
From: Binh T. <bt...@nc...> - 2010-03-26 14:18:01
|
Hi, Does anyone know what viewer I can use to read the build_sys.dox in src/doc folder? Thanks Binh __________ Information from ESET NOD32 Antivirus, version of virus signature database 4965 (20100322) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com |
From: Miller, S. <Sha...@yr...> - 2010-03-26 14:34:15
|
Good day Binh, In regards to viewing gpxe/src/doc/build_sys.dox: http://www.etherboot.org/wiki/doc#source_code_documentation http://www.etherboot.org/share/sha0/gpxe/src/bin/doc/html/build_sys.html - Shao Miller |
From: Michael B. <mb...@fe...> - 2010-03-26 14:38:24
|
On Friday 26 Mar 2010 14:33:23 Miller, Shao wrote: > In regards to viewing gpxe/src/doc/build_sys.dox: > > http://www.etherboot.org/wiki/doc#source_code_documentation > http://www.etherboot.org/share/sha0/gpxe/src/bin/doc/html/build_sys.html And also http://etherboot.org/api/build_sys.html (which is automatically updated to keep in sync with the latest git tree). Michael |
From: Miller, S. <Sha...@yr...> - 2010-03-26 14:46:31
|
Thanks, Michael. At last, the Wiki has been updated with this correct location. :) http://www.etherboot.org/wiki/doc?rev=1263191914&do=diff - Shao Miller |
From: H. P. A. <hp...@zy...> - 2010-04-07 18:26:36
|
On 03/25/2010 03:57 AM, Alex Zeffertt wrote: > Michael Brown wrote: >> On Wednesday 24 Mar 2010 12:02:44 Alex Zeffertt wrote: >>> Perhaps the Bootix NBP shouldn't be doing this... but we've found that if >>> we make PXENV_TFTP_READ_FILE only update the filename in >>> cached_info[CACHED_INFO_DHCPACK] and leave the filename in >>> cached_info[CACHED_INFO_BINL] then the boot succeeds. (See patch below.) >>> >>> If anybody is still reading, do you know whether this is an okay way to fix >>> the problem, ... or will it break NTLDR? >> >> Nice debugging! >> >> Unfortunately I have no idea whether or not it will break NTLDR, and NTLDR >> compatibility probably has to take higher priority than Bootix compatibility. >> If you can verify that your change still allows a successful RIS deployment >> (for which you would need Windows Server 2003 R1; I believe RIS was obsoleted >> in 2003 R2 and replaced with WDS), then we could fairly safely apply this >> change. >> >> I have Windows Server 2003 R1 media and licence keys, so could test this for >> you, but I won't be able to do so any time soon, sorry. >> >> Michael >> > > You're right, we can't break NTLDR. Here's an alternative way to fix the > problem. Now, I know this is a terrible hack... but it does guarrantee(*) that > non-Bootix NBPs will not be affected. > > Alex > It would seem to me that something is fundamentally bogus if it can only be detected by recognizing the particular NBP. Rather, that seems to indicate that something was mischaracterized in how other BCs act... -hpa |
From: Alex Z. <ale...@eu...> - 2010-04-08 08:18:52
|
H. Peter Anvin wrote: > On 03/25/2010 03:57 AM, Alex Zeffertt wrote: >> Michael Brown wrote: >>> On Wednesday 24 Mar 2010 12:02:44 Alex Zeffertt wrote: >>>> Perhaps the Bootix NBP shouldn't be doing this... but we've found that if >>>> we make PXENV_TFTP_READ_FILE only update the filename in >>>> cached_info[CACHED_INFO_DHCPACK] and leave the filename in >>>> cached_info[CACHED_INFO_BINL] then the boot succeeds. (See patch below.) >>>> >>>> If anybody is still reading, do you know whether this is an okay way to fix >>>> the problem, ... or will it break NTLDR? >>> Nice debugging! >>> >>> Unfortunately I have no idea whether or not it will break NTLDR, and NTLDR >>> compatibility probably has to take higher priority than Bootix compatibility. >>> If you can verify that your change still allows a successful RIS deployment >>> (for which you would need Windows Server 2003 R1; I believe RIS was obsoleted >>> in 2003 R2 and replaced with WDS), then we could fairly safely apply this >>> change. >>> >>> I have Windows Server 2003 R1 media and licence keys, so could test this for >>> you, but I won't be able to do so any time soon, sorry. >>> >>> Michael >>> >> You're right, we can't break NTLDR. Here's an alternative way to fix the >> problem. Now, I know this is a terrible hack... but it does guarrantee(*) that >> non-Bootix NBPs will not be affected. >> >> Alex >> > > It would seem to me that something is fundamentally bogus if it can only > be detected by recognizing the particular NBP. Rather, that seems to > indicate that something was mischaracterized in how other BCs act... > > -hpa > You're right that this is a hack. But it's a hack to workaround problems caused by an earlier hack :-) Thankfully my hack is not needed since Michael found the earlier hack wasn't necessary after all, and he reverted it in commit 80d1ac7320f597b4c981dfdeb19d8e88eb85ca69. Regards, Alex |