You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(227) |
Sep
(185) |
Oct
(259) |
Nov
(168) |
Dec
(163) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(94) |
Feb
(92) |
Mar
(121) |
Apr
(83) |
May
(158) |
Jun
(72) |
Jul
(150) |
Aug
(64) |
Sep
(81) |
Oct
(98) |
Nov
(79) |
Dec
(27) |
2004 |
Jan
(93) |
Feb
(81) |
Mar
(85) |
Apr
(43) |
May
(71) |
Jun
(28) |
Jul
(89) |
Aug
(156) |
Sep
(51) |
Oct
(50) |
Nov
(48) |
Dec
(56) |
2005 |
Jan
(59) |
Feb
(180) |
Mar
(68) |
Apr
(58) |
May
(44) |
Jun
(59) |
Jul
(50) |
Aug
(103) |
Sep
(100) |
Oct
(66) |
Nov
(41) |
Dec
(33) |
2006 |
Jan
(41) |
Feb
(51) |
Mar
(133) |
Apr
(66) |
May
(40) |
Jun
(34) |
Jul
(86) |
Aug
(28) |
Sep
(62) |
Oct
(54) |
Nov
(24) |
Dec
(23) |
2007 |
Jan
(72) |
Feb
(81) |
Mar
(33) |
Apr
(64) |
May
(23) |
Jun
(67) |
Jul
(33) |
Aug
(54) |
Sep
(38) |
Oct
(40) |
Nov
(108) |
Dec
(84) |
2008 |
Jan
(49) |
Feb
(44) |
Mar
(65) |
Apr
(43) |
May
(75) |
Jun
(171) |
Jul
(121) |
Aug
(86) |
Sep
(189) |
Oct
(326) |
Nov
(172) |
Dec
(178) |
2009 |
Jan
(86) |
Feb
(154) |
Mar
(159) |
Apr
(112) |
May
(113) |
Jun
(64) |
Jul
(147) |
Aug
(170) |
Sep
(157) |
Oct
(153) |
Nov
(149) |
Dec
(184) |
2010 |
Jan
(196) |
Feb
(234) |
Mar
(191) |
Apr
(233) |
May
(95) |
Jun
(200) |
Jul
(134) |
Aug
(189) |
Sep
(158) |
Oct
(135) |
Nov
(104) |
Dec
(135) |
2011 |
Jan
(101) |
Feb
(142) |
Mar
(157) |
Apr
(142) |
May
(145) |
Jun
(195) |
Jul
(306) |
Aug
(268) |
Sep
(128) |
Oct
(80) |
Nov
(125) |
Dec
(112) |
2012 |
Jan
(93) |
Feb
(125) |
Mar
(94) |
Apr
(102) |
May
(134) |
Jun
(85) |
Jul
(80) |
Aug
(130) |
Sep
(104) |
Oct
(104) |
Nov
(133) |
Dec
(107) |
2013 |
Jan
(136) |
Feb
(127) |
Mar
(172) |
Apr
(183) |
May
(158) |
Jun
(84) |
Jul
(132) |
Aug
(143) |
Sep
(46) |
Oct
(94) |
Nov
(42) |
Dec
(61) |
2014 |
Jan
(248) |
Feb
(89) |
Mar
(93) |
Apr
(102) |
May
(59) |
Jun
(44) |
Jul
(131) |
Aug
(69) |
Sep
(199) |
Oct
(88) |
Nov
(38) |
Dec
(59) |
2015 |
Jan
(54) |
Feb
(57) |
Mar
(70) |
Apr
(71) |
May
(63) |
Jun
(79) |
Jul
(85) |
Aug
(106) |
Sep
(69) |
Oct
(72) |
Nov
(48) |
Dec
(28) |
2016 |
Jan
(42) |
Feb
(70) |
Mar
(89) |
Apr
(87) |
May
(114) |
Jun
(57) |
Jul
(47) |
Aug
(60) |
Sep
(38) |
Oct
(36) |
Nov
(12) |
Dec
(28) |
2017 |
Jan
(32) |
Feb
(44) |
Mar
(135) |
Apr
(101) |
May
(98) |
Jun
(42) |
Jul
(54) |
Aug
(21) |
Sep
(23) |
Oct
(83) |
Nov
(89) |
Dec
(15) |
2018 |
Jan
(18) |
Feb
(2) |
Mar
(35) |
Apr
(12) |
May
(52) |
Jun
(103) |
Jul
(65) |
Aug
(35) |
Sep
(47) |
Oct
(81) |
Nov
(86) |
Dec
(44) |
2019 |
Jan
(34) |
Feb
(63) |
Mar
(58) |
Apr
(21) |
May
(39) |
Jun
(30) |
Jul
(43) |
Aug
(22) |
Sep
(26) |
Oct
(62) |
Nov
(39) |
Dec
(47) |
2020 |
Jan
(40) |
Feb
(27) |
Mar
(30) |
Apr
(20) |
May
(42) |
Jun
(24) |
Jul
(60) |
Aug
(26) |
Sep
(60) |
Oct
(29) |
Nov
(15) |
Dec
(7) |
2021 |
Jan
(34) |
Feb
(31) |
Mar
(54) |
Apr
(8) |
May
(40) |
Jun
(49) |
Jul
(14) |
Aug
(26) |
Sep
(25) |
Oct
(13) |
Nov
(46) |
Dec
(19) |
2022 |
Jan
(45) |
Feb
(8) |
Mar
(20) |
Apr
(25) |
May
(8) |
Jun
(12) |
Jul
(10) |
Aug
(11) |
Sep
(4) |
Oct
(11) |
Nov
(3) |
Dec
(3) |
2023 |
Jan
|
Feb
(25) |
Mar
(7) |
Apr
(16) |
May
(7) |
Jun
(8) |
Jul
(31) |
Aug
(11) |
Sep
(32) |
Oct
(18) |
Nov
(25) |
Dec
(6) |
2024 |
Jan
(48) |
Feb
(31) |
Mar
(7) |
Apr
(1) |
May
(22) |
Jun
(8) |
Jul
(3) |
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
From: Nathan A B. <be...@us...> - 2023-12-11 17:11:40
|
Just wanted to chime in to say that I have been following the same process that Victor described below for tracking contributors using the current process, and I also am in favor of changing to the lighter weight process that Matt described below as a potential replacement process to use when the new goverence structure goes into effect. ________________________________ From: VICTOR HU via xCAT-user <xca...@li...> Sent: Monday, December 11, 2023 11:38 AM To: Matthew Alton <ma...@oc...>; xCAT Users Mailing list <xca...@li...> Cc: VICTOR HU <vh...@us...> Subject: [EXTERNAL] Re: [xcat-user] xCAT CLA status and questions This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1e-vj5BTRvm6FYv7uAGkjvolLJ3J_4ee0TnYPAV6kNWr-XfKCV7tFV052N_t9WzY0UbRmwiS1Rzi__Z_vfOg-hvKwgXB6sILB5rcGliJp09hGNG3G7ASHIQbKvSr3RFxhmWuwpdGyg$> Hi Matthew, +1, I’ve seen that “signed-off” mechanism also being used in projects, that would be much lower weight process IMO. I like it. Victor From: Matthew Alton <ma...@oc...> Date: Monday, December 11, 2023 at 10:55 AM To: xCAT Users Mailing list <xca...@li...> Cc: VICTOR HU <vh...@us...> Subject: [EXTERNAL] RE: [xcat-user] xCAT CLA status and questions Hello, Thank you Samveen for your initial questions and to Victor for your response. We’ve just been discussing this on one of our weekly consortium calls today and the necessity of some form statement or agreement was a talking point. ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1g-uTV4zSvlaFYv7eIGFzq9M0QW2Doio2DkGyCYsFGiTZp7ho2DXIEwEMUvng5VjoriEi2m_q078hhEZlKKjlQDDR2zKIHeMfSHxWBbpmP8i01Lji1ArqJWc0_rG6_nibx2opw$> ZjQcmQRYFpfptBannerEnd Hello, Thank you Samveen for your initial questions and to Victor for your response. We’ve just been discussing this on one of our weekly consortium calls today and the necessity of some form statement or agreement was a talking point. The consortium will continue to require a contributor’s license agreement to ensure, as Victor highlights, the code contributed to the project is something owned by the project from the point of submission and no individual can revoke the inclusion of submitted code or bring any future legal cases against xCAT for the continued use of code. “ “xCAT Community” shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management?“ This is something that will be updated in due course to reflect how the project will be managed when the consortium officially takes ownership of the project. “In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA?” Existing CLAs will remain with IBM and these will not be transferred to the consortium. Existing CLAs IBM hold will be stored securely within IBM and only accessible to IBM for legacy purposes should any legal issues arise with code added to the project prior to the consortium taking ownership. No action will be required by anyone who has previously signed an agreement and any existing agreements will not automatically enrol signers with the new management of xCAT. A new CLA will be required for any contributions post IBM ownership. “Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)?” We, the consortium, are keen to ensure that anyone can submit code to the project in an open and timely manner. We are assessing a different model of CLA not requiring a full legal document to be signed before submitting any code and are looking at how Open MPI and Kernel projects CLAs are implemented. Using the model seen in these projects will require each commit to be explicitly “signed off” as a contribution as part of the commit message and this will streamline the entire administrative process. The mechanism expected for this will be much the same as other projects where the legal agreement text is visible, and version controlled on Github and the commit messages on individual commits or pull requests contains a line stating a contributor’s agreement with this. There are many administrative and legal aspects we are discussing between consortium members and lawyers to ensure our opensource philosophy of the project does still meet the necessary legal requirements and protections of running such a project as xCAT. There will be other announcements from the consortium in due course to outline some of our progress so far and to give a general update of how we are progressing. Regards, Matt. Matthew Alton MBCS | Research & Development Lead [Image removed by sender.]<https://www.ocf.co.uk/> Phone: +44 (0)114 257 2200 Mobile: +44 (0)7943 594 084 Address: OCF Limited, Unit 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield S35 2PG Website: www.ocf.co.uk<http://www.ocf.co.uk/> [Image removed by sender. LinkedIn icon]<https://www.linkedin.com/company/ocf-limited/> [Image removed by sender. Twitter icon] <https://twitter.com/ocf_hpc?lang=en> [Image removed by sender.] OCF Limited is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF Limited, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us immediately and remove it from your system. From: VICTOR HU via xCAT-user <xca...@li...> Sent: Monday, December 11, 2023 1:53 PM To: xCAT Users Mailing list <xca...@li...> Cc: VICTOR HU <vh...@us...> Subject: Re: [xcat-user] xCAT CLA status and questions Hi Samveen Here’s my thoughts, but others can chime in. I understood that need for the CLA to ensure that contributions made from the community was “given 100% to the project with no strings attached”. Once a PR is submitted, it’s owned by the project now. Someone would not come back at a later time and say we stole their work and result in some legal issues. But looking at other open source projects, CLAs seem pretty standard. I would suggest that we look at other projects to get ideas on how to handle it. When Softlayer was acquired by IBM, I took interest in their open-sourced Python API and and I just went back to look… it looks like they also had a very similar CLA, but not sure if this is standard IBM practice. (perhaps) https://github.com/softlayer/softlayer-python/blob/master/CONTRIBUTING.md<https://github.com/softlayer/softlayer-python/blob/master/CONTRIBUTING.md> which links to https://github.com/softlayer/softlayer-python/blob/master/docs/dev/cla-individual.md<https://github.com/softlayer/softlayer-python/blob/master/docs/dev/cla-individual.md> but then looking at a Kubernetes project, they also have CLAs that are similar to what xCAT has today: https://github.com/kubernetes/community/blob/master/CLA.md<https://github.com/kubernetes/community/blob/master/CLA.md> For the signed CLA’s, when I was tracking it, we would accept the CLAs, store them in a safe place, and then I would add the user into the “Contributors” group in the xcat-core repo, which is set to “read-only”. At least this adds the github handle to be able to be mentioned and one way to easily know if someone has signed it. It would be up to the user whether they wanted to accept membership or not, if not, then they would not join and we can’t @ them anyway. There was probably some other internal location that I used to track…. I forgot. Looking at other projects today, I would probably have suggested creating a CONTRIBUTORS file in the repo and keep track of the github handle. (but not sure how people feel about that) As to where to store the CLAs, there probably needs to be a better way to do this moving forward that allows for the maintainers to have transparency and access to the CLA documents if needed. Regards, Victor From: Samveen Gulati via xCAT-user <xca...@li...<mailto:xca...@li...>> Date: Saturday, December 9, 2023 at 9:04 AM To: xca...@li...<mailto:xca...@li...> <xca...@li...<mailto:xca...@li...>> Cc: Samveen Gulati <sa...@ya...<mailto:sa...@ya...>> Subject: [EXTERNAL] [xcat-user] xCAT CLA status and questions Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2e-g474_Ktma2mV4GKFlz6bn8z502T4QKHkZ6P9HUGmzc1sM1C5-jR5WyE7GkyvvKd2XbMTjBV5d6A5DiccY_LYEtPrU$> ZjQcmQRYFpfptBannerEnd Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT CLA), whether the individual version or the Corporate version (https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license<https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license>) - Once of the terms of the license state the following: “xCAT Community” shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. - With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management? - In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA? - Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)? Jarrod, Victor and Nathan, would you also chime in on how you managed tracking the CLA of first-time contributors. I ask this as there are a few PRs on Github by first-time contributors, and now that the project activity is picking back up, I'd rather possible legal gotchas don't hit the community. Regards, -- Samveen S. Gulati The best-laid schemes o' mice an 'men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! -- Robert Burns (The best laid plans of mice and men often go awry, and bring nothing but grief and pain of the ..) |
From: Markus H. <mar...@me...> - 2023-12-11 17:06:14
|
Dear xCAT community, it is one month ago, since we met at Supercomputing ’23 to announce that a consortium led by RedLine, OCF and MEGWARE is willing to take over the ownership of the current xCAT project to provide continuity for existing xCAT 2 based clusters. And we are still thrilled having seen so many of you during this event who have shown their interest and support, both in presence and online. On the technical side, we have started to dive into the IBM-internal integration setup for compiling and testing xCAT releases. Our goal is to replicate and extend this infrastructure at RedLine, OCF and MEGWARE to provide even wider test coverage on more hardware infrastructures for future releases. At the same time, we continue to investigate the various legal aspects of the transition. During our SC’23 meeting, we have also announced that in addition to our efforts surrounding xCAT 2, we would work with LENOVO towards making Confluent a suitable, community backed, xCAT 2 replacement for future cluster systems. For this reason, we invited LENOVO to become a full member of our consortium from the beginning. Understandably, this plan is a significant step for the Confluent team that requires thorough preparation and careful consideration and we will share further updates as this effort takes shape. We would also like to take this opportunity to thank the team at IBM, namely Nathan A Besaw, for their continued support during the transition. Mit freundlichen Grüßen / Kind regards Markus Hilger HPC Engineer MEGWARE Computer Vertrieb und Service GmbH Tel: +49 3722 528-47 Nordstraße 19 mar...@me...<mailto:mar...@me...> 09247 Chemnitz-Röhrsdorf, Germany www.megware.com<http://www.megware.com/> Geschäftsführer: André Singer, Axel Auweter Amtsgericht: Chemnitz HRB 584 |
From: Matthew A. <ma...@oc...> - 2023-12-11 16:39:31
|
Hello, Thank you Samveen for your initial questions and to Victor for your response. We've just been discussing this on one of our weekly consortium calls today and the necessity of some form statement or agreement was a talking point. The consortium will continue to require a contributor's license agreement to ensure, as Victor highlights, the code contributed to the project is something owned by the project from the point of submission and no individual can revoke the inclusion of submitted code or bring any future legal cases against xCAT for the continued use of code. " "xCAT Community" shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management?" This is something that will be updated in due course to reflect how the project will be managed when the consortium officially takes ownership of the project. "In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA?" Existing CLAs will remain with IBM and these will not be transferred to the consortium. Existing CLAs IBM hold will be stored securely within IBM and only accessible to IBM for legacy purposes should any legal issues arise with code added to the project prior to the consortium taking ownership. No action will be required by anyone who has previously signed an agreement and any existing agreements will not automatically enrol signers with the new management of xCAT. A new CLA will be required for any contributions post IBM ownership. "Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)?" We, the consortium, are keen to ensure that anyone can submit code to the project in an open and timely manner. We are assessing a different model of CLA not requiring a full legal document to be signed before submitting any code and are looking at how Open MPI and Kernel projects CLAs are implemented. Using the model seen in these projects will require each commit to be explicitly "signed off" as a contribution as part of the commit message and this will streamline the entire administrative process. The mechanism expected for this will be much the same as other projects where the legal agreement text is visible, and version controlled on Github and the commit messages on individual commits or pull requests contains a line stating a contributor's agreement with this. There are many administrative and legal aspects we are discussing between consortium members and lawyers to ensure our opensource philosophy of the project does still meet the necessary legal requirements and protections of running such a project as xCAT. There will be other announcements from the consortium in due course to outline some of our progress so far and to give a general update of how we are progressing. Regards, Matt. Matthew Alton MBCS | Research & Development Lead [https://ocf.co.uk/media/0i1lzfjz/ocf-logo-strapline-2022-blk.png]<https://www.ocf.co.uk/> Phone: +44 (0)114 257 2200 Mobile: +44 (0)7943 594 084 Address: OCF Limited, Unit 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield S35 2PG Website: www.ocf.co.uk<http://www.ocf.co.uk/> [LinkedIn icon]<https://www.linkedin.com/company/ocf-limited/> [Twitter icon] <https://twitter.com/ocf_hpc?lang=en> [https://ocf.co.uk/media/imkoyi2j/line.jpg] OCF Limited is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF Limited, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us immediately and remove it from your system. From: VICTOR HU via xCAT-user <xca...@li...> Sent: Monday, December 11, 2023 1:53 PM To: xCAT Users Mailing list <xca...@li...> Cc: VICTOR HU <vh...@us...> Subject: Re: [xcat-user] xCAT CLA status and questions Hi Samveen Here's my thoughts, but others can chime in. I understood that need for the CLA to ensure that contributions made from the community was "given 100% to the project with no strings attached". Once a PR is submitted, it's owned by the project now. Someone would not come back at a later time and say we stole their work and result in some legal issues. But looking at other open source projects, CLAs seem pretty standard. I would suggest that we look at other projects to get ideas on how to handle it. When Softlayer was acquired by IBM, I took interest in their open-sourced Python API and and I just went back to look... it looks like they also had a very similar CLA, but not sure if this is standard IBM practice. (perhaps) https://github.com/softlayer/softlayer-python/blob/master/CONTRIBUTING.md which links to https://github.com/softlayer/softlayer-python/blob/master/docs/dev/cla-individual.md but then looking at a Kubernetes project, they also have CLAs that are similar to what xCAT has today: https://github.com/kubernetes/community/blob/master/CLA.md For the signed CLA's, when I was tracking it, we would accept the CLAs, store them in a safe place, and then I would add the user into the "Contributors" group in the xcat-core repo, which is set to "read-only". At least this adds the github handle to be able to be mentioned and one way to easily know if someone has signed it. It would be up to the user whether they wanted to accept membership or not, if not, then they would not join and we can't @ them anyway. There was probably some other internal location that I used to track.... I forgot. Looking at other projects today, I would probably have suggested creating a CONTRIBUTORS file in the repo and keep track of the github handle. (but not sure how people feel about that) As to where to store the CLAs, there probably needs to be a better way to do this moving forward that allows for the maintainers to have transparency and access to the CLA documents if needed. Regards, Victor From: Samveen Gulati via xCAT-user <xca...@li...<mailto:xca...@li...>> Date: Saturday, December 9, 2023 at 9:04 AM To: xca...@li...<mailto:xca...@li...> <xca...@li...<mailto:xca...@li...>> Cc: Samveen Gulati <sa...@ya...<mailto:sa...@ya...>> Subject: [EXTERNAL] [xcat-user] xCAT CLA status and questions Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2e-g474_Ktma2mV4GKFlz6bn8z502T4QKHkZ6P9HUGmzc1sM1C5-jR5WyE7GkyvvKd2XbMTjBV5d6A5DiccY_LYEtPrU$> ZjQcmQRYFpfptBannerEnd Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT CLA), whether the individual version or the Corporate version (https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license) - Once of the terms of the license state the following: "xCAT Community" shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. - With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management? - In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA? - Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)? Jarrod, Victor and Nathan, would you also chime in on how you managed tracking the CLA of first-time contributors. I ask this as there are a few PRs on Github by first-time contributors, and now that the project activity is picking back up, I'd rather possible legal gotchas don't hit the community. Regards, -- Samveen S. Gulati The best-laid schemes o' mice an 'men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! -- Robert Burns (The best laid plans of mice and men often go awry, and bring nothing but grief and pain of the ..) |
From: VICTOR HU <vh...@us...> - 2023-12-11 16:39:19
|
Hi Matthew, +1, I’ve seen that “signed-off” mechanism also being used in projects, that would be much lower weight process IMO. I like it. Victor From: Matthew Alton <ma...@oc...> Date: Monday, December 11, 2023 at 10:55 AM To: xCAT Users Mailing list <xca...@li...> Cc: VICTOR HU <vh...@us...> Subject: [EXTERNAL] RE: [xcat-user] xCAT CLA status and questions Hello, Thank you Samveen for your initial questions and to Victor for your response. We’ve just been discussing this on one of our weekly consortium calls today and the necessity of some form statement or agreement was a talking point. ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1g-uTV4zSvlaFYv7eIGFzq9M0QW2Doio2DkGyCYsFGiTZp7ho2DXIEwEMUvng5VjoriEi2m_q078hhEZlKKjlQDDR2zKIHeMfSHxWBbpmP8i01Lji1ArqJWc0_rG6_nibx2opw$> Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!1g-uTV4zSvlaFYv7eIGFzq9M0QW2Doio2DkGyCYsFGiTZp7ho2DXIEwEMUvng5VjoriEi2m_q078hhEZlKKjlQDDR2zKIHeMfSHxWBbpmP8i01Lji1ArqJWc0_rG6_nibx2opw$> ZjQcmQRYFpfptBannerEnd Hello, Thank you Samveen for your initial questions and to Victor for your response. We’ve just been discussing this on one of our weekly consortium calls today and the necessity of some form statement or agreement was a talking point. The consortium will continue to require a contributor’s license agreement to ensure, as Victor highlights, the code contributed to the project is something owned by the project from the point of submission and no individual can revoke the inclusion of submitted code or bring any future legal cases against xCAT for the continued use of code. “ “xCAT Community” shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management?“ This is something that will be updated in due course to reflect how the project will be managed when the consortium officially takes ownership of the project. “In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA?” Existing CLAs will remain with IBM and these will not be transferred to the consortium. Existing CLAs IBM hold will be stored securely within IBM and only accessible to IBM for legacy purposes should any legal issues arise with code added to the project prior to the consortium taking ownership. No action will be required by anyone who has previously signed an agreement and any existing agreements will not automatically enrol signers with the new management of xCAT. A new CLA will be required for any contributions post IBM ownership. “Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)?” We, the consortium, are keen to ensure that anyone can submit code to the project in an open and timely manner. We are assessing a different model of CLA not requiring a full legal document to be signed before submitting any code and are looking at how Open MPI and Kernel projects CLAs are implemented. Using the model seen in these projects will require each commit to be explicitly “signed off” as a contribution as part of the commit message and this will streamline the entire administrative process. The mechanism expected for this will be much the same as other projects where the legal agreement text is visible, and version controlled on Github and the commit messages on individual commits or pull requests contains a line stating a contributor’s agreement with this. There are many administrative and legal aspects we are discussing between consortium members and lawyers to ensure our opensource philosophy of the project does still meet the necessary legal requirements and protections of running such a project as xCAT. There will be other announcements from the consortium in due course to outline some of our progress so far and to give a general update of how we are progressing. Regards, Matt. Matthew Alton MBCS | Research & Development Lead [Image removed by sender.]<https://www.ocf.co.uk/> Phone: +44 (0)114 257 2200 Mobile: +44 (0)7943 594 084 Address: OCF Limited, Unit 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield S35 2PG Website: www.ocf.co.uk<http://www.ocf.co.uk/> [Image removed by sender. LinkedIn icon]<https://www.linkedin.com/company/ocf-limited/> [Image removed by sender. Twitter icon] <https://twitter.com/ocf_hpc?lang=en> [Image removed by sender.] OCF Limited is a company registered in England and Wales. Registered number 4132533, VAT number GB 780 6803 14. Registered office address: OCF Limited, 5 Rotunda Business Centre, Thorncliffe Park, Chapeltown, Sheffield, S35 2PG. This message is private and confidential. If you have received this message in error, please notify us immediately and remove it from your system. From: VICTOR HU via xCAT-user <xca...@li...> Sent: Monday, December 11, 2023 1:53 PM To: xCAT Users Mailing list <xca...@li...> Cc: VICTOR HU <vh...@us...> Subject: Re: [xcat-user] xCAT CLA status and questions Hi Samveen Here’s my thoughts, but others can chime in. I understood that need for the CLA to ensure that contributions made from the community was “given 100% to the project with no strings attached”. Once a PR is submitted, it’s owned by the project now. Someone would not come back at a later time and say we stole their work and result in some legal issues. But looking at other open source projects, CLAs seem pretty standard. I would suggest that we look at other projects to get ideas on how to handle it. When Softlayer was acquired by IBM, I took interest in their open-sourced Python API and and I just went back to look… it looks like they also had a very similar CLA, but not sure if this is standard IBM practice. (perhaps) https://github.com/softlayer/softlayer-python/blob/master/CONTRIBUTING.md<https://github.com/softlayer/softlayer-python/blob/master/CONTRIBUTING.md> which links to https://github.com/softlayer/softlayer-python/blob/master/docs/dev/cla-individual.md<https://github.com/softlayer/softlayer-python/blob/master/docs/dev/cla-individual.md> but then looking at a Kubernetes project, they also have CLAs that are similar to what xCAT has today: https://github.com/kubernetes/community/blob/master/CLA.md<https://github.com/kubernetes/community/blob/master/CLA.md> For the signed CLA’s, when I was tracking it, we would accept the CLAs, store them in a safe place, and then I would add the user into the “Contributors” group in the xcat-core repo, which is set to “read-only”. At least this adds the github handle to be able to be mentioned and one way to easily know if someone has signed it. It would be up to the user whether they wanted to accept membership or not, if not, then they would not join and we can’t @ them anyway. There was probably some other internal location that I used to track…. I forgot. Looking at other projects today, I would probably have suggested creating a CONTRIBUTORS file in the repo and keep track of the github handle. (but not sure how people feel about that) As to where to store the CLAs, there probably needs to be a better way to do this moving forward that allows for the maintainers to have transparency and access to the CLA documents if needed. Regards, Victor From: Samveen Gulati via xCAT-user <xca...@li...<mailto:xca...@li...>> Date: Saturday, December 9, 2023 at 9:04 AM To: xca...@li...<mailto:xca...@li...> <xca...@li...<mailto:xca...@li...>> Cc: Samveen Gulati <sa...@ya...<mailto:sa...@ya...>> Subject: [EXTERNAL] [xcat-user] xCAT CLA status and questions Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2e-g474_Ktma2mV4GKFlz6bn8z502T4QKHkZ6P9HUGmzc1sM1C5-jR5WyE7GkyvvKd2XbMTjBV5d6A5DiccY_LYEtPrU$> ZjQcmQRYFpfptBannerEnd Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT CLA), whether the individual version or the Corporate version (https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license<https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license>) - Once of the terms of the license state the following: “xCAT Community” shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. - With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management? - In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA? - Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)? Jarrod, Victor and Nathan, would you also chime in on how you managed tracking the CLA of first-time contributors. I ask this as there are a few PRs on Github by first-time contributors, and now that the project activity is picking back up, I'd rather possible legal gotchas don't hit the community. Regards, -- Samveen S. Gulati The best-laid schemes o' mice an 'men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! -- Robert Burns (The best laid plans of mice and men often go awry, and bring nothing but grief and pain of the ..) |
From: VICTOR HU <vh...@us...> - 2023-12-11 14:18:36
|
Hi Samveen Here’s my thoughts, but others can chime in. I understood that need for the CLA to ensure that contributions made from the community was “given 100% to the project with no strings attached”. Once a PR is submitted, it’s owned by the project now. Someone would not come back at a later time and say we stole their work and result in some legal issues. But looking at other open source projects, CLAs seem pretty standard. I would suggest that we look at other projects to get ideas on how to handle it. When Softlayer was acquired by IBM, I took interest in their open-sourced Python API and and I just went back to look… it looks like they also had a very similar CLA, but not sure if this is standard IBM practice. (perhaps) https://github.com/softlayer/softlayer-python/blob/master/CONTRIBUTING.md which links to https://github.com/softlayer/softlayer-python/blob/master/docs/dev/cla-individual.md but then looking at a Kubernetes project, they also have CLAs that are similar to what xCAT has today: https://github.com/kubernetes/community/blob/master/CLA.md For the signed CLA’s, when I was tracking it, we would accept the CLAs, store them in a safe place, and then I would add the user into the “Contributors” group in the xcat-core repo, which is set to “read-only”. At least this adds the github handle to be able to be mentioned and one way to easily know if someone has signed it. It would be up to the user whether they wanted to accept membership or not, if not, then they would not join and we can’t @ them anyway. There was probably some other internal location that I used to track…. I forgot. Looking at other projects today, I would probably have suggested creating a CONTRIBUTORS file in the repo and keep track of the github handle. (but not sure how people feel about that) As to where to store the CLAs, there probably needs to be a better way to do this moving forward that allows for the maintainers to have transparency and access to the CLA documents if needed. Regards, Victor From: Samveen Gulati via xCAT-user <xca...@li...> Date: Saturday, December 9, 2023 at 9:04 AM To: xca...@li... <xca...@li...> Cc: Samveen Gulati <sa...@ya...> Subject: [EXTERNAL] [xcat-user] xCAT CLA status and questions Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2e-g474_Ktma2mV4GKFlz6bn8z502T4QKHkZ6P9HUGmzc1sM1C5-jR5WyE7GkyvvKd2XbMTjBV5d6A5DiccY_LYEtPrU$> Report Suspicious <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/PjiDSg!2e-g474_Ktma2mV4GKFlz6bn8z502T4QKHkZ6P9HUGmzc1sM1C5-jR5WyE7GkyvvKd2XbMTjBV5d6A5DiccY_LYEtPrU$> ZjQcmQRYFpfptBannerEnd Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified: - As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT CLA), whether the individual version or the Corporate version (https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license<https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license>) - Once of the terms of the license state the following: “xCAT Community” shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. - With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management? - In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA? - Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)? Jarrod, Victor and Nathan, would you also chime in on how you managed tracking the CLA of first-time contributors. I ask this as there are a few PRs on Github by first-time contributors, and now that the project activity is picking back up, I'd rather possible legal gotchas don't hit the community. Regards, -- Samveen S. Gulati The best-laid schemes o' mice an 'men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! -- Robert Burns (The best laid plans of mice and men often go awry, and bring nothing but grief and pain of the ..) |
From: Samveen G. <sa...@ya...> - 2023-12-09 14:03:40
|
Hi all, Now that the project is starting to get back to it's feet, there are a couple of legal aspects I'm hoping to get clarified:- As of now, all contributors to xCAT were required to sign the xCAT Contributors License Agreement (the xCAT CLA), whether the individual version or the Corporate version (https://github.com/xcat2/xcat-core/tree/master/docs/source/developers/license)- Once of the terms of the license state the following: “xCAT Community” shall mean International Business Machines Corporation and other users of xCAT. Accepted Contributions will be made available to the xCAT Community at large through sourceforge.net or other open source community. - With regards to the CLA, does the definition of xCAT community work as here, or will this need updating, given the new structure of management? - In case the agreement is changed to update this, would the previous signers have to resign and send the updated CLA?- Can the CLA be made implicit, instead of explicit? Should it be (i.e add a large disclaimer in the README, that by contributing to the project, the contributor is accepting the CLA and thus the "Grant of Copyright License" section of the CLA)? Jarrod, Victor and Nathan, would you also chime in on how you managed tracking the CLA of first-time contributors. I ask this as there are a few PRs on Github by first-time contributors, and now that the project activity is picking back up, I'd rather possible legal gotchas don't hit the community. Regards, --Samveen S. Gulati The best-laid schemes o' mice an 'men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! -- Robert Burns (The best laid plans of mice and men often go awry, and bring nothing but grief and pain of the ..) |
From: Russell J. <arj...@gm...> - 2023-11-28 00:11:39
|
Hi devs/users, I am having a very weird issue where "remoteshell" is failing to run on multiple different images/clusters after we performed datacenter maintenance. On the compute node side I am seeing: Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment: trying to download > postscripts... > Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment: postscripts > downloaded successfully > Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment: trying to get > mypostscript from <removed>... > Mon Nov 27 15:13:08 CST 2023 [info]: xcat.deployment.postbootscript: > postbootscript start..: syslog > Mon Nov 27 15:13:09 CST 2023 [info]: xcat.deployment.postbootscript: > postbootscript end...:syslog return with 0 > Mon Nov 27 15:13:09 CST 2023 [info]: xcat.deployment.postbootscript: > postbootscript start..: remoteshell .... and it just hangs here. On the cluster manager side, I see: Nov 27 15:23:15 xcat8 xcat[2124]: ERR The node (compute-n2) is not ready, > ignore it. It is saying this same error for all the nodes I have booted, across multiple different osimages. I am not understanding - what is it looking for? How can I correct this hangup? I have tried restarting xcatd, as well as rebooting the xcat VM. No changes yet. |
From: David M. <dma...@ee...> - 2023-11-27 15:04:19
|
To circle back on this and close the loop for the record: a large source of the problem was an interaction between the Python version on the inherited/legacy CentOS 7 system that this is being run on and the Python code that Confluent (3.8) uses. Specifically there were issue in the confluent_selfcheck utility that needed to be fixe, e.g.: - for rsp in sess.read(f'/nodes/{args.node}/attributes/all'): + for rsp in sess.read('/nodes/{args.node}/attributes/all’): […] - emprint(f'There is no node named "{args.node}"') + emprint('There is no node named "{args.node}”') […] Once a bunch of changes (f’) where done, the utility was able to run and exit without problems. Once such step was detecting double-checking that web certificates were properly setup, which they were not: and the Confluent system really wants to do things over HTTPS, and so their lack was breaking things (see reference to /tls/ in various places below in previous messages). After the above, and re-running osdeploy and nodedeploy, (along with previous tweak to dhcpd.conf) the server PXE booted from Confluent and was able to run through things with the non-manual subiquity installer kicking off and putting the Ubuntu 20.04 bits on the system. The next steps are to better control what the new Ubuntu installer (subiquity) does to get the system in a desired base state. After that look into using Ansible to finalize things (perhaps through Confluent (3.2+) via osdeploy initialize -a). > On Nov 15, 2023, at 11:02, David Magda <dma...@ee...> wrote: > > Doing a grep search for the string “nocloud-net” in /var/lib/confluent brings it up in boot.img files for the Ubuntu 20.04 and 22.04 distributions. > > However I do not see the the “boot.img" file being fetched in the Apache logs, or any kind of fetch for the autoinstall/ stuff either: > > """ > 172.17.15.155 - - [14/Nov/2023:11:09:09 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe HTTP/1.1" 200 301 "-" "iPXE/1.21.1 (g988d2)" > 172.17.15.155 - - [14/Nov/2023:11:09:09 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/kernel HTTP/1.1" 200 13680904 "-" "iPXE/1.21.1 (g988d2)" > 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/addons.cpio HTTP/1.1" 200 79360 "-" "iPXE/1.21.1 (g988d2)" > 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/site.cpio HTTP/1.1" 200 1536 "-" "iPXE/1.21.1 (g988d2)" > 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/distribution HTTP/1.1" 200 88323508 "-" "iPXE/1.21.1 (g988d2)" > fe80::ae1f:XXff:feXX:XXYY - - [14/Nov/2023:11:09:30 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/distribution/install.iso HTTP/1.1" 200 1487339520 "-" “Wget" > """ > > I’m not sure how the target system is supposed to be told to use/fetch the boot.img file (doing a grep for “boot.img” in /var/lib/confluent does not bring up anything), but it does not seem to be happening. This seems to track with the console log: > > """ > net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 > Next server: 172.17.15.254 > Filename: http:// 172.17.15.254 /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe > http:// 172.17.15.254 /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe... ok > boot.ipxe : 301 bytes [script] > boot/initramfs/addons.cpio... ok > boot/initramfs/site.cpio... ok > Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::XX%2] > Connecting to [fe80::YY%2] ([fe80::YY%eno0]:80) > install.iso 5% |* | 74.7M 0:00:17 ETA > […] > install.iso 100% |********************************| 1418M 0:00:00 ETA > cp: can't stat '/custom-installation/iso-override/*': No such file or directory > cat: can't open '/tls/*.0': No such file or directory > cp: can't stat '/tls/*': No such file or directory > Password was not accepted > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 > curl: (77) error setting certificate verify locations: > CAfile: /etc/ssl/certs/ca-certificates.crt > CApath: /etc/ssl/certs > passwd: password expiry information changed. > Using CD-ROM mount point /cdrom/ > Identifying... [e2e9021b074342abd39d6c3842902203-2] > Scanning disc for index files… > """ > > After which the install.iso seems to kick-off. There is no kernel output, and the next general output is the cloud-init announcing its interface (IPv6 link-local only), SSHd key generation, a “waiting for clound-init…” message, and the installer is started. > > It then detects I have a serial console available (in addition to TTY) and asks whether I was rich mode, basic mode, or use SSH. If I select SSH I am told to go to installer@172.17.15.199, which was a different IP than what the PXE boot got (172.17.15.155). > > Should I perhaps manually tweak the boot.ipxe file to add the kernel parameters in for fetching cloud-init? > > >> On Nov 14, 2023, at 11:52, Jarrod Johnson <jjo...@le...> wrote: >> >> Ultimately, it should be doing this: >> autoinstall ds=nocloud-net;s=https://${ipv4s}/confluent-public/os/${osprofile}/autoinstall/ >> >> Making changes as appropriate and pulling in the autoinstall in that way. >> >> However, the networknig comes from: >> { >> echo "DEVICE='$DEVICE'" >> echo "PROTO='none'" >> echo "IPV4PROTO='none'" >> echo "IPV4ADDR='$v4addr'" >> echo "IPV4NETMASK='$v4nm'" >> echo "IPV4BROADCAST='$v4nm'" >> echo "IPV4GATEWAY='$v4gw'" >> echo "IPV4DNS1='$dns'" >> echo "HOSTNAME='$NODENAME'" >> echo "DNSDOMAIN='$dnsdomain'" >> echo "DOMAINSEARCH='$dnsdomain'" >> } > "/run/net-$DEVICE.conf" >> >> >> Something along those lines. >> >> At the time of failure, are you able to ssh in? >> >>> From: David Magda <dma...@ee...> >>> Sent: Tuesday, November 14, 2023 11:28 AM >>> To: xCAT Users Mailing list <xca...@li...> >>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>> >>> So is Confluent supposed to act as a cloud-init datasource? >>> >>> […] >>> >>> There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a autoinstall/ directory that contains “meta-data” and “user-data” files. >>> >>> There’s a lot of output that flies by quite quickly, so I edited the “boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo webUI console could more fully see and capture the output in /var/log/confluent/console/. From there I see Confluent giving a PXE response: >>> >>> net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 >>> Next server: 172.17.15.254 >>> Filename: […] >>> >>> It then switches to link-local IPv6 (?) to fetch the ISO: >>> >>> Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::AAbb:Cff:feCd:dEE%2] >>> Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80) >>> install.iso 3% |* | 52.0M 0:00:26 ETA >>> install.iso 11% |*** | 162M 0:00:15 ETA >>> […] >>> >>> Cloud-init then seems to be kicked off (with only an IPv6 LL address?): >>> >>> [ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds. >>> [ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds. >>> [ 69.064878] cloud-init[2742]: ci-info: +++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ >>> [ 69.084789] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ >>> [ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | >>> [ 69.124838] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ >>> [ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64 | . | link | ac:1f:[…] | >>> [ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | . | . | . | ac:1f:[…] | >>> […] >>> >>> This seems to fail / error out: >>> >>> [ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed >>> [ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: >>> [0;1;31mFailed to connect system bus: No such file or directory >>> [ 69.836748] cloud-init[2742]: Falling back to a hard restart of systemd-networkd.service >>> [ 70.170428] cloud-init[2742]: Generating public/private rsa key pair. >>> >>> Bunch of SSH key generation stuff, until we get to: >>> >>> [ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds. >>> [ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds >>> [ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - cc_final_message.py[WARNING]: Used fallback datasource >>> Ubuntu 20.04.6 LTS ubuntu-server ttyS1 >>> connecting... >>> waiting for cloud-init… >>> >>> After which the manual installation of Ubuntu kicks in (the installer noticed that it is (now) running in a serial console, per “boot.ipxe” changes above, and asked if I wanted ‘rich’ or ‘basic’ mode). >>> >>>> On Nov 10, 2023, at 17:06, David Magda <dma...@ee...> wrote: >>>> >>>> >>>> $ nodedeploy MYHOST >>>> MYHOST: pending: ubuntu-20.04.6-x86_64-default >>>> >>>> I have U22.04 available already as well if testing with that is useful. >>>> >>>> The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. >>>> >>>>> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: >>>>> >>>>> It should cloud-init as a matter of course, just like for the kickstart installs... >>>>> >>>>> What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... >>>>> >>>>>> From: David Magda <dma...@ee...> >>>>>> Sent: Friday, November 10, 2023 2:16 PM >>>>>> To: xCAT Users Mailing list <xca...@li...> >>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>> >>>>>> Ah, silly me: bad copy-paste. >>>>>> >>>>>> That command gives: >>>>>> >>>>>> File "/opt/confluent/bin/confluent_selfcheck", line 241 >>>>>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >>>>>> ^ >>>>>> SyntaxError: invalid syntax >>>>>> >>>>>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >>>>>> >>>>>> So I’ve think I’ve finally managed to muddle through this part of the documentation: >>>>>> >>>>>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >>>>>> >>>>>> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >>>>>> >>>>>> >>>>>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >>>>>>> >>>>>>> The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >>>>>>> >>>>>>> confluent_selfcheck -n MYHOST >>>>>>> >>>>>>> Say anything interesting? >>>>>>> >>>>>>>> From: David Magda <dma...@ee...> >>>>>>>> Sent: Friday, November 10, 2023 10:50 AM >>>>>>>> To: xCAT Users Mailing list <xca...@li...> >>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>> >>>>>>>> Looking in that file there was: >>>>>>>> >>>>>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >>>>>>>> mode, but insecure mode is disabled. Set the attribute >>>>>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >>>>>>>> support, or use UEFI HTTP boot with HTTPS." } >>>>>>>> >>>>>>>> Trying to tweak that attribute, I got: >>>>>>>> >>>>>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >>>>>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >>>>>>>> >>>>>>>> I tried using nodegroupattrib as well on a group that the host was in, and got: >>>>>>>> >>>>>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >>>>>>>> >>>>>>>> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >>>>>>>> >>>>>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >>>>>>>> >>>>>>>> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >>>>>>>> >>>>>>>> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >>>>>>>> >>>>>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >>>>>>>> >>>>>>>> And that did not help. >>>>>>>> >>>>>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >>>>>>>> >>>>>>>> Should I edit my dhcpd.conf and rip out things like: >>>>>>>> >>>>>>>> […] >>>>>>>> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>>>>>> always-broadcast on; >>>>>>>> filename = "…" >>>>>>>> […] >>>>>>>> >>>>>>>> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >>>>>>>>> >>>>>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring the request. >>>>>>>>> >>>>>>>>>> From: David Magda <dm...@ee...> >>>>>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >>>>>>>>>> To: xCAT Users Mailing list <xca...@li...> >>>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >>>>>>>>>> >>>>>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >>>>>>>>>> >>>>>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >>>>>>>>>> ubuntu-20.04.6-x86_64-default” earlier. >>>>>>>>>> >>>>>>>>>> $ nodeattrib MYHOST >>>>>>>>>> MYHOST: console.method: ipmi >>>>>>>>>> MYHOST: deployment.apiarmed: once >>>>>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >>>>>>>>>> MYHOST: deployment.profile: >>>>>>>>>> MYHOST: deployment.stagedprofile: >>>>>>>>>> MYHOST: deployment.state: >>>>>>>>>> MYHOST: deployment.state_detail: >>>>>>>>>> MYHOST: groups: prox,ipmi,all,everything >>>>>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >>>>>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >>>>>>>>>> MYHOST: net.ipv4_method: dhcp >>>>>>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >>>>>>>>>> MYHOST: secret.hardwaremanagementuser: ******** >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >>>>>>>>>>> >>>>>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >>>>>>>>>>> >>>>>>>>>>> If you have a dhcp server with a dynamic range on it, then: >>>>>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >>>>>>>>>>> >>>>>>>>>>> If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >>>>>>>>>>> >>>>>>>>>>> If you have no dhcp server, then it should just do the right thing directly. >>>>>>>>>>> >>>>>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >>>>>>>>>>> >>>>>>>>>>> If your dhcp has: >>>>>>>>>>> option gpxe.no-pxedhcp 1; >>>>>>>>>>> Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >>>>>>>>>>> >>>>>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >>>>>>>>>>> >>>>>>>>>>>> From: David Magda >>>>>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >>>>>>>>>>>> To: xCAT Users Mailing list >>>>>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>>>>>> >>>>>>>>>>>> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >>>>>>>>>>>> >>>>>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >>>>>>>>>>>> >>>>>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >>>>>>>>>>>> >>>>>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >>>>>>>>>>>>> >>>>>>>>>>>>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >>>>>>>>>>>>> >>>>>>>>>>>>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >>>>>>>>>>>>> >>>>>>>>>>>>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >>>>>>>>>>>>>> From: David Magda >>>>>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >>>>>>>>>>>>>> To: xCAT Users Mailing list >>>>>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >>>>>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >>>>>>>>>>>>>> https://hpc.lenovo.com/users/ >>>>>>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >>>>>>>>>>>>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >>>>>>>>>>>> […] >> > > > _______________________________________________ > xCAT-user mailing list > xCA...@li... > https://lists.sourceforge.net/lists/listinfo/xcat-user |
From: Jarrod J. <jjo...@le...> - 2023-11-15 16:17:11
|
I could look live if desired. The logic comes from 'addons.cpio' that gets symlinked in from /opt. It is in the ipxe or the boot.img (boot.img is meant for remote media and whole-image http boot, while boot.ipxe should be equivalent) ________________________________ From: David Magda <dma...@ee...> Sent: Wednesday, November 15, 2023 11:02 AM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent Doing a grep search for the string “nocloud-net” in /var/lib/confluent brings it up in boot.img files for the Ubuntu 20.04 and 22.04 distributions. However I do not see the the “boot.img" file being fetched in the Apache logs, or any kind of fetch for the autoinstall/ stuff either: """ 172.17.15.155 - - [14/Nov/2023:11:09:09 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe HTTP/1.1" 200 301 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:09 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/kernel HTTP/1.1" 200 13680904 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/addons.cpio HTTP/1.1" 200 79360 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/site.cpio HTTP/1.1" 200 1536 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/distribution HTTP/1.1" 200 88323508 "-" "iPXE/1.21.1 (g988d2)" fe80::ae1f:XXff:feXX:XXYY - - [14/Nov/2023:11:09:30 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/distribution/install.iso HTTP/1.1" 200 1487339520 "-" “Wget" """ I’m not sure how the target system is supposed to be told to use/fetch the boot.img file (doing a grep for “boot.img” in /var/lib/confluent does not bring up anything), but it does not seem to be happening. This seems to track with the console log: """ net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 Next server: 172.17.15.254 Filename: http:// 172.17.15.254 /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe http:// 172.17.15.254 /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe... ok boot.ipxe : 301 bytes [script] boot/initramfs/addons.cpio... ok boot/initramfs/site.cpio... ok Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::XX%2] Connecting to [fe80::YY%2] ([fe80::YY%eno0]:80) install.iso 5% |* | 74.7M 0:00:17 ETA […] install.iso 100% |********************************| 1418M 0:00:00 ETA cp: can't stat '/custom-installation/iso-override/*': No such file or directory cat: can't open '/tls/*.0': No such file or directory cp: can't stat '/tls/*': No such file or directory Password was not accepted % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (77) error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs passwd: password expiry information changed. Using CD-ROM mount point /cdrom/ Identifying... [e2e9021b074342abd39d6c3842902203-2] Scanning disc for index files… """ After which the install.iso seems to kick-off. There is no kernel output, and the next general output is the cloud-init announcing its interface (IPv6 link-local only), SSHd key generation, a “waiting for clound-init…” message, and the installer is started. It then detects I have a serial console available (in addition to TTY) and asks whether I was rich mode, basic mode, or use SSH. If I select SSH I am told to go to installer@172.17.15.199, which was a different IP than what the PXE boot got (172.17.15.155). Should I perhaps manually tweak the boot.ipxe file to add the kernel parameters in for fetching cloud-init? > On Nov 14, 2023, at 11:52, Jarrod Johnson <jjo...@le...> wrote: > > Ultimately, it should be doing this: > autoinstall ds=nocloud-net;s=https://${ipv4s}/confluent-public/os/${osprofile}/autoinstall/ > > Making changes as appropriate and pulling in the autoinstall in that way. > > However, the networknig comes from: > { > echo "DEVICE='$DEVICE'" > echo "PROTO='none'" > echo "IPV4PROTO='none'" > echo "IPV4ADDR='$v4addr'" > echo "IPV4NETMASK='$v4nm'" > echo "IPV4BROADCAST='$v4nm'" > echo "IPV4GATEWAY='$v4gw'" > echo "IPV4DNS1='$dns'" > echo "HOSTNAME='$NODENAME'" > echo "DNSDOMAIN='$dnsdomain'" > echo "DOMAINSEARCH='$dnsdomain'" > } > "/run/net-$DEVICE.conf" > > > Something along those lines. > > At the time of failure, are you able to ssh in? > >> From: David Magda <dma...@ee...> >> Sent: Tuesday, November 14, 2023 11:28 AM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> So is Confluent supposed to act as a cloud-init datasource? >> >> […] >> >> There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a autoinstall/ directory that contains “meta-data” and “user-data” files. >> >> There’s a lot of output that flies by quite quickly, so I edited the “boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo webUI console could more fully see and capture the output in /var/log/confluent/console/. From there I see Confluent giving a PXE response: >> >> net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 >> Next server: 172.17.15.254 >> Filename: […] >> >> It then switches to link-local IPv6 (?) to fetch the ISO: >> >> Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::AAbb:Cff:feCd:dEE%2] >> Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80) >> install.iso 3% |* | 52.0M 0:00:26 ETA >> install.iso 11% |*** | 162M 0:00:15 ETA >> […] >> >> Cloud-init then seems to be kicked off (with only an IPv6 LL address?): >> >> [ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds. >> [ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds. >> [ 69.064878] cloud-init[2742]: ci-info: +++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ >> [ 69.084789] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ >> [ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | >> [ 69.124838] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ >> [ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64 | . | link | ac:1f:[…] | >> [ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | . | . | . | ac:1f:[…] | >> […] >> >> This seems to fail / error out: >> >> [ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed >> [ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: >> [0;1;31mFailed to connect system bus: No such file or directory >> [ 69.836748] cloud-init[2742]: Falling back to a hard restart of systemd-networkd.service >> [ 70.170428] cloud-init[2742]: Generating public/private rsa key pair. >> >> Bunch of SSH key generation stuff, until we get to: >> >> [ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds. >> [ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds >> [ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - cc_final_message.py[WARNING]: Used fallback datasource >> Ubuntu 20.04.6 LTS ubuntu-server ttyS1 >> connecting... >> waiting for cloud-init… >> >> After which the manual installation of Ubuntu kicks in (the installer noticed that it is (now) running in a serial console, per “boot.ipxe” changes above, and asked if I wanted ‘rich’ or ‘basic’ mode). >> >> > On Nov 10, 2023, at 17:06, David Magda <dma...@ee...> wrote: >> > >> > >> > $ nodedeploy MYHOST >> > MYHOST: pending: ubuntu-20.04.6-x86_64-default >> > >> > I have U22.04 available already as well if testing with that is useful. >> > >> > The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. >> > >> >> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: >> >> >> >> It should cloud-init as a matter of course, just like for the kickstart installs... >> >> >> >> What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... >> >> >> >>> From: David Magda <dma...@ee...> >> >>> Sent: Friday, November 10, 2023 2:16 PM >> >>> To: xCAT Users Mailing list <xca...@li...> >> >>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>> >> >>> Ah, silly me: bad copy-paste. >> >>> >> >>> That command gives: >> >>> >> >>> File "/opt/confluent/bin/confluent_selfcheck", line 241 >> >>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >> >>> ^ >> >>> SyntaxError: invalid syntax >> >>> >> >>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >> >>> >> >>> So I’ve think I’ve finally managed to muddle through this part of the documentation: >> >>> >> >>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >> >>> >> >>> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >> >>> >> >>> >> >>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >> >>>> >> >>>> The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >> >>>> >> >>>> confluent_selfcheck -n MYHOST >> >>>> >> >>>> Say anything interesting? >> >>>> >> >>>>> From: David Magda <dma...@ee...> >> >>>>> Sent: Friday, November 10, 2023 10:50 AM >> >>>>> To: xCAT Users Mailing list <xca...@li...> >> >>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>>>> >> >>>>> Looking in that file there was: >> >>>>> >> >>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >> >>>>> mode, but insecure mode is disabled. Set the attribute >> >>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >> >>>>> support, or use UEFI HTTP boot with HTTPS." } >> >>>>> >> >>>>> Trying to tweak that attribute, I got: >> >>>>> >> >>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >> >>>>> >> >>>>> I tried using nodegroupattrib as well on a group that the host was in, and got: >> >>>>> >> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >> >>>>> >> >>>>> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >> >>>>> >> >>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >> >>>>> >> >>>>> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >> >>>>> >> >>>>> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >> >>>>> >> >>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >> >>>>> >> >>>>> And that did not help. >> >>>>> >> >>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >> >>>>> >> >>>>> Should I edit my dhcpd.conf and rip out things like: >> >>>>> >> >>>>> […] >> >>>>> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >> >>>>> always-broadcast on; >> >>>>> filename = "…" >> >>>>> […] >> >>>>> >> >>>>> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >> >>>>> >> >>>>> >> >>>>> >> >>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >> >>>>>> >> >>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring the request. >> >>>>>> >> >>>>>>> From: David Magda <dm...@ee...> >> >>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >> >>>>>>> To: xCAT Users Mailing list <xca...@li...> >> >>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>>>>>> >> >>>>>>> >> >>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >>>>>>> >> >>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >>>>>>> >> >>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> >>>>>>> ubuntu-20.04.6-x86_64-default” earlier. >> >>>>>>> >> >>>>>>> $ nodeattrib MYHOST >> >>>>>>> MYHOST: console.method: ipmi >> >>>>>>> MYHOST: deployment.apiarmed: once >> >>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> >>>>>>> MYHOST: deployment.profile: >> >>>>>>> MYHOST: deployment.stagedprofile: >> >>>>>>> MYHOST: deployment.state: >> >>>>>>> MYHOST: deployment.state_detail: >> >>>>>>> MYHOST: groups: prox,ipmi,all,everything >> >>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> >>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> >>>>>>> MYHOST: net.ipv4_method: dhcp >> >>>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >> >>>>>>> MYHOST: secret.hardwaremanagementuser: ******** >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> >>>>>>>> >> >>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> >>>>>>>> >> >>>>>>>> If you have a dhcp server with a dynamic range on it, then: >> >>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >> >>>>>>>> >> >>>>>>>> If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> >>>>>>>> >> >>>>>>>> If you have no dhcp server, then it should just do the right thing directly. >> >>>>>>>> >> >>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> >>>>>>>> >> >>>>>>>> If your dhcp has: >> >>>>>>>> option gpxe.no-pxedhcp 1; >> >>>>>>>> Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> >>>>>>>> >> >>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> >>>>>>>> >> >>>>>>>>> From: David Magda >> >>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >> >>>>>>>>> To: xCAT Users Mailing list >> >>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>>>>>>>> >> >>>>>>>>> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >>>>>>>>> >> >>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >>>>>>>>> >> >>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >>>>>>>>> >> >>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >>>>>>>>>> >> >>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >>>>>>>>>> >> >>>>>>>>>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >>>>>>>>>> >> >>>>>>>>>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >>>>>>>>>> >> >>>>>>>>>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >>>>>>>>>>> From: David Magda >> >>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >> >>>>>>>>>>> To: xCAT Users Mailing list >> >>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >>>>>>>>>>> >> >>>>>>>>>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >>>>>>>>>>> https://hpc.lenovo.com/users/ >> >>>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >> >>>>>>>>>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >>>>>>>>> […] > _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7Ce278005946294e83660f08dbe5f4895e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638356610692047964%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XILbN%2BA29lVqZNvOTe07rpYtq6Ojh32PS8N4%2BSIYrrQ%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> |
From: David M. <dma...@ee...> - 2023-11-15 16:03:10
|
Doing a grep search for the string “nocloud-net” in /var/lib/confluent brings it up in boot.img files for the Ubuntu 20.04 and 22.04 distributions. However I do not see the the “boot.img" file being fetched in the Apache logs, or any kind of fetch for the autoinstall/ stuff either: """ 172.17.15.155 - - [14/Nov/2023:11:09:09 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe HTTP/1.1" 200 301 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:09 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/kernel HTTP/1.1" 200 13680904 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/addons.cpio HTTP/1.1" 200 79360 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/site.cpio HTTP/1.1" 200 1536 "-" "iPXE/1.21.1 (g988d2)" 172.17.15.155 - - [14/Nov/2023:11:09:10 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot/initramfs/distribution HTTP/1.1" 200 88323508 "-" "iPXE/1.21.1 (g988d2)" fe80::ae1f:XXff:feXX:XXYY - - [14/Nov/2023:11:09:30 -0500] "GET /confluent-public/os/ubuntu-20.04.6-x86_64-default/distribution/install.iso HTTP/1.1" 200 1487339520 "-" “Wget" """ I’m not sure how the target system is supposed to be told to use/fetch the boot.img file (doing a grep for “boot.img” in /var/lib/confluent does not bring up anything), but it does not seem to be happening. This seems to track with the console log: """ net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 Next server: 172.17.15.254 Filename: http:// 172.17.15.254 /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe http:// 172.17.15.254 /confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe... ok boot.ipxe : 301 bytes [script] boot/initramfs/addons.cpio... ok boot/initramfs/site.cpio... ok Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::XX%2] Connecting to [fe80::YY%2] ([fe80::YY%eno0]:80) install.iso 5% |* | 74.7M 0:00:17 ETA […] install.iso 100% |********************************| 1418M 0:00:00 ETA cp: can't stat '/custom-installation/iso-override/*': No such file or directory cat: can't open '/tls/*.0': No such file or directory cp: can't stat '/tls/*': No such file or directory Password was not accepted % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (77) error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs passwd: password expiry information changed. Using CD-ROM mount point /cdrom/ Identifying... [e2e9021b074342abd39d6c3842902203-2] Scanning disc for index files… """ After which the install.iso seems to kick-off. There is no kernel output, and the next general output is the cloud-init announcing its interface (IPv6 link-local only), SSHd key generation, a “waiting for clound-init…” message, and the installer is started. It then detects I have a serial console available (in addition to TTY) and asks whether I was rich mode, basic mode, or use SSH. If I select SSH I am told to go to installer@172.17.15.199, which was a different IP than what the PXE boot got (172.17.15.155). Should I perhaps manually tweak the boot.ipxe file to add the kernel parameters in for fetching cloud-init? > On Nov 14, 2023, at 11:52, Jarrod Johnson <jjo...@le...> wrote: > > Ultimately, it should be doing this: > autoinstall ds=nocloud-net;s=https://${ipv4s}/confluent-public/os/${osprofile}/autoinstall/ > > Making changes as appropriate and pulling in the autoinstall in that way. > > However, the networknig comes from: > { > echo "DEVICE='$DEVICE'" > echo "PROTO='none'" > echo "IPV4PROTO='none'" > echo "IPV4ADDR='$v4addr'" > echo "IPV4NETMASK='$v4nm'" > echo "IPV4BROADCAST='$v4nm'" > echo "IPV4GATEWAY='$v4gw'" > echo "IPV4DNS1='$dns'" > echo "HOSTNAME='$NODENAME'" > echo "DNSDOMAIN='$dnsdomain'" > echo "DOMAINSEARCH='$dnsdomain'" > } > "/run/net-$DEVICE.conf" > > > Something along those lines. > > At the time of failure, are you able to ssh in? > >> From: David Magda <dma...@ee...> >> Sent: Tuesday, November 14, 2023 11:28 AM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> So is Confluent supposed to act as a cloud-init datasource? >> >> […] >> >> There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a autoinstall/ directory that contains “meta-data” and “user-data” files. >> >> There’s a lot of output that flies by quite quickly, so I edited the “boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo webUI console could more fully see and capture the output in /var/log/confluent/console/. From there I see Confluent giving a PXE response: >> >> net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 >> Next server: 172.17.15.254 >> Filename: […] >> >> It then switches to link-local IPv6 (?) to fetch the ISO: >> >> Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::AAbb:Cff:feCd:dEE%2] >> Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80) >> install.iso 3% |* | 52.0M 0:00:26 ETA >> install.iso 11% |*** | 162M 0:00:15 ETA >> […] >> >> Cloud-init then seems to be kicked off (with only an IPv6 LL address?): >> >> [ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds. >> [ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds. >> [ 69.064878] cloud-init[2742]: ci-info: +++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ >> [ 69.084789] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ >> [ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | >> [ 69.124838] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ >> [ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64 | . | link | ac:1f:[…] | >> [ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | . | . | . | ac:1f:[…] | >> […] >> >> This seems to fail / error out: >> >> [ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed >> [ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: >> [0;1;31mFailed to connect system bus: No such file or directory >> [ 69.836748] cloud-init[2742]: Falling back to a hard restart of systemd-networkd.service >> [ 70.170428] cloud-init[2742]: Generating public/private rsa key pair. >> >> Bunch of SSH key generation stuff, until we get to: >> >> [ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds. >> [ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds >> [ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - cc_final_message.py[WARNING]: Used fallback datasource >> Ubuntu 20.04.6 LTS ubuntu-server ttyS1 >> connecting... >> waiting for cloud-init… >> >> After which the manual installation of Ubuntu kicks in (the installer noticed that it is (now) running in a serial console, per “boot.ipxe” changes above, and asked if I wanted ‘rich’ or ‘basic’ mode). >> >> > On Nov 10, 2023, at 17:06, David Magda <dma...@ee...> wrote: >> > >> > >> > $ nodedeploy MYHOST >> > MYHOST: pending: ubuntu-20.04.6-x86_64-default >> > >> > I have U22.04 available already as well if testing with that is useful. >> > >> > The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. >> > >> >> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: >> >> >> >> It should cloud-init as a matter of course, just like for the kickstart installs... >> >> >> >> What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... >> >> >> >>> From: David Magda <dma...@ee...> >> >>> Sent: Friday, November 10, 2023 2:16 PM >> >>> To: xCAT Users Mailing list <xca...@li...> >> >>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>> >> >>> Ah, silly me: bad copy-paste. >> >>> >> >>> That command gives: >> >>> >> >>> File "/opt/confluent/bin/confluent_selfcheck", line 241 >> >>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >> >>> ^ >> >>> SyntaxError: invalid syntax >> >>> >> >>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >> >>> >> >>> So I’ve think I’ve finally managed to muddle through this part of the documentation: >> >>> >> >>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >> >>> >> >>> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >> >>> >> >>> >> >>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >> >>>> >> >>>> The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >> >>>> >> >>>> confluent_selfcheck -n MYHOST >> >>>> >> >>>> Say anything interesting? >> >>>> >> >>>>> From: David Magda <dma...@ee...> >> >>>>> Sent: Friday, November 10, 2023 10:50 AM >> >>>>> To: xCAT Users Mailing list <xca...@li...> >> >>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>>>> >> >>>>> Looking in that file there was: >> >>>>> >> >>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >> >>>>> mode, but insecure mode is disabled. Set the attribute >> >>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >> >>>>> support, or use UEFI HTTP boot with HTTPS." } >> >>>>> >> >>>>> Trying to tweak that attribute, I got: >> >>>>> >> >>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >> >>>>> >> >>>>> I tried using nodegroupattrib as well on a group that the host was in, and got: >> >>>>> >> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >> >>>>> >> >>>>> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >> >>>>> >> >>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >> >>>>> >> >>>>> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >> >>>>> >> >>>>> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >> >>>>> >> >>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >> >>>>> >> >>>>> And that did not help. >> >>>>> >> >>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >> >>>>> >> >>>>> Should I edit my dhcpd.conf and rip out things like: >> >>>>> >> >>>>> […] >> >>>>> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >> >>>>> always-broadcast on; >> >>>>> filename = "…" >> >>>>> […] >> >>>>> >> >>>>> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >> >>>>> >> >>>>> >> >>>>> >> >>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >> >>>>>> >> >>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring the request. >> >>>>>> >> >>>>>>> From: David Magda <dm...@ee...> >> >>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >> >>>>>>> To: xCAT Users Mailing list <xca...@li...> >> >>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>>>>>> >> >>>>>>> >> >>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >>>>>>> >> >>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >>>>>>> >> >>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> >>>>>>> ubuntu-20.04.6-x86_64-default” earlier. >> >>>>>>> >> >>>>>>> $ nodeattrib MYHOST >> >>>>>>> MYHOST: console.method: ipmi >> >>>>>>> MYHOST: deployment.apiarmed: once >> >>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> >>>>>>> MYHOST: deployment.profile: >> >>>>>>> MYHOST: deployment.stagedprofile: >> >>>>>>> MYHOST: deployment.state: >> >>>>>>> MYHOST: deployment.state_detail: >> >>>>>>> MYHOST: groups: prox,ipmi,all,everything >> >>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> >>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> >>>>>>> MYHOST: net.ipv4_method: dhcp >> >>>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >> >>>>>>> MYHOST: secret.hardwaremanagementuser: ******** >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> >>>>>>>> >> >>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> >>>>>>>> >> >>>>>>>> If you have a dhcp server with a dynamic range on it, then: >> >>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >> >>>>>>>> >> >>>>>>>> If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> >>>>>>>> >> >>>>>>>> If you have no dhcp server, then it should just do the right thing directly. >> >>>>>>>> >> >>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> >>>>>>>> >> >>>>>>>> If your dhcp has: >> >>>>>>>> option gpxe.no-pxedhcp 1; >> >>>>>>>> Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> >>>>>>>> >> >>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> >>>>>>>> >> >>>>>>>>> From: David Magda >> >>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >> >>>>>>>>> To: xCAT Users Mailing list >> >>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >>>>>>>>> >> >>>>>>>>> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >>>>>>>>> >> >>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >>>>>>>>> >> >>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >>>>>>>>> >> >>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >>>>>>>>>> >> >>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >>>>>>>>>> >> >>>>>>>>>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >>>>>>>>>> >> >>>>>>>>>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >>>>>>>>>> >> >>>>>>>>>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >>>>>>>>>>> From: David Magda >> >>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >> >>>>>>>>>>> To: xCAT Users Mailing list >> >>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >>>>>>>>>>> >> >>>>>>>>>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >>>>>>>>>>> https://hpc.lenovo.com/users/ >> >>>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >> >>>>>>>>>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >>>>>>>>> […] > |
From: David D J. <dav...@br...> - 2023-11-15 11:53:50
|
We built this script to bring up ipoib on RHELS 9.2. Your mileage may vary.... [root@xcat02 postscripts]# more ipoib #!/bin/bash # Define the log function function log { echo "$(date +"%Y-%m-%d %H:%M:%S") - $1" >> /root/post.log 2>&1 logger -t xcat "$1" } # Log script start log "Starting ib-config-as-eth script" # Find the primary interface and its IPv4 address log "Finding the primary interface and its IPv4 address" primary_interface=$(ip route | awk '/default/ {print $5; exit}') primary_ip=$(ip addr show dev $primary_interface | awk '$1 == "inet" {gsub(/\/.* $/, "", $2); print $2}') # Replace "172.20." with "172.25." in the primary IP address log "Replacing IP address" ib_ip=${primary_ip/172.20./172.25.} ib_conname=ib0 # Find first ib interface name ib_ifname=$(cd /sys/class/net; ls -d ib* | head -1) [[ -n $ib_ifname ]] || exit # Add InfiniBand connection log "Adding InfiniBand connection $ib_ifname" nmcli connection add type infiniband con-name $ib_conname ifname $ib_ifname tran sport-mode Datagram # Set IPv4 address log "Setting IPv4 address to $ib_ip/16" nmcli connection modify $ib_conname ipv4.addresses "$ib_ip/16" # Set IPv4 method to manual log "Setting IPv4 method to manual" nmcli connection modify $ib_conname ipv4.method manual # Ignore IPv6 log "Ignoring IPv6" nmcli connection modify $ib_conname ipv6.method ignore # Activate the connection log "Activating the connection" nmcli connection up $ib_conname # Log script end log "/postscripts/ipoib script completed" > On Nov 15, 2023, at 5:52 AM, Tomer Shachaf <tom...@ma...> wrote: > > From my experience, it’s working good for redhat 9. but there the script to configure the ib interface doesn’t working for me , only on 8.5. > Hope it’s helping you . > > בברכה , > > תומר שחף | מהנדס אינטגרציה ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 054-2686841 | > tom...@ma... <mailto:tom...@ma...> | www.matrix.co.il <http://www.matrix.co.il/> > > > >> On 15 Nov 2023, at 12:12, Tovey, Matthew <Mat...@lr...> wrote: >> >> >> Hi list, >> >> Is xcat 2.16.5 working with Redhat 9.X OSs for CNs? The release notes say “alpha support for 9.0” – how well does that work? Does Confluent handle them better? >> >> I’ve tried to install a Rocky 9.0 image on an x86_64 CN, but the kickstart file that xcat comes up with has syntax errors. I’m wondering if there exists a patch with a kickstart file for xcat that will work with rocky 9.0 and/or 9.2. >> >> Thanks, >> >> Matt >> -- >> Matt Tovey >> Future Computing / Quantum Computing / BDAI Administration >> Leibniz Supercomputing Centre Tel. : +49-89-35831-7864 >> Boltzmannstr. 1 >> D-85748 Garching email : mat...@lr... <mailto:mat...@lr...> >> >> >> זהירות: מקור הדואל הזה הוא מחוץ למטריקס. חל איסור ללחוץ על קישורים או לפתוח קבצים מצורפים אלא אם כן השולח מוכר והתוכן בטוח >> Caution: The source of this email is from outside Matrix. it is forbidden to click on links or open attachments unless you recognize the sender and know the content is safe. >> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... <mailto:xCA...@li...> >> https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user> > _______________________________________________ > xCAT-user mailing list > xCA...@li... <mailto:xCA...@li...> > https://lists.sourceforge.net/lists/listinfo/xcat-user <https://lists.sourceforge.net/lists/listinfo/xcat-user> |
From: Samveen G. <sa...@ya...> - 2023-11-15 11:05:01
|
Hi Matthew, I might be able to help, if you can post the kickstart boot log of the failure.--Samveen S. Gulati http://samveen.in The best-laid schemes o' mice an 'men Gang aft agley, An'lea'e us nought but grief an' pain, For promis'd joy! -- Robert Burns (The best laid plans of mice and men often go awry, and bring nothing but grief and pain instead of ..) On Wed, Nov 15, 2023 at 16:23, Tomer Shachaf<tom...@ma...> wrote: _______________________________________________ xCAT-user mailing list xCA...@li... https://lists.sourceforge.net/lists/listinfo/xcat-user |
From: Tomer S. <tom...@ma...> - 2023-11-15 10:53:06
|
From my experience, it’s working good for redhat 9. but there the script to configure the ib interface doesn’t working for me , only on 8.5. Hope it’s helping you . בברכה , תומר שחף | מהנדס אינטגרציה ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 054-2686841 | tom...@ma...<mailto:tom...@ma...> | www.matrix.co.il<http://www.matrix.co.il/> [image001.jpg] On 15 Nov 2023, at 12:12, Tovey, Matthew <Mat...@lr...> wrote: Hi list, Is xcat 2.16.5 working with Redhat 9.X OSs for CNs? The release notes say “alpha support for 9.0” – how well does that work? Does Confluent handle them better? I’ve tried to install a Rocky 9.0 image on an x86_64 CN, but the kickstart file that xcat comes up with has syntax errors. I’m wondering if there exists a patch with a kickstart file for xcat that will work with rocky 9.0 and/or 9.2. Thanks, Matt -- Matt Tovey Future Computing / Quantum Computing / BDAI Administration Leibniz Supercomputing Centre Tel. : +49-89-35831-7864 Boltzmannstr. 1 D-85748 Garching email : mat...@lr...<mailto:mat...@lr...> זהירות: מקור הדואל הזה הוא מחוץ למטריקס. חל איסור ללחוץ על קישורים או לפתוח קבצים מצורפים אלא אם כן השולח מוכר והתוכן בטוח Caution: The source of this email is from outside Matrix. it is forbidden to click on links or open attachments unless you recognize the sender and know the content is safe. _______________________________________________ xCAT-user mailing list xCA...@li... https://lists.sourceforge.net/lists/listinfo/xcat-user |
From: Tovey, M. <Mat...@lr...> - 2023-11-15 10:10:28
|
Hi list, Is xcat 2.16.5 working with Redhat 9.X OSs for CNs? The release notes say "alpha support for 9.0" - how well does that work? Does Confluent handle them better? I've tried to install a Rocky 9.0 image on an x86_64 CN, but the kickstart file that xcat comes up with has syntax errors. I'm wondering if there exists a patch with a kickstart file for xcat that will work with rocky 9.0 and/or 9.2. Thanks, Matt -- Matt Tovey Future Computing / Quantum Computing / BDAI Administration Leibniz Supercomputing Centre Tel. : +49-89-35831-7864 Boltzmannstr. 1 D-85748 Garching email : <mailto:mat...@lr...> mat...@lr... |
From: Jarrod J. <jjo...@le...> - 2023-11-14 16:52:27
|
Ultimately, it should be doing this: autoinstall ds=nocloud-net;s=https://${ipv4s}/confluent-public/os/${osprofile}/autoinstall/ Making changes as appropriate and pulling in the autoinstall in that way. However, the networknig comes from: { echo "DEVICE='$DEVICE'" echo "PROTO='none'" echo "IPV4PROTO='none'" echo "IPV4ADDR='$v4addr'" echo "IPV4NETMASK='$v4nm'" echo "IPV4BROADCAST='$v4nm'" echo "IPV4GATEWAY='$v4gw'" echo "IPV4DNS1='$dns'" echo "HOSTNAME='$NODENAME'" echo "DNSDOMAIN='$dnsdomain'" echo "DOMAINSEARCH='$dnsdomain'" } > "/run/net-$DEVICE.conf" Something along those lines. At the time of failure, are you able to ssh in? ________________________________ From: David Magda <dma...@ee...> Sent: Tuesday, November 14, 2023 11:28 AM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent So is Confluent supposed to act as a cloud-init datasource? https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloudinit.readthedocs.io%2Fen%2F22.4.2%2Ftopics%2Fdatasources.html&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oSlkgQHp3cj1odc6mKTvRcNgk8czFzkkHQSWdOc4%2FUQ%3D&reserved=0<https://cloudinit.readthedocs.io/en/22.4.2/topics/datasources.html> There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a autoinstall/ directory that contains “meta-data” and “user-data” files. There’s a lot of output that flies by quite quickly, so I edited the “boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo webUI console could more fully see and capture the output in /var/log/confluent/console/. From there I see Confluent giving a PXE response: net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 Next server: 172.17.15.254 Filename: https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.15.254%2Fconfluent-public%2Fos%2Fubuntu-20.04.6-x86_64-default%2Fboot.ipxe&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lPvEYKVmMGZEhRK18L3mT5vyBpbhSOfnRJ3%2BJ7Y7%2BRE%3D&reserved=0<http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe> https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.15.254%2Fconfluent-public%2Fos%2Fubuntu-20.04.6-x86_64-default%2Fboot.ipxe&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lPvEYKVmMGZEhRK18L3mT5vyBpbhSOfnRJ3%2BJ7Y7%2BRE%3D&reserved=0<http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe> It then switches to link-local IPv6 (?) to fetch the ISO: Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::AAbb:Cff:feCd:dEE%2] Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80) install.iso 3% |* | 52.0M 0:00:26 ETA install.iso 11% |*** | 162M 0:00:15 ETA […] Cloud-init then seems to be kicked off (with only an IPv6 LL address?): [ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds. [ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds. [ 69.064878] cloud-init[2742]: ci-info: +++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ [ 69.084789] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ [ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 69.124838] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ [ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64 | . | link | ac:1f:[…] | [ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | . | . | . | ac:1f:[…] | […] This seems to fail / error out: [ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed [ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: [0;1;31mFailed to connect system bus: No such file or directory [ 69.836748] cloud-init[2742]: Falling back to a hard restart of systemd-networkd.service [ 70.170428] cloud-init[2742]: Generating public/private rsa key pair. Bunch of SSH key generation stuff, until we get to: [ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds. [ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds [ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - cc_final_message.py[WARNING]: Used fallback datasource Ubuntu 20.04.6 LTS ubuntu-server ttyS1 connecting... waiting for cloud-init… After which the manual installation of Ubuntu kicks in (the installer noticed that it is (now) running in a serial console, per “boot.ipxe” changes above, and asked if I wanted ‘rich’ or ‘basic’ mode). > On Nov 10, 2023, at 17:06, David Magda <dma...@ee...> wrote: > > > $ nodedeploy MYHOST > MYHOST: pending: ubuntu-20.04.6-x86_64-default > > I have U22.04 available already as well if testing with that is useful. > > The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. > >> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: >> >> It should cloud-init as a matter of course, just like for the kickstart installs... >> >> What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... >> >>> From: David Magda <dma...@ee...> >>> Sent: Friday, November 10, 2023 2:16 PM >>> To: xCAT Users Mailing list <xca...@li...> >>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>> >>> Ah, silly me: bad copy-paste. >>> >>> That command gives: >>> >>> File "/opt/confluent/bin/confluent_selfcheck", line 241 >>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >>> ^ >>> SyntaxError: invalid syntax >>> >>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >>> >>> So I’ve think I’ve finally managed to muddle through this part of the documentation: >>> >>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >>> >>> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >>> >>> >>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >>>> >>>> The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >>>> >>>> confluent_selfcheck -n MYHOST >>>> >>>> Say anything interesting? >>>> >>>>> From: David Magda <dma...@ee...> >>>>> Sent: Friday, November 10, 2023 10:50 AM >>>>> To: xCAT Users Mailing list <xca...@li...> >>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>> >>>>> Looking in that file there was: >>>>> >>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >>>>> mode, but insecure mode is disabled. Set the attribute >>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >>>>> support, or use UEFI HTTP boot with HTTPS." } >>>>> >>>>> Trying to tweak that attribute, I got: >>>>> >>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >>>>> >>>>> I tried using nodegroupattrib as well on a group that the host was in, and got: >>>>> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >>>>> >>>>> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >>>>> >>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >>>>> >>>>> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >>>>> >>>>> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >>>>> >>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >>>>> >>>>> And that did not help. >>>>> >>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >>>>> >>>>> Should I edit my dhcpd.conf and rip out things like: >>>>> >>>>> […] >>>>> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>>> always-broadcast on; >>>>> filename = "…" >>>>> […] >>>>> >>>>> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >>>>> >>>>> >>>>> >>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >>>>>> >>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring the request. >>>>>> >>>>>>> From: David Magda <dm...@ee...> >>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >>>>>>> To: xCAT Users Mailing list <xca...@li...> >>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>> >>>>>>> >>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >>>>>>> >>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >>>>>>> >>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >>>>>>> ubuntu-20.04.6-x86_64-default” earlier. >>>>>>> >>>>>>> $ nodeattrib MYHOST >>>>>>> MYHOST: console.method: ipmi >>>>>>> MYHOST: deployment.apiarmed: once >>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >>>>>>> MYHOST: deployment.profile: >>>>>>> MYHOST: deployment.stagedprofile: >>>>>>> MYHOST: deployment.state: >>>>>>> MYHOST: deployment.state_detail: >>>>>>> MYHOST: groups: prox,ipmi,all,everything >>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >>>>>>> MYHOST: net.ipv4_method: dhcp >>>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >>>>>>> MYHOST: secret.hardwaremanagementuser: ******** >>>>>>> >>>>>>> >>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >>>>>>>> >>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >>>>>>>> >>>>>>>> If you have a dhcp server with a dynamic range on it, then: >>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >>>>>>>> >>>>>>>> If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >>>>>>>> >>>>>>>> If you have no dhcp server, then it should just do the right thing directly. >>>>>>>> >>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >>>>>>>> >>>>>>>> If your dhcp has: >>>>>>>> option gpxe.no-pxedhcp 1; >>>>>>>> Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >>>>>>>> >>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >>>>>>>> >>>>>>>>> From: David Magda >>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >>>>>>>>> To: xCAT Users Mailing list >>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>>> >>>>>>>>> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >>>>>>>>> >>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >>>>>>>>> >>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >>>>>>>>> >>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >>>>>>>>>> >>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >>>>>>>>>> >>>>>>>>>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >>>>>>>>>> >>>>>>>>>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >>>>>>>>>> >>>>>>>>>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >>>>>>>>>>> From: David Magda >>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >>>>>>>>>>> To: xCAT Users Mailing list >>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >>>>>>>>>>> >>>>>>>>>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >>>>>>>>>>> https://hpc.lenovo.com/users/ >>>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >>>>>>>>>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >>>>>>>>> […] >>>>> >>> >>> >>> _______________________________________________ >>> xCAT-user mailing list >>> xCA...@li... >>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z6gqMfoNs%2FXCpDMEVr%2BwmDplWtm7rTo3CBCRuneLIvs%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> >>> _______________________________________________ >>> xCAT-user mailing list >>> xCA...@li... >>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z6gqMfoNs%2FXCpDMEVr%2BwmDplWtm7rTo3CBCRuneLIvs%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> > > > _______________________________________________ > xCAT-user mailing list > xCA...@li... > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302807767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TTd4PMTqgrf9iAM9TvZL2ctlSNN4EDo2Qx89ybEj6Sw%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302807767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TTd4PMTqgrf9iAM9TvZL2ctlSNN4EDo2Qx89ybEj6Sw%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> U |
From: David M. <dma...@ee...> - 2023-11-14 16:29:22
|
So is Confluent supposed to act as a cloud-init datasource? https://cloudinit.readthedocs.io/en/22.4.2/topics/datasources.html There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a autoinstall/ directory that contains “meta-data” and “user-data” files. There’s a lot of output that flies by quite quickly, so I edited the “boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo webUI console could more fully see and capture the output in /var/log/confluent/console/. From there I see Confluent giving a PXE response: net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 Next server: 172.17.15.254 Filename: http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe It then switches to link-local IPv6 (?) to fetch the ISO: Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::AAbb:Cff:feCd:dEE%2] Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80) install.iso 3% |* | 52.0M 0:00:26 ETA install.iso 11% |*** | 162M 0:00:15 ETA […] Cloud-init then seems to be kicked off (with only an IPv6 LL address?): [ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds. [ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds. [ 69.064878] cloud-init[2742]: ci-info: +++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ [ 69.084789] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ [ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 69.124838] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ [ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64 | . | link | ac:1f:[…] | [ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | . | . | . | ac:1f:[…] | […] This seems to fail / error out: [ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed [ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: [0;1;31mFailed to connect system bus: No such file or directory [ 69.836748] cloud-init[2742]: Falling back to a hard restart of systemd-networkd.service [ 70.170428] cloud-init[2742]: Generating public/private rsa key pair. Bunch of SSH key generation stuff, until we get to: [ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds. [ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds [ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - cc_final_message.py[WARNING]: Used fallback datasource Ubuntu 20.04.6 LTS ubuntu-server ttyS1 connecting... waiting for cloud-init… After which the manual installation of Ubuntu kicks in (the installer noticed that it is (now) running in a serial console, per “boot.ipxe” changes above, and asked if I wanted ‘rich’ or ‘basic’ mode). > On Nov 10, 2023, at 17:06, David Magda <dma...@ee...> wrote: > > > $ nodedeploy MYHOST > MYHOST: pending: ubuntu-20.04.6-x86_64-default > > I have U22.04 available already as well if testing with that is useful. > > The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. > >> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: >> >> It should cloud-init as a matter of course, just like for the kickstart installs... >> >> What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... >> >>> From: David Magda <dma...@ee...> >>> Sent: Friday, November 10, 2023 2:16 PM >>> To: xCAT Users Mailing list <xca...@li...> >>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>> >>> Ah, silly me: bad copy-paste. >>> >>> That command gives: >>> >>> File "/opt/confluent/bin/confluent_selfcheck", line 241 >>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >>> ^ >>> SyntaxError: invalid syntax >>> >>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >>> >>> So I’ve think I’ve finally managed to muddle through this part of the documentation: >>> >>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >>> >>> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >>> >>> >>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >>>> >>>> The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >>>> >>>> confluent_selfcheck -n MYHOST >>>> >>>> Say anything interesting? >>>> >>>>> From: David Magda <dma...@ee...> >>>>> Sent: Friday, November 10, 2023 10:50 AM >>>>> To: xCAT Users Mailing list <xca...@li...> >>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>> >>>>> Looking in that file there was: >>>>> >>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >>>>> mode, but insecure mode is disabled. Set the attribute >>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >>>>> support, or use UEFI HTTP boot with HTTPS." } >>>>> >>>>> Trying to tweak that attribute, I got: >>>>> >>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >>>>> >>>>> I tried using nodegroupattrib as well on a group that the host was in, and got: >>>>> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >>>>> >>>>> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >>>>> >>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >>>>> >>>>> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >>>>> >>>>> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >>>>> >>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >>>>> >>>>> And that did not help. >>>>> >>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >>>>> >>>>> Should I edit my dhcpd.conf and rip out things like: >>>>> >>>>> […] >>>>> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>>> always-broadcast on; >>>>> filename = "…" >>>>> […] >>>>> >>>>> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >>>>> >>>>> >>>>> >>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >>>>>> >>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring the request. >>>>>> >>>>>>> From: David Magda <dm...@ee...> >>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >>>>>>> To: xCAT Users Mailing list <xca...@li...> >>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>> >>>>>>> >>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >>>>>>> >>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >>>>>>> >>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >>>>>>> ubuntu-20.04.6-x86_64-default” earlier. >>>>>>> >>>>>>> $ nodeattrib MYHOST >>>>>>> MYHOST: console.method: ipmi >>>>>>> MYHOST: deployment.apiarmed: once >>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >>>>>>> MYHOST: deployment.profile: >>>>>>> MYHOST: deployment.stagedprofile: >>>>>>> MYHOST: deployment.state: >>>>>>> MYHOST: deployment.state_detail: >>>>>>> MYHOST: groups: prox,ipmi,all,everything >>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >>>>>>> MYHOST: net.ipv4_method: dhcp >>>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >>>>>>> MYHOST: secret.hardwaremanagementuser: ******** >>>>>>> >>>>>>> >>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >>>>>>>> >>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >>>>>>>> >>>>>>>> If you have a dhcp server with a dynamic range on it, then: >>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >>>>>>>> >>>>>>>> If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >>>>>>>> >>>>>>>> If you have no dhcp server, then it should just do the right thing directly. >>>>>>>> >>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >>>>>>>> >>>>>>>> If your dhcp has: >>>>>>>> option gpxe.no-pxedhcp 1; >>>>>>>> Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >>>>>>>> >>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >>>>>>>> >>>>>>>>> From: David Magda >>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >>>>>>>>> To: xCAT Users Mailing list >>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>>> >>>>>>>>> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >>>>>>>>> >>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >>>>>>>>> >>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >>>>>>>>> >>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >>>>>>>>>> >>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >>>>>>>>>> >>>>>>>>>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >>>>>>>>>> >>>>>>>>>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >>>>>>>>>> >>>>>>>>>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >>>>>>>>>>> From: David Magda >>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >>>>>>>>>>> To: xCAT Users Mailing list >>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >>>>>>>>>>> >>>>>>>>>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >>>>>>>>>>> https://hpc.lenovo.com/users/ >>>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >>>>>>>>>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >>>>>>>>> […] >>>>> >>> >>> >>> _______________________________________________ >>> xCAT-user mailing list >>> xCA...@li... >>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C792090eb799c44203d5f08dbe221c79a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352407016733478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HnVKyN2mc6qmLTaPkQafrcs5ZZ3UV9tp%2B9xFz6jf0bE%3D&reserved=0 >>> _______________________________________________ >>> xCAT-user mailing list >>> xCA...@li... >>> https://lists.sourceforge.net/lists/listinfo/xcat-user > > > _______________________________________________ > xCAT-user mailing list > xCA...@li... > https://lists.sourceforge.net/lists/listinfo/xcat-user |
From: David M. <dm...@ee...> - 2023-11-10 22:23:04
|
$ nodedeploy MYHOST MYHOST: pending: ubuntu-20.04.6-x86_64-default I have U22.04 available already as well if testing with that is useful. The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. > On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: > > It should cloud-init as a matter of course, just like for the kickstart installs... > > What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... > >> From: David Magda <dma...@ee...> >> Sent: Friday, November 10, 2023 2:16 PM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> Ah, silly me: bad copy-paste. >> >> That command gives: >> >> File "/opt/confluent/bin/confluent_selfcheck", line 241 >> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >> ^ >> SyntaxError: invalid syntax >> >> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >> >> So I’ve think I’ve finally managed to muddle through this part of the documentation: >> >> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >> >> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >> >> >> > On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >> > >> > The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >> > >> > confluent_selfcheck -n MYHOST >> > >> > Say anything interesting? >> > >> >> From: David Magda <dma...@ee...> >> >> Sent: Friday, November 10, 2023 10:50 AM >> >> To: xCAT Users Mailing list <xca...@li...> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> Looking in that file there was: >> >> >> >> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >> >> mode, but insecure mode is disabled. Set the attribute >> >> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >> >> support, or use UEFI HTTP boot with HTTPS." } >> >> >> >> Trying to tweak that attribute, I got: >> >> >> >> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >> >> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >> >> >> >> I tried using nodegroupattrib as well on a group that the host was in, and got: >> >> >> >> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >> >> >> >> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >> >> >> >> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >> >> >> >> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >> >> >> >> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >> >> >> >> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >> >> >> >> And that did not help. >> >> >> >> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >> >> >> >> Should I edit my dhcpd.conf and rip out things like: >> >> >> >> […] >> >> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >> >> always-broadcast on; >> >> filename = "…" >> >> […] >> >> >> >> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >> >> >> >> >> >> >> >> > On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >> >> > >> >> > tail /var/log/confluent/events for a hint on why it might be ignoring the request. >> >> > >> >> >> From: David Magda <dm...@ee...> >> >> >> Sent: Wednesday, November 8, 2023 2:46 PM >> >> >> To: xCAT Users Mailing list <xca...@li...> >> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> >> >> >> >> >> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >> >> >> >> >> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >> >> >> >> >> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> >> >> ubuntu-20.04.6-x86_64-default” earlier. >> >> >> >> >> >> $ nodeattrib MYHOST >> >> >> MYHOST: console.method: ipmi >> >> >> MYHOST: deployment.apiarmed: once >> >> >> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> >> >> MYHOST: deployment.profile: >> >> >> MYHOST: deployment.stagedprofile: >> >> >> MYHOST: deployment.state: >> >> >> MYHOST: deployment.state_detail: >> >> >> MYHOST: groups: prox,ipmi,all,everything >> >> >> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> >> >> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> >> >> MYHOST: net.ipv4_method: dhcp >> >> >> MYHOST: secret.hardwaremanagementpassword: ******** >> >> >> MYHOST: secret.hardwaremanagementuser: ******** >> >> >> >> >> >> >> >> >> > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> >> >> > >> >> >> > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> >> >> > >> >> >> > If you have a dhcp server with a dynamic range on it, then: >> >> >> > nodeattrib net.ipv4_method=firmwaredhcp >> >> >> > >> >> >> > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> >> >> > >> >> >> > If you have no dhcp server, then it should just do the right thing directly. >> >> >> > >> >> >> > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> >> >> > >> >> >> > If your dhcp has: >> >> >> > option gpxe.no-pxedhcp 1; >> >> >> > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> >> >> > >> >> >> > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> >> >> > >> >> >> >> From: David Magda >> >> >> >> Sent: Tuesday, November 7, 2023 9:27 AM >> >> >> >> To: xCAT Users Mailing list >> >> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> >> >> >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> >> >> >> >> >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> >> >> >> >> >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >> >> >> >> >> >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >> >> >>> >> >> >> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >> >> >>> >> >> >> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >> >> >>> >> >> >> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >> >> >>> >> >> >> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >> >> >>>> From: David Magda >> >> >> >>>> Sent: Thursday, October 26, 2023 11:22 AM >> >> >> >>>> To: xCAT Users Mailing list >> >> >> >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >> >> >>>> >> >> >> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >> >> >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >> >> >>>> https://hpc.lenovo.com/users/ >> >> >> >>>> https://hpc.lenovo.com/users/documentation/ >> >> >> >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >> >> >> […] >> >> >> >> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C792090eb799c44203d5f08dbe221c79a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352407016733478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HnVKyN2mc6qmLTaPkQafrcs5ZZ3UV9tp%2B9xFz6jf0bE%3D&reserved=0 >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://lists.sourceforge.net/lists/listinfo/xcat-user > |
From: David M. <dma...@ee...> - 2023-11-10 22:23:01
|
$ nodedeploy MYHOST MYHOST: pending: ubuntu-20.04.6-x86_64-default I have U22.04 available already as well if testing with that is useful. The server in question isn’t used for anything special currently. My hope is that once I get some basic stuff going with the SuperMicro hardware we can start upgrading our Lenovo systems. > On Nov 10, 2023, at 14:25, Jarrod Johnson <jjo...@le...> wrote: > > It should cloud-init as a matter of course, just like for the kickstart installs... > > What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... > >> From: David Magda <dma...@ee...> >> Sent: Friday, November 10, 2023 2:16 PM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> Ah, silly me: bad copy-paste. >> >> That command gives: >> >> File "/opt/confluent/bin/confluent_selfcheck", line 241 >> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >> ^ >> SyntaxError: invalid syntax >> >> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. >> >> So I’ve think I’ve finally managed to muddle through this part of the documentation: >> >> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >> >> Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? >> >> >>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: >>> >>> The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >>> >>> confluent_selfcheck -n MYHOST >>> >>> Say anything interesting? >>> >>>> From: David Magda <dma...@ee...> >>>> Sent: Friday, November 10, 2023 10:50 AM >>>> To: xCAT Users Mailing list <xca...@li...> >>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>> >>>> Looking in that file there was: >>>> >>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >>>> mode, but insecure mode is disabled. Set the attribute >>>> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >>>> support, or use UEFI HTTP boot with HTTPS." } >>>> >>>> Trying to tweak that attribute, I got: >>>> >>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >>>> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >>>> >>>> I tried using nodegroupattrib as well on a group that the host was in, and got: >>>> >>>> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >>>> >>>> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >>>> >>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >>>> >>>> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >>>> >>>> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >>>> >>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >>>> >>>> And that did not help. >>>> >>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >>>> >>>> Should I edit my dhcpd.conf and rip out things like: >>>> >>>> […] >>>> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>> always-broadcast on; >>>> filename = "…" >>>> […] >>>> >>>> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >>>> >>>> >>>> >>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >>>>> >>>>> tail /var/log/confluent/events for a hint on why it might be ignoring the request. >>>>> >>>>>> From: David Magda <dm...@ee...> >>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >>>>>> To: xCAT Users Mailing list <xca...@li...> >>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>> >>>>>> >>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >>>>>> >>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >>>>>> >>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >>>>>> ubuntu-20.04.6-x86_64-default” earlier. >>>>>> >>>>>> $ nodeattrib MYHOST >>>>>> MYHOST: console.method: ipmi >>>>>> MYHOST: deployment.apiarmed: once >>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >>>>>> MYHOST: deployment.profile: >>>>>> MYHOST: deployment.stagedprofile: >>>>>> MYHOST: deployment.state: >>>>>> MYHOST: deployment.state_detail: >>>>>> MYHOST: groups: prox,ipmi,all,everything >>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >>>>>> MYHOST: net.ipv4_method: dhcp >>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >>>>>> MYHOST: secret.hardwaremanagementuser: ******** >>>>>> >>>>>> >>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >>>>>>> >>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >>>>>>> >>>>>>> If you have a dhcp server with a dynamic range on it, then: >>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >>>>>>> >>>>>>> If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >>>>>>> >>>>>>> If you have no dhcp server, then it should just do the right thing directly. >>>>>>> >>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >>>>>>> >>>>>>> If your dhcp has: >>>>>>> option gpxe.no-pxedhcp 1; >>>>>>> Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >>>>>>> >>>>>>> I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >>>>>>> >>>>>>>> From: David Magda >>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >>>>>>>> To: xCAT Users Mailing list >>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>> >>>>>>>> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >>>>>>>> >>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >>>>>>>> >>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >>>>>>>> >>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >>>>>>>>> >>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >>>>>>>>> >>>>>>>>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >>>>>>>>> >>>>>>>>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >>>>>>>>> >>>>>>>>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >>>>>>>>>> From: David Magda >>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >>>>>>>>>> To: xCAT Users Mailing list >>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >>>>>>>>>> >>>>>>>>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >>>>>>>>>> https://hpc.lenovo.com/users/ >>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >>>>>>>>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >>>>>>>> […] >>>> >> >> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C792090eb799c44203d5f08dbe221c79a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352407016733478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HnVKyN2mc6qmLTaPkQafrcs5ZZ3UV9tp%2B9xFz6jf0bE%3D&reserved=0 >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://lists.sourceforge.net/lists/listinfo/xcat-user |
From: Jarrod J. <jjo...@le...> - 2023-11-10 19:30:52
|
It should cloud-init as a matter of course, just like for the kickstart installs... What does nodedeploy <node> look like when you hit interactive? May need to look into this more directly next week... ________________________________ From: David Magda <dma...@ee...> Sent: Friday, November 10, 2023 2:16 PM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent Ah, silly me: bad copy-paste. That command gives: File "/opt/confluent/bin/confluent_selfcheck", line 241 for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): ^ SyntaxError: invalid syntax Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. So I’ve think I’ve finally managed to muddle through this part of the documentation: https://hpc.lenovo.com/users/documentation/confluentosdeploy.html Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? > On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: > > The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. > > confluent_selfcheck -n MYHOST > > Say anything interesting? > >> From: David Magda <dma...@ee...> >> Sent: Friday, November 10, 2023 10:50 AM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> Looking in that file there was: >> >> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >> mode, but insecure mode is disabled. Set the attribute >> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >> support, or use UEFI HTTP boot with HTTPS." } >> >> Trying to tweak that attribute, I got: >> >> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >> >> I tried using nodegroupattrib as well on a group that the host was in, and got: >> >> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >> >> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >> >> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >> >> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >> >> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >> >> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >> >> And that did not help. >> >> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >> >> Should I edit my dhcpd.conf and rip out things like: >> >> […] >> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >> always-broadcast on; >> filename = "…" >> […] >> >> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >> >> >> >> > On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >> > >> > tail /var/log/confluent/events for a hint on why it might be ignoring the request. >> > >> >> From: David Magda <dm...@ee...> >> >> Sent: Wednesday, November 8, 2023 2:46 PM >> >> To: xCAT Users Mailing list <xca...@li...> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> >> >> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >> >> >> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >> >> >> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> >> ubuntu-20.04.6-x86_64-default” earlier. >> >> >> >> $ nodeattrib MYHOST >> >> MYHOST: console.method: ipmi >> >> MYHOST: deployment.apiarmed: once >> >> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> >> MYHOST: deployment.profile: >> >> MYHOST: deployment.stagedprofile: >> >> MYHOST: deployment.state: >> >> MYHOST: deployment.state_detail: >> >> MYHOST: groups: prox,ipmi,all,everything >> >> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> >> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> >> MYHOST: net.ipv4_method: dhcp >> >> MYHOST: secret.hardwaremanagementpassword: ******** >> >> MYHOST: secret.hardwaremanagementuser: ******** >> >> >> >> >> >> > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> >> > >> >> > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> >> > >> >> > If you have a dhcp server with a dynamic range on it, then: >> >> > nodeattrib net.ipv4_method=firmwaredhcp >> >> > >> >> > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> >> > >> >> > If you have no dhcp server, then it should just do the right thing directly. >> >> > >> >> > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> >> > >> >> > If your dhcp has: >> >> > option gpxe.no-pxedhcp 1; >> >> > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> >> > >> >> > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> >> > >> >> >> From: David Magda >> >> >> Sent: Tuesday, November 7, 2023 9:27 AM >> >> >> To: xCAT Users Mailing list >> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> >> >> >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> >> >> >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >> >> >> >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >> >>> >> >> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >> >>> >> >> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >> >>> >> >> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >> >>> >> >> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >> >>>> From: David Magda >> >> >>>> Sent: Thursday, October 26, 2023 11:22 AM >> >> >>>> To: xCAT Users Mailing list >> >> >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >> >>>> >> >> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >> >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >> >>>> https://hpc.lenovo.com/users/ >> >> >>>> https://hpc.lenovo.com/users/documentation/ >> >> >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >> >> […] >> _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C792090eb799c44203d5f08dbe221c79a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352407016733478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HnVKyN2mc6qmLTaPkQafrcs5ZZ3UV9tp%2B9xFz6jf0bE%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> |
From: David M. <dma...@ee...> - 2023-11-10 19:17:15
|
Ah, silly me: bad copy-paste. That command gives: File "/opt/confluent/bin/confluent_selfcheck", line 241 for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): ^ SyntaxError: invalid syntax Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE boot parameters and the system launched into the Ubuntu 20.04 installer. The console is prompting me a bunch of questions. So I’ve think I’ve finally managed to muddle through this part of the documentation: https://hpc.lenovo.com/users/documentation/confluentosdeploy.html Is there any documentation about automating Ubuntu installs with Confluent? Does Confluent handle any cloud-init stuff (which was run during the boot process), or is there some other method to send things that partitioning and packing information to Ubuntu? > On Nov 10, 2023, at 11:01, Jarrod Johnson <jjo...@le...> wrote: > > The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. > > confluent_selfcheck -n MYHOST > > Say anything interesting? > >> From: David Magda <dma...@ee...> >> Sent: Friday, November 10, 2023 10:50 AM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> Looking in that file there was: >> >> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure >> mode, but insecure mode is disabled. Set the attribute >> `deployment.useinsecureprotocols` to `firmware` or `always` to enable >> support, or use UEFI HTTP boot with HTTPS." } >> >> Trying to tweak that attribute, I got: >> >> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >> Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid >> >> I tried using nodegroupattrib as well on a group that the host was in, and got: >> >> Error: Bad Request - deployment.useinsecureprotocol attribute is invalid >> >> I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: >> >> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} >> >> And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >> >> Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: >> >> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >> >> And that did not help. >> >> Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). >> >> Should I edit my dhcpd.conf and rip out things like: >> >> […] >> if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent >> always-broadcast on; >> filename = "…" >> […] >> >> to try to see if that will get things going with Confluent? Or are things expected to work with all of that? >> >> >> >> > On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: >> > >> > tail /var/log/confluent/events for a hint on why it might be ignoring the request. >> > >> >> From: David Magda <dm...@ee...> >> >> Sent: Wednesday, November 8, 2023 2:46 PM >> >> To: xCAT Users Mailing list <xca...@li...> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> >> >> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >> >> >> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >> >> >> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> >> ubuntu-20.04.6-x86_64-default” earlier. >> >> >> >> $ nodeattrib MYHOST >> >> MYHOST: console.method: ipmi >> >> MYHOST: deployment.apiarmed: once >> >> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> >> MYHOST: deployment.profile: >> >> MYHOST: deployment.stagedprofile: >> >> MYHOST: deployment.state: >> >> MYHOST: deployment.state_detail: >> >> MYHOST: groups: prox,ipmi,all,everything >> >> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> >> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> >> MYHOST: net.ipv4_method: dhcp >> >> MYHOST: secret.hardwaremanagementpassword: ******** >> >> MYHOST: secret.hardwaremanagementuser: ******** >> >> >> >> >> >> > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> >> > >> >> > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> >> > >> >> > If you have a dhcp server with a dynamic range on it, then: >> >> > nodeattrib net.ipv4_method=firmwaredhcp >> >> > >> >> > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> >> > >> >> > If you have no dhcp server, then it should just do the right thing directly. >> >> > >> >> > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> >> > >> >> > If your dhcp has: >> >> > option gpxe.no-pxedhcp 1; >> >> > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> >> > >> >> > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> >> > >> >> >> From: David Magda >> >> >> Sent: Tuesday, November 7, 2023 9:27 AM >> >> >> To: xCAT Users Mailing list >> >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> >> >> >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> >> >> >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >> >> >> >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >> >>> >> >> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >> >>> >> >> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >> >>> >> >> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >> >>> >> >> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >> >>>> From: David Magda >> >> >>>> Sent: Thursday, October 26, 2023 11:22 AM >> >> >>>> To: xCAT Users Mailing list >> >> >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >> >>>> >> >> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >> >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >> >>>> https://hpc.lenovo.com/users/ >> >> >>>> https://hpc.lenovo.com/users/documentation/ >> >> >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >> >> […] >> |
From: Jarrod J. <jjo...@le...> - 2023-11-10 16:02:53
|
Oh and quick tip about attribute names, try using tab completion to save typing and to help know what the attribute names can be. It's not perfect (doesn't handle attributes with 'middle names' like you can have with net. and power. categories, but it will generally help. ________________________________ From: Jarrod Johnson <jjo...@le...> Sent: Friday, November 10, 2023 11:01 AM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. confluent_selfcheck -n MYHOST Say anything interesting? ________________________________ From: David Magda <dma...@ee...> Sent: Friday, November 10, 2023 10:50 AM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent Looking in that file there was: Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure mode, but insecure mode is disabled. Set the attribute `deployment.useinsecureprotocols` to `firmware` or `always` to enable support, or use UEFI HTTP boot with HTTPS." } Trying to tweak that attribute, I got: $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid I tried using nodegroupattrib as well on a group that the host was in, and got: Error: Bad Request - deployment.useinsecureprotocol attribute is invalid I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 And that did not help. Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). Should I edit my dhcpd.conf and rip out things like: […] if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = "https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.15.254%2Ftftpboot%2Fxcat%2Fxnba%2Fnets%2F172.17.8.0_21%25E2%2580%259D&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b5EQO6JtOhDzRjfYRTBsWxE%2B4iemyRLjTguJa2qPtB4%3D&reserved=0; […] to try to see if that will get things going with Confluent? Or are things expected to work with all of that? > On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: > > tail /var/log/confluent/events for a hint on why it might be ignoring the request. > >> From: David Magda <dm...@ee...> >> Sent: Wednesday, November 8, 2023 2:46 PM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> ubuntu-20.04.6-x86_64-default” earlier. >> >> $ nodeattrib MYHOST >> MYHOST: console.method: ipmi >> MYHOST: deployment.apiarmed: once >> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> MYHOST: deployment.profile: >> MYHOST: deployment.stagedprofile: >> MYHOST: deployment.state: >> MYHOST: deployment.state_detail: >> MYHOST: groups: prox,ipmi,all,everything >> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> MYHOST: net.ipv4_method: dhcp >> MYHOST: secret.hardwaremanagementpassword: ******** >> MYHOST: secret.hardwaremanagementuser: ******** >> >> >> > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> > >> > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> > >> > If you have a dhcp server with a dynamic range on it, then: >> > nodeattrib net.ipv4_method=firmwaredhcp >> > >> > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> > >> > If you have no dhcp server, then it should just do the right thing directly. >> > >> > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> > >> > If your dhcp has: >> > option gpxe.no-pxedhcp 1; >> > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> > >> > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> > >> >> From: David Magda >> >> Sent: Tuesday, November 7, 2023 9:27 AM >> >> To: xCAT Users Mailing list >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >> >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >>> >> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >>> >> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >>> >> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >>> >> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >>>> From: David Magda >> >>>> Sent: Thursday, October 26, 2023 11:22 AM >> >>>> To: xCAT Users Mailing list >> >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >>>> >> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >>>> https://hpc.lenovo.com/users/ >> >>>> https://hpc.lenovo.com/users/documentation/ >> >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >> […] >> >> >> >> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=59Da2sup2Z4sMHecbIXWhZ2ci75P1DZt2%2FutngGdNF4%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=59Da2sup2Z4sMHecbIXWhZ2ci75P1DZt2%2FutngGdNF4%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> > _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=59Da2sup2Z4sMHecbIXWhZ2ci75P1DZt2%2FutngGdNF4%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> The |
From: Jarrod J. <jjo...@le...> - 2023-11-10 16:01:35
|
The attribute name is plural, with s at the end. deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. confluent_selfcheck -n MYHOST Say anything interesting? ________________________________ From: David Magda <dma...@ee...> Sent: Friday, November 10, 2023 10:50 AM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent Looking in that file there was: Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure mode, but insecure mode is disabled. Set the attribute `deployment.useinsecureprotocols` to `firmware` or `always` to enable support, or use UEFI HTTP boot with HTTPS." } Trying to tweak that attribute, I got: $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid I tried using nodegroupattrib as well on a group that the host was in, and got: Error: Bad Request - deployment.useinsecureprotocol attribute is invalid I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 And that did not help. Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). Should I edit my dhcpd.conf and rip out things like: […] if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = "https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.15.254%2Ftftpboot%2Fxcat%2Fxnba%2Fnets%2F172.17.8.0_21%25E2%2580%259D&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b5EQO6JtOhDzRjfYRTBsWxE%2B4iemyRLjTguJa2qPtB4%3D&reserved=0; […] to try to see if that will get things going with Confluent? Or are things expected to work with all of that? > On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: > > tail /var/log/confluent/events for a hint on why it might be ignoring the request. > >> From: David Magda <dm...@ee...> >> Sent: Wednesday, November 8, 2023 2:46 PM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> ubuntu-20.04.6-x86_64-default” earlier. >> >> $ nodeattrib MYHOST >> MYHOST: console.method: ipmi >> MYHOST: deployment.apiarmed: once >> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> MYHOST: deployment.profile: >> MYHOST: deployment.stagedprofile: >> MYHOST: deployment.state: >> MYHOST: deployment.state_detail: >> MYHOST: groups: prox,ipmi,all,everything >> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> MYHOST: net.ipv4_method: dhcp >> MYHOST: secret.hardwaremanagementpassword: ******** >> MYHOST: secret.hardwaremanagementuser: ******** >> >> >> > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> > >> > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> > >> > If you have a dhcp server with a dynamic range on it, then: >> > nodeattrib net.ipv4_method=firmwaredhcp >> > >> > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> > >> > If you have no dhcp server, then it should just do the right thing directly. >> > >> > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> > >> > If your dhcp has: >> > option gpxe.no-pxedhcp 1; >> > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> > >> > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> > >> >> From: David Magda >> >> Sent: Tuesday, November 7, 2023 9:27 AM >> >> To: xCAT Users Mailing list >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >> >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >>> >> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >>> >> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >>> >> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >>> >> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >>>> From: David Magda >> >>>> Sent: Thursday, October 26, 2023 11:22 AM >> >>>> To: xCAT Users Mailing list >> >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >>>> >> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >>>> https://hpc.lenovo.com/users/ >> >>>> https://hpc.lenovo.com/users/documentation/ >> >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >> […] >> >> >> >> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=59Da2sup2Z4sMHecbIXWhZ2ci75P1DZt2%2FutngGdNF4%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=59Da2sup2Z4sMHecbIXWhZ2ci75P1DZt2%2FutngGdNF4%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> > _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C19f3189f0ae345d809ce08dbe205186a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352283759014066%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=59Da2sup2Z4sMHecbIXWhZ2ci75P1DZt2%2FutngGdNF4%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> The |
From: David M. <dma...@ee...> - 2023-11-10 15:51:22
|
Looking in that file there was: Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in insecure mode, but insecure mode is disabled. Set the attribute `deployment.useinsecureprotocols` to `firmware` or `always` to enable support, or use UEFI HTTP boot with HTTPS." } Trying to tweak that attribute, I got: $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware Error: Bad Request - deployment.useinsecureprotocol attribute on node MYHOST is invalid I tried using nodegroupattrib as well on a group that the host was in, and got: Error: Bad Request - deployment.useinsecureprotocol attribute is invalid I then edited the reply_dhcp4(() function in /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change the default check to remove the “return;" in the "if insecuremode == 'never' and not httpboot:" stanza so that it would continue going. The log message still appears (so I know the code is getting there), but the events file now has: Nov 09 09:18:34 {"info": "Offering PXE boot without address, served from 172.17.15.254 to MYHOST"} And the system is still booting xCat (I have commented out "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). Not running the dhcpd at all simply has the system timeout on its PXE attempt. I told Confluent about the particular IP address the system should have: $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 And that did not help. Per "lsof -i udp", Confluent is listening on (amongst many other ports) *:bootps, *:dhcpv6-server, *:pxe (etc). Should I edit my dhcpd.conf and rip out things like: […] if option user-class-identifier = "xNBA" and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = "http://172.17.15.254:80/tftpboot/xcat/xnba/nets/172.17.8.0_21”; […] to try to see if that will get things going with Confluent? Or are things expected to work with all of that? > On Nov 8, 2023, at 16:19, Jarrod Johnson <jjo...@le...> wrote: > > tail /var/log/confluent/events for a hint on why it might be ignoring the request. > >> From: David Magda <dm...@ee...> >> Sent: Wednesday, November 8, 2023 2:46 PM >> To: xCAT Users Mailing list <xca...@li...> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. >> >> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. >> >> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n >> ubuntu-20.04.6-x86_64-default” earlier. >> >> $ nodeattrib MYHOST >> MYHOST: console.method: ipmi >> MYHOST: deployment.apiarmed: once >> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >> MYHOST: deployment.profile: >> MYHOST: deployment.stagedprofile: >> MYHOST: deployment.state: >> MYHOST: deployment.state_detail: >> MYHOST: groups: prox,ipmi,all,everything >> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >> MYHOST: net.ipv4_method: dhcp >> MYHOST: secret.hardwaremanagementpassword: ******** >> MYHOST: secret.hardwaremanagementuser: ******** >> >> >> > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >> > >> > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also >> > >> > If you have a dhcp server with a dynamic range on it, then: >> > nodeattrib net.ipv4_method=firmwaredhcp >> > >> > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. >> > >> > If you have no dhcp server, then it should just do the right thing directly. >> > >> > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. >> > >> > If your dhcp has: >> > option gpxe.no-pxedhcp 1; >> > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. >> > >> > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' >> > >> >> From: David Magda >> >> Sent: Tuesday, November 7, 2023 9:27 AM >> >> To: xCAT Users Mailing list >> >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >> >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >> >>> >> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >> >>> >> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >> >>> >> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >> >>> >> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >> >>>> From: David Magda >> >>>> Sent: Thursday, October 26, 2023 11:22 AM >> >>>> To: xCAT Users Mailing list >> >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >> >>>> >> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >> >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >> >>>> https://hpc.lenovo.com/users/ >> >>>> https://hpc.lenovo.com/users/documentation/ >> >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> >> […] >> >> >> >> >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C992d7a7240f9433f0fb808dbe094523d%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638350699880372279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ivJb35ri%2FruSw68MzH3DhFixSArSkGatBtpifBAzZbE%3D&reserved=0 >> _______________________________________________ >> xCAT-user mailing list >> xCA...@li... >> https://lists.sourceforge.net/lists/listinfo/xcat-user > |
From: Tomer S. <tom...@ma...> - 2023-11-10 09:54:18
|
It will be greatly בברכה , תומר שחף | מהנדס אינטגרציה ותשתיות | חטיבת אינטגרציה ותשתיות | מטריקס | נייד 054-2686841 | tom...@ma...<mailto:tom...@ma...> | www.matrix.co.il<http://www.matrix.co.il/> [image001.jpg] On 24 Oct 2023, at 21:16, Ryan Novosielski via xCAT-user <xca...@li...> wrote: Bear in mind that this is called “Confluent” (pronounced Con-FLU-ent), and not Confluence, which is a part of the Jira suite of tools (nor Apache Confluent — this namespace seems a little crowded). -- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - nov...@ru... || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `' On Oct 24, 2023, at 12:07, David Magda <dma...@ee...> wrote: Where is the ‘community’ for Confluence gathering? Any mailing lists? Where does the code live? Bug reports and patches / pull requests? On Sep 21, 2023, at 17:13, Jarrod Johnson <jjo...@le...> wrote: Yes, we are committed to it being open source ongoing. I won't rule out proprietary things built on top of it, but at least in all the ways that exist today and the CLI I don't imagine any changes. Currently, the GUI is not technically open sourced (though everyone gets the source code, but no redistribution). I do hope to at least open source our upcoming browser library that makes writing a webui with all the async behaviors a bit more trivial (which is what the next WebUI will be written with). […] From: Don Avart <da...@re...> Sent: Thursday, September 21, 2023 1:05 PM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: Announcement: xCAT Project End-Of-Life planned for December 1, 2023 I couldn’t agree more with Brian’s sentiment about xCAT. We, RedLine, have been xCAT users, integrators and occasional contributors since the end of IBM’s CSM. We’ve deployed it on numerous vendor platforms and it just works. As a small business in the greater HPC marketplace we have many customers that rely on xCAT and we will need to work with them to identify an alternative should xCAT discontinue. I’ve reached out to the IBM team as well as Jarrod from Lenovo and others in the community. I am very interested in putting together a plan that would continue to provide an open source option that is platform agnostic. With respect to Jarrod’s comments about using Confluent as a starting point for future development of xCAT, there are a number of considerations. Here are a few. • Is Lenovo committed to keeping Confluent open-source • Is Lenovo open to integration of features/capabilities of non-Lenovo vendors • Governance. Who controls changes to the code base and future development directions • Does xCAT remain it’s own project and share code with Confluent or do they become one project There are definitely other considerations, but I just wanted to get a few thoughts out there. My opinion is that Jarrod’s idea is one that should be given significant thought and debate. xCAT2 was, according to everything I’ve read, a complete rewrite of the original xCAT. Therefore, adopting Confluent as the next version is not a bridge too far, in my opinion. I also can’t speak to the original intentions of IBM when xCAT2 was released with respect to multi-vendor support. I can say that as a member of the xCAT community I would like to see the project continue as open source and vendor agnostic. I would really like to hear from anyone that is interested in keeping the project alive. I’m hopeful that we can reach a solution as a community. Best Regards, ---- Don Avart CTO RedLine Performance Solutions, LLC (703) 634-5686 da...@re... On Sep 21, 2023, at 10:59 AM, Jarrod Johnson <jjo...@le...> wrote: There are at least some options I've heard discussed, if anyone has feedback: -Someone to take over the xCAT 2.x codebase as-is, adding some missing stuff like Ubuntu 20+ support, RHEL9, etc. I don't know that anyone has volunteered to go all in on all that exactly yet. -Try to establish a community around confluent (potentially as 'xCAT 3'). This may suggest some sort of rebranding and/or governance changes, but basically starting from confluent instead of xCAT 2 for the xCAT-like experience. Not precisely xCAT-like but was designed "by one of the designers of xCAT 2" with a lot of sensibilities preserved. Given that there's not much in the way of 'backwards compatibility', I'm cautious about the 'xCAT 3' branding, and while I would be a consistent contributor across xCAT 2.0 through 2.8 and then confluent, it would technically be a change from an IBM to Lenovo contributions, which I could see being a challenge. -The current default trajectory is an archived project and people having to decide for themselves what to do next (only 'all-in-one' options that I know to be cross-platform are Bright and Confluent, if just OS deployment, then I commonly see Foreman used for diskful, with Warewulf being an option for mostly diskless scenario). Obviously, I like Confluent best, but of course I would. From: Brian Joiner <mar...@gm...> Sent: Thursday, September 21, 2023 9:57 AM To: xca...@li... <xca...@li...> Subject: [External] Re: [xcat-user] Announcement: xCAT Project End-Of-Life planned for December 1, 2023 This is the saddest thing I've hear in some time. I've had the chance to support customers with Bright, HP cluster manager, and xCAT. xCAT was by far the best. Thank you for all your work, I hope that a transition can happen! Thanks, Brian J On 9/1/23 11:49 AM, Nathan A Besaw via xCAT-user wrote: Mark Gurevich, Peter Wong, and I have been the primary xCAT maintainers for the past few years. This year, we have moved on to new roles unrelated to xCAT and can no longer continue to support the project. As a result, we plan to archive the project on December 1, 2023. xCAT 2.16.5, released on March 7, 2023, is our final planned release. We would consider transitioning responsibility for the project to a new group of maintainers if members of the xCAT community can develop a viable proposal for future maintenance. Thank you all for you support of the project over the past 20+ years. _______________________________________________ xCAT-user mailing list xCA...@li... https://lists.sourceforge.net/lists/listinfo/xcat-user _______________________________________________ xCAT-user mailing list xCA...@li... https://lists.sourceforge.net/lists/listinfo/xcat-user _______________________________________________ xCAT-user mailing list xCA...@li... https://lists.sourceforge.net/lists/listinfo/xcat-user _______________________________________________ xCAT-user mailing list xCA...@li...<mailto:xCA...@li...> https://lists.sourceforge.net/lists/listinfo/xcat-user זהירות: מקור הדואל הזה הוא מחוץ למטריקס. חל איסור ללחוץ על קישורים או לפתוח קבצים מצורפים אלא אם כן השולח מוכר והתוכן בטוח Caution: The source of this email is from outside Matrix. it is forbidden to click on links or open attachments unless you recognize the sender and know the content is safe. _______________________________________________ xCAT-user mailing list xCA...@li... https://lists.sourceforge.net/lists/listinfo/xcat-user |
From: Jarrod J. <jjo...@le...> - 2023-11-08 21:19:24
|
tail /var/log/confluent/events for a hint on why it might be ignoring the request. ________________________________ From: David Magda <dm...@ee...> Sent: Wednesday, November 8, 2023 2:46 PM To: xCAT Users Mailing list <xca...@li...> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent I did a “service dhcpd stop” and a “service confluent restart”, and the SuperMicro did not receive any reply to the DHCP/PXE packets it was sending out. I then did a “service dhcpd start” and the “xcat/genesis” file was loaded. The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and restarting did not change any behaviour. I noticed that “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being referenced. Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not sure why it is not answering. I had run a “nodedeploy MYHOST -n ubuntu-20.04.6-x86_64-default” earlier. $ nodeattrib MYHOST MYHOST: console.method: ipmi MYHOST: deployment.apiarmed: once MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default MYHOST: deployment.profile: MYHOST: deployment.stagedprofile: MYHOST: deployment.state: MYHOST: deployment.state_detail: MYHOST: groups: prox,ipmi,all,everything MYHOST: hardwaremanagement.manager: MYHOST-ipmi MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD MYHOST: net.ipv4_method: dhcp MYHOST: secret.hardwaremanagementpassword: ******** MYHOST: secret.hardwaremanagementuser: ******** > On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: > > If dhcpd.conf is set to not send any 'filename', it's best. If you don't need a dhcp server, then you can turn it off. There's also > > If you have a dhcp server with a dynamic range on it, then: > nodeattrib net.ipv4_method=firmwaredhcp > > If you have a dhcp server with static reservations, you could either have dhcp continue, or disallow dhcp for the confluent node. > > If you have no dhcp server, then it should just do the right thing directly. > > If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however you own the IPAM sort of responsibility totally. > > If your dhcp has: > option gpxe.no-pxedhcp 1; > Please remove that to let confluent merge an offer with an uncoordinated dhcp server. > > I need to do a deeper right up on the detail about dhcp interaction, how it is now optional, and how it can coexist with an unmanaged dhcp server and free the dhcp server from 'filename' > >> From: David Magda >> Sent: Tuesday, November 7, 2023 9:27 AM >> To: xCAT Users Mailing list >> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >> >> After running the first few commands, I have /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, distribution}/ubuntu* present, along with genesis-x86_64/. >> >> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >> >> Do I need to tweak the dhcpd.conf just for the test system I’m playing with, or should a completely new dhcpd.conf file be put in place for using Confluent? (Moving the current one out of the way, perhaps temporarily until I get an understanding of Confluent so I can revert to xCat if need-be.) >> >>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >>> >>> I will say that EL7 hasn't been tested and thus we haven't pushed updates since 3.8.0, but 3.8.0 should be plenty. >>> >>> The confluent you have going is already enough to start examining OS deployment profiles. If you would like to, you can use commands like osdeploy initialize and osdeploy import and even imgutil build, and it won't mess with xCAT. >>> >>> When you get to nodedeploy, that is the time when you have to start planning around potential disruption as xCAT and confluent might fight over who gets to deploy a system, and that can be confusing. We should document formally how to mask a node from xCAT ('!*NOIP*' in mac table) to let one kick the tires with a node... >>> >>> I can help look at a few people kicking tires, certainly seems worthy of documentation or video example... >>>> From: David Magda >>>> Sent: Thursday, October 26, 2023 11:22 AM >>>> To: xCAT Users Mailing list >>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >>>> >>>> Yes, there was perhaps auto-completion with regards Confluent/Confluence. >>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) installation on RHEL 7 that I inherited; if one wants to fully move from xCAT to Confluent, is there document on how to ‘extract’ oneself from xCAT? I don’t see anything that jumps out at: >>>> https://hpc.lenovo.com/users/ >>>> https://hpc.lenovo.com/users/documentation/ >>>> Should I simply abandon the previous installation and do a fresh install? While there is some documentation, the system leans towards being heavily vendor-used so people completely new to it have a steep learning curve (xCAT is/was also challenging to get into since it was fairly vendor-focused). >> […] >> _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C992d7a7240f9433f0fb808dbe094523d%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638350699880372279%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ivJb35ri%2FruSw68MzH3DhFixSArSkGatBtpifBAzZbE%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> |