From: Jarrod J. <jjo...@le...> - 2024-01-25 12:52:51
|
Anything in /var/log/confluent/stderr or /var/log/confluent/trace? Also would be tempted to see if 'confluent_selfcheck' has any suggestions. You can also ssh into the node during that phase to confirm what it is doing while it is seemingly hung, e.g. looking at ps axf ________________________________ From: David Magda <dma...@ee...> Sent: Wednesday, January 24, 2024 9:37 PM To: xCA...@li... <xCA...@li...> Subject: [External] [xcat-user] Ansible and Confluent Hello, I'm trying to get Ansible working with Confluent 3.8.0. (Using an older version due to legacy OS reasons.) In /var/lib/confluent/public/os/ I created a new profile called ubuntu-22.04.3-x86_64-test1/, and this seems to work just fine: I took the provided "autoinstall/user-data" file, added some partition stanzas, some packages, etc. Once I sorted out a 'basic' automated Ubuntu install I tried creating a "ansible/post.d/01-packages.yaml" file with-in the profile directory with the following contents: """ - name: install chrony apt: pkg: - chrony """ The Ubuntu (subiquity) installer seems to 'hang' at: """ start: subiquity/Late/run/command_1: /custom-installation/post.sh """ which probably corresponds to this part of the "user-data" file: """ late-commands: - chroot /target apt-get -y -q purge snapd modemmanager - /custom-installation/post.sh """ When the 'hang' occurs the following starts filling up the "/var/log/httpd/ssl_access_log" file of the Confluent/xcat server: """ fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - """ When I force a restart of the system/VM, it can boot off the disk, and goes through the regular start-up process, including a bunch of cloud-init stuff. Though after it runs "/etc/confluent/firstboot.sh", the "ssl_access_log" file once again starts filling with the "remoteconfig/status" stuff per above. Renaming "ansible/" to "ansible_off/" seems to make the problem go away. Similar behaviour with Ubuntu 20.04. I'm wondering what's going with the 'hang' when "post.sh" is executed, and the flooding after "firstboot.sh". Regards, David _______________________________________________ xCAT-user mailing list xCA...@li... https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C1a071e27a40c447e020208dc1d50acd8%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638417479688016346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C60000%7C%7C%7C&sdata=rjezz0DVeivcDm%2FQyUPGNj1CPft3hI381qfEn%2BKPHkA%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user> |