#61 Hung "ls -l" via vglrun on CentOS/SL 6.2 x86_64

closed-wont-fix
DRC
VirtualGL (44)
5
2015-01-29
2012-10-12
Anonymous
No

When running a command like:
ls -l /lib64/libc.so.6
via vglrun cause a hung on CentOS/SL 6.2 x86_64 (with latest updates)
Reproducible: 100%
Environment:
CentOS/Scientific Linux 6.2 x86_64 (base release and with latest updates)
SELinux disabled
vglusers restrictions not enabled
VirtualGL v.2.3.2 x86_64
Nvidia Drivers 295.20 x86_64

Discussion

  • DRC

    DRC - 2012-10-12

    Why are you trying to use 'ls -l' with vglrun? vglrun should only be used with 3D applications. If your application is launched via a script that invokes 'ls -l', then you will need to edit the script per Chapter 13 of the VirtualGL User's Guide.

    Preloading VirtualGL into an application causes libGL to be dynamically loaded at a much earlier point in the application's execution (or, in the case of 'ls', libGL is dynamically loaded into an application that normally wouldn't load it at all.) It should be possible to modify VirtualGL to load libGL later in the execution, but such would entail some extensive modifications to VirtualGL, and it hasn't yet presented enough of an issue to justify the time/money required to make said modifications.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-10-15

    I run ANSYS CFX with vglrun. On CentOS/SL/RH 5 x86_64 there are no problems with launching via vglrun. Only on new systems with version CentOS/SL 5 x86_64 6.2. Hung was detected on 'ls -l' used in CFX launcher (written on bash + perl).

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-10-15

    === on Scientific Linux v.5.8 (WORK) ===
    vglrun +tr +v ls -l /lib64/libc.so.6
    [VGL] dlopen (filename=libc.so.6 flag=1 retval=0x2b0c1d0919a8)
    [VGL] dlopen (filename=NULL flag=257 retval=0x2b0c1cd42000)
    [VGL] dlopen (filename=libselinux.so.1 flag=1 retval=0x2b0c1d0914e0)
    [VGL] dlopen (filename=NULL flag=1 retval=0x2b0c1cd42000)
    [VGL] dlopen (filename=/lib64/libc.so.6 flag=1 retval=0x2b0c1d0919a8)
    [VGL] dlopen (filename=/lib64/libdl.so.2 flag=1[VGL] NOTICE: Replacing dlopen("/lib64/libdl.so.2") with dlopen("libdlfaker.so")
    retval=0x2b0c1cd431a0)
    [VGL] dlopen (filename=/lib64/libpthread.so.0 flag=1 retval=0x2b0c1d092838)
    [VGL] dlopen (filename=libnvidia-tls.so.295.20 flag=1 retval=0x2b0c1d3abe90)
    [VGL] dlopen (filename=NULL flag=1 retval=0x2b0c1cd42000)
    [VGL] dlopen (filename=libc.so.6 flag=1 retval=0x2b0c1d0919a8)
    lrwxrwxrwx 1 root root 11 Aug 21 10:15 /lib64/libc.so.6 -> libc-2.5.so

    === on Scientific Linux v.6.2 (NOT WORKED) ===
    vglrun +tr +v ls -l /lib64/libc.so.6
    [VGL] dlopen (filename=libc.so.6 flag=1 retval=0x7f87e46bf4c8)
    [VGL] dlopen (filename=NULL flag=257 retval=0x3fd9821188)
    [VGL] dlopen (filename=libselinux.so.1 flag=1 retval=0x7f87e46c0018)
    [VGL] dlopen (filename=NULL flag=1 retval=0x3fd9821188)
    [VGL] dlopen (filename=/lib64/libc.so.6 flag=1 retval=0x7f87e46bf4c8)
    [VGL] dlopen (filename=/lib64/libdl.so.2 flag=1[VGL] NOTICE: Replacing dlopen("/lib64/libdl.so.2") with dlopen("libdlfaker.so")
    retval=0x7f87e4a02660)
    [VGL] dlopen (filename=/lib64/libpthread.so.0 flag=1 retval=0x7f87e46be4c8)
    [VGL] dlopen (filename=libnvidia-tls.so.295.20 flag=1 retval=0x7f87e46bc5a8)
    [VGL] dlopen (filename=NULL flag=1 retval=0x3fd9821188)
    [VGL] dlopen (filename=libc.so.6 flag=1 retval=0x7f87e46bf4c8)
    <<<hung>>>

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-10-15

    Ok. Not very good solution (on CentOS/SL/RH v5 no problem), but...
    Editing the script at problematic part as described in Chapter 13 of the VirtualGL User's Guide (save/unset/restore LD_PRELOAD) help to solve problem on SL 6.2.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-10-15
    • status: open --> closed
     
  • DRC

    DRC - 2012-10-15

    OK, thanks for the info. We'll consider it worked around for now, but I will likely need to consider modifying VGL at some point in the future to delay the loading of libGL, since the nVidia version seems to occasionally cause problems like this when it is loaded early in the execution of some applications.

     
  • DRC

    DRC - 2012-10-31
    • status: closed --> closed-wont-fix
     
  • DRC

    DRC - 2012-10-31

    Actually, further investigation reveals that delaying the loading of libGL won't work, because librrfaker has to act as a substitute libGL in some cases (specifically, with programs that use dlopen() to access libGL.) Thus, librrfaker.so has to link directly with libGL, and there doesn't seem to be any way around the specific issue described in this bug. I will document the workaround.

     
  • DRC

    DRC - 2012-10-31

    Can you provide the path of the script that you edited and specifics regarding where you added the save/restore LD_PRELOAD code? This is so I can add an appropriate entry in the Application Recipe's section of the User's Guide.

     
  • DRC

    DRC - 2013-02-15

    More research by myself and others in the VirtualGL community has revealed that this issue appears to be due to SELinux file attributes that exist on certain system DSO's that /bin/ls loads. Why those attributes cause issues with LD_PRELOAD is unknown, but the attributes are not removed when disabling SELinux, so they have to be removed manually. One user reported that simply removing the SELinux attributes on libc was sufficient:

    attr -S -r selinux /usr/lib64/libc.so
    attr -S -r selinux /lib64/libc.so.6

    That unfortunately didn't work for me. In my case, I had to systematically remove the attributes from all files on the system:

    cd /
    sudo find . -mount | while read file; do sudo attr -S -r selinux "$file"; done

    (NOTE: these attributes can be restored simply by re-enabling SELinux and rebooting.)

    After that, 'vglrun ls -l' worked.

    Hoping someone in the community has further insight. It would be nice to get this working without disabling SELinux.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2013-03-07

    Yes!!! Thanks!!! Removing the SELinux attributes on libc was solve a problem in my case! :)
    attr -S -r selinux /usr/lib64/libc.so
    attr -S -r selinux /lib64/libc.so.6

     
  • Kyle Brenneman

    Kyle Brenneman - 2013-05-22

    There's another hang that can happen with some older versions of libselinux, regardless of any SELinux attributes. This hang will happen if you run an executable that links against libselinux (which ls does), and if librrfaker links against in a library that loads libselinux using dlopen (which Nvidia's libGL does). During process termination, it hangs when a destructor function in libselinux tries to access a __thread variable.

    I've only tested libselinux 2.0.91, but I think it's fixed as far back as 2.0.85.

     
  • DRC

    DRC - 2015-01-29

    Reports are that upgrading to nVidia driver 340.65 fixes this issue. I have been unable to verify that, because my only installations of RHEL 6 are on a laptop with an AMD GPU and on an old machine that can't run any nVidia driver later than 304.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks