Thread: [clisp-list] FFI feature proposal: versioned symbols.

Brought to you by: haible, hoehle, sds

clisp-list

[clisp-list] FFI feature proposal: versioned symbols.

From: Kaz K. <kky...@gm...> - 2006-11-29 08:09:46

In glibc, there is a dlvsym() function for retrieving versioned symbols.

This is very useful if you are writing an application, targetting the
ABI of a shared library, to ensure that old versions your program will
use the right ABI even in new versions of that library (as long as
they don't drop support for that ABI).

Concrete example. RIght now, I'm using the CLISP FFI to call
"__xstat64" in "libc.so.6". But "__xstat64" refers to the latest and
greatest ABI for that symbol which a given installation of "libc.so.6"
exports. That latest and greatest function could, for instance, think
that the struct stat object I'm giving it is bigger than it really is,
and write beyond its end.

What I want is to access the "GLIBC_2.2" version of that symbol. This
might appear in the "nm" listing of the library as
"__xstat64@@GLIBC_2.2".

You can't get to this symbol with the dlsym API. And the @@ will
change to @ anyway if GLIBC_2.2 is no longer the default version for
that symbol.

The way you ask for this symbol is dlvsym(handle, "__xstat64", "GLIBC_2.2").

And so, what if there was a (:SYMVER ...) option in the CLISP FFI
whereby you could specify the version, thereby causing it to use
dlvsym? (Rationale for the name: derived from the .symver GNU
assembler directive for defining versioned symbols).

(def-call-out __xstat64
  (:library "libc.so.6")
  (:symver "GLIBC_2.2")
  ... etc)

Now you can be quite confident that even though you tested the code
with, say, glibc-2.3.4, it will still run if you upgrade to glibc 2.5.
That is to say, run as well as any of the compiled C programs linked
to the library.

Comments?

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Sam S. <sd...@gn...> - 2006-11-29 14:20:35

Kaz Kylheku wrote:
> In glibc, there is a dlvsym() function for retrieving versioned symbols.

how about other libc implementations? woe32?

> (def-call-out __xstat64
>   (:library "libc.so.6")
>   (:symver "GLIBC_2.2")
>   ... etc)

I would prefer (:library "libc.so.6" "GLIBC_2.2")
this way we will not have to check that :library is given for each 
:symver and also this will give a good symver default via default-library.

the only problem I see here is that we were thinking about switching to 
libltdl - does it support this versioning?

Sam.

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Bruno H. <br...@cl...> - 2006-11-29 15:15:39

Kaz Kylheku wrote:

> This is very useful if you are writing an application, targetting the
> ABI of a shared library, to ensure that old versions your program will
> use the right ABI even in new versions of that library (as long as
> they don't drop support for that ABI).
>
> Concrete example. RIght now, I'm using the CLISP FFI to call
> "__xstat64" in "libc.so.6". But "__xstat64" refers to the latest and
> greatest ABI for that symbol which a given installation of "libc.so.6"
> exports. That latest and greatest function could, for instance, think
> that the struct stat object I'm giving it is bigger than it really is,
> and write beyond its end.

Yup, this is a problem, because we have hardcoded in
modules/bindings/glibc/linux.lisp definitions like this:

(def-c-struct stat
  (st_dev dev_t)
  (__pad1 ushort)
  (st_ino ino_t)
  (st_mode mode_t)
  (st_nlink nlink_t)
  (st_uid uid_t)
  (st_gid gid_t)
  (st_rdev dev_t)
  (__pad2 ushort)
  (st_size off_t)
  (st_blksize ulong)
  (st_blocks ulong)
  (st_atime time_t)
  (__unused1 ulong)
  (st_mtime time_t)
  (__unused2 ulong)
  (st_ctime time_t)
  (__unused3 ulong)
  (__unused4 ulong)
  (__unused5 ulong)
)

That is, we have extracted the 'struct stat' of a particular glibc version
and therefore also need the __xstat function of that particular ABI.

But I disagree with the approach. The C library (more precisely its
header files and the symbol versioning in libc.so) shields the usual C
programmer from such problems. I think clisp should get on the same level,
and use the solution that the C library maintainers propose. Otherwise we
have to track closely the glibc versions and update the def-c-struct
definitions manually in the future.

Concretely this means one of the two following approaches:
 a) Generate the (def-c-struct stat ...) form at compile time,
    for example by having a C program like this:

       #include <sys/types.h>
       #include <sys/stat.h>
       #include <stdlib.h>
       #include <stdio.h>
       int main ()
       {
         printf("%d\n", offsetof (struct stat, st_dev));
         ...
         printf("%d\n", offsetof (struct stat, st_ctime));
         printf("%d\n", sizeof (((struct stat *) 0)->st_dev));
         ...
         printf("%d\n", sizeof (((struct stat *) 0)->st_ctime));
         return 0;
       }

    and a bit of Lisp code that infers where are the gaps between the
    fields, based on these offset and size numbers.
 b) Add a new primitive (def-c-partial-struct ...) to the FFI that causes
    this gap computation to occur in the FFI, based on C snippets emitted
    by the .lisp -> .c compiler. (I call it "partial" because the definition
    of the fields are not complete. The C definition of the struct can have
    additional fields that are not visible from Lisp.)

Bruno

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Sam S. <sd...@gn...> - 2006-11-29 15:33:46

Bruno Haible wrote:
> Concretely this means one of the two following approaches:
>  a) Generate the (def-c-struct stat ...) form at compile time,
>     for example by having a C program like this:
> 
>        #include <sys/types.h>
>        #include <sys/stat.h>
>        #include <stdlib.h>
>        #include <stdio.h>
>        int main ()
>        {
>          printf("%d\n", offsetof (struct stat, st_dev));
>          ...
>          printf("%d\n", offsetof (struct stat, st_ctime));
>          printf("%d\n", sizeof (((struct stat *) 0)->st_dev));
>          ...
>          printf("%d\n", sizeof (((struct stat *) 0)->st_ctime));
>          return 0;
>        }
> 
>     and a bit of Lisp code that infers where are the gaps between the
>     fields, based on these offset and size numbers.
>  b) Add a new primitive (def-c-partial-struct ...) to the FFI that causes
>     this gap computation to occur in the FFI, based on C snippets emitted
>     by the .lisp -> .c compiler. (I call it "partial" because the definition
>     of the fields are not complete. The C definition of the struct can have
>     additional fields that are not visible from Lisp.)

yes, this is what we need.
we already have def-c-const that eliminates the need to copy the actual 
values of #defined symbols into lisp.
it would be nice to access C structure automatically too.

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Kaz K. <kky...@gm...> - 2006-11-29 15:34:50

On 11/29/06, Sam Steingold <sd...@gn...> wrote:
> Kaz Kylheku wrote:
> > In glibc, there is a dlvsym() function for retrieving versioned symbols.
>
> how about other libc implementations? woe32?

It's an ELF feature. Windows DLLs don't have versioned symbols.

Microsoft's idea of symbol versioning is:

- write a broken or inadequate function, and then when we figure out
what it should really do, add an Ex to its name and keep the old one.

- Put a size (pardon me ``dwSize''), field into every structure whose
representation might change (by getting more fields at the end). The
library function checks the size field to determine which ABI is being
used. (This is actually not reasonable; objects should describe
themselves, like they do in Lisp, right? But it has limitations. The
structure can't exist in two versions that have the same size).

- The good old trick of leaving reserved fields in a structure, so
today's client application allocates it as big as tomorrow's
application.

- Miscellaneous other hacks, like the Winsock initialization with a
version field before you can do any socket work.

> > (def-call-out __xstat64
> >   (:library "libc.so.6")
> >   (:symver "GLIBC_2.2")
> >   ... etc)
>
> I would prefer (:library "libc.so.6" "GLIBC_2.2")

Or maybe have a property on it like (:library "libc.so.6" :symver "GLIBC_2.2").

> this way we will not have to check that :library is given for each
> :symver and also this will give a good symver default via default-library.
>
> the only problem I see here is that we were thinking about switching to
> libltdl - does it support this versioning?

[ ... google ...] Apparently a dlvsym wrapper is not in the API. But
that could be patched. The versioning is a feature of the underlying
ELF object format, plus the toolchain and the libdl.so API.

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Bruno H. <br...@cl...> - 2006-11-29 15:15:41

Sam Steingold asked:
> > In glibc, there is a dlvsym() function for retrieving versioned symbols.
> 
> how about other libc implementations? woe32?

Only glibc has dlvsym(). Woe32 uses #defines in the include files to address
this problem.

> this will give a good symver default via default-library.

How do you mean this? A library does not have a "default symbol version".
For example glibc-2.4 has symbols

  openat@@GLIBC_2.4
  open@@GLIBC_2.0

but no symbol

  open@@GLIBC_2.4

If you want the address of the open() function and you don't know its
specific version, you cannot use dlvsym(); you must use dlsym(handle,"open").

> the only problem I see here is that we were thinking about switching to 
> libltdl - does it support this versioning?

No. Probably because it's not as useful as Kaz thinks (see the other mail).

Bruno

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Sam S. <sd...@gn...> - 2006-11-29 15:28:59

Bruno Haible wrote:
> Sam Steingold asked:
>>> In glibc, there is a dlvsym() function for retrieving versioned symbols.
>> how about other libc implementations? woe32?
> 
> Only glibc has dlvsym(). Woe32 uses #defines in the include files to address
> this problem.
> 
>> this will give a good symver default via default-library.
> 
> How do you mean this? A library does not have a "default symbol version".
> For example glibc-2.4 has symbols
> 
>   openat@@GLIBC_2.4
>   open@@GLIBC_2.0
> 
> but no symbol
> 
>   open@@GLIBC_2.4
> 
> If you want the address of the open() function and you don't know its
> specific version, you cannot use dlvsym(); you must use dlsym(handle,"open").

sym = dlvsym(lib,"foo","ver");
if (sym == NULL) sym = dlsym(lib,"foo");

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Kaz K. <kky...@gm...> - 2006-11-29 17:54:04

On 11/29/06, Bruno Haible <br...@cl...> wrote:
> Sam Steingold asked:
> > > In glibc, there is a dlvsym() function for retrieving versioned symbols.
> >
> > how about other libc implementations? woe32?
>
> Only glibc has dlvsym(). Woe32 uses #defines in the include files to address
> this problem.

This problem is so nontrivial that Microsoft developed COM  (out of
DCE) to address it. In the base non-COM libraries, versioning is done
with size fields and other hacks, like I mentioned in my other e-mail.
 But primary way by which versioning problems are addressed in the MS
environment is by writing and using COM DLL's.  Interfaces are
versioned, and tied to a 128 bit GUID. The library location problem is
also solved by GUIDS.

To locate a library, a client passes a ``class ID'' (CLSID) GUID to an
API function. That API pulls out the path name from the registry and
loads the library. Next, the application asks for an interface, using
an ``interface ID''  (IID). If the object supports that interface, it
returns a pointer (which happens to point to data that is binary
compatible with the way the Microsoft compiler compiles C++ abstract
base classes).

> A library does not have a "default symbol version".
> For example glibc-2.4 has symbols
>
>  openat@@GLIBC_2.4
>  open@@GLIBC_2.0
>
> but no symbol
>
>  open@@GLIBC_2.4
>
> If you want the address of the open() function and you don't know its
> specific version, you cannot use dlvsym(); you must use dlsym(handle,"open").

Correct. But if you are writing FFI stuff, you know exactly what is
versioned and what isn't. You know that as of a particular version of
the library, openat was introduced as a versioned symbol. So you
target that symbol, using the version that you want, and declare that
your program needs that version of the library or later.

In the case of open, since it's not a versioned symbol, you just target open.

So what happens if glibc 2.6 comes out and needs to introduce a
versioned open? How will your program run?

What will happen is that they will make open into an alias:

  open -> old_open

So old clients will be redirected to old_open and continue to run.
There will be a new_open function, and some versioned symbol which
will alias to that function.

  open@@GLIBC_2.6 -> new_open

So why use versioned symbols and not just keep repeating the aliasing
trick? Because the old unversioned symbol can only alias to one
version. You have to pick a single function, like old_open, and map
open to that. And that's what you're stuck with. Versioning allows
open to refer to different things depending on who is asking.

Newly compiled programs would be linked to the new open, requesting
version GLIBC_2.6. The dynamic linker, ld.so, uses the equivalent of
dlvsym to grab "open" at version "GLIBC_2.6". For old clients, there
is no version request, so it grabs "open" via the equivalent of dlsym,
which resolves to the same address as old_open. When glibc 2.7 comes
out, there can be an open@@GLIBC_2.7, as well as an open@GLIBC_2.6.
The double @@ indicates that that's the default version that is
selected by newly linked programs.

> > the only problem I see here is that we were thinking about switching to
> > libltdl - does it support this versioning?
>
> No. Probably because it's not as useful as Kaz thinks (see the other mail).

Versioned symbols are the reason why, at all, you can upgrade a
GNU/Linux system without everything going haywire, like it did in the
a.out days.

At work here, I've been able to take a vendor's embedded distro
(targetting MIPS) running glibc 2.3.4, and run their binaries under a
glibc-2.5 that I compiled.

I literally copied the new glibc over top of the root filesystem, and
by golly, it booted.

That is thanks, in part, to careful ABI versioning.

We can only speculate about the reason why libtool's library doesn't
have dlvsym (yet!).  Maybe that project is more focused on different
problems. If a library with versioned symbols is linked using libtool,
all that versioning stuff still works. Libtool helps with issues
related to finding the library at link time and run time.

Maybe nobody is using dlopen() on versioned interfaces with libraries
that are used in conjunction with libtool.

Typically libaries that are designed for run-time use solve their
versioning problems in other ways, one big reason being portability.

If you're designing a platform-independent ``plugin'', then you can't
just say ``we will use ELF symbol versioning to deal with versioning
issues, and too bad those of you who are on non-ELF platforms''.

But in the case of glibc, we are run-time linking to a library which
is not normally used this way. That project has decided to deal with
versioning problems by using the versioning facilities in ELF. Which
is fine, since it gets to define the platform, basically. That means
that if you want to target that library, you have to play by its
rules.

glibc is not even linked using libtool, so it would be stupid to use
libtool's API to access it. If dlopen("libc.so.6") doesn't work, you
have a big problem that libtool won't help you with.

Re: [clisp-list] FFI feature proposal: versioned symbols.

From: Kaz K. <kky...@gm...> - 2006-11-29 17:36:58

On 11/29/06, Bruno Haible <br...@cl...> wrote:
> Yup, this is a problem, because we have hardcoded in
> modules/bindings/glibc/linux.lisp definitions like this:

[ snip ]

> That is, we have extracted the 'struct stat' of a particular glibc version
> and therefore also need the __xstat function of that particular ABI.
>
> But I disagree with the approach. The C library (more precisely its
> header files and the symbol versioning in libc.so) shields the usual C
> programmer from such problems.

Yes, and it does that using the same approach. Only, of course, you
can re-compile and re-link the C program, which extracts the
particular version at that time. In the FFI, you're doing it by hand.

> I think clisp should get on the same level,
> and use the solution that the C library maintainers propose.

That /is/ the solution that they propose: use versioned symbols to
target a stable ABI. The extraction of the interface and selection of
symbols is hidden in the toolchain, that's all.

> Otherwise we
> have to track closely the glibc versions and update the def-c-struct
> definitions manually in the future.

That's an issue within CLISP. Don't confuse that with what should or
should not be available to users of CLISP through the FFI interface.
If CLISP wants to solve its __xstat issue in some other way, that's
fine.

In my application, I don't mind maintaining DEF-C-STRUCT definitions by hand.

Since CLISP is a complex app which has to be compiled anyway, the
considerations are different.

But it's nice to be able to package a CLISP program which is nothing
but .lisp files that are fed into CLISP.

> Concretely this means one of the two following approaches:
>  a) Generate the (def-c-struct stat ...) form at compile time,
>     for example by having a C program like this:
>
>        #include <sys/types.h>
>        #include <sys/stat.h>
>        #include <stdlib.h>
>        #include <stdio.h>
>        int main ()
>        {
>          printf("%d\n", offsetof (struct stat, st_dev));
>          ...
>          printf("%d\n", offsetof (struct stat, st_ctime));
>          printf("%d\n", sizeof (((struct stat *) 0)->st_dev));
>          ...
>          printf("%d\n", sizeof (((struct stat *) 0)->st_ctime));
>          return 0;
>        }

The problem is that if this C program were to actually call stat(), it
would call an appropriately versioned symbol, and so you could use the
binary version of this program with a newer version of glibc, where it
would continue to produce the same output. Yet if it were to be
recompiled, its output  might change.

Heck, the program doesn't even tell you that you really need to call
some version of  __xstat64, which takes an extra parameter.

>     and a bit of Lisp code that infers where are the gaps between the
>     fields, based on these offset and size numbers.

Right. You also need the size of the entire structure.

>  b) Add a new primitive (def-c-partial-struct ...) to the FFI that causes
>     this gap computation to occur in the FFI, based on C snippets emitted
>     by the .lisp -> .c compiler. (I call it "partial" because the definition
>     of the fields are not complete. The C definition of the struct can have
>     additional fields that are not visible from Lisp.)

;; proposed syntax

(def-partial-c-struct x
  (:size <whatever>) ;; extracted using sizeof
  (y uint32 :offset 24) ;; y member at offset 16
  ...))

So now CLISP will allocate an amount of bytes equal to :size for the
structure, and only do conversions on the defined fields, ignoring the
gaps on the way :IN, and setting them to zero on the way :OUT.

If any field extends beyond the limit specified by :SIZE, the compiler
for the form can signal an error.

This partial struct idea is definitely very good and worth
implementing. But it does not solve the ABI versioning problem. It
solves the API extraction problem, by reducing manual labor.

You need both approaches. Use the partial struct hack to get only the
fields you are interested in, so you don't have to keep revising the
definition when new fields are added, which you are not even
interested in. And then use the versioned symbols to lock in on the
ABI which corresponds to the extracted API.