From: Joseph K. <jk...@us...> - 2012-12-26 05:43:27
|
Hello All, The proposed `elftc_symbol_table*()` and `elftc_string_table*()` functions are convenience APIs for managing ELF symbol tables [1] and string tables [2], respectively. These functions would be useful in implementing as(1), and could also help reduce code duplication in our source tree. I have added reference documentation for these proposed functions to the source tree in changesets [2819] and [2821]. Your review & comments of these APIs would be appreciated. [1]: http://sourceforge.net/apps/trac/elftoolchain/browser/trunk/libelftc/elftc_symbol_table_create.3 [2]: http://sourceforge.net/apps/trac/elftoolchain/browser/trunk/libelftc/elftc_string_table_create.3 Regards, Joseph Koshy <jk...@us...> |
From: Kai W. <kai...@gm...> - 2012-12-27 02:30:13
|
On Wed, Dec 26, 2012 at 6:42 AM, Joseph Koshy <jk...@us...> wrote: > Hello All, > > The proposed `elftc_symbol_table*()` and `elftc_string_table*()` > functions are convenience APIs for managing ELF symbol tables [1] and > string tables [2], respectively. > > These functions would be useful in implementing as(1), and could also > help reduce code duplication in our source tree. > > I have added reference documentation for these proposed functions to > the source tree in changesets [2819] and [2821]. > > Your review & comments of these APIs would be appreciated. Hi, I just briefly read the manual pages. The APIs look really good in general. Besides as(1), I think it can be used in elfcopy(1), ld(1) and other tools. Some questions: * What is the nested symbol table used for? Could you give an example? * Is it possible to customize the elftc_symbol_table_to_image? For example, ld(1) sometimes searchs symbols by name and symbol version. To achieve that, ld(1) stores symbol name is the form "symbol@version", "atoi@FBSD_1.0" for instance. When elftc_symbol_table_to_image is called, ld(1) will want the string "atoi" in the string table, not "atoi@FBSD_1.0". * How does elftc_symbol_table_to_image know where to find symbol size, value, shndx information? Suppose I define a struct: struct _MySymbol { Elftc_Symbol sym_base; uint64_t sym_size; uint64_t sym_value; uint64_t sym_shndx; } How does elftc_symbol_table_to_image know it should use sym_size, sym_value etc? Or it just return an array of GElf_Sym and let the application to fill in the value? * Is it possible to provide a sort API? e.g. elftc_symbol_table_sort(Elftc_Symbol_Table *table, int (*cmp)(Elftc_Symbol *s1, Elftc_Symbol *s2)) * Is it possible to provide a "replace" API? e.g. elftc_symbol_table_replace(Elftc_Symbol_Table *table, Elftc_Symbol *s1, Elftc_Symbol *s2) This API can be used when, for example, symbol resolving in ld(1). When application knows symbol s1 exists in the symbol table, it wants to replace s1 with s2 and expects that s2 will have the same position in the symbol table as s1. * What kind of internal data structures are you going to use to implement symbol table and string table? Will probably have more questions later when I read the string table API manual page. Thanks again for working on these symbol table APIs! We really need these since long ago. Thanks, Kai |
From: Joseph K. <jk...@us...> - 2012-12-27 07:39:35
|
> Some questions: > > * What is the nested symbol table used for? Could you give an example? as(1) has a macro facility (.altmacro) which supports 'local' identifiers. When the macro definition is used, such local names expand to unique IDs, different for each use of the macro. These local names go out of scope when the macro ends, and can shadow symbols already seen in the assembler source. > * Is it possible to customize the elftc_symbol_table_to_image? > > For example, ld(1) sometimes searchs symbols by name and symbol version. > To achieve that, ld(1) stores symbol name is the form "symbol@version", > "atoi@FBSD_1.0" for instance. When elftc_symbol_table_to_image is called, > ld(1) will want the string "atoi" in the string table, not "atoi@FBSD_1.0". The iterate() API is intended for these kind of transformations. For example, the application could use the iterate() API to generate a copy of the symbol table with the names transformed, and then use elftc_symbol_table_to_image() on the transformed table: int myfn(Elftc_Symbol *entry, void *cookie) { Elftc_Elf_Symbol *elfsym = (Elftc_Elf_Symbol *) entry; Elftc_Symbol_Table *newtable = (Elftc_Symbol_Table *) cookie; ... insert a transformed symbol into 'newtable' ; return (ELFTC_ITERATE_CONTINUE); } newtable = elftc_symbol_table_create(...); status = elftc_symbol_iterate(oldtable, myfn, newtable); if (status != ELFTC_ITERATE_SUCCESS) error("..."); else elf_section_image = elftc_symbol_table_to_image(newtable, &nentries, &strtab); Alternatively, we could add a 'transformfn()' parameter to the elftc_symbol_table_to_image() API. If non-null, the transformfn() would be called for each entry, prior to the entry's (ELF) in-memory image being created. This function could effect an in-place transformation of the symbol. The revised prototype would then look like: Gelf_Sym * elftc_symbol_table_to_image(Elftc_Symbol_Table *table, size_t *nentries, int (*transformfn)(Elftc_Elf_Symbol *sym, void *cookie), void *cookie, Elftc_String_Table **strtab); However, this way seems less general than the first. > * How does elftc_symbol_table_to_image know where to find symbol size, value, > shndx information? > > Suppose I define a struct: > > struct _MySymbol { > Elftc_Symbol sym_base; > uint64_t sym_size; > uint64_t sym_value; > uint64_t sym_shndx; > } > > How does elftc_symbol_table_to_image know it should use sym_size, > sym_value etc? Or it just return an array of GElf_Sym and let > the application to fill in the value? The _to_image() function only works with subtypes of "Elftc_Elf_Symbol". This type is: typedef struct _Elftc_Elf_Symbol { Elftc_Symbol sym_base; Gelf_Sym sym_elf; .. other fields and flags, e.g., controlling sort order that haven't been finalized yet. .. } Elftc_Elf_Symbol; I hadn't provided the definition of "Elftc_Elf_Symbol" in the manual page, apologies. > * Is it possible to provide a sort API? e.g. > elftc_symbol_table_sort(Elftc_Symbol_Table *table, > int (*cmp)(Elftc_Symbol *s1, Elftc_Symbol *s2)) Good point. For tables with entries that have some kind of ordering associated with them, we could also have a 'step()' API: Elftc_Symbol * elftc_symbol_table_step(Elftc_Symbol_Table *table, Elftc_Symbol *sym, int stepdirection); where 'stepdirection' would be one of ELFTC_STEP_NEXT | ELFTC_STEP_PREVIOUS. There is another API that could be useful during disassembly: Elftc_Elf_Symbol * elftc_symbol_table_lookup_value(Elftc_Symbol_Table *table, uint64_t value, off_t &offset, int searchflags); This would return the symbol 'closest' to the specified value. 'searchflags' could be ELFTC_SEARCH_FORWARD | ELFTC_SEARCH_BACKWARD. > * Is it possible to provide a "replace" API? > e.g. elftc_symbol_table_replace(Elftc_Symbol_Table *table, > Elftc_Symbol *s1, Elftc_Symbol *s2) > This API can be used when, for example, symbol resolving in ld(1). > When application knows symbol s1 exists in the symbol table, it wants > to replace s1 with s2 and expects that s2 will have the same > position in the symbol table as s1. Could you clarify what the difference between a replace API and a 'delete(s1)', 'insert(s2)' sequence would be? Do you need 's2' to be associated with the same (ELF) symbol table index as 's1'? > * What kind of internal data structures are you going to use to > implement symbol table and string table? I was thinking of some kind of hash table for name lookups for basic symbol tables that have no concept of a sort order. For "Elftc_Elf_Symbol" entries, we would need additional fields for dealing with ordering of symbols. Suggestions welcome. Regards, Joseph Koshy |
From: Kai W. <kai...@gm...> - 2012-12-29 07:34:47
|
On Thu, Dec 27, 2012 at 01:09:18PM +0530, Joseph Koshy wrote: > > Some questions: > > > > * What is the nested symbol table used for? Could you give an example? > > as(1) has a macro facility (.altmacro) which supports 'local' identifiers. > When the macro definition is used, such local names expand to unique > IDs, different for each use of the macro. These local names go out of > scope when the macro ends, and can shadow symbols already seen in the > assembler source. I see. > > * Is it possible to customize the elftc_symbol_table_to_image? > > > > For example, ld(1) sometimes searchs symbols by name and symbol version. > > To achieve that, ld(1) stores symbol name is the form "symbol@version", > > "atoi@FBSD_1.0" for instance. When elftc_symbol_table_to_image is called, > > ld(1) will want the string "atoi" in the string table, not "atoi@FBSD_1.0". > > The iterate() API is intended for these kind of transformations. > > For example, the application could use the iterate() API to generate > a copy of the symbol table with the names transformed, and then use > elftc_symbol_table_to_image() on the transformed table: I see. My only concern is that for ld(1) the symbol table can be really huge, create another copy of symbol table might be expensive. > Alternatively, we could add a 'transformfn()' parameter to the > elftc_symbol_table_to_image() API. If non-null, the transformfn() would > be called for each entry, prior to the entry's (ELF) in-memory image > being created. This function could effect an in-place transformation > of the symbol. > > The revised prototype would then look like: > > Gelf_Sym * > elftc_symbol_table_to_image(Elftc_Symbol_Table *table, size_t *nentries, > int (*transformfn)(Elftc_Elf_Symbol *sym, void *cookie), > void *cookie, Elftc_String_Table **strtab); > > However, this way seems less general than the first. How about, we keep the current elftc_symbol_table_to_image as is, and add another more generic transform API: void * elftc_symbol_table_to_image_generic(Elftc_Symbol_Table *table, size_t *nentries, int (*transformfn)(Elftc_Elf_Symbol *sym, void *cookie, void *entry), size_t entsize, void *cookie, Elftc_String_Table **strtab); Rationale: * API returns untyped buffer. * `entsize' spcifies the entry size of the returned buffer. * `entry' points to the entry buffer where application provided `transformfn' should fill in. (The API advance the `entry' pointer internally, by adding `entsize') The idea is that elftc_symbol_table_to_image_generic can return an array of Elf32_Sym/Elf64_Sym instead of GElf_Sym. The returned array can be assigned to the `d_buf' field of an Elf_Data descriptor directly, thus avoid one more memory copy. (comparing to using gelf_update_sym() on returned GElf_Sym array) Also, ld(1) can use this API to transform symbol with customized name, as I mentioned in my preivous mail. > > * Is it possible to provide a sort API? e.g. > > elftc_symbol_table_sort(Elftc_Symbol_Table *table, > > int (*cmp)(Elftc_Symbol *s1, Elftc_Symbol *s2)) > > Good point. For tables with entries that have some kind of ordering > associated with them, we could also have a 'step()' API: > > Elftc_Symbol * > elftc_symbol_table_step(Elftc_Symbol_Table *table, > Elftc_Symbol *sym, int stepdirection); > > where 'stepdirection' would be one of ELFTC_STEP_NEXT | ELFTC_STEP_PREVIOUS. See my comments below. > There is another API that could be useful during disassembly: > > Elftc_Elf_Symbol * > elftc_symbol_table_lookup_value(Elftc_Symbol_Table *table, > uint64_t value, off_t &offset, int searchflags); > > This would return the symbol 'closest' to the specified value. > 'searchflags' could be ELFTC_SEARCH_FORWARD | ELFTC_SEARCH_BACKWARD. Good idea. This can be used by tools like addr2line(1) and ld(1) as well. > > * Is it possible to provide a "replace" API? > > e.g. elftc_symbol_table_replace(Elftc_Symbol_Table *table, > > Elftc_Symbol *s1, Elftc_Symbol *s2) > > > This API can be used when, for example, symbol resolving in ld(1). > > When application knows symbol s1 exists in the symbol table, it wants > > to replace s1 with s2 and expects that s2 will have the same > > position in the symbol table as s1. > > Could you clarify what the difference between a replace API and a > 'delete(s1)', 'insert(s2)' sequence would be? Do you need 's2' to be > associated with the same (ELF) symbol table index as 's1'? The `replace' API depends on the implementation, see below. And you're right, I want 's2' to have the same symbol table index as 's1'. > > * What kind of internal data structures are you going to use to > > implement symbol table and string table? > > I was thinking of some kind of hash table for name lookups for basic > symbol tables that have no concept of a sort order. For "Elftc_Elf_Symbol" > entries, we would need additional fields for dealing with ordering > of symbols. Suggestions welcome. I was thinking the symbol table is both a doublely linked list and a hash table. The table is iterated in insertion order, or sort order if the `sort' API has been called. The `replace' API make sure symbol `s2' has the same position in the linked list as symbol `s1', thus the same symbol index. This is useful for ld(1). If it's implemented as above, the `step' API can be used in sorted and unsorted tables? Thanks, Kai |
From: Kai W. <kai...@gm...> - 2012-12-29 07:41:57
|
On Sat, Dec 29, 2012 at 08:34:26AM +0100, Kai Wang wrote: > I see. My only concern is that for ld(1) the symbol table can be > really huge, create another copy of symbol table might be expensive. > > ... > > How about, we keep the current elftc_symbol_table_to_image as is, and add another > more generic transform API: > > void * > elftc_symbol_table_to_image_generic(Elftc_Symbol_Table *table, size_t *nentries, > int (*transformfn)(Elftc_Elf_Symbol *sym, void *cookie, void *entry), > size_t entsize, void *cookie, Elftc_String_Table **strtab); > > Rationale: > * API returns untyped buffer. > * `entsize' spcifies the entry size of the returned buffer. > * `entry' points to the entry buffer where application provided `transformfn' > should fill in. (The API advance the `entry' pointer internally, by adding > `entsize') > > The idea is that elftc_symbol_table_to_image_generic can return an array of > Elf32_Sym/Elf64_Sym instead of GElf_Sym. The returned array can be assigned to > the `d_buf' field of an Elf_Data descriptor directly, thus avoid one more memory > copy. (comparing to using gelf_update_sym() on returned GElf_Sym array) > > Also, ld(1) can use this API to transform symbol with customized name, > as I mentioned in my preivous mail. Please ignore above comment. On a second thought, you're right, ld(1) can use the `iterate' API to create the Elf32_Sym/Elf64_Sym table directly, thus all of the above is not needed. Thanks, Kai |
From: Joseph K. <jk...@us...> - 2012-12-29 09:45:53
|
Thank you for your review comments. kw> I see. My only concern is that for ld(1) the symbol table can be kw> really huge, create another copy of symbol table might be kw> expensive. You are right. The proposed *_to_image() API is is flawed, because it will lead to unnecessary memory allocation and data copying when used with libelf. kw> On a second thought, you're right, ld(1) can use the `iterate' API kw> to create the Elf32_Sym/Elf64_Sym table directly, thus all of the kw> above is not needed. Agreed. We would also need a `elftc_symbol_table_count()` API to help the application size its buffers. If we are implementing a _step() API, the application can step through the symbol table, filling its buffer and discarding the symbols it does not want. (If we are supporting *_step() for unsorted tables too, then the *_iterate() API would be redundant). kw> The `replace' API depends on the implementation, see below. And you're kw> right, I want 's2' to have the same symbol table index as 's1'. Ok. kw> I was thinking the symbol table is both a doublely linked list and kw> a hash table. The table is iterated in insertion order, or sort kw> order if the `sort' API has been called. The `replace' API make kw> sure symbol `s2' has the same position in the linked list as kw> symbol `s1', thus the same symbol index. This is useful for ld(1). kw> If it's implemented as above, the `step' API can be used in sorted kw> and unsorted tables? Yes, that would be a good way to implementing it, given the desired behaviors for the _step() and _replace() APIs. I will update the manual page to reflect our current understanding of the API shortly. Regards, Joseph Koshy |