From: damien d. <dj...@cm...> - 2012-02-17 01:46:39
|
This is my first couple of days with mingw (and full-blown c). The focus below is upon a function getline, but several of the questions relate to improving my undestanding of the mingw environment. In attempting to move a Unix/Linux/.. application to Windows, the mingw linker cannot find the getline function. The first question is: It is all very well to know the default paths to libraries, but if you have no idea in which library a function resides, knowing paths is not particularly helpful until you identify the library. In fact, knowing the default libraries is possibly more useful. How does one determine the default libraries searched by the linker? The second question is: How does one identify a library which contains a particular function. The third question is: Is getline a fully functional part of mingw? All discussion on this list seemed to stop in 2007. The fourth question is: In the libraries under '...mingw/lib' there are three possible candidates for containing getline. I do not understand why there are three and the relevance of each (each of them respond to: objdump -t .../lib/libgettext... | grep getline). libgettextpo.a libgettextpo.dll.a libgettextlib.dll.a But so far, using any of these three (with the -l option), the linker does not find getline. Which raises the question - do any of these three contain a functioning getline? Regards |
From: djd <dj...@cm...> - 2012-02-17 08:38:03
|
Eli Zaretskii <eliz@gn...> - 2012-02-17 08:15 > Date: Fri, 17 Feb 2012 12:30:22 +1100 (EST) > From: "damien dunlop"<djd@...> > > The focus below is upon a function getline, but several of the > questions relate to improving my undestanding of the mingw > environment. Thanks to Eli Zaretski for a complete and comprehensive answer to the questions raised in the original post. I know it takes time just to do the typing, without having to think as well. Regards |
From: Eli Z. <el...@gn...> - 2012-02-17 08:15:24
|
> Date: Fri, 17 Feb 2012 12:30:22 +1100 (EST) > From: "damien dunlop" <dj...@cm...> > > The focus below is upon a function getline, but several of the > questions relate to improving my undestanding of the mingw > environment. > > In attempting to move a Unix/Linux/.. application to Windows, > the mingw linker cannot find the getline function. > > The first question is: > > It is all very well to know the default paths to libraries, but > if you have no idea in which library a function resides, knowing > paths is not particularly helpful until you identify the library. > In fact, knowing the default libraries is possibly more useful. > > How does one determine the default libraries searched by the linker? "gcc -dumpspecs" will show you something like this: *link: %{mwindows:--subsystem windows} %{mconsole:--subsystem console} %{shared: %{mdll: %eshared and mdll are not compatible}} %{shared: --shared} %{mdll:--dll} %{static:-Bstatic} %{!static:-Bdynamic} %{shared|mdll: -e _DllMainCRTStartup@12} *lib: %{pg:-lgmon} %{mwindows:-lgdi32 -lcomdlg32} -luser32 -lkernel32 -ladvapi32 -lshell32 *libgcc: %{mthreads:-lmingwthrd} -lmingw32 -lgcc -lmoldname -lmingwex -lmsvcrt This shows the libraries the linker will search, some of them conditioned by some non-default link command-line switch. For example, -lgdi32 and -lcomdlg32 will only be searched if you give the "-mwindows" switch to gcc. (The details of the syntax of the specs file are documented in the GCC manual.) If you don't want to second-guess the specs conditions, invoke the actual link command in your project, but add the -v switch, and GCC will display all the libraries it actually looks into. > How does one identify a library which contains a particular function. I can suggest several methods: . For functions that come with the default libraries searched by the MinGW linker, just search the header files. Something like this: fgrep -Rw getline /path/to/include --include="*.h" (This tells me that the only `getline' I have is in a C++ header file, which means MinGW doesn't have this function in C. Hardly surprising, since `getline' is a Linux-ism.) . For libraries not searched by the linker by default, I use `nm', like this: nm -A /path/to/lib/lib*.a | fgrep FUNCTION If one of the lines this produces shows the function name with the "T" symbol, like this: libregex.dll.a:d000015.o:00000000 T _regexec that means the function (`regexec' in this case, remove one underscore to know its C name) is in that library (libregex in this case). Then linking with -lregex will allow the linker to find the function `regexec'. ("T" stands for "text", i.e. code. It must be an upper case T.) . If all else fails, google for "getline for MinGW Windows" or some such. > Is getline a fully functional part of mingw? See above: I'm quite sure it doesn't. You can find one in gnulib, I think. Or write your own, it shouldn't be too hard. > In the libraries under '...mingw/lib' there are three possible > candidates for containing getline. I do not understand why there are > three and the relevance of each (each of them respond to: > objdump -t .../lib/libgettext... | grep getline). > > libgettextpo.a > libgettextpo.dll.a > libgettextlib.dll.a The first one is a static library of libgettextpo, the second one is an import library for the corresponding DLL, the third one is an import library for libgettextlib's DLL. > But so far, using any of these three (with the -l option), the > linker does not find getline. Try the nm command above, and you will see that getline is indeed not there. > Which raises the question - do any of these three contain > a functioning getline? No. |
From: Keith M. <kei...@us...> - 2012-02-17 21:04:52
|
On 17/02/12 08:15, Eli Zaretskii wrote: > . For functions that come with the default libraries searched by the > MinGW linker, just search the header files. Something like this: > > fgrep -Rw getline /path/to/include --include="*.h" > > (This tells me that the only `getline' I have is in a C++ header > file, which means MinGW doesn't have this function in C. Hardly > surprising, since `getline' is a Linux-ism.) It used to be so. However, as of IEEE 1003.1-2008, it has become a POSIX standard. >> Is getline a fully functional part of mingw? Not at present, but since it is now a POSIX standard, me might consider adding it to libmingwex.a. > See above: I'm quite sure it doesn't. You can find one in gnulib, I > think. I hesitate to suggest gnulib, simply because IMO most gnulib modules bring far to much ancillary (and often unnecessary) baggage along with them. That's an entirely personal opinion, of course, and you may disagree, in which case you may wish to consider it. One caveat: beware possible licensing issues; it is LGPL, I think. > Or write your own, it shouldn't be too hard. Fairly trivial, I think; tentative implementation at http://bit.ly/ADZ4PP may strengthen the case for eventual inclusion in (a future release of) libmingwex.a. -- Regards, Keith. |
From: Albrecht S. <vms...@go...> - 2012-02-18 12:16:21
|
On 17.02.2012 16:42, Keith Marshall wrote: >>> Is getline a fully functional part of mingw? > > Not at present, but since it is now a POSIX standard, me might consider > adding it to libmingwex.a. ... >> Or write your own, it shouldn't be too hard. > > Fairly trivial, I think; tentative implementation at > > http://bit.ly/ADZ4PP > > may strengthen the case for eventual inclusion in (a future release of) > libmingwex.a. I see a few issues with the proposed code. Posting for discussion... The stream argument shouldn't be checked for NULL (and return EINVAL), but leave this up to fgetc() later or return EBADF instead, which fgetc() would do. The standard says "For the conditions under which the getdelim() and getline() functions shall fail and may fail, refer to fgetc", and this would be one of them. The way how realloc() is called will lead to a memory leak if realloc() returns NULL. This will probably only happen if realloc has been called many times before, and the buffer is already very long. The previous linebuf pointer would be lost, and *len would have been incremented already. A cleaner solution would be to keep the previous pointer and the buffer size intact (so the buffer could at least be free()'d). The standard doesn't say anything about what should be done in this case, but I'd expect the buffer pointer (and size) to be valid as of the time when realloc() fails. Besides that there would be no way to retrieve the data that has already been read. IMHO it would be useful to zero-terminate the buffer also in case of failure. However, this would still not guarantee that the data can be correctly retrieved, since zero bytes within the data are allowed. Again, the standard doesn't say much about this case, maybe this should be regarded as undefined, but... Thoughts? -- Regards, Albrecht |
From: Keith M. <kei...@us...> - 2012-02-18 20:14:42
|
On 18/02/12 12:15, Albrecht Schlosser wrote: > I see a few issues with the proposed code. Posting for discussion... Thanks; this is precisely the sort of constructive dialogue I hoped to encourage. Comments attached directly to the ticket might have been preferred, but no big deal; we can easily add a cross reference to this mail thread. > The stream argument shouldn't be checked for NULL (and return EINVAL), You're right. I added the stream argument to the original checks on linebuf and len, as an afterthought. Blessed with hindsight, I realise that I should have checked it separately... > but leave this up to fgetc() later ...but not like this, I think... > or return EBADF instead, ...but rather this, explicitly. > which fgetc() would do. Unfortunately, I don't think it would. At least, running under wine, (and with native code compiled for Linux): int x = fgetc( (FILE *)(NULL) ); doesn't return at all -- it aborts with a segmentation fault. Sure, one might argue that, if it's okay for fgetc() to abort in this ugly manner, when handed a NULL stream reference,it should be good enough for getline() and getdelim() too, but to me, it just seems to smack of laziness and carelessness. > The standard says "For the conditions under which the getdelim() and > getline() functions shall fail and may fail, refer to fgetc", and > this would be one of them. It would, if fgetc() did actually fail gracefully, with an appropriate assignment to errno, in this circumstance. > The way how realloc() is called will lead to a memory leak if realloc() > returns NULL. Indeed it will; thanks for pointing this out. > This will probably only happen if realloc has been called > many times before, and the buffer is already very long. The previous > linebuf pointer would be lost, and *len would have been incremented > already. A cleaner solution would be to keep the previous pointer and > the buffer size intact (so the buffer could at least be free()'d). > > The standard doesn't say anything about what should be done in this > case, but I'd expect the buffer pointer (and size) to be valid as of > the time when realloc() fails. I've adjusted it, so the original pointer and length will be retained, while the function will immediately return -1; caller will still have access to the abandoned buffer, and its length, and will be expected to take responsibility for clean up. > Besides that there would be no way to retrieve the data that has > already been read. IMHO it would be useful to zero-terminate the > buffer also in case of failure. However, this would still not > guarantee that the data can be correctly retrieved, since zero bytes > within the data are allowed. Again, the standard doesn't say much > about this case, maybe this should be regarded as undefined, but... I didn't bother to NUL terminate it, but with the second patch the abandoned buffer remains accessible to the caller, which could supply the terminator if desired; I don't know how useful that might be. -- Regards, Keith. |
From: Albrecht S. <vms...@go...> - 2012-02-22 13:07:37
|
On 18.02.2012 21:14, Keith Marshall wrote: Sorry for the late reply... I've seen the modified code, and all looks good for me now. > I've adjusted it, so the original pointer and length will be retained, > while the function will immediately return -1; caller will still have > access to the abandoned buffer, and its length, and will be expected to > take responsibility for clean up. Yep, that's what I would've expected. >> Besides that there would be no way to retrieve the data that has >> already been read. IMHO it would be useful to zero-terminate the >> buffer also in case of failure. However, this would still not >> guarantee that the data can be correctly retrieved, since zero bytes >> within the data are allowed. Again, the standard doesn't say much >> about this case, maybe this should be regarded as undefined, but... > > I didn't bother to NUL terminate it, but with the second patch the > abandoned buffer remains accessible to the caller, which could supply > the terminator if desired; I don't know how useful that might be. My concern was (also) about the buffer contents. Since there is no count returned if the function fails, the user (caller) can never know how many valid bytes have been put into the buffer before the function failed. The only way to have access to *valid* data *only* would be if the read data bytes had been NUL terminated inside the function. Therefore that's what I proposed. Having said this, this would work only if the data can't contain valid zero bytes, but that's what the standard mentions anyway. So, thanks for fixing it so far. I don't know if NUL terminating the buffer on failure should be done, so please feel free to do what you think is appropriate. -- Regards, Albrecht |
From: Keith M. <kei...@us...> - 2012-02-24 20:54:04
|
On 22/02/12 13:06, Albrecht Schlosser wrote: >> I didn't bother to NUL terminate it, but with the second patch >> the abandoned buffer remains accessible to the caller, which >> could supply the terminator if desired; I don't know how useful >> that might be. > > My concern was (also) about the buffer contents. Since there is no > count returned if the function fails, the user (caller) can never > know how many valid bytes have been put into the buffer before the > function failed. The only way to have access to *valid* data *only* > would be if the read data bytes had been NUL terminated inside the > function. Therefore that's what I proposed. But, POSIX offers you no guarantee that there will be any valid data in the buffer. A conforming application must assume that there isn't, so why introduce a non-standard feature which might encourage any expectation otherwise? Certainly not portable, so not wise, IMO. > Having said this, this would work only if the data can't contain > valid zero bytes, but that's what the standard mentions anyway. Exactly so. Even if I did NUL terminate the abandoned buffer, there is no way to tell if the first NUL present is the one I just appended; POSIX explicitly allows the buffer to contain an unspecified number of NULs, at any arbitrary position *prior* to end-of-data. It's only when the end of the valid data is marked by the specified delimiter character followed by NUL, that an application can determine that it has retrieved all valid content, and that combination will not be present in the case under discussion, (in which realloc() has failed, and thus getline() or getdelim() will have returned (ssize_t)(-1) with errno == ENOMEM). > So, thanks for fixing it so far. I don't know if NUL terminating the > buffer on failure should be done, so ... I think it would be of little value, so ... > please feel free to do what you think is appropriate. ... I will leave well alone, for now. -- Regards, Keith. |
From: Albrecht S. <vms...@go...> - 2012-02-25 01:08:24
|
On 24.02.2012 14:09, Keith Marshall wrote: > POSIX offers you no guarantee that there will be any valid data > in the buffer. A conforming application must assume that there isn't, > so why introduce a non-standard feature which might encourage any > expectation otherwise? Certainly not portable, so not wise, IMO. Yep, I think you're right, I must have missed something when I read the standard before. I thought that it stated that a NUL byte was always appended, but it says: "The characters read, including any delimiter, shall be stored in the string pointed to by the lineptr argument, and a terminating NUL added when the delimiter or end of file is encountered." So it explicitly states that the terminating NUL will *only* be added when the delimiter or EOF is encountered. > ... I will leave well alone, for now. Right so, sorry for the additional noise... -- Regards, Albrecht |
From: Keith M. <kei...@us...> - 2012-02-25 22:18:37
Attachments:
getline.eoverflow.diff
|
On 22/02/12 13:06, Albrecht Schlosser wrote: > I've seen the modified code, and all looks good for me now. Okay, having addressed Albrecht's earlier concerns, I thought I would add one final tweak, (patch attached), to implement this (optional) POSIX error reporting feature: > These functions may fail if: > > [EOVERFLOW] > More than {SSIZE_MAX} characters were read without encountering the > delimiter character. Unfortunately, there's a problem with this: MinGW's errno.h has no definition for EOVERFLOW, and msvcrt.dll provides no description, (as may be retrieved by strerror(3)), which would be appropriate to its meaning, ("value too big for data type"), and thus would hint at an appropriate choice of value to assign. A brief google search reveals a number of suggestions, as adopted by other projects: - Define it as zero, ("no error"), so effectively ignoring it; definitely isn't appropriate, in this case. - Map it to E2BIG, ("argument list too long"), or to EFBIG, ("file too large"); (these suggestions are from the mingw-w64 ML, with the latter apparently preferred, yet again, neither seems entirely appropriate in the current context). - Snippets on the google search page itself hint that msvcr100.dll, or gnulib, (or both), may assign a value of 132 to EOVERFLOW; to comply with our licensing requirements, I am not prepared to inspect code associated with either of these sources, to confirm this. In the context of a MinGW implementation of getline(3), I think we have three options here: 1) Since POSIX doesn't actually require the EOVERFLOW failure condition, we could simply decline to implement it. 2) We could choose an arbitrary, and currently unused errno value for EOVERFLOW, (perhaps even adopting 132), and define it, recognising that msvcrt.dll's strerror(3) will report it as "unknown error", (so client applications may need to handle it explicitly). 3) We could have the implementation return an alternative errno code for this failure case, (ERANGE -- "result too large" -- perhaps), and *document* this non-standard usage in our getline(3) manpage. Any thoughts? -- Regards, Keith. |
From: Albrecht S. <vms...@go...> - 2012-02-28 14:28:33
|
On 25.02.2012 23:18, Keith Marshall wrote: > ... I thought I would > add one final tweak, (patch attached), to implement this (optional) > POSIX error reporting feature: > >> These functions may fail if: >> >> [EOVERFLOW] >> More than {SSIZE_MAX} characters were read without encountering the >> delimiter character. > > Unfortunately, there's a problem with this: MinGW's errno.h has no > definition for EOVERFLOW, and msvcrt.dll provides no description, (as > may be retrieved by strerror(3)), which would be appropriate to its > meaning, ("value too big for data type"), and thus would hint at an > appropriate choice of value to assign. ... > In the context of a MinGW implementation of getline(3), I think we have > three options here: > > 1) Since POSIX doesn't actually require the EOVERFLOW failure condition, > we could simply decline to implement it. > > 2) We could choose an arbitrary, and currently unused errno value for > EOVERFLOW, (perhaps even adopting 132), and define it, recognising > that msvcrt.dll's strerror(3) will report it as "unknown error", > (so client applications may need to handle it explicitly). > > 3) We could have the implementation return an alternative errno code > for this failure case, (ERANGE -- "result too large" -- perhaps), > and *document* this non-standard usage in our getline(3) manpage. > > Any thoughts? My favourite would be 2), with one additional note that once MS would assign a meaning to errno 132, we'd get a /wrong/ strerror output. But *if* this happened, we could still use another value for EOVERFLOW. WRT 1) What does it mean, *not* to implement it? What would happen, if there are more then SSIZE_MAX bytes in the input data w/o the given delimiter? Would we simply wrap to 0 (bad, but not fatal), or would we encounter a crash, because we'd request a maybe negative value when calling realloc(), or what would happen? Sorry, I could try to check this myself, but my lack of time forbids this for now. I think that we'd need to check carefully the [un]signedness of values and variables in this context... WRT 3) I tend to read man pages from different sources, so I could be missing this non-standard implementation detail, and maybe others would as well. Just my 2 ct. -- Regards, Albrecht |
From: Eli Z. <el...@gn...> - 2012-02-17 21:14:27
|
> Date: Fri, 17 Feb 2012 15:42:45 +0000 > From: Keith Marshall <kei...@us...> > > > See above: I'm quite sure it doesn't. You can find one in gnulib, I > > think. > > I hesitate to suggest gnulib, simply because IMO most gnulib modules > bring far to much ancillary (and often unnecessary) baggage along with > them. As it happens, I very much agree. But I looked at its code before suggesting it, and it didn't seem to be the case here, although I didn't invest too much time in finding that out. In any case, the OP will have to make up his own mind on this. |
From: Keith M. <kei...@us...> - 2012-02-17 21:59:26
|
On 17/02/12 21:13, Eli Zaretskii wrote: >> I hesitate to suggest gnulib, simply because IMO most gnulib modules >> bring far to much ancillary (and often unnecessary) baggage along with >> them. > > As it happens, I very much agree. But I looked at its code before > suggesting it, and it didn't seem to be the case here, although I > didn't invest too much time in finding that out. Fair enough. I looked only at the POSIX specification... http://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html ...before crafting my own tentative implementation; this allows me to offer it as a MIT licensed contribution to libmingwex.a, without any possibility of contamination from GPL/LGPL or other incompatibly licensed code. Like you, I didn't invest an inordinate amount of time in the exercise. -- Regards, Keith. |
From: Keith M. <kei...@us...> - 2012-02-17 21:04:53
|
On 17/02/12 08:15, Eli Zaretskii wrote: >> In the libraries under '...mingw/lib' there are three possible >> candidates for containing getline. I do not understand why there are >> three and the relevance of each (each of them respond to: >> objdump -t .../lib/libgettext... | grep getline). >> >> libgettextpo.a >> libgettextpo.dll.a >> libgettextlib.dll.a > > ... > >> But so far, using any of these three (with the -l option), the >> linker does not find getline. > > Try the nm command above, and you will see that getline is indeed not > there. > >> Which raises the question - do any of these three contain >> a functioning getline? > > No. Nor should it be expected that any would; getline() is a stdio input routine, while gettext provides the GNU national language translation API. The two are conceptually unrelated. -- Regards, Keith. |
From: djd <dj...@cm...> - 2012-02-18 01:23:38
|
On 18/02/2012 8:04 AM, Keith Marshall wrote: >>> But so far, using any of these three (with the -l option), the >>> linker does not find getline. >> Try the nm command above, and you will see that getline is indeed not >> there. >> >>> Which raises the question - do any of these three contain >>> a functioning getline? >> No. > Nor should it be expected that any would; getline() is a stdio input > routine, while gettext provides the GNU national language translation > API. The two are conceptually unrelated. > Where can one find a description of the intent of the various libraries, especially the mingw libraries so that conceptual relationships become apparent In the example of libgettextpo.a libgettextpo.dll.a libgettextlib.dll.a The names imply something to do with text, and getline has something to do with text - so there was a little logic behind the example. Regards |
From: Eli Z. <el...@gn...> - 2012-02-18 08:20:39
|
> Date: Sat, 18 Feb 2012 12:23:23 +1100 > From: djd <dj...@cm...> > > Where can one find a description of the intent of the various libraries, > especially the mingw > libraries so that conceptual relationships become apparent > > In the example of > > libgettextpo.a > libgettextpo.dll.a > libgettextlib.dll.a > > The names imply something to do with text, and getline has something to do > with text - so there was a little logic behind the example. Didn't you yourself install these packages? They are not part of Windows, so someone, presumably you, should have installed them at some point. That someone surely saw some kind of description or README file on the site (presumably, the MinGW site) from which the packages were downloaded, and decided to download them _because_ the description said something about the contents of each package. So how come you are asking about packages you yourself installed? Assuming that you just forgot all that information, I think looking into the `contrib' subdirectory of where you installed the various MinGW packages will reveal many README files that answer your questions in this regard. (To give you a specific answer about those specific files: Libraries matching the wildcard "*gettext*" come from the GNU gettext package. If you don't know what GNU gettext is about, google it.) |
From: djd <dj...@cm...> - 2012-02-18 13:08:43
|
On 18/02/2012 7:18 PM, Eli Zaretskii wrote: >> Date: Sat, 18 Feb 2012 12:23:23 +1100 >> From: djd<dj...@cm...> >> >> Where can one find a description of the intent of the various libraries, >> especially the mingw >> libraries so that conceptual relationships become apparent >> >> In the example of >> >> libgettextpo.a >> libgettextpo.dll.a >> libgettextlib.dll.a >> >> The names imply something to do with text, and getline has something to do >> with text - so there was a little logic behind the example. > Didn't you yourself install these packages? I installed the standard Source Forge mingw distribution. In addition to getline, regcomp seemed unavailable. I did find a mingw-regex and mingw-gnurx on Source Forge and installed both of those, solving the regcomp problem. I possibly did look for what may be packages with getline but remember not finding any. I have also looked through the directory I use for mingw downloads. There is nothing related to libgettext* which tends to confirm I did not download or install those. My conclusion is, they came with the original mingw distribution or Alzheimer's has got to me. I have since downloaded a getline.c and compiled it OK. Regards |