From: Keith M. <kei...@us...> - 2006-03-23 01:55:25
|
I've posted a man-1.6-beta-1 source snapshot on the project files page: https://sourceforge.net/project/showfiles.php?group_id=2435&package_id=82724&release_id=403917 I've successfully built it as a set of native MinGW applications, for English locales only, configuring in MSYS with: ./configure --prefix=/mingw --sysconfdir=/etc (prefix will default to /usr/local, if not overridden; sysconfdir is where the man.conf file goes; I think I've made the configure script smart enough to convert POSIX style paths to native Win32, but I do have ac_default_prefix=`cd /usr/local; pwd -W` in my config.site, so I may have overlooked prefix). I've also been able to successfully deploy man pages in other languages, e.g. by adding `--enable-languages=en,fr,de' to the configure options, but I have not yet figured out how enable the message catalogues for other languages; `--enable-nls' is overriden on my MinGW installation, because the nl_types.h and langinfo.h headers are missing, (and presumably also the catopen() and catgets() functions). If anyone knows of a usable implementation of these, I'd appreciate a pointer -- Google turned up nothing useful for any sensible query I could think of. Note that, in spite of Yongwei's earlier suggestion that -Tascii might be a more suitable choice of output encoding than latin1 for most users, I haven't found this to be the case -- I use code page 850 and export LESSCHARSET=latin1, and find -Tlatin one gives much better results, using `man groff_char' as a yardstick. Therefore, I've left the default setting as it is in the standard UNIX package; if you think ASCII will work better for you, then add `--with-nroff="nroff -Tascii -mandoc"' and `--with-neqn="neqn -Tascii"' to the configure options. (Do note that you need to have groff installed *before* you attempt to configure man, so you should have the groff_char man page, which includes a table showing how the entire available character set will be displayed). If anyone would like to give this a try, and report back on any problems, I appreciate the feedback -- particularly any tips on getting the NLS stuff to work. (BTW, I do realise that the README files need a good overhaul -- the only reference to Win32 seems to relate to an ancient Cygwin release. Perhaps a README.MinGW is called for). Regards, Keith. |
From: Chris S. <ir0...@gm...> - 2006-03-23 05:27:53
|
Hey, > I've successfully built it as a set of native MinGW applications, for > English locales only, configuring in MSYS with: > > ./configure --prefix=3D/mingw --sysconfdir=3D/etc I managed to cleanly compile the sources. I created a 'build' subdirectory and executed: ../configure --prefix=3D/mingw --sysconfdir=3D/etc However, after doing a make / make install, I attempted to do a 'man man' and got 'No manual entry for man'. I also tried 'man -M /mingw/man man' with the same result. I tried adding: > ac_default_prefix=3D`cd /usr/local; pwd -W` to my config.site to see if it would make a difference, but I got the same result. Is there something I missed? Chris -- Chris Sutcliffe http://ir0nh34d.blogspot.com http://emergedesktop.org |
From: Keith M. <kei...@us...> - 2006-03-23 23:10:36
|
Hi Chris, Thanks for the feedback... On Thursday 23 March 2006 5:27 am, Chris Sutcliffe wrote: > > I've successfully built it as a set of native MinGW applications, for > > English locales only, configuring in MSYS with: > > > > ./configure --prefix=/mingw --sysconfdir=/etc > > I managed to cleanly compile the sources. I created a 'build' > subdirectory and executed: > > ../configure --prefix=/mingw --sysconfdir=/etc This is much the same as the way I build it myself; I too prefer to keep source and build directories separate. > However, after doing a make / make install, I attempted to do a 'man > man' and got 'No manual entry for man'. I also tried 'man -M > /mingw/man man' with the same result. > > I tried adding: > > ac_default_prefix=`cd /usr/local; pwd -W` > > to my config.site to see if it would make a difference, but I got the > same result. I'd expect that; the ac_default_prefix only takes effect when you omit the --prefix option entirely. Normally, that's what I do, and I then see the native Win32 equivalent of of /usr/local, (which is d:/msys/1.0/local in my case), propagating into my configuration. Sorry for any confusion; I didn't mean to imply that this definition in config.site would in any way affect the behaviour of configure, when --prefix *is* specified. > Is there something I missed? Nothing obvious. What do the commands $ which man $ man -w $ man -w man $ man -d man tell you? Does $ man /mingw/man/man1/man.1 display the man page? Does the content of /etc/man.conf look sane? Are the configured paths what you would expect? Likewise, for the generated header file paths.h, in the src subdirectory of your build tree? Regards, Keith. |
From: Wu Y. <ad...@sh...> - 2006-03-23 16:54:46
|
Keith Marshall wrote: > Note that, in spite of Yongwei's earlier suggestion that -Tascii might be > a more suitable choice of output encoding than latin1 for most users, I > haven't found this to be the case -- I use code page 850 and export > LESSCHARSET=latin1, and find -Tlatin one gives much better results, using > `man groff_char' as a yardstick. Therefore, I've left the default > setting as it is in the standard UNIX package; if you think ASCII will > work better for you, then add `--with-nroff="nroff -Tascii -mandoc"' and > `--with-neqn="neqn -Tascii"' to the configure options. (Do note that you > need to have groff installed *before* you attempt to configure man, so > you should have the groff_char man page, which includes a table showing > how the entire available character set will be displayed). How about people using CP 932/936/950/..., or even people using directly CP 1252 (it is the case of rxvt; does it work with you `man')? If the program cannot decide the output according to the environment `smartly', I would still suggest using plain ASCII. Best regards, Yongwei |
From: Keith M. <kei...@us...> - 2006-03-24 00:17:12
|
On Thursday 23 March 2006 4:51 pm, Wu Yongwei wrote: > Keith Marshall wrote: > > Note that, in spite of Yongwei's earlier suggestion that -Tascii > > might be a more suitable choice of output encoding than latin1 for > > most users, I haven't found this to be the case -- I use code page > > 850 and export LESSCHARSET=latin1, and find -Tlatin one gives much > > better results, using `man groff_char' as a yardstick. Therefore, > > I've left the default setting as it is in the standard UNIX package; > > if you think ASCII will work better for you, then add > > `--with-nroff="nroff -Tascii -mandoc"' and `--with-neqn="neqn > > -Tascii"' to the configure options. (Do note that you need to have > > groff installed *before* you attempt to configure man, so you should > > have the groff_char man page, which includes a table showing how the > > entire available character set will be displayed). > > How about people using CP 932/936/950/..., or even people using > directly CP 1252 (it is the case of rxvt; does it work with you `man')? I've no idea. I assume that, in such cases, -Tascii *will* be the better choice; in some cases, -Tutf8 may also be worth considering. (OTOH, the -Tnippon, which appears in the JNROFF entry in man.conf, should be avoided; it is not supported by any standard groff distribution, such as groff-1.19.2-mingwPORT, and, AIUI only works with Debian Linux pre-1.18 groff packages; I believe even recent Debian distributions no longer support it, and it is likely that man too will forget about it, in the near future). > If the program cannot decide the output according to the environment > `smartly', I would still suggest using plain ASCII. Yongwei, I didn't mean to imply any criticism of your recommendation to adopt -Tascii; there are certainly many people for whom it will be the right choice. But equally, there are also many for whom -Tlatin1 may be a better choice; (I would suggest that this is likely to be true for the majority of the population of Western Europe, the entire American continent, Africa and Australasia, who *can* use cp850 or similar; even if their default would be cp437, cp850 is certainly a viable alternative). My intent with the above statement was to point out that I have opted to retain the default configuration which Federico uses in the official man distribution, and to indicate the configure options which may be used to override that default, for those users who know in advance that -Tascii will be the better choice for them. While this MinGW snapshot for man is currently a fork of Federico's official distribution, we two have engaged in some dialogue prior to my posting of it. I believe that we both hope to one day merge the MinGW/Win32 support back into the official distribution; to facilitate that, I want to keep the same defaults as he has established For those who wish to experiment, the --with-nroff and --with-eqn options provide a mechanism for specifying an alternative initial configuration. It is worth mentioning that this configuration can be changed later, without rebuilding, simply by editing the installed man.conf file -- look for the NROFF and EQN configuration records, and substitute any alternative options. Another option worthy of comment here may be the --with-pager option to configure, or the PAGER record in man.conf. By default, this is set to `--with-pager="less -is"', (with a fully qualified path name in man.conf). Using `--with-pager="less -irs"' as an alternative seems to produce the same effect, with cp850, as exporting `LESSCHARSET=latin1', obviating the need to set the environment variable. And furthermore, for the experimenters, I would suggest as you did before, that `man groff_char' provides a good yardstick for assessment of alternative configuration options. Regards, Keith. |
From: Keith M. <kei...@us...> - 2006-03-25 00:36:53
|
On Thursday 23 March 2006 5:27 am, Chris Sutcliffe wrote: > I managed to cleanly compile the sources. I created a 'build' > subdirectory and executed: > > ../configure --prefix=/mingw --sysconfdir=/etc > > However, after doing a make / make install, I attempted to do a 'man > man' and got 'No manual entry for man'. I also tried 'man -M > /mingw/man man' with the same result. After some investigation, and a brief dialogue with Chris, we've identified that this failure occurred because he overlooked this note, in my initial posting: > ... Do note that you need to have groff installed *before* you attempt > to configure man ... The reason that this produced a man which appeared to work, but not be able to find an installed man page is explained as follows:-- - Man is capable of operating in two distinct modes: it's most common mode reads *unformatted* man page sources, stored as troff source files, and formats them "just in time" for display; the alternative mode requires preformated man pages, which are simply displayed "as is". - To format the raw troff sources, either for display or for the purpose of creating a preformatted page file, man invokes `nroff'; this isn't provided with the man package; it is provided by installing groff. - When searching for pages to display, man "walks" the MANPATH, much like the shell "walks" the PATH to find executables. For each MANDIR component of MANPATH, it first looks for a preformatted page file, in the directory `$MANDIR/catN', (where `N' represents the MANSECT number); only if the preformatted page is *not* found, does it then look for the unformatted page source, in `$MANDIR/manN'. - By configuring man, with no groff/nroff command installed, Chris generated a configuration in which the search for unformatted man pages was disabled, but the search for preformatted pages was still available. - The man package provides only unformatted man pages, which in Chris' configuration, were installed into /mingw/man/man[158]; the corresponding /mingw/man/cat[158] directories, where the preformatted pages would be stored, were neither created, nor populated. - When Chris invoked the `man man' command, it looked only for /mingw/man/cat*/man.*; because nothing matched, and because the search for unformatted man pages was disabled, causing it to ignore the /mingw/man/man1/man.1 file which *did* exist, it gave up and reported "No manual entry for man". BTW, while helping Chris to identify his problem, I've noticed that I have omitted the canonicalisation of `prefix', from the configure script. This results in the `apropos' and `whatis' paths being built into the man executable in MSYS-POSIX format. This will still work, provided the MSYS shell can be found in the PATH, when man is invoked; to make it portable to other (UNIXy) shells, configuring with ./configure --prefix=`cd /mingw; pwd -W` --sysconfdir=/etc is required; (the `sysconfdir' *is* canonicalised, as are all other paths resolved during configuration). Regards, Keith. |
From: Chris S. <ir0...@gm...> - 2006-03-26 01:46:32
|
Hey All, > BTW, while helping Chris to identify his problem, I've noticed that I > have omitted the canonicalisation of `prefix', from the configure script. > This results in the `apropos' and `whatis' paths being built into the man > executable in MSYS-POSIX format. This will still work, provided the MSYS > shell can be found in the PATH, when man is invoked; to make it portable > to other (UNIXy) shells, configuring with > > ./configure --prefix=3D`cd /mingw; pwd -W` --sysconfdir=3D/etc > > is required; (the `sysconfdir' *is* canonicalised, as are all other paths > resolved during configuration). It works like a charm now. With Keith's help man is now up and running perfectly on my system. Thanx Keith! Chris -- Chris Sutcliffe http://ir0nh34d.blogspot.com http://emergedesktop.org |
From: Keith M. <kei...@us...> - 2006-03-25 01:07:30
|
On Thursday 23 March 2006 4:51 pm, Wu Yongwei wrote: [re: choice between ASCII and Latin1 encoding for man output] > How about people using CP 932/936/950/..., or even people using > directly CP 1252 (it is the case of rxvt; does it work with you `man')? > > If the program cannot decide the output according to the environment > `smartly', I would still suggest using plain ASCII. In my ~/.profile, I have: cmd.exe //c chcp 850 export LESSCHARSET=latin1 With this set up, `setlocale( LC_CTYPE, NULL )', (in a C program -- it should really be LC_MESSAGES, except that Windoze doesn't support it), returns English_United Kingdom.1252 and Latin1 is, IMO, a *much* better choice than ASCII. If I change the configuration, such that I use `chcp 1252', the groff_char output remains indistinguishable from the chcp 850 case; Latin1 is still the better choice. However, with `chcp 437', ASCII definitely becomes the better choice, but it's inferior to cp850/1252 with Latin1. I haven't tried the effect of other code page selections. Its very much a case of "horses for courses" here; each user must exercise his/her own judgement on what works best with the particular code page in use. I don't see any benefit in changing the package default, when the user has complete freedom to choose an alternative in any case, and at any time. Regards, Keith. |
From: Aaron W. L. <aar...@aa...> - 2006-03-25 01:37:28
|
Would it be possible for man to have a setting to select the groff output device based on the Win32 system locale? |
From: Keith M. <kei...@us...> - 2006-03-25 20:22:24
|
On Saturday 25 March 2006 1:37 am, Aaron W. LaFramboise wrote: > Would it be possible for man to have a setting to select the groff > output device based on the Win32 system locale? Of course it should be possible; in software, virtually anything is achievable, where there is a will to implement it :-) I'm not sure, however, if it is *desirable*. At present, the selection is made by an appropriate setting in a configuration file. The user is free to change this at any time, to suit his/her own preference. If we attempt to heuristically set it, based on our view of the locale, then we deny the user the freedom to make the choice; we *impose* a choice with which he/she may nt be happy, (and to quote Bruno Haible, writing on the groff list recently, any heuristic algorithm is effectively "broken by design"). IMO, the present implementation is perfectly satisfactory. The configure script defaults to Latin1, but provides options to allow the user to select an alternative. This controls the *initial* configuration, which is written to the man.conf file. If the user finds that an alternative choice may be better, it is a simple matter to edit man.conf, and to experiment until a more suitable alternative is identified. Regards, Keith. |
From: Aaron W. L. <aar...@aa...> - 2006-03-26 02:38:48
|
Keith Marshall wrote: > IMO, the present implementation is perfectly satisfactory. The configure > script defaults to Latin1, but provides options to allow the user to > select an alternative. You're right, of course. However, I don't think the benefit of having configuration that works out of the box, without prompting or adjustment, can be overstated. Imagine the beginner's pain if every utility in a Linux distro or Cygwin install asked for just one configuration detail before it would operate properly. Also, I have a considerable amount of empathy for international users who continually have to adjust each and every bit of software they use to get it to operate property. I was thinking that, through the Windows API, we should know--with relative confidence--exactly what encoding is being used, and if we know all output devices groff supports, we should be able to do the right thing for every case that groff (or Windows) supports. Rather than subverting the usual -Txxx setting, I was thinking of an additional option that could be set /instead of/ that to cause automatic deduction, which would still allow adjustment the 'old way.' Anyway, it was just a thought. Obviously the onus is on me to write the code. Is there any particular reason that /man/ does not seem to attempt to use the value of the LANG environment variable to determine encoding, despite the fact it uses it to determine which language pages to use? |
From: Wu Y. <ad...@sh...> - 2006-03-26 12:42:12
|
Keith Marshall wrote: > On Thursday 23 March 2006 4:51 pm, Wu Yongwei wrote: > [re: choice between ASCII and Latin1 encoding for man output] > >>How about people using CP 932/936/950/..., or even people using >>directly CP 1252 (it is the case of rxvt; does it work with you `man')? >> >>If the program cannot decide the output according to the environment >>`smartly', I would still suggest using plain ASCII. > > > In my ~/.profile, I have: > > cmd.exe //c chcp 850 > export LESSCHARSET=latin1 > > With this set up, `setlocale( LC_CTYPE, NULL )', (in a C program -- it > should really be LC_MESSAGES, except that Windoze doesn't support it), > returns > > English_United Kingdom.1252 The wording is a little misleading. `setlocale( LC_CTYPE, NULL )' will always return the same thing regardless of your CHCP result (it is only changed when you change your Regional Setting in the Control Panel). > > and Latin1 is, IMO, a *much* better choice than ASCII. If I change the > configuration, such that I use `chcp 1252', the groff_char output remains > indistinguishable from the chcp 850 case; Latin1 is still the better > choice. However, with `chcp 437', ASCII definitely becomes the better > choice, but it's inferior to cp850/1252 with Latin1. I haven't tried the > effect of other code page selections. I am stille a little confused: what magic groff has done here? I mean, the `ø' as in `Rømer' is 0x9B in CP850, but it is 0xF8 in Latin1.--How can it be correct simultaneously? (Can't I have a CP850 console and a Latin1 rxvt run man simultaneously?) [snipped] Best regards, Yongwei |
From: Keith M. <kei...@us...> - 2006-03-26 13:12:17
|
On Sunday 26 March 2006 1:38 pm, Wu Yongwei wrote: > > In my ~/.profile, I have: > > > > cmd.exe //c chcp 850 > > export LESSCHARSET=latin1 > > > > With this set up, `setlocale( LC_CTYPE, NULL )', (in a C program -- > > it should really be LC_MESSAGES, except that Windoze doesn't support > > it), returns > > > > English_United Kingdom.1252 > > The wording is a little misleading. `setlocale( LC_CTYPE, NULL )' will > always return the same thing regardless of your CHCP result (it is only > changed when you change your Regional Setting in the Control Panel). This seems odd, and I certainly wasn't aware of it; the implication is that MS `setlocale()' is broken, and rather worthless. Nonetheless, even after an explicit `cmd.exe //c chcp 1252', `-Tlatin1' still yields identical output from `man groff_char', as when cp850 was invoked, and this is superior to the output from `-Tascii'. OTOH, after an explicit `cmd.exe //c chcp 437', the `-Tascii' output is preferable, although of inferior quality to the previous settings. > > and Latin1 is, IMO, a *much* better choice than ASCII. If I change > > the configuration, such that I use `chcp 1252', the groff_char output > > remains indistinguishable from the chcp 850 case; Latin1 is still the > > better choice. However, with `chcp 437', ASCII definitely becomes > > the better choice, but it's inferior to cp850/1252 with Latin1. I > > haven't tried the effect of other code page selections. > > I am stille a little confused: what magic groff has done here? I mean, > the `ø' as in `Rømer' is 0x9B in CP850, but it is 0xF8 in Latin1.--How > can it be correct simultaneously? I don't know, but I suspect that the magic rather occurs in `less', than in groff. But, having zero experience of internationalisation issues, I don't consider myself qualified to give a definitive answer. > (Can't I have a CP850 console and a Latin1 rxvt run man simultaneously?) Again, I simply don't know, and since I *always* run MSYS shell in a Win32 console, I haven't evaluated the behaviour of `man' under rxvt. If I can find a few spare minutes at work -- I run GNU/Linux exclusively on my own box -- I'll take a look. Regards, Keith. |
From: Wu Y. <ad...@sh...> - 2006-03-26 14:04:20
|
Keith Marshall wrote: >>I am stille a little confused: what magic groff has done here? I mean, >>the `ø' as in `Rømer' is 0x9B in CP850, but it is 0xF8 in Latin1.--How >>can it be correct simultaneously? > > I don't know, but I suspect that the magic rather occurs in `less', than > in groff. But, having zero experience of internationalisation issues, I > don't consider myself qualified to give a definitive answer. Puzzled. I checked and found out that LESSCHARSET is only for checking normal, control, and binary data. No conversion is mentioned anywhere in the man page of less. And I verified that with LESSCHARSET set to latin1, my version of less cannot display a Latin1 file correctly on a CP850 console. Can you do it with your less? If not: can you give any suggestion of normal tools (like less/ls/cp) whose man pages include non-ASCII characters? I want to have a check. Best regards, Yongwei |
From: Wu Y. <ad...@sh...> - 2006-03-26 14:12:44
|
Keith Marshall wrote: > On Sunday 26 March 2006 1:38 pm, Wu Yongwei wrote: > >>> In my ~/.profile, I have: >>> >>> cmd.exe //c chcp 850 >>> export LESSCHARSET=latin1 >>> >>> With this set up, `setlocale( LC_CTYPE, NULL )', (in a C program -- >>>it should really be LC_MESSAGES, except that Windoze doesn't support >>>it), returns >>> >>> English_United Kingdom.1252 >> >>The wording is a little misleading. `setlocale( LC_CTYPE, NULL )' will >>always return the same thing regardless of your CHCP result (it is only >>changed when you change your Regional Setting in the Control Panel). > > This seems odd, and I certainly wasn't aware of it; the implication is > that MS `setlocale()' is broken, and rather worthless. In the Microsoft world, Console is something very special, and needs some special APIs to support. AFAIK, setlocale is only useful for GUI applications, and for CUI applications you need APIs as shown here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/console_functions.asp For our current discussion the most relevant is GetConsoleOutputCP. Best regards, Yongwei |
From: Wu Y. <ad...@sh...> - 2006-03-26 13:14:47
|
Keith Marshall wrote: > On Saturday 25 March 2006 1:37 am, Aaron W. LaFramboise wrote: > >>Would it be possible for man to have a setting to select the groff >>output device based on the Win32 system locale? > > > Of course it should be possible; in software, virtually anything is > achievable, where there is a will to implement it :-) > > I'm not sure, however, if it is *desirable*. At present, the selection > is made by an appropriate setting in a configuration file. The user is > free to change this at any time, to suit his/her own preference. If we How many users of `man' have ever changed their configuration file? How should they know they should change which bit in it? Do we really require users to spend their time to `man man'? > attempt to heuristically set it, based on our view of the locale, then we > deny the user the freedom to make the choice; we *impose* a choice with > which he/she may nt be happy, (and to quote Bruno Haible, writing on the > groff list recently, any heuristic algorithm is effectively "broken by > design"). As recently I have been studying the ease-of-use issues (required by my job), I cannot help casting a really doubtful eye at the value of this `freedom'. Who really needs it? People simply need something working, but not study the guts of their tools. I vaguely remember somebody said something to this effect: the abundance of options shows the inability of the software creator to understand its target users.--Sure, this is against the spirit of Unix; but Unix is not meant to be, nor is, easy to use; though as a programmer I appreciate it very much. Oh, did I forget to mention that studies show that experienced developers are significantly different from most other people in that they are much better at systematic thinking, which about 75% of the whole population are not good at (so grannies never learn to fix computer problems on their PCs)? On the technical side, the suggestion Aaron made was really not heuristic. It is purely deterministic and not broken at all. And he did not intend to force it: he said `have a setting', and I would like to add that it should be the default setting. > > IMO, the present implementation is perfectly satisfactory. The configure > script defaults to Latin1, but provides options to allow the user to > select an alternative. This controls the *initial* configuration, which > is written to the man.conf file. If the user finds that an alternative > choice may be better, it is a simple matter to edit man.conf, and to > experiment until a more suitable alternative is identified. Who needs `man' most? I would suppose the new users. Do you expect them to build their tools (egg or chicken first)? How many people have the default console code page set to 850? Do you think it is good to make a choice that does not work out of the box for most users, most of whom lack the ability/interest to fix a tool they simply want to use to view some documentation? I would make the opposite choice as yours: make ASCII (the safer one) as default. People that have higher needs (like you) generally are able to fix the problem themselves. Sorry that I am more and more thinking like a commercial software vendor. Best regards, Yongwei |
From: Earnie B. <ea...@us...> - 2006-03-26 21:04:28
|
Quoting Wu Yongwei <ad...@sh...>: > > As recently I have been studying the ease-of-use issues (required by > my job), I cannot help casting a really doubtful eye at the value of > this `freedom'. Who really needs it? A good case study for this would be the configurability of Cygwin vs MSYS. One of the hacks I did with MSYS was to choose a set of standards to base the way MSYS should behave for everyone. Take a look at the Cygwin users list at the requests for help in configuring Cygwin to ones liking vs those same questions on the MinGW and MSYS lists. Often configuration options happen because one set of people like the look and feel of a product this way instead of that way. With open source developers are encouraged to provide patches which can be configured on or off with a set standard being the default. As for which character set to use for man; I'm inclined to agree with Keith that the chosen standard ("old way") that is common for any configuration of man should be the standard we present for this version of man. Earnie Boyd http://shop.siebunlimited.com |