#1344 Doctools nroff/groff output not supported by tcltk-man2html

closed-fixed
doctools (48)
9
2013-01-29
2013-01-18
Twylite
No

This is related to Tcl bug 3600058 "Doctools nroff/groff output not supported by tcltk-man2html" (https://sourceforge.net/tracker/index.php?func=detail&aid=3600058&group_id=10894&atid=110894). Most of the cases listed here are invalid input or invalid nroff. Valid nroff not supported by the Tcl tool is being addressed in the Tcl bug.

Required action: most of the issues below have proposed fixes, and those fixes have been implemented (or are being implemented) in a patch. Mostly I need to get a YES/NO on each proposed fix (and for NO a suggested approach).

Issue #1: Rendering of Copyright is inappropriate for files in the public domain

c_get_copyright in doctools/mpformats/_common.tcl currently renders public domain as "Copyright (c) Public domain". Suggested fix is to handle "public domain" as a special case that doesn't get a "Copyright (c)" prefix.

Issue #2: Some Copyright statements include the word "copyright"

Invalid input problem; examples are amazon-s3/S3.man and amazon-s3/xsxp.man. Option A is to strip leading "copyring" and/or "(c)" from the copyright parameter. Option B is to fix the .man pages, but I'm hesitant to do so as (i) this probably should require the copyright holder's approval; and (ii) if any authors maintain their own upstream repositories the change should be committed there. Proposed fix is A (as a workaround), and to request authors to approve or make a change to the source.

Issue #3: Doctools can generate nroff output where non-directive lines start with a period (invalid nroff)

See for example snit/snitfaq.man, where [example] blocks may contain widget names like '.text'. Suggested fix is that the nroff formatter (mpformats/_nroff.tcl) should quote leading "." on non-directive lines.

Issue #4: Doctools can still generate nroff output where non-directive lines start with a period

Parts of the nroff output are explicitly handled by mpformats/fmt.nroff, in particular the output of keywords. If a keyword starts with a period (e.g. .dtx in docstrip/docstrip.man) it will be sorted to the front of the keyword list and result in invalid nroff output. Suggested fix is to explicitly check for a quote a leading period in the keyword line (in fmt.nroff).

Issue #5: Bad output from ldap/ldap.man

This turns out to be a bad input problem - the .man file has "[call ::ldap::searchInit]" in the middle of a paragraph, instead of "[cmd ::ldap::searchInit]".

Issue #6: Bad output from pop3/pop3.man

This turns out to be a bad input problem - the .man file has a trailing period on the line "[opt_def -retr-mode ...].".

Issue #7: Spaces in man page names

struct/graph1.man, struct/matrix1.man, and struct/struct_tree1.man all have [manpage_begin] statements in which the name contains a space: {struct::graph v1} {struct::matrix v1} and {struct::tree v1} respectively. Proposed fix is to use underscore (_) instead of space, giving struct::graph_v1, struct::matrix_v1, and struct::tree_v1.

Issue #8: Inappropriate copyright statements

A number of files contain copyright statements in formats that may not have appropriate legal effect.

No date:
expander: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) William H. Duquette, http://www.wjduquette.com/expand
smtpd: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) Pat Thoyts <patthoyts@users.sourceforge.net>
Invalid date:
namespacex: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) 200? Neil Madden (http://wiki.tcl.tk/12790)
namespacex: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) 200? Various (http://wiki.tcl.tk/1489)
soundex: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) ????, Algorithm: Donald E. Knuth
Weird date:
OpenTcp: CATEGORY: merge-copyrights: unrecognised date format: Copyright (c) 1996-7 Sun Microsystems, Inc.

I'm really not sure how to handle these. Best option may be to request the authors to fix their copyright statements. The effect is that for now these authors won't (necessarily) be reflected in the summary of copyright holders that is added to the table of contents.

Issue #9: Multiple definitions for coroutine, doctools::idx, interp, tie, try,

Tcllib packages define man pages with the same name as Tcl core commands coroutine, interp and try. Proposed fix is to add a "tcllib" prefix to Tclllib pages, and fix up references.

Within Tcllib there are multiple man pages providing definitions of doctools::idx and tie. Proposed fix is to add a version suffix (as for graph/matrix/tree) and fix up references.

Issue #10: Unvestigated

Once the above issues are resolved there remain three more types of error when converting .man --(doctools)--> .n --(tcltk-man2html)--> .html:

A. struct_list: make-manpage-section: ignoring .TP after .TP
B. treeql: EXAMPLES: output-directive: unrecognized format directive: ... TODO ...
C. calculus.n: reference error: Out of place end-quote: {<PRE>Dy'' + ky = 0
x = 0: y = 1
x = L: y = 0</PRE>}

The causes have not yet been investigated.

Discussion

  • Twylite
    Twylite
    2013-01-18

    Proposed fixes pushed to branch bug-3601370-td.

    In combination with the Tcl tcltk-man2html fixes it looks like the only remaining issues are invalid copyright statements:

    expander: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) William H. Duquette, http://www.wjduquette.com/expand
    namespacex: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) 200? Neil Madden (http://wiki.tcl.tk/12790)
    namespacex: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) 200? Various (http://wiki.tcl.tk/1489)
    smtpd: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) Pat Thoyts <patthoyts@users.sourceforge.net>
    soundex: COPYRIGHT: merge-copyrights: unrecognised format: Copyright (c) ????, Algorithm: Donald E. Knuth

    I think it would be a good idea for the doctools checker to include a check that the manpage name does not contain spaces, and to warn if the copyright statement has an inappropriate format.

     
    • priority: 5 --> 9
     
  • New revision on the branch: [6606f5686a].

    Added check to manpage_begin, reject spaces in
    title. Message catalogs extended with new warning 'mptitle' for
    spaces in the manpage title. The french catalog contains the english
    text, and needs a translation.

    For the copyright check I would need a nice regexp, or two.

    Even without that check, and the missing french translation of my change I see no trouble to merge this into the

    tcllib-1-15-rc

    branch and work in that.

     
    • status: open --> closed-fixed
     
  • Merged into the release branch now. Bumped version to 1.4.14. Closing. Reopen or create new bug for the missing french translation of the new warning message, and/or the regexes to check for issues with copyright lines.