Menu

#120 Rudimentary commit message linting

open
nobody
None
2020-01-15
2019-09-26
Anonymous
No

Originally created by: dscho

To make it easier for new contributors, we should probably have our own "linter" for commit messages, i.e. validate that at least some of the conventions are met:

  • the first line should not exceed 76 characters
  • the first line should be followed by an empty line
  • if the first line starts with a prefix (e.g. tests:), it should continue in lower-case
  • the last line should be a Signed-off-by: line (our DCO check does this already, but it managed to baffle one long-time contributor already because its error message was not clear enough: the Signed-off-by: line was indented)
  • the body of the commit message should not be empty.
  • the body of the commit message should not consist of a hyperlink, without any other explanation.

There is also the convention that the commit message should talk in the imperative tone (as opposed to the past tense, as many other projects prefer). I am not sure that there is an easy way to test for that.

If any of these conventions are not met, GitGitGadget should talk about that, in a way that helps the contributors (e.g. suggest amending the commit, or using an interactive rebase if it is not the latest commit, force-push to update the PR, etc).

Discussion

  • Anonymous

    Anonymous - 2019-10-15

    Originally posted by: webstech

    There are some github apps for commit linting but they do not appear to match these requirements. Not sure if they can be extended to include these requirements but a config file should be able to control what is needed for a generic linter. Just wondering if this issue is better addressed with a github action responding to a pull request or a push to a pull request versus a specific PR comment. There is also the question of whether it would not be a good commit hook for git in general.

     
  • Anonymous

    Anonymous - 2019-10-15

    Originally posted by: dscho

    Just wondering if this issue is better addressed with a github action responding to a pull request or a push to a pull request versus a specific PR comment.

    I would be in favor, but that GitHub Action needs to be in master, which is controlled by the Git maintainer, not by GitGitGadget.

    There is also the question of whether it would not be a good commit hook for git in general.

    Sure. But there is an entire ecosystem of npm packages out there that Unix shell scripts typically cannot access (read: the common Git hooks), but GitGitGadget can.

     
  • Anonymous

    Anonymous - 2020-01-09

    Originally posted by: webstech

    I have an initial POC/WIP at https://github.com/webstech/gitgitgadget/commit/888c4f53429f6f5801c89f6e7319e9f24a380713. The message linting is done in a separate class with methods to perform one or more checks based on complexity. Just want to make sure you agree on the basic design before proceeding further. As always, suggestions are welcome.

     
  • Anonymous

    Anonymous - 2020-01-09

    Originally posted by: dscho

    I have an initial POC/WIP at webstech@888c4f5.

    Very nice! I left a couple of comments there.

    The message linting is done in a separate class with methods to perform one or more checks based on complexity.

    That makes a total lot of sense!

    Just want to make sure you agree on the basic design before proceeding further. As always, suggestions are welcome.

    I love it.

    At some stage, I think we will want to make the tests a bit more DRY and maybe split up ci-helper.tests.ts to allow for more parallel testing (or does Jest already allow parallelizing tests within a single script?), but that can wait for later, it is not pressing yet, I would think.

     
  • Anonymous

    Anonymous - 2020-01-10

    Originally posted by: webstech

    Very nice! I left a couple of comments there.

    Thanks for the input.

    At some stage, I think we will want to make the tests a bit more DRY and maybe split up ci-helper.tests.ts to allow for more parallel testing (or does Jest already allow parallelizing tests within a single script?), but that can wait for later, it is not pressing yet, I would think.

    I agree on splitting up ci-helper.test.ts and reducing the DRY of the pieces. The q&d of getting tests in place makes it easy to end up with bloated test code. I am thinking about this (and possible defaults mentioned in your comments on the WIP). This is post this change.

    I have pushed an update to the WIP. If you have no major issues (or in spite of, to make them more public), I will open a PR after doing some code/doc cleanup.

    Just trying to get the linting framework agreed to and in place. It would help if you can prioritize the initial list above (you are more familiar with the concerns/issues from the git list).

    After that we can all compete to implement new linting asap. \<insert laughing emoji here/>

    One linting request is for imperative - would that mean a negative dictionary for possessive nouns? Sounds like this could be a separate project (if one is not already out there).

     
  • Anonymous

    Anonymous - 2020-01-10

    Originally posted by: dscho

    I have pushed an update to the WIP. If you have no major issues (or in spite of, to make them more public), I will open a PR after doing some code/doc cleanup.

    Could you open the PR right away? That would make it much easier for me to keep track of changes. It does not have to be perfect, of course, if you want to, you can even mark it as a Draft PR.

    After that we can all compete to implement new linting asap. <insert laughing="" emoji="" here=""></insert>

    🤣

    One linting request is for imperative - would that mean a negative dictionary for possessive nouns? Sounds like this could be a separate project (if one is not already out there).

    I think that people already tried to work on this, see e.g. https://github.com/spencermountain/compromise/issues/499. It also looks as if https://www.npmjs.com/package/node-nlp should have something to help with that task, but I did not really find anything.

     
  • Anonymous

    Anonymous - 2020-01-12

    Originally posted by: webstech

    I guess https://github.com/clns/node-commit-msg would be a much more pragmatic route to take ;-)

    Java required but no source for the parser? Didn't really look at what the parser is supposed to do.

     
  • Anonymous

    Anonymous - 2020-01-12

    Originally posted by: dscho

    I guess https://github.com/clns/node-commit-msg would be a much more pragmatic route to take ;-)

    Java required but no source for the parser?

    Oy, sorry, I failed to see that it is not implemented in Javascript :-(

    OTOH I wonder whether parser.jar is simply a pre-compiled version of the Stanford Parser, as mentioned in the npm package description:

    • Detection of non-imperative verbs in subject, eg. "Fixes bug" or "Fixed bug" instead of "Fix bug" (error | configurable) - using the Stanford Parser with a custom trained model for commit messages

    I found a couple other npm modules to validate commit messages, but they seem to follow the "Conventional Commits" guidelines (which disagree with Git's own requirements for commit messages).

    In any case, this is not relevant for the initial commit linting ;-)

     
  • Anonymous

    Anonymous - 2020-01-15

    Originally posted by: dscho

    • Detection of non-imperative verbs in subject, eg. "Fixes bug" or "Fixed bug" instead of "Fix bug" (error | configurable) - using the Stanford Parser with a custom trained model for commit messages

    I found a couple other npm modules to validate commit messages, but they seem to follow the "Conventional Commits" guidelines (which disagree with Git's own requirements for commit messages).

    I experimented with wink-pos-tagger:

    var posTagger = require( 'wink-pos-tagger' );
    
    // Create an instance of the pos tagger.
    var tagger = posTagger();
    
    // Tag the sentence using the tag sentence api.
    console.log(tagger.tagSentence( 'env: have the thing go away. Or else' ));
    

    outputs

    [ { value: 'env',
        tag: 'word',
        normal: 'env',
        pos: 'NNP',
        lemma: 'env' },
      { value: ':', tag: 'punctuation', normal: ':', pos: ':' },
      { value: 'have',
        tag: 'word',
        normal: 'have',
        pos: 'VBP',
        lemma: 'have' },
      { value: 'the', tag: 'word', normal: 'the', pos: 'DT' },
      { value: 'thing',
        tag: 'word',
        normal: 'thing',
        pos: 'NN',
        lemma: 'thing' },
      { value: 'go', tag: 'word', normal: 'go', pos: 'NN', lemma: 'go' },
      { value: 'away', tag: 'word', normal: 'away', pos: 'RB' },
      { value: '.', tag: 'punctuation', normal: '.', pos: '.' },
      { value: 'Or', tag: 'word', normal: 'or', pos: 'CC' },
      { value: 'else', tag: 'word', normal: 'else', pos: 'RB' } ]
    

    Note the tag VBP, which according to https://github.com/trinker/tagger#interpreting-tags means "verb, present tense, not 3rd person singular".

    Replacing the word have with has turns this tag into VBZ which means "verb, present tense, 3rd person singular".

    It might be possible to come up with a set of simple rules what VB* tags we want to allow in the first line of the commit message. My idea was to experiment with this a bit by using Git's own commit history and just generating some statistics next, like, which tags occur how often, then see if e.g. "VBD, VBG and VBN are not okay" would result in any false positives (I am more concerned with suppressing false positives than with preventing false negatives).

     

Log in to post a comment.