gitgitgadget / Tickets / #120 Rudimentary commit message linting

Anonymous - 2019-10-15

Originally posted by: webstech

There are some github apps for commit linting but they do not appear to match these requirements. Not sure if they can be extended to include these requirements but a config file should be able to control what is needed for a generic linter. Just wondering if this issue is better addressed with a github action responding to a pull request or a push to a pull request versus a specific PR comment. There is also the question of whether it would not be a good commit hook for git in general.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2019-10-15

Originally posted by: dscho

Just wondering if this issue is better addressed with a github action responding to a pull request or a push to a pull request versus a specific PR comment.

I would be in favor, but that GitHub Action needs to be in master, which is controlled by the Git maintainer, not by GitGitGadget.

There is also the question of whether it would not be a good commit hook for git in general.

Sure. But there is an entire ecosystem of npm packages out there that Unix shell scripts typically cannot access (read: the common Git hooks), but GitGitGadget can.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-09

Originally posted by: webstech

I have an initial POC/WIP at https://github.com/webstech/gitgitgadget/commit/888c4f53429f6f5801c89f6e7319e9f24a380713. The message linting is done in a separate class with methods to perform one or more checks based on complexity. Just want to make sure you agree on the basic design before proceeding further. As always, suggestions are welcome.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-09

Originally posted by: dscho

I have an initial POC/WIP at webstech@888c4f5.

Very nice! I left a couple of comments there.

The message linting is done in a separate class with methods to perform one or more checks based on complexity.

That makes a total lot of sense!

Just want to make sure you agree on the basic design before proceeding further. As always, suggestions are welcome.

I love it.

At some stage, I think we will want to make the tests a bit more DRY and maybe split up ci-helper.tests.ts to allow for more parallel testing (or does Jest already allow parallelizing tests within a single script?), but that can wait for later, it is not pressing yet, I would think.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-10

Originally posted by: webstech

Very nice! I left a couple of comments there.

Thanks for the input.

At some stage, I think we will want to make the tests a bit more DRY and maybe split up ci-helper.tests.ts to allow for more parallel testing (or does Jest already allow parallelizing tests within a single script?), but that can wait for later, it is not pressing yet, I would think.

I agree on splitting up ci-helper.test.ts and reducing the DRY of the pieces. The q&d of getting tests in place makes it easy to end up with bloated test code. I am thinking about this (and possible defaults mentioned in your comments on the WIP). This is post this change.

I have pushed an update to the WIP. If you have no major issues (or in spite of, to make them more public), I will open a PR after doing some code/doc cleanup.

Just trying to get the linting framework agreed to and in place. It would help if you can prioritize the initial list above (you are more familiar with the concerns/issues from the git list).

After that we can all compete to implement new linting asap. \<insert laughing emoji here/>

One linting request is for imperative - would that mean a negative dictionary for possessive nouns? Sounds like this could be a separate project (if one is not already out there).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-10

Originally posted by: dscho

I have pushed an update to the WIP. If you have no major issues (or in spite of, to make them more public), I will open a PR after doing some code/doc cleanup.

Could you open the PR right away? That would make it much easier for me to keep track of changes. It does not have to be perfect, of course, if you want to, you can even mark it as a Draft PR.

After that we can all compete to implement new linting asap. <insert laughing="" emoji="" here=""></insert>

🤣

One linting request is for imperative - would that mean a negative dictionary for possessive nouns? Sounds like this could be a separate project (if one is not already out there).

I think that people already tried to work on this, see e.g. https://github.com/spencermountain/compromise/issues/499. It also looks as if https://www.npmjs.com/package/node-nlp should have something to help with that task, but I did not really find anything.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-10

Originally posted by: dscho

One linting request is for imperative - would that mean a negative dictionary for possessive nouns? Sounds like this could be a separate project (if one is not already out there).

I think that people already tried to work on this, see e.g. [spencermountain/compromise#499](https://github.com/spencermountain/compromise/issues/499). It also looks as if https://www.npmjs.com/package/node-nlp should have something to help with that task, but I did not really find anything.

I guess https://github.com/clns/node-commit-msg would be a much more pragmatic route to take ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-12

Originally posted by: webstech

I guess https://github.com/clns/node-commit-msg would be a much more pragmatic route to take ;-)

Java required but no source for the parser? Didn't really look at what the parser is supposed to do.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-12

Originally posted by: dscho

I guess https://github.com/clns/node-commit-msg would be a much more pragmatic route to take ;-)

Java required but no source for the parser?

Oy, sorry, I failed to see that it is not implemented in Javascript :-(

OTOH I wonder whether parser.jar is simply a pre-compiled version of the Stanford Parser, as mentioned in the npm package description:

Detection of non-imperative verbs in subject, eg. "Fixes bug" or "Fixed bug" instead of "Fix bug" (error | configurable) - using the Stanford Parser with a custom trained model for commit messages

I found a couple other npm modules to validate commit messages, but they seem to follow the "Conventional Commits" guidelines (which disagree with Git's own requirements for commit messages).

In any case, this is not relevant for the initial commit linting ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2020-01-15

Originally posted by: dscho

Detection of non-imperative verbs in subject, eg. "Fixes bug" or "Fixed bug" instead of "Fix bug" (error | configurable) - using the Stanford Parser with a custom trained model for commit messages

I found a couple other npm modules to validate commit messages, but they seem to follow the "Conventional Commits" guidelines (which disagree with Git's own requirements for commit messages).

I experimented with wink-pos-tagger:

var posTagger = require( 'wink-pos-tagger' ); // Create an instance of the pos tagger. var tagger = posTagger(); // Tag the sentence using the tag sentence api. console.log(tagger.tagSentence( 'env: have the thing go away. Or else' ));

outputs

[ { value: 'env', tag: 'word', normal: 'env', pos: 'NNP', lemma: 'env' }, { value: ':', tag: 'punctuation', normal: ':', pos: ':' }, { value: 'have', tag: 'word', normal: 'have', pos: 'VBP', lemma: 'have' }, { value: 'the', tag: 'word', normal: 'the', pos: 'DT' }, { value: 'thing', tag: 'word', normal: 'thing', pos: 'NN', lemma: 'thing' }, { value: 'go', tag: 'word', normal: 'go', pos: 'NN', lemma: 'go' }, { value: 'away', tag: 'word', normal: 'away', pos: 'RB' }, { value: '.', tag: 'punctuation', normal: '.', pos: '.' }, { value: 'Or', tag: 'word', normal: 'or', pos: 'CC' }, { value: 'else', tag: 'word', normal: 'else', pos: 'RB' } ]

Note the tag VBP, which according to https://github.com/trinker/tagger#interpreting-tags means "verb, present tense, not 3rd person singular".

Replacing the word have with has turns this tag into VBZ which means "verb, present tense, 3rd person singular".

It might be possible to come up with a set of simple rules what VB* tags we want to allow in the first line of the commit message. My idea was to experiment with this a bit by using Git's own commit history and just generating some statistics next, like, which tags occur how often, then see if e.g. "VBD, VBG and VBN are not okay" would result in any false positives (I am more concerned with suppressing false positives than with preventing false negatives).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rudimentary commit message linting

Sending GitHub PRs to the Git mailing list

Milestone

Searches

Help

#120 Rudimentary commit message linting

Discussion