Download Latest Version v4.5.10 - Remove Patterns _ Lucene and add Semgrex _ Ssurgeon features source code.tar.gz (43.9 MB)
Email in envelope

Get an email when there's a new version of Stanford CoreNLP

Home / v4.5.10
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2025-06-07 1.7 kB
v4.5.10 - Remove Patterns _ Lucene and add Semgrex _ Ssurgeon features source code.tar.gz 2025-06-07 43.9 MB
v4.5.10 - Remove Patterns _ Lucene and add Semgrex _ Ssurgeon features source code.zip 2025-06-07 45.4 MB
Totals: 3 Items   89.3 MB 8

Remove Patterns

  • Older versions of Lucene have a security issue: https://github.com/advisories/GHSA-g643-xq6w-r67c Unfortunately, Lucene V9.12 is not compatible with Java 8. We therefore want to remove Lucene from this release
  • The one project using it in CoreNLP is the patterns directory. We remove this, perhaps temporarily. If you are making use of the patterns project, please file an issue and we will include it in a future Java 11 compatible release. (We are aware of at least one group which used that project, back in 2020)

Semgrex and Ssurgeon upgrades

  • :: uniq operator at the end of a Semgrex expression allows for making results uniq across a set of node values
  • <> search in Semgrex means connected either as parent or child. Simplifies expressions where the direction of the connection doesn't matter
  • Ssurgeon EditNode now supports -removemorphofeatures to remove one or more features without removing all features
  • Ssurgeon SplitWord now allows for exact word splitting, not just regex based splitting
  • Ssurgeon MergeNodes can now merge multiple nodes at once, not just two
  • add Ssurgeon SetPhraseHead operation to make a connected phrase in a dependency graph have a different head, possibly updating the relations between the children as well. Useful for changing the head of a proper noun phrase, for example

Hopefully minor interface changes

  • We move VariableStrings from trees/tregex and semgraph/semgrex into util. It turns out there were two copies of this code in the codebase. This may ruin serialized tregex outputs, if such a thing exists.
Source: README.md, updated 2025-06-07