Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2024-07-25 | 9.9 kB | |
Release 0.14.0 source code.tar.gz | 2024-07-25 | 2.2 MB | |
Release 0.14.0 source code.zip | 2024-07-25 | 2.4 MB | |
Totals: 3 Items | 4.6 MB | 0 |
This release adds major new support for biomedical text analytics! It adds improved biomedical NER and a state-of-the-art model for biomedical entity linking. Other new features include (1) support for parameter-efficient fine-tuning and (2) various new datasets, bug fixes and enhancements! We also removed a few dependencies, so Flair should install faster and take up less space!
Biomedical NER and Entity Linking
With Flair 0.14.0, you can now detect and normalize biomedical entities in text.
For example, to analyze the sentence "We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome
", use this code snippet:
:::python
from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.data import Sentence
# A sentence from biomedical literature
sentence = Sentence("We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome.")
# Tag named entities in the text
ner_tagger = Classifier.load("hunflair2")
ner_tagger.predict(sentence)
# Normalize disease names
disease_linker = EntityMentionLinker.load("gene-linker")
disease_linker.predict(sentence)
# Normalize gene names
gene_linker = EntityMentionLinker.load("disease-linker")
gene_linker.predict(sentence)
# Iterate over predicted entities and print
for label in sentence.get_labels():
print(label)
This should print out:
:::console
Span[5:6]: "IFNAR2" → Gene (1.0)
Span[5:6]: "IFNAR2" → 3455/name=IFNAR2
Span[7:8]: "POLG" → Gene (1.0)
Span[7:8]: "POLG" → 5428/name=POLG
Span[9:11]: "long-COVID syndrome" → Disease (1.0)
Span[9:11]: "long-COVID syndrome" → MESH:D000094024/name=Post-Acute COVID-19 Syndrome
The printout shows that:
-
"IFNAR2" is a gene. Further, it is recognized as gene 3455 ("interferon alpha and beta receptor subunit 2") in the NCBI database.
-
"POLG" is a gene. Further, it is recognized as gene 5428 ("DNA polymerase gamma, catalytic subunit") in the NCBI database.
-
"long-COVID syndrome" is a disease. Further, it is uniquely linked to "Post-Acute COVID-19 Syndrome" in the MESH database.
Big thanks to @sg-wbi @WangXII @mariosaenger @helpmefindaname for all their work: * Entity Mention Linker by @helpmefindaname in https://github.com/flairNLP/flair/pull/3388 * Support for biomedical datasets with multiple entity types by @WangXII in https://github.com/flairNLP/flair/pull/3387 * Update documentation for Hunflair2 release by @mariosaenger in https://github.com/flairNLP/flair/pull/3410 * Improve nel tutorial by @helpmefindaname in https://github.com/flairNLP/flair/pull/3369 * Incorporate hunflair2 docs to docpage by @helpmefindaname in https://github.com/flairNLP/flair/pull/3442
Parameter-Efficient Fine-Tuning
Flair 0.14.0 also adds support for PEFT.
For instance, to fine-tune a BERT model on the TREC question classification task using LoRA, use the following snippet:
:::python
from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
# Note: you need to install peft to use this feature!
from peft import LoraConfig, TaskType
# Get corpus and make label dictionary
corpus: Corpus = TREC_6()
label_type = "question_class"
label_dict = corpus.make_label_dictionary(label_type=label_type)
# Define embeddings with LoRA fine-tuning
document_embeddings = TransformerDocumentEmbeddings(
"bert-base-uncased",
fine_tune=True,
# set LoRA config
peft_config=LoraConfig(
task_type=TaskType.FEATURE_EXTRACTION,
inference_mode=False,
),
)
# define model
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)
# train model
trainer = ModelTrainer(classifier, corpus)
trainer.fine_tune(
"resources/taggers/question-classification-with-transformer",
learning_rate=5.0e-4,
mini_batch_size=4,
max_epochs=1,
)
Big thanks to @janpf for this new feature! * Add PEFT training and explicit kwarg passthrough by @janpf in https://github.com/flairNLP/flair/pull/3480
Smaller Library
We've removed dependencies such as gensim
from the core package, since they increased the size of the Flair library and caused some compatibility/maintenance issues. This means the core package is now smaller and fast to install.
Install as always with:
:::console
pip install flair
For certain features, you still need gensim
, such as training a model that uses classic word embeddings. For this use case, install with:
:::console
pip install flair[word-embeddings]
Or just install gensim
separately.
Big thanks to @helpmefindaname for this new feature! * Make gensim optional by @helpmefindaname in https://github.com/flairNLP/flair/pull/3493 * Update models for v0.14.0 by @alanakbik in https://github.com/flairNLP/flair/pull/3505 * Relax version constraint for konoha by @himkt in https://github.com/flairNLP/flair/pull/3394 * Dependencies maintainance updates by @helpmefindaname in https://github.com/flairNLP/flair/pull/3402 * Make janome optional by @himkt in https://github.com/flairNLP/flair/pull/3405 * Bump min. version of bpemb by @stefan-it in https://github.com/flairNLP/flair/pull/3468
Other Improvements
New Features and Improvements
- Speed up euclidean distance calculation by @sheldon-roberts in https://github.com/flairNLP/flair/pull/3485
- Add DataTriples which act just like DataPairs by @janpf in https://github.com/flairNLP/flair/pull/3481
- Add random seed parameter to dataset splitting and downsampling for better reproducibility by @MattGPT-ai in https://github.com/flairNLP/flair/pull/3475
- Allow cpu device even if gpu available by @drbh in https://github.com/flairNLP/flair/pull/3417
- Add prediction label type for span classifier by @helpmefindaname in https://github.com/flairNLP/flair/pull/3432
- Character embeddings store their embedding name too by @helpmefindaname in https://github.com/flairNLP/flair/pull/3477
Bug Fixes
TextPairRegressor
: Fix data point iteration by @ya0guang in https://github.com/flairNLP/flair/pull/3413TextPairRegressor
: Fix GPU memory leak by @MattGPT-ai in https://github.com/flairNLP/flair/pull/3490TextRegressor
: Fix label_name bug by @sheldon-roberts in https://github.com/flairNLP/flair/pull/3491SequenceTagger
: Fix _all_scores_for_token in ViterbiDecoder by @mauryaland in https://github.com/flairNLP/flair/pull/3455SentenceSplitter
: Fix linking of sentences by @mariosaenger in https://github.com/flairNLP/flair/pull/3397SentenceSplitter
: Fix case where split was performed on special characters by @helpmefindaname in https://github.com/flairNLP/flair/pull/3404Classifier
: Fix loading by moving error message to main load function by @alanakbik in https://github.com/flairNLP/flair/pull/3504Trainer
: Fix edge case by loading best model at end, even when there is no final evaluation by @helpmefindaname in https://github.com/flairNLP/flair/pull/3470TransformerEmbeddings
: Fix special tokens by not replacing replace_additional_special_tokens by @helpmefindaname in https://github.com/flairNLP/flair/pull/3451- Unit tests: Fix double
data_folder
in unit test by @ya0guang in https://github.com/flairNLP/flair/pull/3412
New Datasets
- Add revision support for all Universal Dependencies datasets by @stefan-it in https://github.com/flairNLP/flair/pull/3420
NER_ESTONIAN_NOISY
: Support for Estonian NER dataset with noise by @teresaloeffelhardt in https://github.com/flairNLP/flair/pull/3463MASAKHA_POS
: Support for two new languages by @stefan-it in https://github.com/flairNLP/flair/pull/3421UD_BAVARIAN_MAIBAAM
: Add support for new Bavarian MaiBaam UD by @stefan-it in https://github.com/flairNLP/flair/pull/3426
Documentation
- Minor readme fixes by @stefan-it in https://github.com/flairNLP/flair/pull/3424
- Fix typo transformer-embeddings.md by @abhisheklomsh in https://github.com/flairNLP/flair/pull/3500
- Fix typo in how-model-training-works.md by @abhisheklomsh in https://github.com/flairNLP/flair/pull/3499
Build Management
- Fix black and ruff by @stefan-it in https://github.com/flairNLP/flair/pull/3423
- Remove zappr yaml by @helpmefindaname in https://github.com/flairNLP/flair/pull/3435
- Fix
tests
package being incorrectly included in builds by @asumagic in https://github.com/flairNLP/flair/pull/3440
New Contributors
- @ya0guang made their first contribution in https://github.com/flairNLP/flair/pull/3413
- @drbh made their first contribution in https://github.com/flairNLP/flair/pull/3417
- @asumagic made their first contribution in https://github.com/flairNLP/flair/pull/3440
- @MattGPT-ai made their first contribution in https://github.com/flairNLP/flair/pull/3475
- @janpf made their first contribution in https://github.com/flairNLP/flair/pull/3481
- @sheldon-roberts made their first contribution in https://github.com/flairNLP/flair/pull/3485
- @abhisheklomsh made their first contribution in https://github.com/flairNLP/flair/pull/3500
- @teresaloeffelhardt made their first contribution in https://github.com/flairNLP/flair/pull/3463
Full Changelog: https://github.com/flairNLP/flair/compare/v0.13.1...v0.14.0