flair - Browse /v0.14.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2024-07-25	9.9 kB	0
Release 0.14.0 source code.tar.gz	2024-07-25	2.2 MB	0
Release 0.14.0 source code.zip	2024-07-25	2.4 MB	0
Totals: 3 Items		4.6 MB	0

This release adds major new support for biomedical text analytics! It adds improved biomedical NER and a state-of-the-art model for biomedical entity linking. Other new features include (1) support for parameter-efficient fine-tuning and (2) various new datasets, bug fixes and enhancements! We also removed a few dependencies, so Flair should install faster and take up less space!

Biomedical NER and Entity Linking

With Flair 0.14.0, you can now detect and normalize biomedical entities in text.

For example, to analyze the sentence "We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome", use this code snippet:

:::python
from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.data import Sentence

# A sentence from biomedical literature
sentence = Sentence("We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome.")

# Tag named entities in the text
ner_tagger = Classifier.load("hunflair2")
ner_tagger.predict(sentence)

# Normalize disease names
disease_linker = EntityMentionLinker.load("gene-linker")
disease_linker.predict(sentence)

# Normalize gene names
gene_linker = EntityMentionLinker.load("disease-linker")
gene_linker.predict(sentence)

# Iterate over predicted entities and print
for label in sentence.get_labels():
    print(label)

This should print out:

:::console
Span[5:6]: "IFNAR2" → Gene (1.0)
Span[5:6]: "IFNAR2" → 3455/name=IFNAR2

Span[7:8]: "POLG" → Gene (1.0)
Span[7:8]: "POLG" → 5428/name=POLG

Span[9:11]: "long-COVID syndrome" → Disease (1.0)
Span[9:11]: "long-COVID syndrome" → MESH:D000094024/name=Post-Acute COVID-19 Syndrome

The printout shows that:

"IFNAR2" is a gene. Further, it is recognized as gene 3455 ("interferon alpha and beta receptor subunit 2") in the NCBI database.
"POLG" is a gene. Further, it is recognized as gene 5428 ("DNA polymerase gamma, catalytic subunit") in the NCBI database.
"long-COVID syndrome" is a disease. Further, it is uniquely linked to "Post-Acute COVID-19 Syndrome" in the MESH database.

Big thanks to @sg-wbi @WangXII @mariosaenger @helpmefindaname for all their work: * Entity Mention Linker by @helpmefindaname in https://github.com/flairNLP/flair/pull/3388 * Support for biomedical datasets with multiple entity types by @WangXII in https://github.com/flairNLP/flair/pull/3387 * Update documentation for Hunflair2 release by @mariosaenger in https://github.com/flairNLP/flair/pull/3410 * Improve nel tutorial by @helpmefindaname in https://github.com/flairNLP/flair/pull/3369 * Incorporate hunflair2 docs to docpage by @helpmefindaname in https://github.com/flairNLP/flair/pull/3442

Parameter-Efficient Fine-Tuning

Flair 0.14.0 also adds support for PEFT.

For instance, to fine-tune a BERT model on the TREC question classification task using LoRA, use the following snippet:

:::python
from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# Note: you need to install peft to use this feature!
from peft import LoraConfig, TaskType

# Get corpus and make label dictionary
corpus: Corpus = TREC_6()
label_type = "question_class"
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Define embeddings with LoRA fine-tuning
document_embeddings = TransformerDocumentEmbeddings(
    "bert-base-uncased",
    fine_tune=True,
    # set LoRA config
    peft_config=LoraConfig(
        task_type=TaskType.FEATURE_EXTRACTION,
        inference_mode=False,
    ),
)

# define model
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

# train model
trainer = ModelTrainer(classifier, corpus)
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    learning_rate=5.0e-4,
    mini_batch_size=4,
    max_epochs=1,
)

Big thanks to @janpf for this new feature! * Add PEFT training and explicit kwarg passthrough by @janpf in https://github.com/flairNLP/flair/pull/3480

Smaller Library

We've removed dependencies such as gensim from the core package, since they increased the size of the Flair library and caused some compatibility/maintenance issues. This means the core package is now smaller and fast to install.

Install as always with:

:::console
pip install flair

For certain features, you still need gensim, such as training a model that uses classic word embeddings. For this use case, install with:

:::console
pip install flair[word-embeddings]

Or just install gensim separately.

Big thanks to @helpmefindaname for this new feature! * Make gensim optional by @helpmefindaname in https://github.com/flairNLP/flair/pull/3493 * Update models for v0.14.0 by @alanakbik in https://github.com/flairNLP/flair/pull/3505 * Relax version constraint for konoha by @himkt in https://github.com/flairNLP/flair/pull/3394 * Dependencies maintainance updates by @helpmefindaname in https://github.com/flairNLP/flair/pull/3402 * Make janome optional by @himkt in https://github.com/flairNLP/flair/pull/3405 * Bump min. version of bpemb by @stefan-it in https://github.com/flairNLP/flair/pull/3468

Other Improvements

New Features and Improvements

Speed up euclidean distance calculation by @sheldon-roberts in https://github.com/flairNLP/flair/pull/3485
Add DataTriples which act just like DataPairs by @janpf in https://github.com/flairNLP/flair/pull/3481
Add random seed parameter to dataset splitting and downsampling for better reproducibility by @MattGPT-ai in https://github.com/flairNLP/flair/pull/3475
Allow cpu device even if gpu available by @drbh in https://github.com/flairNLP/flair/pull/3417
Add prediction label type for span classifier by @helpmefindaname in https://github.com/flairNLP/flair/pull/3432
Character embeddings store their embedding name too by @helpmefindaname in https://github.com/flairNLP/flair/pull/3477

Bug Fixes

TextPairRegressor: Fix data point iteration by @ya0guang in https://github.com/flairNLP/flair/pull/3413
TextPairRegressor: Fix GPU memory leak by @MattGPT-ai in https://github.com/flairNLP/flair/pull/3490
TextRegressor: Fix label_name bug by @sheldon-roberts in https://github.com/flairNLP/flair/pull/3491
SequenceTagger: Fix _all_scores_for_token in ViterbiDecoder by @mauryaland in https://github.com/flairNLP/flair/pull/3455
SentenceSplitter: Fix linking of sentences by @mariosaenger in https://github.com/flairNLP/flair/pull/3397
SentenceSplitter: Fix case where split was performed on special characters by @helpmefindaname in https://github.com/flairNLP/flair/pull/3404
Classifier: Fix loading by moving error message to main load function by @alanakbik in https://github.com/flairNLP/flair/pull/3504
Trainer: Fix edge case by loading best model at end, even when there is no final evaluation by @helpmefindaname in https://github.com/flairNLP/flair/pull/3470
TransformerEmbeddings: Fix special tokens by not replacing replace_additional_special_tokens by @helpmefindaname in https://github.com/flairNLP/flair/pull/3451
Unit tests: Fix double data_folder in unit test by @ya0guang in https://github.com/flairNLP/flair/pull/3412

New Datasets

Add revision support for all Universal Dependencies datasets by @stefan-it in https://github.com/flairNLP/flair/pull/3420
NER_ESTONIAN_NOISY: Support for Estonian NER dataset with noise by @teresaloeffelhardt in https://github.com/flairNLP/flair/pull/3463
MASAKHA_POS: Support for two new languages by @stefan-it in https://github.com/flairNLP/flair/pull/3421
UD_BAVARIAN_MAIBAAM: Add support for new Bavarian MaiBaam UD by @stefan-it in https://github.com/flairNLP/flair/pull/3426

Documentation

Minor readme fixes by @stefan-it in https://github.com/flairNLP/flair/pull/3424
Fix typo transformer-embeddings.md by @abhisheklomsh in https://github.com/flairNLP/flair/pull/3500
Fix typo in how-model-training-works.md by @abhisheklomsh in https://github.com/flairNLP/flair/pull/3499

Build Management

Fix black and ruff by @stefan-it in https://github.com/flairNLP/flair/pull/3423
Remove zappr yaml by @helpmefindaname in https://github.com/flairNLP/flair/pull/3435
Fix tests package being incorrectly included in builds by @asumagic in https://github.com/flairNLP/flair/pull/3440

New Contributors

@ya0guang made their first contribution in https://github.com/flairNLP/flair/pull/3413
@drbh made their first contribution in https://github.com/flairNLP/flair/pull/3417
@asumagic made their first contribution in https://github.com/flairNLP/flair/pull/3440
@MattGPT-ai made their first contribution in https://github.com/flairNLP/flair/pull/3475
@janpf made their first contribution in https://github.com/flairNLP/flair/pull/3481
@sheldon-roberts made their first contribution in https://github.com/flairNLP/flair/pull/3485
@abhisheklomsh made their first contribution in https://github.com/flairNLP/flair/pull/3500
@teresaloeffelhardt made their first contribution in https://github.com/flairNLP/flair/pull/3463

Full Changelog: https://github.com/flairNLP/flair/compare/v0.13.1...v0.14.0

Source: README.md, updated 2024-07-25

flair Files

A very simple framework for state-of-the-art NLP

Biomedical NER and Entity Linking

Parameter-Efficient Fine-Tuning

Smaller Library

Other Improvements

New Features and Improvements

Bug Fixes

New Datasets

Documentation

Build Management

New Contributors

flair Files

A very simple framework for state-of-the-art NLP

Get an email when there's a new version of flair

Biomedical NER and Entity Linking

Parameter-Efficient Fine-Tuning

Smaller Library

Other Improvements

New Features and Improvements

Bug Fixes

New Datasets

Documentation

Build Management

New Contributors