Download Latest Version Release 0.15.1 source code.zip (2.5 MB)
Email in envelope

Get an email when there's a new version of flair

Home / v0.14.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-07-25 9.9 kB
Release 0.14.0 source code.tar.gz 2024-07-25 2.2 MB
Release 0.14.0 source code.zip 2024-07-25 2.4 MB
Totals: 3 Items   4.6 MB 0

This release adds major new support for biomedical text analytics! It adds improved biomedical NER and a state-of-the-art model for biomedical entity linking. Other new features include (1) support for parameter-efficient fine-tuning and (2) various new datasets, bug fixes and enhancements! We also removed a few dependencies, so Flair should install faster and take up less space!

Biomedical NER and Entity Linking

With Flair 0.14.0, you can now detect and normalize biomedical entities in text.

For example, to analyze the sentence "We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome", use this code snippet:

:::python
from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.data import Sentence

# A sentence from biomedical literature
sentence = Sentence("We correlate genetic variants in IFNAR2 and POLG with long-COVID syndrome.")

# Tag named entities in the text
ner_tagger = Classifier.load("hunflair2")
ner_tagger.predict(sentence)

# Normalize disease names
disease_linker = EntityMentionLinker.load("gene-linker")
disease_linker.predict(sentence)

# Normalize gene names
gene_linker = EntityMentionLinker.load("disease-linker")
gene_linker.predict(sentence)

# Iterate over predicted entities and print
for label in sentence.get_labels():
    print(label)

This should print out:

:::console
Span[5:6]: "IFNAR2" → Gene (1.0)
Span[5:6]: "IFNAR2" → 3455/name=IFNAR2

Span[7:8]: "POLG" → Gene (1.0)
Span[7:8]: "POLG" → 5428/name=POLG

Span[9:11]: "long-COVID syndrome" → Disease (1.0)
Span[9:11]: "long-COVID syndrome" → MESH:D000094024/name=Post-Acute COVID-19 Syndrome

The printout shows that:

  • "IFNAR2" is a gene. Further, it is recognized as gene 3455 ("interferon alpha and beta receptor subunit 2") in the NCBI database.

  • "POLG" is a gene. Further, it is recognized as gene 5428 ("DNA polymerase gamma, catalytic subunit") in the NCBI database.

  • "long-COVID syndrome" is a disease. Further, it is uniquely linked to "Post-Acute COVID-19 Syndrome" in the MESH database.

Big thanks to @sg-wbi @WangXII @mariosaenger @helpmefindaname for all their work: * Entity Mention Linker by @helpmefindaname in https://github.com/flairNLP/flair/pull/3388 * Support for biomedical datasets with multiple entity types by @WangXII in https://github.com/flairNLP/flair/pull/3387 * Update documentation for Hunflair2 release by @mariosaenger in https://github.com/flairNLP/flair/pull/3410 * Improve nel tutorial by @helpmefindaname in https://github.com/flairNLP/flair/pull/3369 * Incorporate hunflair2 docs to docpage by @helpmefindaname in https://github.com/flairNLP/flair/pull/3442

Parameter-Efficient Fine-Tuning

Flair 0.14.0 also adds support for PEFT.

For instance, to fine-tune a BERT model on the TREC question classification task using LoRA, use the following snippet:

:::python
from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import TransformerDocumentEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer

# Note: you need to install peft to use this feature!
from peft import LoraConfig, TaskType

# Get corpus and make label dictionary
corpus: Corpus = TREC_6()
label_type = "question_class"
label_dict = corpus.make_label_dictionary(label_type=label_type)

# Define embeddings with LoRA fine-tuning
document_embeddings = TransformerDocumentEmbeddings(
    "bert-base-uncased",
    fine_tune=True,
    # set LoRA config
    peft_config=LoraConfig(
        task_type=TaskType.FEATURE_EXTRACTION,
        inference_mode=False,
    ),
)

# define model
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict, label_type=label_type)

# train model
trainer = ModelTrainer(classifier, corpus)
trainer.fine_tune(
    "resources/taggers/question-classification-with-transformer",
    learning_rate=5.0e-4,
    mini_batch_size=4,
    max_epochs=1,
)

Big thanks to @janpf for this new feature! * Add PEFT training and explicit kwarg passthrough by @janpf in https://github.com/flairNLP/flair/pull/3480

Smaller Library

We've removed dependencies such as gensim from the core package, since they increased the size of the Flair library and caused some compatibility/maintenance issues. This means the core package is now smaller and fast to install.

Install as always with:

:::console
pip install flair

For certain features, you still need gensim, such as training a model that uses classic word embeddings. For this use case, install with:

:::console
pip install flair[word-embeddings]

Or just install gensim separately.

Big thanks to @helpmefindaname for this new feature! * Make gensim optional by @helpmefindaname in https://github.com/flairNLP/flair/pull/3493 * Update models for v0.14.0 by @alanakbik in https://github.com/flairNLP/flair/pull/3505 * Relax version constraint for konoha by @himkt in https://github.com/flairNLP/flair/pull/3394 * Dependencies maintainance updates by @helpmefindaname in https://github.com/flairNLP/flair/pull/3402 * Make janome optional by @himkt in https://github.com/flairNLP/flair/pull/3405 * Bump min. version of bpemb by @stefan-it in https://github.com/flairNLP/flair/pull/3468

Other Improvements

New Features and Improvements

Bug Fixes

New Datasets

Documentation

Build Management

New Contributors

Full Changelog: https://github.com/flairNLP/flair/compare/v0.13.1...v0.14.0

Source: README.md, updated 2024-07-25