Drug name recognition and normalisation/grounding to DrugBank ids and standard names.

Package provides 2 taggers:
1. DrugTagger - CRF-based with DrugBank presence feature (see feature set for details).
2. DrugnameGazetteer - gazetteer/dictionary-based. Dictionary created from DrugBank.ca database.
Both taggers include grounding/normalisation to DrugBank ids and standard names.

Feature set:
Word, Word-1, Word+1, Word-1_Word, Word_Word+1, DrugBankPresence, POS
DrugBankPresence feature indicates the presence of the drug name in the DrugBank.

Using CONLL-Evaluation:
processed 32065 tokens with 3656 phrases; found: 3251 phrases; correct: 2786.
accuracy: 95.25%; precision: 85.70%; recall: 76.20%; FB1: 80.67


Using GATE Corpus Benchmark:
Strict: P: 0.65 R: 0.73 F1: 0.69
Lenient: P: 0.74 R: 0.84 F1: 0.78

The details of how to reproduce evaluation, see README.

To use standalone version for tagging download DrugExtractionStandalone.tar.gz from Files.

Project Samples

Project Activity

See All Activity >

Follow Drug Extraction

Drug Extraction Web Site

Other Useful Business Software
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Drug Extraction!

Additional Project Details

User Interface

Console/Terminal

Programming Language

Java

Related Categories

Java Bio-Informatics Software, Java Linguistics Software, Java Machine Learning Software

Registered

2015-06-10