textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals, tokenization, part-of-speech tagging, dependency parsing, etc., delegated to another library, textacy focuses primarily on the tasks that come before and follow after.
Features
- Access and extend spaCy's core functionality for working with one or many documents through convenient methods and custom extensions
- Load prepared datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments
- Clean, normalize, and explore raw text before processing it with spaCy
- Extract structured information from processed documents, including n-grams, entities, acronyms, keyterms, and SVO triples
- Compare strings and sequences using a variety of similarity metrics
- Tokenize and vectorize documents then train, interpret, and visualize topic models
Categories
Natural Language Processing (NLP)License
Apache License V2.0Follow textacy
Other Useful Business Software
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of textacy!