Showing 31 open source projects for "word"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    ...Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. It's blazing fast, easy to install and comes with a simple and productive API.
    Downloads: 116 This Week
    Last Update:
    See Project
  • 2
    spacy-transformers

    spacy-transformers

    Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

    spaCy supports a number of transfer and multi-task learning workflows that can often help improve your pipeline’s efficiency or accuracy. Transfer learning refers to techniques such as word vector tables and language model pretraining. These techniques can be used to import knowledge from raw text into your pipeline, so that your models are able to generalize better from your annotated examples. You can convert word vectors from popular tools like FastText and Gensim, or you can load in any pre trained transformer model if you install spacy-transformers. ...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 3
    DocTR

    DocTR

    Library for OCR-related tasks powered by Deep Learning

    ...End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    rich

    rich

    Rich is a Python library for rich text and beautiful formatting

    ...Rich can be installed in the Python REPL, so that any data structures will be pretty printed and highlighted. As you might expect, this will print "Hello World!" to the terminal. Note that unlike the builtin print function, Rich will word-wrap your text to fit within the terminal width.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    csv2odf

    csv2odf

    csv2odf can convert csv data to formatted spreadsheets and documents.

    ...It is a command line tool and you can automate the generation of reports by using scripts and cron. It can be used to create spreadsheets and documents for LibreOffice, OpenOffice, Microsoft Office Excel and Word. It is open source GPL v3 and crossplatform, it can run on most operating systems that can run Python (Python is required). More details, example files, and online manual at http://csv2odf.sf.net.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    bridgex

    bridgex

    Convert files like docx, xlsx, pptx, html, and more to MarkDown

    ... - Support for multiple input formats. - Lightweight editing prior to saving. Supported Formats 📂 Bridgex supports conversion of the following file formats: - PDF (.pdf) - Word (.docx) - PowerPoint (.pptx) - Excel (.xlsx, .xls, .csv) - Outlook Messages (.msg) - Text (.txt, .text) - Markdown (.md, .markdown) - JSON (.json, .jsonl) - XML (.xml) - RSS/Atom (.rss, .atom) - HTML/MHTML (.html, .htm, .mhtml) - ePub (.epub) - Compressed files (.zip) - Jupyter Notebooks (.ipynb) - Other formats supported by Markitdown Bridgex is not an IDE, text editor, Markdown editor, or document viewer
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    GluCat: Clifford algebra templates

    GluCat: Clifford algebra templates

    Calculation with Clifford algebras: C++ library and Python module

    GluCat is a generic library of C++ templates that implement universal Clifford algebras over the field of real numbers. The PyClical extension module for Python gives users an easy Python scripting interface for calculations in Clifford algebras. The name PyClical is an homage to Pertti Lounesto's CLICAL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    OWASP Mobile Application Security

    OWASP Mobile Application Security

    Manual for mobile app security testing and reverse engineering

    ...MAS Advocates are industry adopters of the OWASP MASVS and MASTG who have invested a significant and consistent amount of resources to push the project forward by providing consistent high-impact contributions and continuously spreading the word.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 11
    Docx2PDF The Converter [I.S,A]

    Docx2PDF The Converter [I.S,A]

    Docx-2-PDF: The Converter [Improved.Simplified.Alternative]

    Docx-2-PDF Converter' is an desktop application developed using python 3.11.4 and other add-on libaries. Converts image file into PDF file. 'Image 2 PDF Converter' has two modes: 1) Single file - Convert one word (.docx) file into pdf file. 2) From Directory/Folder - Convert word (.docx) files into pdf files from a directory or folder. Compatible only for windows OS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    dirsearch

    dirsearch

    Web path scanner

    ...About extensions, unlike other tools, dirsearch only replaces the %EXT% keyword with extensions from -e flag. For wordlists without %EXT% (like SecLists), -f | --force-extensions switch is required to append extensions to every word in wordlist, as well as the /. To use multiple wordlists, you can separate your wordlists with commas. Example: wordlist1.txt,wordlist2.txt. Default values for dirsearch flags can be edited in the configuration file: default.conf. The thread number (-t | --threads) reflects the number of separated brute force processes. And so the bigger the thread number is, the faster dirsearch runs. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    fastNLP

    fastNLP

    fastNLP: A Modularized and Extensible NLP Framework

    ...Various convenient NLP tools, such as Embedding loading (including ELMo and BERT), intermediate data cache, etc.. Provide a variety of neural network components and recurrence models (covering tasks such as Chinese word segmentation, named entity recognition, syntactic analysis, text classification, text matching, metaphor resolution, summarization, etc.). Trainer provides a variety of built-in Callback functions to facilitate experiment recording, exception capture, etc. Automatic download of some datasets and pre-trained models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    SentEval

    SentEval

    A python tool for evaluating the quality of sentence embeddings

    ...It defines a simple interface—provide an encoder function from sentences to vectors—and then runs consistent training/evaluation loops for tasks like sentiment, entailment, paraphrase, and semantic textual similarity. The suite also contains linguistic probing tasks that illuminate what properties embeddings capture, such as tense, word order, or syntactic structure. Datasets are wrapped with unified preprocessing and metrics so results are comparable across papers and implementations. Because the interface is minimal, researchers can plug in encoders from any framework or language model and obtain a broad evaluation with little glue code. SentEval helped establish common baselines and reporting conventions in the sentence-representation community, reducing friction when comparing new methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PyTorch Natural Language Processing

    PyTorch Natural Language Processing

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    ...Now you've setup your pipeline, you may want to ensure that some functions run deterministically. Wrap any code that's random, with fork_rng and you'll be good to go. Now that you've computed your vocabulary, you may want to make use of pre-trained word vectors to set your embeddings.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    ULIX TxT Editor

    ULIX TxT Editor

    The only Word Processor designed to create and modify .htaccess files

    Full version available for Python. No updates needed for UTE-11 on Linux. *** IMPORTANT... PLEASE READ - September 24, 2022*** UTE-11 on Windows patch will be released for October 20, 2022. A patch for the faulty ability to fail when opening program is being tested now. Sorry for the long wait on this patch. Took longer than expected to isolate the root of the problem. Sorry for the inconvenience caused on our coding. The Vampnerd Group.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    aeneas

    aeneas

    Automagically synchronize audio and text (aka forced alignment)

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    node2vec

    node2vec

    Learn continuous vector embeddings for nodes in a graph using biased R

    ...It allows researchers and practitioners to apply node2vec to various graph datasets and evaluate embedding quality on downstream tasks. By bridging ideas from graph theory and word embedding models, this project demonstrates how graph-based machine learning can be made efficient and flexible.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    PROJECT MOVED TO https://github.com/paulhtremblay/rtf2xml The script rtf2xml faithfully converts Microsoft's RTF format to structured XML. Developers can make further transformations using standard XML tools, or use the stylsheets provided to convert to sdocbook or TEI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    inFolder

    A personal wiki created by your directory structure

    A graphical text editor used to maintain a collection of pages, whose content is created by the user, but whose hierarchical structure is dictated by the directory structure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    ...It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both. Supports word inflection (pluralization and singularization) and lemmatization, as well as spelling correction. Add new models or languages through extensions. Also, it comes with a WordNet integration. If you only intend to use TextBlob’s default models (no model overrides), you can pass the lite argument. This downloads only those corpora needed for basic functionality. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Voikko

    Voikko

    Library of linguistic tools

    Voikko is a spell checking, grammar checking, morphological analysis and hyphenation system. Spell checkers are available for multiple languages, other features for Finnish only.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    gEcrit
    gEcrit is a Python orientated source code editor. It tries to keep the interface as clean as possible and keep the menus simple. It features all the common features a Python programmer might need, including an interactive Python shell.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    A collection of open source libraries and tools that provide solutions for common problems in processing Arabic text, especially in web applications. text normalization, phrase segmentation, text indexing, stop word lists, common spelling mistakes.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB