Search Results for "text processing" - Page 8

Showing 346 open source projects for "text processing"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 1
    Synonyms

    Synonyms

    Chinese synonyms, chat robot, intelligent question and answer toolkit

    Chinese Synonyms for natural language processing and understanding. Better Chinese synonyms, chatbot, intelligent question and answer toolkit. synonymsCan be used for many tasks in natural language understanding, text alignment, recommendation algorithms, similarity calculation, semantic shifting, keyword extraction, concept extraction, automatic summarization, search engines, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    fastNLP

    fastNLP

    fastNLP: A Modularized and Extensible NLP Framework

    ...Provide a variety of neural network components and recurrence models (covering tasks such as Chinese word segmentation, named entity recognition, syntactic analysis, text classification, text matching, metaphor resolution, summarization, etc.). Trainer provides a variety of built-in Callback functions to facilitate experiment recording, exception capture, etc. Automatic download of some datasets and pre-trained models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    htmleditor.py

    A Python based HTML and CSS Editor

    Requires PyQt >= 5.2 QsciScintilla >= 2.8 Python >=3.4
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    GluonNLP

    GluonNLP

    NLP made easy

    GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you load the text data, process the text data, and train models. To facilitate both the engineers and researchers, we provide command-line-toolkits for downloading and processing the NLP datasets. Gluon NLP makes it easy to evaluate and train word embeddings. Here are examples to evaluate the pre-trained embeddings included in the Gluon NLP toolkit as well as example scripts for training embeddings on custom datasets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DeText

    DeText

    A Deep Neural Text Understanding Framework

    DeText is a Deep Text understanding framework for NLP-related ranking, classification, and language generation tasks. It leverages semantic matching using deep neural networks to understand member intents in search and recommender systems. As a general NLP framework, DeText can be applied to many tasks, including search & recommendation ranking, multi-class classification and query understanding tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    NLP-Models-Tensorflow

    NLP-Models-Tensorflow

    Gathers machine learning and Tensorflow deep learning models for NLP

    NLP-Models-Tensorflow is a collection of natural language processing model implementations built using the TensorFlow deep learning framework. The repository provides numerous examples of neural network architectures used in modern NLP research and applications, including text classification, language modeling, machine translation, and sentiment analysis. Each model implementation is designed to illustrate how common NLP architectures operate, such as recurrent neural networks, convolutional models for text processing, and transformer-style attention mechanisms. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Delta ML

    Delta ML

    Deep learning based natural language and speech processing platform

    ...It helps you to train, develop, and deploy NLP and/or speech models. Use configuration files to easily tune parameters and network structures. What you see in training is what you get in serving: all data processing and features extraction are integrated into a model graph. Text classification, named entity recognition, question and answering, text summarization, etc. Uniform I/O interfaces and no changes for new models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    NLP Best Practices

    NLP Best Practices

    Natural Language Processing Best Practices & Examples

    In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10

    FastoCloud PRO

    IPTV/NVR/CCTV/Video cloud https://fastocloud.com

    IPTV/Video cloud Features: Cross-platform (Linux, MacOSX, FreeBSD, Raspbian/Armbian) GPU/CPU Encode/Decode/Post Processing Stream statistics CCTV Adaptive hls streams Load balancing Temporary urls HLS push EPG scanning Subtitles to text conversions AD insertion Logo overlay Video effects Relays Timeshifts Catchups Playlists Restream/Transcode from online streaming services like Youtube, Twitch Mozaic Many Outputs Physical Inputs Streaming Protocols File Formats Presets Vods/Series server-side support Pay per view channels Channels on demand HTTP Live Streaming (HLS) server-side support Public API, client server communication via JSON RPC Protocol gzip compression Deep learning video analysis Supported deep learning frameworks: Tensorflow NCSDK Caffe ML Hardware:
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    cocoNLP is a lightweight natural-language processing toolkit geared toward practical information extraction from raw text, especially for Chinese and mixed Chinese–English content. Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats, comments, and user-generated content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PyTorch Natural Language Processing

    PyTorch Natural Language Processing

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    PyTorch-NLP is a library for Natural Language Processing (NLP) in Python. It’s built with the very latest research in mind, and was designed from day one to support rapid prototyping. PyTorch-NLP comes with pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules and text encoders. It’s open-source software, released under the BSD3 license. With your batch in hand, you can use PyTorch to develop and train your model using gradient descent.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Dragonfire

    Dragonfire

    The open-source virtual assistant for Ubuntu based Linux distributions

    Dragonfire is the open-source virtual assistant project for Ubuntu-based Linux distributions. Her main objective is to serve as a command and control interface to the helmet user. So that you will be able to give orders just by using your voice commands and your eye movements. That makes the helmet handsfree. We are planning to ship Dragonfire as a preinstalled software package on DragonOS Linux Distribution. DragonOS will be a Linux distribution specially designed for the helmet. It will...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Texar

    Texar

    Toolkit for Machine Learning, Natural Language Processing

    Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    gpt2-client

    gpt2-client

    Easy-to-use TensorFlow Wrapper for GPT-2 117M, 345M, 774M, etc.

    GPT-2 is a Natural Language Processing model developed by OpenAI for text generation. It is the successor to the GPT (Generative Pre-trained Transformer) model trained on 40GB of text from the internet. It features a Transformer model that was brought to light by the Attention Is All You Need paper in 2017. The model has 4 versions - 124M, 345M, 774M, and 1558M - that differ in terms of the amount of training data fed to it and the number of parameters they contain. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Budou

    Budou

    Budou is an auto organizer tool for beautiful line breaking in CJK

    Budou is a Python library developed by Google to improve web typography for CJK (Chinese, Japanese, Korean) languages by producing semantically meaningful line breaks. Unlike English, CJK scripts lack spaces or hyphenation cues, often resulting in awkward or unreadable text wrapping on web pages. Budou addresses this issue by segmenting sentences into logical lexical chunks and wrapping each chunk in non-breaking HTML <span> tags. These spans can be styled with CSS to ensure smooth, visually coherent line breaks without splitting words or phrases. The tool supports multiple segmentation backends, including Google Cloud Natural Language API, MeCab, and TinySegmenter, enabling flexibility for both cloud-based and offline processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MatchZoo

    MatchZoo

    Facilitating the design, comparison and sharing of deep text models

    The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use. Preprocess your input data in three lines of code, keep track parameters to be passed into the model. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    nonechucks

    nonechucks

    Deal with bad samples in your dataset dynamically

    ...What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you have an AlternateIndexSampler, and you want to be able to move to dataset[6] after dataset[4] fails while attempting to load! PyTorch's data processing module expects you to rid your dataset of any unwanted or invalid samples before you feed them into its pipeline, and provides no easy way to define a "fallback policy" in case such samples are encountered during dataset iteration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    bsed

    bsed

    Simple SQL-like syntax on top of Perl text processing

    bsed is a stream editor that offers a simple SQL-like syntax for text processing tasks. Designed to replace basic uses of tools like sed, grep, AWK, and Perl, it allows users to perform complex text manipulations with intuitive commands.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Safe Harbor Deidentification

    Safe Harbor Deidentification for medical documents

    Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    lazynlp

    lazynlp

    Library to scrape and clean web pages to create massive datasets

    LazyNLP is a lightweight tool for collecting and curating large-scale text datasets for machine learning and NLP applications with minimal manual effort.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    NeuroNER

    NeuroNER

    Named-entity recognition using neural networks

    Named-entity recognition (NER) aims at identifying entities of interest in the text, such as location, organization and temporal expression. Identified entities can be used in various downstream applications such as patient note de-identification and information extraction systems. They can also be used as features for machine learning systems for other natural language processing tasks. Leverages the state-of-the-art prediction capabilities of neural networks (a.k.a. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TextRank

    TextRank

    TextRank implementation for Python 3

    TextRank is an implementation of the TextRank algorithm for extractive text summarization and keyword extraction, inspired by Google’s PageRank.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...
    Leader badge
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    Diffuse
    Diffuse is a graphical tool for comparing and merging text files. It can retrieve files for comparison from Bazaar, CVS, Darcs, Git, Mercurial, Monotone, RCS, Subversion, and SVK repositories.
    Leader badge
    Downloads: 131 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB