Showing 100 open source projects for "word analysis"

View related business solutions
  • Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud Icon
    Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud

    Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

    Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.
    Try Cloud SQL Free
  • Build on Google Cloud with $300 in Free Credit Icon
    Build on Google Cloud with $300 in Free Credit

    New to Google Cloud? Get $300 in free credit to explore Compute Engine, BigQuery, Cloud Run, Vertex AI, and 150+ other products.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query exabytes in BigQuery, or build AI apps with Vertex AI and Gemini. Once your credits are used, keep building with 20+ products with free monthly usage, including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. Sign up to start building right away.
    Start Free Trial
  • 1
    IK Analysis for Elasticsearch

    IK Analysis for Elasticsearch

    A plugin that integrates Lucene IK analyzer into elasticsearch

    IK Analyzer is an open source, lightweight Chinese word segmentation toolkit developed based on java language. Since the release of version 1.0 in December 2006, IKAnalyzer has launched 4 major versions. Initially, it was a Chinese word segmentation component based on the open source project Luence as the main application, combined with dictionary word segmentation and grammar analysis algorithms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    HanLP

    HanLP

    Han Language Processing

    ...Built on TensorFlow 2.0, it was designed to advance state-of-the-art deep learning techniques and popularize the application of natural language processing in both academia and industry. HanLP is capable of lexical analysis (Chinese word segmentation, part-of-speech tagging, named entity recognition), syntax analysis, text classification, and sentiment analysis. It comes with pretrained models for numerous languages including Chinese and English. It offers efficient performance, clear structure and customizable features, with plenty more amazing features to look forward to on the roadmap.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    natural

    natural

    General natural language facilities for node

    "Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported. It’s still in the early stages, so we’re very interested in bug reports, contributions and the like. Note that many algorithms from Rob Ellis’s node-nltools are being merged into this project and will be maintained from here...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    WeChatMsg

    WeChatMsg

    Project aimed at extracting, exporting, and analyzing chat records

    WeChatMsg repository hosts an open-source project aimed at extracting, exporting, and analyzing chat records from the WeChat messaging platform. It provides tools that read local WeChat database files and allow users to convert chat data into readable formats such as HTML, Word, and CSV, making it possible to inspect conversations outside the mobile app environment. Beyond simple export, the project includes mechanisms for analyzing chat histories and generating annual reports or visual...
    Downloads: 88 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Underthesea

    Underthesea

    Underthesea - Vietnamese NLP Toolkit

    Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    tidytext

    tidytext

    Text mining using tidy tools

    tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    h2oGPT

    h2oGPT

    Private chat with local GPT with document, images, video, etc.

    h2oGPT is an open-source platform that allows users to interact with local GPT models in a completely private environment. It supports a variety of document types, including PDFs, Word files, images, video frames, and even audio, enabling users to query and analyze their documents or engage in a private chat with AI. The platform is designed to be secure and offline, ensuring that all data remains private and under the user's control. h2oGPT supports several AI models, including oLLaMa and Mixtral, making it a flexible tool for anyone needing advanced document analysis and AI-driven conversation in a secure, local setup.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Python 100 Days

    Python 100 Days

    Python - From Novice to Master in 100 Days

    ...It starts with foundational syntax, control flow, data structures, and functions, then advances through object-oriented programming, file I/O, exceptions, and modules. The middle sections focus on real-world Python applications, including working with CSV, Excel, Word, PowerPoint, PDFs, images, email/SMS, and regular expressions. The curriculum expands into databases and SQL, Linux essentials, web fundamentals, and a substantial Practical Django track that covers ORM, sessions, RESTful APIs, caching with Redis, asynchronous tasks with Celery, authentication, testing, and deployment. Data analysis and visualization receive dedicated coverage via NumPy, pandas, matplotlib, seaborn, and pyecharts, followed by an applied machine learning track with kNN, trees, Bayes, regression, clustering, ensembles, and neural networks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    spaCy models

    spaCy models

    Models for the spaCy Natural Language Processing (NLP) library

    spaCy is designed to help you do real work, to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry...
    Downloads: 7 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    flair

    flair

    A very simple framework for state-of-the-art NLP

    ...A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically generated by reading functional.py, you can add custom functions at will, and liberate the pasteboard. Support for markdown tables output by GPT. If the output contains a formula, it will be displayed in tex form and rendered form at the same time, which is convenient for copying and reading.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    AQUAD

    Software for Analysis of Qualitative Data

    AQUAD Eight is available in English, German, and Spanish with separate modules for the analysis of texts, audios, videos, graphic files, and a complementary module for exploratory statistical analysis with R. The modules follow the coding paradigm of qualitative analysis, offering for text analysis also functions for sequence analysis (Objective Hermeneutics) and word based quantitative analysis.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    TEXminer

    TEXminer

    Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

    TEXminer uses generic Text Mining Methods to analyze Unicode Files as plain Text or PDF. The Text Database can be saved in XML where the orginal Text, the Sentence and Word Lists and additional Parameters (e.g. Abbreviations) are stored. TEXminer allows Language Detection by Letter Frequency Analysis, finding important Words by Cooccurrence Analysis, Determination of Central Expressions, Thematic Text Classification (also Semantic Groups) Fingerprint Comparison and Word Frequency. Because TEXminer is not disigned to have a Reference Corpus, Thematic Model Statistics uses Language Models (lexicons) to have Background Knowledge about certain Languages (English, German, French, Spanish, Italian, Russian), which are derived from Decaleon Project. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    An extremely simple program for analyzing sentence readability and detecting repetitive word use in DOCX files. Allows custom word lists in TXT format and outputs an XLSX file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    pyLogos

    Qualitative content analysis software.

    pyLogos is a program to support text content analysis. Documents (imported from txt and docx files) are stored in a database, and may have marked text segments associated with codes. It is possible to retrieve these segments in various ways, generate word clouds, tabulate frequency of codes and words, among other outputs. pyLogos é um programa de apoio à análise de conteúdo de textos.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    candy-doc

    candy-doc

    An advance document editor

    Welcome to Candy Doc Editor: Redefining Advanced Document Editing Candy Doc Editor is a cutting-edge document editor designed to elevate your productivity and creativity. Packed with powerful features and a sleek, user-friendly interface, it provides everything you need to craft dynamic and professional documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    AnyTXT Searcher

    AnyTXT Searcher

    A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

    AnyTXT Searcher is a powerful file full-text search engine, a desktop search application for fast document retrieval. Just like a local disk Google search engine, much faster than Windows Search, it is your ideal desktop file content full-text search engine. It has a powerful document parsing engine built in, which extracts the text of commonly used file formats without installing any other software, and combines the built-in high-speed indexing system to store the metadata of the...
    Leader badge
    Downloads: 2,358 This Week
    Last Update:
    See Project
  • 19
    This is an implementation of brazilian portuguese metaphone. Its importance is on strings lookups like names, streets names, words (aspell) etc, made as simple as possible. There is a very large text analysis field to explore with it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Universal Sentence Encoder

    Universal Sentence Encoder

    Encoder of greater-than-word length text trained on a variety of data

    The Universal Sentence Encoder (USE) is a pre-trained deep learning model designed to encode sentences into fixed-length embeddings for use in various natural language processing (NLP) tasks. It leverages Transformer and Deep Averaging Network (DAN) architectures to generate embeddings that capture the semantic meaning of sentences. The model is designed for tasks like sentiment analysis, semantic textual similarity, and clustering, and provides high-quality sentence representations in a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    pdf combiner merger converter splitter

    pdf combiner merger converter splitter

    PDF Combiner is a user-friendly, GUI-based tool built in

    PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    ...It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). It also includes links to academic papers, open-source model implementations, and practical utilities like word segmentation or text cleaning scripts. The project is highly community-oriented, frequently updated with contributions and new resources, and it’s widely used in both academic and applied NLP research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    werpy

    Python package for fast Word Error Rate (WER) calculation and analysis

    ...Moreover, the summary function furnishes a comprehensive breakdown of the computed outcomes, facilitating a swift and detailed analysis of individual errors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Polish to English translation aid

    Polish to English translation aid. Automatic dictionary presentation.

    Program displays morphological analysis and English word equivalents to enable a reader with little knowledge of Polish besides grammatical principles to read or translate Polish text. The program is in a prototype stage. The present mode of use is to take a Polish text and annotate with word by word morphological analysis and English equivalents for each possible Polish lemma.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Mycat2

    Mycat2

    MySQL Proxy using Java NIO based on Sharding SQL, Calcite

    Open source intermediate for the MySQL database network protocol written in Java language, open source of GPLv3 protocol. Custom Calcite distributed query engine, edit SQL to relationship algebra expression, rule optimization engine, and cost optimization engine, generate physical execution plan, support logic view. Any cross-currant cross-table join query, support cross-currant cross-table non-associated query, support cross-currant cross-table association query, support cross-currant...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB