Showing 22 open source projects for "topic modeling"

View related business solutions
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    BERTopic

    BERTopic

    Leveraging BERT and c-TF-IDF to create easily interpretable topics

    BERTopic is a topic modeling technique that leverages transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions. BERTopic supports guided, supervised, semi-supervised, manual, long-document, hierarchical, class-based, dynamic, and online topic modeling. It even supports visualizations similar to LDAvis!
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    MathModel

    MathModel

    Mathematical Modeling for Graduate Students, Mathematical Modeling

    ...The repository is structured by topic and resource type so that users can more easily find templates, solved problems, and methodological notes. It also includes auxiliary educational materials like references, recommended textbooks, and guidebooks on mathematical modeling theory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    NLP

    NLP

    Open source NLP guide with models, methods, and real use cases

    ...Its covers core NLP concepts such as text representation, feature extraction, and model evaluation, alongside hands-on implementations using tools like Word2Vec, TF-IDF, and FastText. It also introduces topic modeling with LDA, keyword extraction techniques, and document similarity methods. NLP extends into real-world applications, including sentiment analysis and text classification, helping readers connect concepts to use cases. Designed for accessibility, the project evolves over time, allowing updates and improvements as NLP techniques advance. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    PaperAI

    PaperAI

    Semantic search and workflows for medical/scientific papers

    PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Personal Security Checklist

    Personal Security Checklist

    A compiled checklist of 300+ tips for protecting digital security

    Personal Security Checklist is a comprehensive, plain-language checklist for improving personal digital security and privacy across devices, accounts, and everyday workflows. It’s organized so that complete beginners can make quick, high-impact changes, while advanced users can dig into deeper hardening steps. The guidance spans topics like passwords, 2FA, device encryption, browser hygiene, network safety, backups, and incident response planning. Each section breaks recommendations into...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ktrain

    ktrain

    ktrain is a Python library that makes deep learning AI more accessible

    ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Algorithms Math Models

    Algorithms Math Models

    MATLAB implementations of algorithms

    ...The codebase is organized into topic folders (e.g., HeuristicAlgorithm, IntegerProgramming, NeuralNetwork, TimeSeries) and includes dozens of worked examples and links to textbook/source materials that the author used to assemble the collection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    codeforces-go

    codeforces-go

    Solutions to Codeforces by Go

    ...Reference links or book chapters (good material) Template code (can contain some comments, usage instructions) Template supplements (extra codes in common question types, modeling tips, etc.) Related topic links (template questions, classic questions, thinking conversion questions, etc.) The main goal of this stage is to improve the ability to observe problems. Doing construction questions can train this point in a targeted manner. Choose the structural questions (tag: constructive algorithms) whose difficulty ranges from your own rating to rating+200, and do the questions in descending order according to the number of people who have passed the questions.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Texthero

    Texthero

    Text preprocessing, representation and visualization from zero to hero

    Texthero is a python package to work with text data efficiently. It empowers NLP developers with a tool to quickly understand any text-based dataset and it provides a solid pipeline to clean and represent text data, from zero to hero.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ad-papers

    Ad-papers

    Papers on Computational Advertising

    The Ad-papers repository is a curated collection of influential research papers focused on the fields of advertising technology, recommendation systems, and applied machine learning in online platforms. The repository organizes academic and industry papers that explore how machine learning algorithms can be used to improve ad targeting, user modeling, click-through rate prediction, and personalized recommendation systems. These papers represent key developments in large-scale industrial machine learning systems used by digital advertising platforms. The repository categorizes papers by topic and provides links to research publications, allowing readers to easily explore the evolution of machine learning techniques in advertising and recommendation domains. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...
    Leader badge
    Downloads: 28 This Week
    Last Update:
    See Project
  • 13
    WEMax

    WEMax

    Work Efficiency Maximize Framework: Analyze work and line processes

    - Work in 2013 - The WEMax proposes a framework for the continuous performance improvement of manufacturing lines. The WEMax framework consists of two main activities: 1) assembly work process improvement, including time and motion study, and 2) improvement of line balance efficiency. Although there have been numerous studies on this topic, most of them deal with partial issues rather than the continuous performance improvement of the whole assembly line, which this paper addresses. To...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DMTK

    DMTK

    Microsoft Distributed Machine Learning Toolkit

    ...This architecture allows developers to build machine learning systems capable of processing massive datasets and training complex models with reduced infrastructure requirements. DMTK also includes several specialized algorithms and systems, such as LightLDA for large-scale topic modeling and distributed implementations of word embedding techniques used in natural language processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Reviz-it

    Software tools to re-tell stories in a better way and expand them

    ... - Use the inspiring word clouds to rephrase the story in an original way, then expand it. Enrich with various text mining algorithms to retrieve automatically the different ways the same thing is said in a given context (series of publications on same topic or from same organization for example): latent semantic analysis, topic modeling, rule-based text mining, etc. This allows rewriting a text with the specific 'style' of a corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Twitter Research Data Collector
    It gives facility of collecting tweets through Twitter Streaming API w.r.t different search criteria and to save tweets in CSV and ARFF (WEKA) file formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    jLDADMM

    A Java package for the LDA and DMM topic models

    The Java package jLDADMM is released to provide alternative choices for topic modeling on normal or short texts. It provides implementations of the Latent Dirichlet Allocation topic model and the one-topic-per-document Dirichlet Multinomial Mixture model (i.e. mixture of unigrams), using collapsed Gibbs sampling. In addition, jLDADMM supplies a document clustering evaluation to compare topic models. See the usage of jLDADMM in its website at http://jldadmm.sourceforge.net/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    RedLDA

    Redundancy Aware LDA Gibbs Sampler

    Redundancy-Aware Topic Modeling Copy Paste Redundancy or Data Duplication are prevalent in many corpora.This redundancy has a negative impact on the quality of text mining and topic modeling in particular. This is a software package of a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of corpora when modeling content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. The toolkit is also...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Topic Model Alignment

    Aligns two LDA topic models

    This script aligns two topic models produced by MALLET (http://mallet.cs.umass.edu/) Reciprocal topic pairs are reported with JS divergence measure. Reciprocal pair (i,j) is defined when the distance of topic i from the first model (M1) and topic j from the second model (M2) is minimal for all pairs (i,k) for k in M2 and (l,j) for l in M1 (best match for both topics).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A graphical tool to discover topics from collections of text documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    K3Studio is the universal workbench for 2d/3d modeling, visualization and simulation. The main topic is simulation and visualization of automata networks, but it can be used for diagram drawing, flowcharting, presentation, as a CAD, GIS,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB