Showing 352 open source projects for "data analysis"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Kohonen neural network library is a set of classes and functions for design, train and use Kohonen network (self organizing map) which is one of AI algorithms and useful tool for data mining and discovery knowledge in data (http://knnl.sf.net).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    GUI Ant-Miner is a tool for extracting classification rules from data. It is an updated version of a data mining algorithm called Ant-Miner (Ant Colony-based Data Miner).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Spark Python Notebooks

    Spark Python Notebooks

    Apache Spark & Python (pySpark) tutorials for Big Data Analysis

    Spark Python Notebooks is a curated collection of example Jupyter notebooks designed to help developers and data engineers learn Apache Spark using Python in an interactive environment. Rather than only providing static code files, this project uses notebooks to teach practical data processing workflows, exposing users to real Spark programming patterns like working with RDDs, DataFrames, and distributed computations. These notebooks often demonstrate how to transform, analyze, and visualize...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Twitter Research Data Collector
    It gives facility of collecting tweets through Twitter Streaming API w.r.t different search criteria and to save tweets in CSV and ARFF (WEKA) file formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    Mass-based dissimilarity

    Mass-based dissimilarity

    A data dependent dissimilarity measure based on mass estimation.

    This software calculates the mass-based dissimilarity matrix for data mining algorithms relying on a distance measure. References: Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure. KDD 2016 http://dx.doi.org/10.1145/2939672.2939779 The source code, presentation slide and poster are attached under "Files". The presentation video in KDD 2016 is published on https://youtu.be/eotD_-SuEoo . Since this software is licensed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are...
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Open Cezeri Library

    Open Cezeri Library

    Effective Linear Algebra and Computer Vision Library with JAVA

    OCL stands for Open Cezeri Library (yet another linear algebra and matrix library). This library provides rapid coding as matlab ease of use. To learn for library please try to use test examples at OpenCezeriLibrary\test\test. It is originally developed at el-cezeri laboratory of Siirt University, in order to establish generic framework of reusable components and software tools for machine vision, machine learning, AI and robotic applications. Currently, it holds following main concepts 1-...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    graphnet

    AI project using a graph technique on written text.

    Experiments using a data sequencing technique on English language sentences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 10

    Accelerated Feature Extraction Tool

    A fast GPU accelerated feature extraction software for speech analysis

    A fast feature extraction software tool for speech analysis and processing. It incorporates standard MFCC, PLP, and TRAPS features. The tool is a specially designed to process very large audio data sets. It uses GPU acceleration if compatible GPU available (CUDA as weel as OpenCL, NVIDIA, AMD, and Intel GPUs are supported). CPU SSE intrinsic instruction set is used in cases where no compatible GPU present.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    Natural Language Analysis with Ngrams

    NLP tool for statistical analysis of words, sentences, documents

    Goal of this project is to have a NLP tool that would give statistical analysis results based on Google Ngram data. Furthermore, it is now just a NetBeans project without a final JAR. Furthermore, there will be a github version for anyone who wishes to contribute. In the future versions, user will be able to convert a single word to numerical data, to be able to compare two words and get the comparison data, and to be able to do the same for the sentences, paragraphs and documents. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    VADER

    VADER

    Lexicon and rule-based sentiment analysis tool

    VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool designed for analyzing the sentiment of text, particularly in social media and short text formats. It is optimized for quick and accurate analysis of positive, negative, and neutral sentiments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Text Expander, Inverse summarizer

    Text Expander, Inverse summarizer

    Expand text, inverse summarizer

    IT WILL WORK WITH A JAVA DEVELOPMENT KIT 1.7 ONLY !!! This program is a data-miner and a knowledge-miner. It does exactly the opposite of what the text summarizers do. A text summarizer produces a shortened text given some text as an input. An inverse summarizer takes the shortened input, a similar or a same text and does the process in reverse. This results in an expanded text. It can be used with any text or notes that have the knowledge gaps. It is a great aid to any creative...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Chordalysis

    Log-linear analysis (data modelling) for high-dimensional data

    ===== Project moved to https://github.com/fpetitjean/Chordalysis ===== Log-linear analysis is the statistical method used to capture multi-way relationships between variables. However, due to its exponential nature, previous approaches did not allow scale-up to more than a dozen variables. We present here Chordalysis, a log-linear analysis method for big data. Chordalysis exploits recent discoveries in graph theory by representing complex models as compositions of triangular structures, also known as chordal graphs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MODLEM

    MODLEM

    rule-based, WEKA compatible, Machine Learning algorithm

    This project is a WEKA (Waikato Environment for Knowledge Analysis) compatible implementation of MODLEM - a Machine Learning algorithm which induces minimum set of rules. These rules can be adopted as a classifier (in terms of ML). It is a sequential covering algorithm, which was invented to cope with numeric data without discretization. Actually the nominal and numeric attributes are treated in the same way: attribute's space is being searched to find the best rule condition during rule induction. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16

    KMeansAniX

    Animation of kmeans clustering using X Window System

    Open source animation of kmeans clustering in X Window System using the C++ libplotter library. Supports Linux, Mac, and BSD. Includes common initialization methods such as Forgy, Macqueen, random, and angular. Sample videos are available through the Files Tab above. The SVN repo is accessible thorugh the Code Tab above. Requires a C++ compiler, libplot-dev, and libncurses5-dev Mac alternative to libplot-dev: macports plotutils +x11
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Persica-A new Persian corpus for NLP

    This project presents a new corpus for NEWS text analysis in Persian

    Lack of multi-application text corpus despite of the surging text data is a serious bottleneck in the text mining and natural language processing especially in Persian language. This project presents a new corpus for NEWS articles analysis in Persian called Persica. NEWS analysis includes NEWS classification, topic discovery and classification, category classification and many more procedures. Dealing with NEWS has special requirements and first of all a valid and reliable corpus to perform the experiments on them. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Graphical Grammar Studio

    Graphical Grammar Studio

    An user friendly grammar tool for natural language processing tasks

    Full documentation with tutorials is included in the download package. Graphical Grammar Studio is a tool for applying grammars which behave as words acceptors/consumers and annotators. GGS grammars can be used to find and annotate sequences of words which respect certain conditions, in a given input. Its purpose is for creating NLP tools like phrase chunkers, named entity finders, pronoun co-reference solvers etc. A grammar is represented by a state machine which can be visualized, edited...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    SocialModeler

    A set of tools for analyzing open source social media

    SocialModeler leverages natural language processing and statistical text analysis approaches to quickly analyze and explore social media data (e.g. news articles or blogs). It uses an application-based user interface for configuration and analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    Simple, Pythonic, text processing, Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both. Supports word inflection (pluralization and singularization) and lemmatization,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Unsupervised TXT classifier

    Unsupervised TXT classifier

    Classify any two TXT documents, no training required - JAVA

    ...This extracts a relevant structure for both documents (and thus avoids the over-training) which are then compared using the Vector-Space analysis to give a range of belonging of one document to another (and thus avoids the shortage of information). This method can be used to create the user-defined classes by merging texts of certain categories and then to calculate the relevant distances between the documents, but this is not necessary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LExAu: Learning Expectations Autonomously. Library for on-line data driven statistical machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    feed4weka is an open library that enriches weka (http://www.cs.waikato.ac.nz/ml/weka/), an open source project for data analysis. It integrates new classification and clustering algorithms, and adds the coclustering and outlier detection frameworks
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    EMGU Face Recognition

    Using EMGU to perform Principle Component Analysis (PCA)

    ...Face Recognition has always been a popular subject for image processing and this article builds upon the good work by Sergio Andrés Gutiérrez Rojas and his original article (codeproject). The reason that face recognition is so popular is not only it’s real world application but also the common use of principle component analysis (PCA). PCA is an ideal method for recognising statistical patterns in data. The popularity of face recognition is the fact a user can apply a method easily and see if it is working without needing to know to much about how the process is working.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    JProGraM (PRObabilistic GRAphical Models in Java) is a statistical machine learning library. It supports statistical modeling and data analysis along three main directions: (1) probabilistic graphical models (Bayesian networks, Markov random fields, dependency networks, hybrid random fields); (2) parametric, semiparametric, and nonparametric density estimation (Gaussian models, nonparanormal estimators, Parzen windows, Nadaraya-Watson estimator); (3) generative models for random networks (small-world, scale-free, exponential random graphs, Fiedler random fields), subgraph sampling algorithms (random walk, snowball, etc.), and spectral decomposition.
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo