Showing 652 open source projects for "data"

View related business solutions
  • Build AI Apps with Gemini 3 on Vertex AI Icon
    Build AI Apps with Gemini 3 on Vertex AI

    Access Google’s most capable multimodal models. Train, test, and deploy AI with 200+ foundation models on one platform.

    Vertex AI gives developers access to Gemini 3—Google’s most advanced reasoning and coding model—plus 200+ foundation models including Claude, Llama, and Gemma. Build generative AI apps with Vertex AI Studio, customize with fine-tuning, and deploy to production with enterprise-grade MLOps. New customers get $300 in free credits.
    Try Vertex AI Free
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 1
    Tangent

    Tangent

    Source-to-source debuggable derivatives in pure Python

    Existing libraries implement automatic differentiation by tracing a program's execution (at runtime, like PyTorch) or by staging out a dynamic data-flow graph and then differentiating the graph (ahead-of-time, like TensorFlow). In contrast, Tangent performs ahead-of-time autodiff on the Python source code itself, and produces Python source code as its output. Tangent fills a unique location in the space of machine learning tools. As a result, you can finally read your automatic derivative code just like the rest of your program. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    AI learning

    AI learning

    AiLearning, data analysis plus machine learning practice

    We actively respond to the Research Open Source Initiative (DOCX) . Open source today is not just open source, but datasets, models, tutorials, and experimental records. We are also exploring other categories of open source solutions and protocols. I hope you will understand this initiative, combine this initiative with your own interests, and do what you can. Everyone's tiny contributions, together, are the entire open source ecosystem. We are iBooker, a large open-source community,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    WikiSQL

    WikiSQL

    A large annotated semantic parsing corpus for developing NL interfaces

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is the dataset released along with our work Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. Regarding tokenization and Stanza, when WikiSQL was written 3-years ago, it relied on Stanza, a CoreNLP python wrapper that has since been deprecated. If you'd still like to use the tokenizer, please use the docker image. We do not anticipate switching...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    auto_ml

    auto_ml

    Automated machine learning for analytics & production

    ...Here's an example that includes serializing and loading the trained model, then getting predictions on single dictionaries, roughly the process you'd likely follow to deploy the trained model. Before you go any further, try running the code. Load up some data (either a DataFrame, or a list of dictionaries, where each dictionary is a row of data). Make a column_descriptions dictionary that tells us which attribute name in each row represents the value we’re trying to predict. Pass all that into auto_ml, and see what happens! You can pass in your own function to perform feature engineering on the data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Seq2seq Chatbot for Keras

    Seq2seq Chatbot for Keras

    This repository contains a new generative model of chatbot

    ...The trained model available here used a small dataset composed of ~8K pairs of context (the last two utterances of the dialogue up to the current point) and respective response. The data were collected from dialogues of English courses online. This trained model can be fine-tuned using a closed-domain dataset to real-world applications. The canonical seq2seq model became popular in neural machine translation, a task that has different prior probability distributions for the words belonging to the input and output sequences since the input and output utterances are written in different languages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    ...Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7

    PyDaMelo

    Python-compatible Data mining elementary objects

    An attempt at offering machine learning and data mining algorithms at the finest grain we are able to, easy to combine together through Python scripting to glue together the Lego-like bricks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Retrieval-Based Conversational Model

    Retrieval-Based Conversational Model

    Dual LSTM Encoder for Dialog Response Generation

    ...The core idea is to embed both the conversation context and potential replies into vector representations, then score how well each candidate fits the current dialogue, choosing the best match accordingly. Designed to work with datasets like the Ubuntu Dialogue Corpus, this codebase includes data preparation, model training, and evaluation components for building and assessing dialog models that can handle multi-turn conversations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    C++ library for working with OWL ontologies
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit for Your Google Cloud Projects Icon
    $300 in Free Credit for Your Google Cloud Projects

    Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

    Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Feed-forward neural network for python
    ffnet is a fast and easy-to-use feed-forward neural network training solution for python. Many nice features are implemented: arbitrary network connectivity, automatic data normalization, very efficient training tools, network export to fortran code. Now ffnet has also a GUI called ffnetui.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ExSTraCS

    ExSTraCS

    Extended Supervised Tracking and Classifying System

    This advanced machine learning algorithm is a Michigan-style learning classifier system (LCS) developed to specialize in classification, prediction, data mining, and knowledge discovery tasks. Michigan-style LCS algorithms constitute a unique class of algorithms that distribute learned patterns over a collaborative population of of individually interpretable IF:THEN rules, allowing them to flexibly and effectively describe complex and diverse problem spaces. ExSTraCS was primarily developed to address problems in epidemiological data mining to identify complex patterns relating predictive attributes in noisy datasets to disease phenotypes of interest. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Neural Libs

    Neural Libs

    Neural network library for developers

    ...The project also includes examples of the use of neural networks as function approximation and time series prediction. Includes a special program makes it easy to test neural network based on training data and the optimization of the network.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    VADER

    VADER

    Lexicon and rule-based sentiment analysis tool

    VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool designed for analyzing the sentiment of text, particularly in social media and short text formats. It is optimized for quick and accurate analysis of positive, negative, and neutral sentiments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Hydroponic Automation Platform (HAPI)

    Hydroponic Automation Platform (HAPI)

    Technologies for automating food production on various scales

    The Hydroponic Automation Platform Initiative (HAPI) develops and provides hardware and software components for automating food production using hydroponic, aquaponics, and precision agriculture techniques. High-yield production in urban settings is one of the primary goals. Artifacts include hardware design (mainly Arduino-based), firmware, management software and reporting modules.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    FineSplice

    FineSplice

    Enhanced splice junction detection and estimation from RNA-Seq data

    FineSplice is a Python wrapper to TopHat2 geared towards a reliable identification of expressed exon junctions from RNA-Seq data, at enhanced detection precision with small loss in sensitivity. Following alignment with TopHat2 using known transcript annotations, FineSplice takes as input the resulting BAM file and outputs a confident set of expressed splice junctions with the corresponding read counts. Potential false positives arising from spurious alignments are filtered out via a semi-supervised anomaly detection strategy based on logistic regression. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    ProximityForest

    Efficient Approximate Nearest Neighbors for General Metric Spaces

    A proximity forest is a data structure that allows for efficient computation of approximate nearest neighbors of arbitrary data elements in a metric space. See: O'Hara and Draper, "Are You Using the Right Approximate Nearest Neighbor Algorithm?", WACV 2013 (best student paper award). One application of a ProximityForest is given in the following CVPR publication: Stephen O'Hara and Bruce A.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Fast Artificial Neural Network Library is a free open source neural network library, which implements multilayer artificial neural networks in C with support for both fully connected and sparsely connected networks. Cross-platform execution in both fixed and floating point are supported. It includes a framework for easy handling of training data sets. It is easy to use, versatile, well documented, and fast. Bindings to more than 15 programming languages are available. An easy to read introduction article and a reference manual accompanies the library with examples and recommendations on how to use the library. Several graphical user interfaces are also available for the library.
    Downloads: 74 This Week
    Last Update:
    See Project
  • 19
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    Simple, Pythonic, text processing, Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both. Supports word inflection (pluralization and singularization) and lemmatization,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition. full installation and usage instructions given at http://sourceforge.net/p/rnnl/wiki/Home/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    A High-Order Multi-Variate Approximation Scheme for Arbitrary Data Sets, C implementation of the method described in http://web.mit.edu/qiqi/www/paper/interpolation.pdf, with Python and Fortran interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    pyIRDG

    pyIRDG

    IMDb Relational Dataset Generator

    pyIRDG is a program written in Python to generate relational datasets in Prolog format. It uses data from the Internet Movie Database in combination with IMDbPY as backend. A graphical user interface written in pyQt allows the user to link multiple entities together as model for the generation process. The big four entities are Title, Person, Company and Character. Many attributes can be chosen for adding to the output .pl file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Open Exchange (OpEx)

    Open Exchange (OpEx)

    The open source Algorithmic Trading System

    OpEx is an application suite that includes the main building blocks of commercial electronic trading systems. All OpEx applications run on distributed system architectures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Bamboo Engine

    Game framework on top of Python, Panda3D and Twisted

    Bamboo intends to be a complete end-to-end game framework for client/server applications using Twisted for data exchange, Panda3D for rendering and coded in Python. Support for PyPy/CPython may be considered at a later point. An Extreme/Agile Development model is in use to allow for emergent design (IE: changing requirements). Release is updated whenever a feature is added and all tests pass cleanly 100%
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ViAmI-Server

    ViAmI-Server

    Pattern recognition for ADL events

    This software uses computer vision algorithms for mining sequence data from telemonitoring data with CBRs. We propose an approach which treats the detection of changes in behavior detected with a sensor/video fusion, which occur at radically different time-scales, through a CBR in two levels: low and high level. The system is always updating the database with the daily data.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB