Showing 1214 open source projects for "python data analysis"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 1
    lgo

    lgo

    Interactive Go programming with Jupyter

    ...The project provides a Jupyter kernel for the Go programming language, allowing developers to write and execute Go code interactively in notebook cells similar to how Python is used in data science workflows. This environment combines the strong performance and concurrency features of the Go language with the exploratory and iterative style of notebook-based programming. Developers can execute code snippets, visualize results, and experiment with Go programs in a step-by-step manner without compiling full programs manually. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    maskrcnn-benchmark

    maskrcnn-benchmark

    Fast, modular reference implementation of Instance Segmentation

    Mask R-CNN Benchmark is a PyTorch-based framework that provides high-performance implementations of object detection, instance segmentation, and keypoint detection models. Originally built to benchmark Mask R-CNN and related models, it offers a clean, modular design to train and evaluate detection systems efficiently on standard datasets like COCO. The framework integrates critical components—region proposal networks (RPNs), RoIAlign layers, mask heads, and backbone architectures such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    RoboSat

    RoboSat

    Semantic segmentation on aerial and satellite imagery

    RoboSat is an end-to-end pipeline written in Python 3 for feature extraction from aerial and satellite imagery. Features can be anything visually distinguishable in the imagery for example: buildings, parking lots, roads, or cars.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Rainbow

    Rainbow

    Rainbow: Combining Improvements in Deep Reinforcement Learning

    Combining improvements in deep reinforcement learning. Results and pretrained models can be found in the releases. Data-efficient Rainbow can be run using several options (note that the "unbounded" memory is implemented here in practice by manually setting the memory capacity to be the same as the maximum number of timesteps).
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    TGAN

    TGAN

    Generative adversarial training for generating synthetic tabular data

    We are happy to announce that our new model for synthetic data called CTGAN is open-sourced. The new model is simpler and gives better performance on many datasets. TGAN is a tabular data synthesizer. It can generate fully synthetic data from real data. Currently, TGAN can generate numerical columns and categorical columns. TGAN has been developed and runs on Python 3.5, 3.6 and 3.7. Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where TGAN is run. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    spark-ml-source-analysis

    spark-ml-source-analysis

    Spark ml algorithm principle analysis and specific source code

    spark-ml-source-analysis is a technical repository that analyzes the internal implementation of machine learning algorithms within Apache Spark’s MLlib library. The project aims to help developers and data scientists understand how distributed machine learning algorithms are implemented and optimized inside the Spark ecosystem. Instead of providing a runnable software system, the repository focuses on explaining algorithm principles and examining the underlying source code used in Spark’s machine learning package. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Machine Learning Mindmap

    Machine Learning Mindmap

    A mindmap summarising Machine Learning concepts

    ...The project organizes a wide range of machine learning topics into an interconnected diagram that helps learners understand how concepts relate to one another across the broader field of artificial intelligence. The mind map covers fundamental areas such as data preprocessing, statistical analysis, supervised learning, unsupervised learning, reinforcement learning, and deep learning architectures. By arranging these concepts visually, the repository allows students and practitioners to quickly explore the relationships between algorithms, techniques, and modeling approaches used in modern machine learning workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TEXT2DATA

    TEXT2DATA

    Text Analytics Platform

    Bring Text Analytics Platform that uses NLP (Natural Language Processing) and Machine Learning to your work environment. Extract essential information from your text documents and let Artificial Intelligence save your time. Get detailed and agile reports on your unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CakeChat

    CakeChat

    CakeChat: Emotional Generative Dialog System

    CakeChat is a backend for chatbots that are able to express emotions via conversations. The code is flexible and allows to condition model's responses by an arbitrary categorical variable. For example, you can train your own persona-based neural conversational model or create an emotional chatting machine. Hierarchical Recurrent Encoder-Decoder (HRED) architecture for handling deep dialog context. Multilayer RNN with GRU cells. The first layer of the utterance-level encoder is always...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    xLearn

    xLearn

    High performance, easy-to-use, and scalable machine learning (ML)

    xLearn is a high-performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM), all of which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data. Many real-world datasets deal with high dimensional sparse feature vectors like a recommendation system where the number of categories and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    automl-gs

    automl-gs

    Provide an input CSV and a target field to predict, generate a model

    Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learning model plus native Python code pipelines allowing you to integrate that model into any prediction workflow. No black box: you can see exactly how the data is processed, and how the model is constructed, and you can make tweaks as necessary. automl-gs is an AutoML tool which, unlike Microsoft's NNI, Uber's Ludwig, and TPOT, offers a zero code/model definition interface to getting an optimized model and data transformation pipeline in multiple popular ML/DL frameworks, with minimal Python dependencies (pandas + scikit-learn + your framework of choice). automl-gs is designed for citizen data scientists and engineers without a deep statistical background under the philosophy that you don't need to know any modern data preprocessing and machine learning engineering techniques to create a powerful prediction workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MUSE

    MUSE

    A library for Multilingual Unsupervised or Supervised word Embeddings

    MUSE is a framework for learning multilingual word embeddings that live in a shared space, enabling bilingual lexicon induction, cross-lingual retrieval, and zero-shot transfer. It supports both supervised alignment with seed dictionaries and unsupervised alignment that starts without parallel data by using adversarial initialization followed by Procrustes refinement. The code can align pre-trained monolingual embeddings (such as fastText) across dozens of languages and provides standardized...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DS-Take-Home

    DS-Take-Home

    Solution to the book A Collection of Data Science Take-Home Challenge

    DS-Take-Home is a repository that provides practical solutions to a series of real-world data science challenges inspired by the book A Collection of Data Science Take-Home Challenges. The project is designed as a learning resource where aspiring data scientists can study how typical industry-style take-home assignments are solved using data analysis and machine learning techniques. Each challenge is implemented in a separate Jupyter notebook that walks through the process of analyzing datasets, performing exploratory data analysis, building predictive models, and interpreting results. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Zabbix-in-Telegram

    Zabbix-in-Telegram

    Zabbix Notifications with graphs in Telegram

    Zabbix Notifications with graphs in Telegram.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    lazynlp

    lazynlp

    Library to scrape and clean web pages to create massive datasets

    LazyNLP is a lightweight tool for collecting and curating large-scale text datasets for machine learning and NLP applications with minimal manual effort.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Papers with Code

    Papers with Code

    List of different papers for coding

    pwc is an open-source repository that compiles machine learning and artificial intelligence research papers together with their corresponding implementation code. The project functions as a curated dataset linking academic publications with practical software implementations, allowing researchers and engineers to quickly locate code that reproduces published results. The repository organizes information such as paper titles, conferences, and links to code implementations so that users can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Tensorpack

    Tensorpack

    A Neural Net Training Interface on TensorFlow, with focus on speed

    ...Uses TensorFlow in the efficient way with no extra overhead. On common CNNs, it runs training 1.2~5x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack. Scalable data-parallel multi-GPU / distributed training strategy is off-the-shelf to use. Squeeze the best data loading performance of Python with tensorpack.dataflow. Symbolic programming (e.g. tf.data) does not offer the data processing flexibility needed in research. Tensorpack squeezes the most performance out of pure Python with various auto parallelization strategies. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Learn_Data_Science_in_3_Months

    Learn_Data_Science_in_3_Months

    This is the Curriculum for "Learn Data Science in 3 Months"

    This project lays out a 12-week plan to go from basics to a portfolio-ready understanding of data science. It breaks the journey into clear stages: Python fundamentals, data wrangling, visualization, statistics, machine learning, and end-to-end projects. The schedule mixes learning and doing, encouraging you to build small deliverables each week—like notebooks, dashboards, and model demos—to reinforce skills. It also includes suggestions for datasets and problem domains so you aren’t stuck wondering what to analyze next. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Finetune Transformer LM

    Finetune Transformer LM

    Code for "Improving Language Understanding by Generative Pre-Training"

    ...It documents that runs are non-deterministic due to certain GPU operations and reports a median accuracy over multiple trials that is slightly below the single-run result in the paper, reflecting expected variance in practice. The project ships lightweight training, data, and analysis scripts, keeping the footprint small while making the experimental pipeline transparent. It is provided as archived, research-grade code intended for replication and study rather than continuous development.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    LabelImg

    LabelImg

    Graphical image annotation tool and label object bounding boxes

    LabelImg is a graphical image annotation tool. It is written in Python and uses Qt for its graphical interface. Annotations are saved as XML files in PASCAL VOC format, the format used by ImageNet. Besides, it also supports YOLO and CreateML formats. Linux/Ubuntu/Mac requires at least Python 2.6 and has been tested with PyQt 4.8. However, Python 3 or above and PyQt5 are strongly recommended. Virtualenv can avoid a lot of the QT / Python version issues. Build and launch using the...
    Downloads: 93 This Week
    Last Update:
    See Project
  • 21
    Market Reporter

    Market Reporter

    Automatic Generation of Brief Summaries of Time-Series Data

    Market Reporter automatically generates short comments that describe time series data of stock prices, FX rates, etc. This is an implementation of Murakami et al. This tool stores data to Amazon S3. Ask the manager to give you AmazonS3FullAccess and issue a credential file. For details, please read AWS Identity and Access Management. Install Docker and Docker Compose. Edit envs/docker-compose.yaml according to your environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Dynamic Routing Between Capsules

    Dynamic Routing Between Capsules

    A PyTorch implementation of the NIPS 2017 paper

    Dynamic Routing Between Capsules is a PyTorch implementation of the Capsule Network architecture originally proposed to address limitations in traditional convolutional neural networks. Capsule networks aim to improve how neural models represent spatial hierarchies and relationships between objects within images. Instead of scalar neuron activations, capsules output vectors that encode both the presence of features and their spatial properties such as orientation or pose. The repository...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    TensorImage

    Image classification library for easily training and deploying models

    (Visit our github repository at https://github.com/TensorImage/tensorimage for more information) TensorImage is and open source package for image classification. It has a wide range of data augmentation operations that can be performed over training data to prevent overfitting and increase testing accuracy. TensorImage is easy to use and manage as all files, trained models and data are organized within a workspace directory, which you can change at any time in the configuration file,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    ...The toolkit includes ready-made models for neural machine translation, automatic speech recognition, speech synthesis, language modeling, and additional NLP tasks such as sentiment analysis. It supports multi-GPU and multi-node data-parallel training, and integrates with Horovod to scale out across large GPU clusters. Mixed-precision support (float16) is optimized for NVIDIA Volta and Turing GPUs, allowing significant speedups and memory savings without sacrificing model quality. The project comes with configuration-driven training scripts, documentation, and examples that demonstrate how to set up pipelines for tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Skater

    Skater

    Python library for model interpretation/explanations

    Skater is a unified framework to enable Model Interpretation for all forms of the model to help one build an Interpretable machine learning system often needed for real-world use-cases(** we are actively working towards to enabling faithful interpretability for all forms models). It is an open-source python library designed to demystify the learned structures of a black box model both globally(inference on the basis of a complete data set) and locally(inference about an individual prediction). The concept of model interpretability in the field of machine learning is still new, largely subjective, and, at times, controversial. Model interpretation is the ability to explain and validate the decisions of a predictive model to enable fairness, accountability, and transparency in algorithmic decision-making. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB