Showing 178 open source projects for "python data analysis"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    local-llm

    local-llm

    Run LLMs locally on Cloud Workstations

    local-llm is a development framework that enables developers to run large language models locally within Google Cloud Workstations or standard environments without requiring GPU hardware. It focuses on making generative AI development more accessible by leveraging quantized models and CPU-based execution, eliminating the dependency on expensive GPU infrastructure. The repository includes tools, Docker configurations, and command-line utilities that simplify the process of downloading,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    LIDA

    LIDA

    Automatic Generation of Visualizations and Infographics using LLMs

    LIDA is an open-source library developed to automate the process of creating data visualizations and infographics using large language models. The system treats visualizations as executable code and uses AI to generate, modify, and interpret that code in order to transform raw datasets into meaningful charts and graphical explanations. Instead of requiring users to manually explore datasets and write plotting scripts, LIDA analyzes the data and automatically proposes visualization goals and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Automated Interpretability

    Automated Interpretability

    Code for Language models can explain neurons in language models paper

    The automated-interpretability repository implements tools and pipelines for automatically generating, simulating, and scoring explanations of neuron (or latent feature) behavior in neural networks. Instead of relying purely on manual, ad hoc interpretability probing, this repo aims to scale interpretability by using algorithmic methods that produce candidate explanations and assess their quality. It includes a “neuron explainer” component that, given a target neuron or latent feature,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    ToRA

    ToRA

    Tool-integrated Reasoning LLM Agents

    ToRA is an open-source framework developed by Microsoft for building tool-integrated reasoning agents powered by large language models. The project focuses on improving the ability of AI systems to solve complex mathematical and analytical problems by combining natural language reasoning with external computational tools. Instead of relying solely on text generation, the system dynamically invokes tools such as symbolic solvers or programming libraries when deeper computation is required....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Ailice

    Ailice

    AIlice is a fully autonomous, general-purpose AI agent

    AIlice is an open-source autonomous AI agent framework built to function as a general-purpose assistant that can plan, decompose, and execute complex tasks through a structured multi-agent architecture. The project presents itself as a standalone assistant powered by open-source language models, with an internal design that treats user requests almost like executable programs rather than simple chat prompts. Its core IACT architecture allows the system to break large goals into smaller...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    YAYI

    YAYI

    Repo for YaYi Chinese LLMs based on LlaMA2 & BLOOM

    YAYI is an open-source large language model project developed to provide a multilingual conversational AI system capable of performing a wide variety of natural language processing tasks. The model is trained on diverse datasets covering multiple languages and domains so that it can support applications ranging from dialogue systems to text analysis and knowledge retrieval. The architecture is based on transformer-style language models optimized for conversational understanding and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    BIG-bench

    BIG-bench

    Beyond the Imitation Game collaborative benchmark for measuring

    BIG-bench (Beyond the Imitation Game Benchmark) is a large, collaborative benchmark suite designed to probe the capabilities and limitations of large language models across hundreds of diverse tasks. Rather than focusing on a single metric or domain, it aggregates many hand-authored tasks that test reasoning, commonsense, math, linguistics, ethics, and creativity. Tasks are intentionally heterogeneous: some are multiple-choice with exact scoring, others are free-form generation judged by...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    autollm

    autollm

    Ship RAG based LLM web apps in seconds

    ...The framework also includes built-in readers for multiple content sources such as PDFs, DOCX files, notebooks, websites, and other document types, which helps shorten the time between raw data and a working knowledge application.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Chinese-LLaMA-Alpaca 2

    Chinese-LLaMA-Alpaca 2

    Chinese LLaMA-2 & Alpaca-2 Large Model Phase II Project

    This project is developed based on the commercially available large model Llama-2 released by Meta. It is the second phase of the Chinese LLaMA&Alpaca large model project. The Chinese LLaMA-2 base model and the Alpaca-2 instruction fine-tuning large model are open-sourced. These models expand and optimize the Chinese vocabulary on the basis of the original Llama-2, use large-scale Chinese data for incremental pre-training, and further improve the basic semantics and command understanding of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model inference, making your pipeline execution 10x faster. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    LLaMA-MoE

    LLaMA-MoE

    Building Mixture-of-Experts from LLaMA with Continual Pre-training

    LLaMA-MoE is an open-source project that builds mixture-of-experts language models from LLaMA through expert partitioning and continual pre-training. The repository is centered on making MoE research more accessible by offering smaller and more affordable models with only about 3.0 to 3.5 billion activated parameters, which helps reduce deployment and experimentation costs. Its architecture works by splitting LLaMA feed-forward networks into sparse experts and adding gating mechanisms so...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    DB-GPT-Hub

    DB-GPT-Hub

    A repository that contains models, datasets, and fine-tuning

    DB-GPT-Hub is an open-source repository that provides datasets, models, and training tools designed to improve large language models for database interaction tasks, particularly Text-to-SQL. The project serves as a specialized extension of the broader DB-GPT ecosystem, focusing on the preparation and evaluation of models capable of translating natural language questions into structured database queries. It offers a modular framework that supports data preparation, model fine-tuning,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Alpaca-CoT

    Alpaca-CoT

    We unified the interfaces of instruction-tuning data

    Alpaca-CoT is an open research project focused on improving reasoning capabilities in language models through chain-of-thought training data. The project builds upon the Alpaca instruction-tuning approach by introducing datasets and methods that encourage models to produce intermediate reasoning steps when solving problems. Instead of generating answers directly, the model learns to produce logical reasoning sequences that lead to the final solution. This chain-of-thought supervision helps...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    RAGs

    RAGs

    Build ChatGPT over your data, all with natural language

    RAGs is an open-source application designed to simplify the creation of retrieval-augmented generation pipelines through an interactive interface. Built with Streamlit and powered by the LlamaIndex ecosystem, the tool allows users to construct AI assistants that answer questions using their own data sources. Instead of requiring extensive programming knowledge, the application allows users to configure and build a RAG system using natural language instructions. The system automatically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Autolabel

    Autolabel

    Label, clean and enrich text datasets with LLMs

    Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs). Autolabel data for NLP tasks such as classification, question-answering and named entity recognition, entity matching and more. Seamlessly use commercial and open-source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    Chinese Llama 2 7B

    Chinese Llama 2 7B

    The first Chinese LLaMA2 model in the open source community

    Chinese Llama 2 7B is an open-source large language model adapted from the LLaMA-2 architecture and optimized for Chinese and bilingual Chinese-English applications. The project provides a version of LLaMA-2 that has been further trained on Chinese data so it can better understand and generate text in Chinese while maintaining compatibility with the original model ecosystem. In addition to the model weights, the repository also includes supervised fine-tuning datasets and training resources...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    LangChain Apps on Production with Jina

    LangChain Apps on Production with Jina

    Langchain Apps on Production with Jina & FastAPI

    Jina is an open-source framework for building scalable multi-modal AI apps on Production. LangChain is another open-source framework for building applications powered by LLMs. long-chain-serve helps you deploy your LangChain apps on Jina AI Cloud in a matter of seconds. You can benefit from the scalability and serverless architecture of the cloud without sacrificing the ease and convenience of local development. And if you prefer, you can also deploy your LangChain apps on your own...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ThoughtSource

    ThoughtSource

    A central, open resource for data and tools

    ThoughtSource is a central, open resource and community centered on data and tools for chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and medical practice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Chinese-LLaMA-Alpaca-2 v2.0

    Chinese-LLaMA-Alpaca-2 v2.0

    Chinese LLaMA & Alpaca large language model + local CPU/GPU training

    This project has open-sourced the Chinese LLaMA model and the Alpaca large model with instruction fine-tuning to further promote the open research of large models in the Chinese NLP community. Based on the original LLaMA , these models expand the Chinese vocabulary and use Chinese data for secondary pre-training, which further improves the basic semantic understanding of Chinese. At the same time, the Chinese Alpaca model further uses Chinese instruction data for fine-tuning, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    aqueduct LLM

    aqueduct LLM

    Aqueduct allows you to run LLM and ML workloads on any infrastructure

    Aqueduct is an MLOps framework that allows you to define and deploy machine learning and LLM workloads on any cloud infrastructure. Aqueduct is an open-source MLOps framework that allows you to write code in vanilla Python, run that code on any cloud infrastructure you'd like to use, and gain visibility into the execution and performance of your models and predictions. Aqueduct's Python native API allows you to define ML tasks in regular Python code. You can connect Aqueduct to your existing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    langchain-prefect

    langchain-prefect

    Tools for using Langchain with Prefect

    Large Language Models (LLMs) are interesting and useful  -  building apps that use them responsibly feels like a no-brainer. Tools like Langchain make it easier to build apps using LLMs. We need to know details about how our apps work, even when we want to use tools with convenient abstractions that may obfuscate those details. Prefect is built to help data people build, run, and observe event-driven workflows wherever they want. It provides a framework for creating deployments on a whole...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ChatGenTitle

    ChatGenTitle

    A paper title generation model fine-tuned on the LLaMA model

    ChatGenTitle: A paper title generation model fine-tuned on the LLaMA model using information from millions of arXiv papers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Alpa

    Alpa

    Training and serving large-scale neural networks

    Alpa is a system for training and serving large-scale neural networks. Scaling neural networks to hundreds of billions of parameters has enabled dramatic breakthroughs such as GPT-3, but training and serving these large-scale neural networks require complicated distributed system techniques. Alpa aims to automate large-scale distributed training and serving with just a few lines of code.
    Downloads: 23 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB