Search Results for "ai data analyst" - Page 16

Showing 464 open source projects for "ai data analyst"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 1
    AI Explainability 360

    AI Explainability 360

    Interpretability and explainability of data and machine learning model

    The AI Explainability 360 toolkit is an open-source library that supports the interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics. The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    LangChain Apps on Production with Jina

    LangChain Apps on Production with Jina

    Langchain Apps on Production with Jina & FastAPI

    Jina is an open-source framework for building scalable multi-modal AI apps on Production. LangChain is another open-source framework for building applications powered by LLMs. long-chain-serve helps you deploy your LangChain apps on Jina AI Cloud in a matter of seconds. You can benefit from the scalability and serverless architecture of the cloud without sacrificing the ease and convenience of local development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    xTuring

    xTuring

    Easily build, customize and control your own LLMs

    xTuring is an open-source AI personalization software. xTuring makes it easy to build and control LLMs by providing a simple interface to personalize LLMs to your own data and application. xTuring provides fast, efficient and simple fine-tuning of LLMs, such as LLaMA, GPT-J, Galactica, and more. By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it simple to build, customize and control LLMs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Lightning-Hydra-Template

    Lightning-Hydra-Template

    PyTorch Lightning + Hydra. A very user-friendly template

    ...A collection of best practices for efficient workflow and reproducibility. Thoroughly commented - you can use this repo as a reference and educational resource. Not fitted for data engineering - the template configuration setup is not designed for building data processing pipelines that depend on each other. PyTorch Lightning, a lightweight PyTorch wrapper for high-performance AI research. Think of it as a framework for organizing your PyTorch code. Hydra, a framework for elegantly configuring complex applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Consistency Models

    Consistency Models

    Official repo for consistency models

    consistency_models is the repository for Consistency Models, a new family of generative models introduced by OpenAI that aim to generate high-quality samples by mapping noise directly into data — circumventing the need for lengthy diffusion chains. It builds on and extends diffusion model frameworks (e.g. based on the guided-diffusion codebase), adding techniques like consistency distillation and consistency training to enable fast, often one-step, sample generation. The repo is implemented...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    macara

    macara

    A converter for seamless transformation of files, data, and media ...

    This application consolidates various scripts, including an AI feature (rembg), into a singular platform. The design of this software is evolutionary, allowing for the seamless integration of additional scripts, menus, or windows as needed. Serving as a versatile tool, it facilitates efficient file management, especially when handling a substantial volume of images, whether sorting by name or other attributes. These scripts are crafted to complement generative art AI technologies like Dall-e...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    AI-powered enterprise search engine

    AI-powered enterprise search engine

    AI-powered enterprise search engine

    AI-powered enterprise search engine is an open-source, AI-powered enterprise search engine designed to help organizations quickly locate and retrieve information scattered across multiple internal tools, documents, and communication platforms. It enables users to search across sources such as Slack, Confluence, Jira, Google Drive, and other enterprise systems, consolidating fragmented knowledge into a single, unified search experience.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PromethAI

    PromethAI

    Open-source framework that gives you AI Agents

    PromethAI-Backend is a backend framework for AI-driven automation and knowledge extraction. It is designed to integrate with large language models (LLMs) to provide AI-enhanced workflows, including content generation, summarization, and data analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Chinese-LLaMA-Alpaca-2 v2.0

    Chinese-LLaMA-Alpaca-2 v2.0

    Chinese LLaMA & Alpaca large language model + local CPU/GPU training

    This project has open-sourced the Chinese LLaMA model and the Alpaca large model with instruction fine-tuning to further promote the open research of large models in the Chinese NLP community. Based on the original LLaMA , these models expand the Chinese vocabulary and use Chinese data for secondary pre-training, which further improves the basic semantic understanding of Chinese. At the same time, the Chinese Alpaca model further uses Chinese instruction data for fine-tuning, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    AI-Agent-Host

    AI-Agent-Host

    The AI Agent Host is a module-based development environment.

    The AI Agent Host provides a seamless interface for managing and querying data, visualizing results, and coding in real-time. The AI Agent Host is built specifically for LangChain, a framework dedicated to developing applications powered by language models. LangChain recognizes that the most powerful and distinctive applications go beyond simply utilizing a language model and strive to be data-aware and agentic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Lightning Flash

    Lightning Flash

    Flash enables you to easily configure and run complex AI recipes

    Your PyTorch AI Factory, Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains. In a nutshell, Flash is the production-grade research framework you always dreamed of but didn't have time to build. All data loading in Flash is performed via a from_* classmethod on a DataModule. Which DataModule to use and which from_* methods are available depends on the task you want to perform.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    ThoughtSource

    ThoughtSource

    A central, open resource for data and tools

    ThoughtSource is a central, open resource and community centered on data and tools for chain-of-thought reasoning in large language models (Wei 2022). Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and medical practice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    scraper-with-chatgpt
    It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Metaseq

    Metaseq

    Repo for external large-scale work

    Metaseq is a flexible, high-performance framework for training and serving large-scale sequence models, such as language models, translation systems, and instruction-tuned LLMs. Built on top of PyTorch, it provides distributed training, model sharding, mixed-precision computation, and memory-efficient checkpointing to support models with hundreds of billions of parameters. The framework was used internally at Meta to train models like OPT (Open Pre-trained Transformer) and serves as a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    fastMRI

    fastMRI

    A large open dataset + tools to speed up MRI scans using ML

    fastMRI is a large-scale collaborative research project by Facebook AI Research (FAIR) and NYU Langone Health that explores how deep learning can accelerate magnetic resonance imaging (MRI) acquisition without compromising image quality. By enabling reconstruction of high-fidelity MR images from significantly fewer measurements, fastMRI aims to make MRI scanning faster, cheaper, and more accessible in clinical settings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    langchain-prefect

    langchain-prefect

    Tools for using Langchain with Prefect

    Large Language Models (LLMs) are interesting and useful  -  building apps that use them responsibly feels like a no-brainer. Tools like Langchain make it easier to build apps using LLMs. We need to know details about how our apps work, even when we want to use tools with convenient abstractions that may obfuscate those details. Prefect is built to help data people build, run, and observe event-driven workflows wherever they want. It provides a framework for creating deployments on a whole...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PRM800K

    PRM800K

    800,000 step-level correctness labels on LLM solutions to MATH problem

    PRM800K is a process supervision dataset accompanying the paper Let’s Verify Step by Step, providing 800,000 step-level correctness labels on model-generated solutions to problems from the MATH dataset. The repository releases the raw labels and the labeler instructions used in two project phases, enabling researchers to study how human raters graded intermediate reasoning. Data are stored as newline-delimited JSONL files tracked with Git LFS, where each line is a full solution sample that...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    data-science-on-gcp

    data-science-on-gcp

    Source code accompanying book: Data Science on the GCP

    ...The repository is organized into multiple directories that reflect real-world pipelines, such as ingesting data, running SQL-based analytics, streaming data processing, using Spark and Dataproc, applying BigQuery ML, and deploying models with Vertex AI. It emphasizes practical, production-oriented workflows rather than isolated examples, showing how different Google Cloud services interact to form cohesive pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The Art of Programming

    The Art of Programming

    A collection of practical tips can be found at the bottom of this page

    The Art of Programming (Second Edition) is a curated collection of programming problems and solutions originally derived from the Microsoft 100 Interview Questions blog series, later refined into a long-running tutorial and ultimately a published book. Created by July, the series began in 2010 and has since evolved into an in-depth exploration of algorithmic thinking, data structures, and coding interview preparation. The repository brings together 42 classic programming problems from the...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    HealthFusion

    HealthFusion

    AI Disease Detections System

    HealthFusion has identified a critical business problem, which is the lack of accessibility and timely detection of multiple diseases. The traditional approach of detecting diseases is time-consuming, expensive, and not accessible to everyone, especially in remote areas. This problem can lead to delayed diagnosis and treatment, which can have serious consequences for patients. The proposed solution, HealthFusion, is novel and practical as it offers a comprehensive solution to detect...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    CPT

    CPT

    CPT: A Pre-Trained Unbalanced Transformer

    A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. We replace the old BERT vocabulary with a larger one of size 51271 built from the training data, in which we 1) add missing 6800+ Chinese characters (most of them are traditional Chinese characters); 2) remove redundant tokens (e.g. Chinese character tokens with ## prefix); 3) add some English tokens to reduce OOV. Position Embeddings We extend the max_position_embeddings from 512 to 1024. We...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TextBox

    TextBox

    A text generation library with pre-trained language models github.com

    TextBox 2.0 is an up-to-date text generation library based on Python and PyTorch focusing on building a unified and standardized pipeline for applying pre-trained language models to text generation. From a task perspective, we consider 13 common text generation tasks such as translation, story generation, and style transfer, and their corresponding 83 widely-used datasets. From a model perspective, we incorporate 47 pre-trained language models/modules covering the categories of general,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Shennina

    Shennina

    Automating Host Exploitation with AI

    ...Shennina is integrated with Metasploit and Nmap for performing the attacks, as well as being integrated with an in-house Command-and-Control Server for exfiltrating data from compromised machines automatically. Shennina scans a set of input targets for available network services, uses its AI engine to identify recommended exploits for the attacks, and then attempts to test and attack the targets. If the attack succeeds, Shennina proceeds with the post-exploitation phase. The AI engine is initially trained against live targets to learn reliable exploits against remote services. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    BCI

    BCI

    BCI: Breast Cancer Immunohistochemical Image Generation

    Breast Cancer Immunohistochemical Image Generation through Pyramid Pix2pix. We have released the trained model on BCI and LLVIP datasets. We host a competition for breast cancer immunohistochemistry image generation on Grand Challenge. Project pix2pix provides a python script to generate pix2pix training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene, these can be pairs {HE, IHC}. Then we can learn to translate A(HE images)...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB