Page 4 | input free download

Showing 506 open source projects for "input"

View related business solutions

Python Clear Filters & Widen Search

Deploy Apps in Seconds with Cloud Run
Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.

Try Cloud Run Free
Managed MySQL, PostgreSQL, and SQL Databases on Google Cloud
Get back to your application and leave the database to us. Cloud SQL automatically handles backups, replication, and scaling.

Cloud SQL is a fully managed relational database for MySQL, PostgreSQL, and SQL Server. We handle patching, backups, replication, encryption, and failover—so you can focus on your app. Migrate from on-prem or other clouds with free Database Migration Service. IDC found customers achieved 246% ROI. New customers get $300 in credits plus a 30-day free trial.

Try Cloud SQL Free
1

gpt-engineer

Full stack AI software engineer

gpt-engineer is an open-source platform designed to help developers automate the software development process using natural language. The platform allows users to specify software requirements in plain language, and the AI generates and executes the corresponding code. It can also handle improvements and iterative development, giving users more control over the software they’re building. Built with a terminal-based interface, gpt-engineer is customizable, enabling developers to experiment...

Downloads: 6 This Week

Last Update: 2024-06-06
See Project
2

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.

Downloads: 2 This Week

Last Update: 2024-11-16
See Project
3

Spark TTS

Spark-TTS Inference Code

Spark TTS is an open-source, PyTorch-based text-to-speech inference system that leverages large language models to produce highly natural, intelligible speech from text input. It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster. The project supports zero-shot voice cloning, meaning it can imitate a new speaker’s voice without dedicated training for that specific voice, and works across languages, including English and Chinese, even in cross-lingual code-switching scenarios. ...

Downloads: 3 This Week

Last Update: 2026-02-04
See Project
4

Transformer Debugger

Tool for exploring and debugging transformer model behaviors

...TDB allows users to intervene directly in the forward pass of a model and observe how such interventions change predictions, making it possible to answer questions like why a token was selected or why an attention head focused on a certain input. It automatically identifies and explains the most influential components, highlights activation patterns, and maps relationships across circuits within the model. The tool includes both a React-based neuron viewer for exploring model components and a backend activation server for running inferences and serving data.

Downloads: 3 This Week

Last Update: 1 day ago
See Project
$300 in Free Credit for Your Google Cloud Projects
Build, test, and explore on Google Cloud with $300 in free credit. No hidden charges. No surprise bills.

Launch your next project with $300 in free Google Cloud credit—no hidden charges. Test, build, and deploy without risk. Use your credit across the Google Cloud platform to find what works best for your needs. After your credits are used, continue building with free monthly usage products. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
5

AI YouTube Shorts Generator

A python tool that uses GPT-4, FFmpeg, and OpenCV

AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies subtitle overlays, producing a polished short video without manual editing. The tool streamlines multiple steps of the tedious short-form video workflow: highlight detection, clipping, subtitle generation, cropping to vertical 9:16 format, and final rendering — reducing hours of editing to a mostly automated pipeline. ...

Downloads: 8 This Week

Last Update: 2026-02-05
See Project
6

GELab-Zero

GUI Exploration Lab. One of the best GUI agent solutions

GELab-Zero is an open-source “GUI Agent” framework aiming to automate interactions with graphical user interfaces (GUIs), combining both the agent model and all supporting infrastructure — including inference, input orchestration, and GUI automation logic — in a plug-and-play package that runs locally, without cloud dependencies. The idea is to let developers or users harness an AI agent that can simulate clicking, typing, reading UI elements, and interacting with apps in a human-like way via the GUI, which can enable tasks like automated testing, scriptable workflows, or even autonomous usage of GUI-based applications. ...

Downloads: 0 This Week

Last Update: 2026-01-23
See Project
7

TensorFlow Datasets

TFDS is a collection of datasets ready to use with TensorFlow,

TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf.data. Datasets , enabling easy-to-use and high-performance input pipelines. To get started see the guide and our list of datasets.

Downloads: 0 This Week

Last Update: 2025-05-28
See Project
8

yq JSON

Command-line YAML, XML, TOML processor

Before using yq, you also have to install its dependency, jq. See the jq installation instructions for details and directions specific to your platform. On macOS, yq is also available on Homebrew use brew install python-yq.

Downloads: 0 This Week

Last Update: 2024-04-27
See Project
9

pycm

Multi-class confusion matrix library in Python

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.

Downloads: 0 This Week

Last Update: 2025-10-14
See Project
Cut Cloud Costs with Google Compute Engine
Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.

Try Compute Engine
10

voila

Voilà turns Jupyter notebooks into standalone web applications

From notebooks to standalone web applications and dashboards. Voilà allows you to convert a Jupyter Notebook into an interactive dashboard that allows you to share your work with others. It is secure and customizable, giving you control over what your readers experience. Unlike the usual HTML-converted notebooks, each user connecting to the Voilà tornado application gets a dedicated Jupyter kernel which can execute the callbacks to changes in Jupyter interactive widgets. To render the bqplot...

Downloads: 1 This Week

Last Update: 2025-08-25
See Project
11

TorchMetrics

Machine learning metrics for distributed, scalable PyTorch application

...Metric arithmetic. Similar to torch.nn, most metrics have both a module-based and a functional version. The functional versions are simple python functions that as input take torch.tensors and return the corresponding metric as a torch.tensor.

Downloads: 1 This Week

Last Update: 2025-09-03
See Project
12

Eigent

The Open Source Cowork Desktop to Unlock Your Exceptional Productivity

Eigent is an open-source cowork desktop application designed to help you build, manage, and deploy a custom AI workforce. It enables multiple specialized AI agents to collaborate in parallel, turning complex workflows into automated, end-to-end tasks. Built on the CAMEL-AI multi-agent framework, Eigent emphasizes productivity, flexibility, and transparent system design. You can run Eigent fully locally for maximum privacy and data control, or choose a cloud-connected experience for quick...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
13

GLM-4-Voice

GLM-4-Voice | End-to-End Chinese-English Conversational Model

GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. GLM-4-Voice builds upon the bilingual strengths of the GLM architecture, supporting both Chinese and English, and is designed to handle long-form conversations with context retention. ...

Downloads: 3 This Week

Last Update: 1 day ago
See Project
14

Depth Anything 3

Recovering the Visual Space from Any Views

Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity.

Downloads: 2 This Week

Last Update: 2026-02-05
See Project
15

MAI-UI

Real-World Centric Foundation GUI Agents

...Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. Unlike traditional UI frameworks, MAI-UI emphasizes realistic deployment by supporting agent–user interaction (clarifying ambiguous instructions), integration with external tool APIs using MCP calls, and a device–cloud collaboration mechanism that dynamically routes computation to on-device or cloud models based on task state and privacy constraints.

Downloads: 4 This Week

Last Update: 2026-02-10
See Project
16

DeepSeek VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal

DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to...

Downloads: 4 This Week

Last Update: 2025-10-03
See Project
17

Qwen-VL

Chat & pretrained large vision language model

Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...

Downloads: 5 This Week

Last Update: 2025-09-23
See Project
18

Gemini Fullstack LangGraph Quickstart

Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph

...The project features a React (Vite) frontend and a LangGraph/FastAPI backend designed to work together seamlessly for real-time research and reasoning tasks. The backend agent dynamically generates search queries based on user input, retrieves information via the Google Search API, and performs reflective reasoning to identify knowledge gaps. It then iteratively refines its search until it produces a comprehensive, well-cited answer synthesized by the Gemini model. The repository provides both a browser-based chat interface and a command-line script (cli_research.py) for executing research queries directly. ...

Downloads: 4 This Week

Last Update: 1 day ago
See Project
19

Paper2Slides

From Paper to Presentation in One Click

...It is designed to replace the repetitive work of turning dense technical documents into presentation-friendly structure by extracting key points, figures, and data into a coherent visual narrative. The system supports multiple input formats, so you can process PDFs and common office documents rather than being locked to a single file type. It uses an extraction approach intended to capture critical insights comprehensively, including important visuals and data points that often get missed in naive summarization. A major focus is traceability: generated slide content is designed to remain linked back to the source material so you can verify accuracy and reduce information drift. ...

Downloads: 3 This Week

Last Update: 2026-01-29
See Project
20

Mesh R-CNN

code for Mesh R-CNN, ICCV 2019

...Unlike voxel-based or point-based approaches, Mesh R-CNN uses a differentiable mesh representation, allowing it to efficiently refine surface geometry while maintaining high spatial detail. The system combines 2D detection from Mask R-CNN with 3D reasoning modules that output full mesh reconstructions aligned with the input image. It has been evaluated on datasets such as Pix3D, where it demonstrates state-of-the-art performance in reconstructing real-world object geometry.

Downloads: 3 This Week

Last Update: 7 days ago
See Project
21

Ling-V2

Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

Ling-V2 is an open-source family of Mixture-of-Experts (MoE) large language models developed by the InclusionAI research organization with the goal of combining state-of-the-art performance, efficiency, and openness for next-generation AI applications. It introduces highly sparse architectures where only a fraction of the model’s parameters are activated per input token, enabling models like Ling-mini-2.0 to achieve reasoning and instruction-following capabilities on par with much larger dense models while remaining significantly more computationally efficient. Trained on more than 20 trillion tokens of high-quality data and enhanced through multi-stage supervised fine-tuning and reinforcement learning, Ling-V2’s models demonstrate strong general reasoning, mathematical problem-solving, coding understanding, and knowledge-intensive task performance.

Downloads: 2 This Week

Last Update: 2026-02-12
See Project
22

TFX

TFX is an end-to-end platform for deploying production ML pipelines

...Both the components themselves and the integrations with orchestration systems can be extended. TFX components interact with an ML Metadata backend that keeps a record of component runs, input and output artifacts, and runtime configuration. This metadata backend enables advanced functionality like experiment tracking or warm starting/resuming ML models from previous runs.

Downloads: 0 This Week

Last Update: 2024-12-11
See Project
23

MNE-Python

Magnetoencephalography (MEG) and Electroencephalography EEG in Python

...MNE-Python is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, EEG, sEEG, ECoG, and more. It includes modules for data input/output, preprocessing, visualization, source estimation, time-frequency analysis, connectivity analysis, machine learning, statistics, and more.

Downloads: 0 This Week

Last Update: 2025-11-21
See Project
24

imodelsX

Interpretable prompting and models for NLP

Interpretable prompting and models for NLP (using large language models). Generates a prompt that explains patterns in data (Official) Explain the difference between two distributions. Find a natural-language prompt using input-gradients. Fit a better linear model using an LLM to extract embeddings. Fit better decision trees using an LLM to expand features. Finetune a single linear layer on top of LLM embeddings. Use these just a like a sci-kit-learn model. During training, they fit better features via LLMs, but at test-time, they are extremely fast and completely transparent.

Downloads: 0 This Week

Last Update: 2025-08-25
See Project
25

Speakr

Speakr is a personal, self-hosted web application

Speakr is an open-source, real-time text-to-speech (TTS) web application that allows users to convert written text into natural-sounding speech in just a few clicks. It provides a clean, user-friendly interface where users can input text, choose a voice style or language, and immediately hear the output, making it ideal for accessibility, content creation, and learning applications. Behind the scenes, Speakr leverages modern TTS engines and streaming audio technologies to deliver smooth and responsive speech generation without noticeable delay. The project is built with extensibility in mind, enabling developers to add custom voices, integrate additional languages, and tailor the backend for different hardware or cloud environments. ...

Downloads: 1 This Week

Last Update: 2026-02-10
See Project