Amazon Polly
Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly's Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries.
In addition to Standard TTS voices, Amazon Polly offers Neural Text-to-Speech (NTTS) voices that deliver advanced improvements in speech quality through a new machine learning approach. Polly’s Neural TTS technology also supports two speaking styles that allow you to better match the delivery style of the speaker to the application: a Newscaster reading style that is tailored to news narration use cases, and a Conversational speaking style that is ideal for two-way communication like telephony applications.
Learn more
ModelScope
This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.
The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.
Learn more
VidAU
VidAU is an AI-powered video ad generation platform that enables users to effortlessly create high-converting video ads, UGC-style content, and product promo videos in minutes, no filming, crews, or editing skills required. With a toolbox that includes URL-to-video, image-to-video, text-to-video, AI avatar generation, AI script writing, voiceover, and text-to-speech in 50+ languages, subtitle removal and translation, watermark removal, and smart video remix/editing, VidAU auto-adjusts formats and aspect ratios for TikTok, Reels, YouTube Shorts, and social media feeds. It offers over 300 customizable AI avatars and 500+ proven ad templates, incorporates GPT-4o-powered scriptwriting, and predicts engaging hooks every few seconds to boost watch time and conversions. It records real-time progress, supports batch creation and preview, and adds brand logos, fonts, colors, and voice alignment for tailored campaigns.
Learn more