text processing free download

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...

Downloads: 27 This Week

Last Update: 2026-05-09

See Project

Live Transcribe Speech Engine

Live Transcribe is an Android application

Live Transcribe Speech Engine provides on-device speech recognition components that power real-time transcription for accessibility and everyday voice-first experiences. Its design prioritizes latency and robustness in noisy, far-field environments, enabling continuous transcription with low delay on mobile hardware. The engine manages audio front-end processing—such as noise suppression and voice activity detection—before feeding audio into compact, accurate acoustic and language models....

Downloads: 0 This Week

Last Update: 2025-10-10

See Project

TIES

A smart search engine for medical documents

TIES (Text Information Extraction System) is a clinical text search engine that uses Natural Language Processing techniques to extract medical concepts from free text clinical reports. It provides secure de-identified access to this information and has in built collaboration tools and honest broker functionality. It is licensed for academic use under the BSD license.

1 Review

Downloads: 0 This Week

Last Update: 2019-09-09

See Project

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on...

Downloads: 9 This Week

Last Update: 2019-03-05

See Project

Stemmer Gujarati

Offline stemmer for Gujarati , which is one of 22 Indian languages.

...There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less or no significant work has been done on Indian front especially for Gujarati language.The code of this stemmer is based on algorithm designed under guidance of Prof. Nikita Desai, India. It takes input file of type .txt containing Gujarati text encoded as UTF-8 and then removes stop words which are unessential. After processing rest of the words, it outputs corresponding file containing all stem words plus other details.

Downloads: 0 This Week

Last Update: 2015-04-05

See Project

Lingala NLP

This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.

Downloads: 0 This Week

Last Update: 2014-11-13

See Project

LinqYedict

Translate Chinese to English

Translate Chinese to English using CEDICT (cantonese dictionary). Demonstrate the speed of C# and Linq. Copy the chinese text from any browser/application to Windows clipboard and see the translation.

Downloads: 0 This Week

Last Update: 2015-11-21

See Project

Search Results for "text processing"

Showing 7 open source projects for "text processing"

pyVideoTrans

Live Transcribe Speech Engine

TIES

Arabic Corpus

Stemmer Gujarati

Lingala NLP

LinqYedict

Search Results for "text processing"

Showing 7 open source projects for "text processing"

pyVideoTrans

Live Transcribe Speech Engine

TIES

Arabic Corpus

Stemmer Gujarati

Lingala NLP

LinqYedict

Related Searches

Related Categories