natural language processing free download

Awesome Fraud Detection Research Papers

A curated list of data mining papers about fraud detection

A curated list of data mining papers about fraud detection from several conferences.

Downloads: 2 This Week

Last Update: 2026-01-05

See Project

Diffgram

Training data (data labeling, annotation, workflow) for all data types

...Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.

Downloads: 3 This Week

Last Update: 2024-10-14

See Project

Riemann

A network event stream processing system, in Clojure

Riemann aggregates events from your servers and applications with a powerful stream processing language. Send an email for every exception in your app. Track the latency distribution of your web app. See the top processes on any host, by memory and CPU. Combine statistics from every Riak node in your cluster and forward to Graphite. Track user activity from second to second. Riemann streams are just functions which accept an event.

Downloads: 8 This Week

Last Update: 2025-05-26

See Project

Data Formulator

Create rich visualizations with AI

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data...

Downloads: 2 This Week

Last Update: 2026-05-28

See Project

Numaflow

Kubernetes-native platform to run massively parallel data/streaming

Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.

Downloads: 2 This Week

Last Update: 2026-05-28

See Project

Siddhi Core Libraries

Stream Processing and Complex Event Processing Engine

...Agile development experience with SQL-like query language and graphical drag-and-drop editor supporting event simulation. Lightweight runtime that can natively run on Kubernetes, Docker, VM, or bare metal, and embedded in any Java or Python application. Scalable, and highly available distributed event processing on Kubernetes, with NATS Streaming and Siddhi Kubernetes Operator.

Downloads: 0 This Week

Last Update: 2025-03-05

See Project

101-0250-00

ETH course - Solving PDEs in parallel on GPUs

This course aims to cover state-of-the-art methods in modern parallel Graphical Processing Unit (GPU) computing, supercomputing and code development with applications to natural sciences and engineering.

Downloads: 1 This Week

Last Update: 2026-01-05

See Project

Benthos

Fancy stream processing made operationally mundane

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and...

Downloads: 10 This Week

Last Update: 4 days ago

See Project

AI Data Science Team

An AI-powered data science team of agents

...It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. The project includes ready-to-use applications that showcase these agents in action, such as an exploratory data analysis copilot that generates reports, a pandas data analyst that combines wrangling and plotting, and SQL database agents that can query business databases and output results directly.

Downloads: 1 This Week

Last Update: 2026-01-26

See Project

Gridap.jl

Grid-based approximation of partial differential equations in Julia

...One can implement new FE spaces, new reference elements, use external mesh generators, linear solvers, post-processing tools, etc. See, e.g., the list of available Gridap plugins.

Downloads: 4 This Week

Last Update: 2026-05-31

See Project

Searchkick

Intelligent search made easy

Searchkick brings powerful, production-ready search to Rails by mapping Active Record models into Elasticsearch with sensible defaults and easy customization. It supports language analyzers, stemming, synonyms, misspelling tolerance, and highlighting so search results feel natural to end users. Indexing is model-centric: you declare what fields to index, add computed fields, and trigger reindexing via callbacks or background jobs, with options for zero-downtime rolling reindexes. On the query side, a simple API covers relevance tuning, boosting, filtering, faceting/aggregations, and pagination, while still allowing direct access to advanced Elasticsearch features when needed. ...

Downloads: 0 This Week

Last Update: 2026-06-05

See Project

Pytente

Uma Ferramenta Computacional para Análise e Recuperação de Patentes

O Pytente é uma solução avançada para automatizar o processo de coleta, armazenamento e tratamento de dados bibliográficos de patentes. A ferramenta foi projetada para simplificar a coleta de grandes volumes de dados em repositórios de acesso aberto. O Pytente garante o armazenamento estruturado das informações, além da validação e eliminação de registros duplicados. Dentre as diversas funcionalidades disponibilizadas pela ferramenta, destacam-se a extração personalizada de subconjuntos de...

Downloads: 0 This Week

Last Update: 2025-11-03

See Project

Catbird Linux

Linux for content creation, web scraping, coding, and data analysis.

Catbird Linux is a USB pluggable Live Linux operating system built for media creation, web scraping, and software coding. It is the daily driver you want for retrieving data, making videos or podcasts, and making software tools to automate the repetitive tasks. It is ready for work in Python, Lua, and Go languages, with numerous packages for web scraping or downloading data via API calls. Using Catbird Linux, it is possible to accomplish in depth stock market analysis, track weather...

Downloads: 9 This Week

Last Update: 2025-08-29

See Project

Quick 2d Plot

Program for live 2d graphical representation of data streams

Quick2dPlot, or q2d for short, is an open source minimalistic plotting program designed for live 2d graphical representation of data streams. The program may be useful for plotting output of different user's application programs, especially in case when the user wants to see a plot or a number of plots during calculations or a data acquisition process. The program is command-driven and uses no widgets. Q2d is written in C, it takes advantage of SDL2 library for plotting. Currently...

Downloads: 0 This Week

Last Update: 2024-09-03

See Project

text-dedup

All-in-one text de-duplication

text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...

Downloads: 0 This Week

Last Update: 2025-04-08

See Project

SentimentAnalysis-Rick&Morty

Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. ...

Downloads: 0 This Week

Last Update: 2023-07-12

See Project

Transducers.jl

Efficient transducers for Julia

Transducers are transformations of "sequence" of input that can be composed very efficiently. The interface used by transducers naturally describes a wide range of processes that is expressible as a succession of steps. Furthermore, transducers can be defined without specifying the details of the input and output (collections, streams, channels, etc.) and therefore achieves a full reusability. Transducers are introduced by Rich Hickey, the creator of the Clojure language. His Strange Loop...

Downloads: 0 This Week

Last Update: 2023-11-09

See Project

Wooey

A Django app that creates automatic web UIs for Python scripts

...Enable the easy wrapping of any program in simple python instead of having to use language specific to existing tools such as Galaxy. Enable fellow lab members with no command line experience to utilize python scripts. Autodocument workflows for data analysis (simple model saving).

Downloads: 0 This Week

Last Update: 2022-10-05

See Project

SZT-bigdata

SZT‑bigdata is an open source project

SZT‑bigdata is an open-source project analyzing real Shenzhen metro (subway) card usage data using big‑data frameworks like Spark, Hadoop, Hive, Kafka, Flink, ClickHouse, HBase, and Elasticsearch. Aimed at exploring transit passenger flow patterns and system optimization using a variety of Scala-based technologies.

Downloads: 1 This Week

Last Update: 2025-08-04

See Project

Strategems

Quantitative systematic trading strategy development and backtesting

...Given the highly iterative nature of event-driven trading strategy development, Julia's high-performance design (particularly in the context of loops) and straightforward syntax would seem to make it a natural fit as a language for systematic strategy research and development. While this package remains early in development, with time the hope is to be able to rapidly implement a trading idea, construct a historical backtest, analyze its results, optimize over a given parameter set, and visualize all of this with great detail.

Downloads: 0 This Week

Last Update: 2023-11-24

See Project

Deep Learning with PyTorch

Latest techniques in deep learning and representation learning

This course concerns the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition. The prerequisites include DS-GA 1001 Intro to Data Science or a graduate-level machine learning course. To be able to follow the exercises, you are going to need a laptop with Miniconda (a minimal version of Anaconda) and several Python packages installed. The following instruction would work as is for Mac or Ubuntu Linux users, Windows users would need to install and work in the Git BASH terminal. ...

Downloads: 0 This Week

Last Update: 2021-10-12

See Project

nonechucks

Deal with bad samples in your dataset dynamically

...What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you have an AlternateIndexSampler, and you want to be able to move to dataset[6] after dataset[4] fails while attempting to load! PyTorch's data processing module expects you to rid your dataset of any unwanted or invalid samples before you feed them into its pipeline, and provides no easy way to define a "fallback policy" in case such samples are encountered during dataset iteration.

Downloads: 0 This Week

Last Update: 2023-06-12

See Project

SPar: Stream Parallelism in Multi-Cores

An Embedded C++ Domain-Specific Language

SPar is an internal C++ Domain-Specific Language (DSL) suitable to model and implement classical stream parallel patterns. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. Latest version can be downloaded from the SVN using the following command: svn checkout svn://svn.code.sf.net/p/spar-dsl-compiler/svn/ spar

Downloads: 0 This Week

Last Update: 2018-04-01

See Project

AI learning

AiLearning, data analysis plus machine learning practice

We actively respond to the Research Open Source Initiative (DOCX) . Open source today is not just open source, but datasets, models, tutorials, and experimental records. We are also exploring other categories of open source solutions and protocols. I hope you will understand this initiative, combine this initiative with your own interests, and do what you can. Everyone's tiny contributions, together, are the entire open source ecosystem. We are iBooker, a large open-source community,...

Downloads: 0 This Week

Last Update: 2022-02-18

See Project

PRADA

PRADA : Pipeline for RNA-Sequencing Data Analysis

Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. PRADA currently supports 7 modules to process and identify...

Downloads: 0 This Week

Last Update: 2016-02-16

See Project

Search Results for "natural language processing"

Showing 38 open source projects for "natural language processing"

Awesome Fraud Detection Research Papers

Diffgram

Riemann

Data Formulator

Numaflow

Siddhi Core Libraries

101-0250-00

Benthos

AI Data Science Team

Gridap.jl

Searchkick

Pytente

Catbird Linux

Quick 2d Plot

text-dedup

SentimentAnalysis-Rick&Morty

Transducers.jl

Wooey

SZT-bigdata

Strategems

Deep Learning with PyTorch

nonechucks

SPar: Stream Parallelism in Multi-Cores

AI learning

PRADA

Search Results for "natural language processing"

Showing 38 open source projects for "natural language processing"

Awesome Fraud Detection Research Papers

Diffgram

Riemann

Data Formulator

Numaflow

Siddhi Core Libraries

101-0250-00

Benthos

AI Data Science Team

Gridap.jl

Searchkick

Pytente

Catbird Linux

Quick 2d Plot

text-dedup

SentimentAnalysis-Rick&Morty

Transducers.jl

Wooey

SZT-bigdata

Strategems

Deep Learning with PyTorch

nonechucks

SPar: Stream Parallelism in Multi-Cores

AI learning

PRADA

Related Searches

Related Categories