Showing 29 open source projects for "language processing"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Riemann

    Riemann

    A network event stream processing system, in Clojure

    Riemann aggregates events from your servers and applications with a powerful stream processing language. Send an email for every exception in your app. Track the latency distribution of your web app. See the top processes on any host, by memory and CPU. Combine statistics from every Riak node in your cluster and forward to Graphite. Track user activity from second to second. Riemann streams are just functions which accept an event.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    ...Agile development experience with SQL-like query language and graphical drag-and-drop editor supporting event simulation. Lightweight runtime that can natively run on Kubernetes, Docker, VM, or bare metal, and embedded in any Java or Python application. Scalable, and highly available distributed event processing on Kubernetes, with NATS Streaming and Siddhi Kubernetes Operator.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Benthos

    Benthos

    Fancy stream processing made operationally mundane

    Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 7
    Gridap.jl

    Gridap.jl

    Grid-based approximation of partial differential equations in Julia

    ...One can implement new FE spaces, new reference elements, use external mesh generators, linear solvers, post-processing tools, etc. See, e.g., the list of available Gridap plugins.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Pytente

    Uma Ferramenta Computacional para Análise e Recuperação de Patentes

    O Pytente é uma solução avançada para automatizar o processo de coleta, armazenamento e tratamento de dados bibliográficos de patentes. A ferramenta foi projetada para simplificar a coleta de grandes volumes de dados em repositórios de acesso aberto. O Pytente garante o armazenamento estruturado das informações, além da validação e eliminação de registros duplicados. Dentre as diversas funcionalidades disponibilizadas pela ferramenta, destacam-se a extração personalizada de subconjuntos de...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    SentimentAnalysis-Rick&Morty

    SentimentAnalysis-Rick&Morty

    Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

    The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Transducers.jl

    Transducers.jl

    Efficient transducers for Julia

    Transducers are transformations of "sequence" of input that can be composed very efficiently. The interface used by transducers naturally describes a wide range of processes that is expressible as a succession of steps. Furthermore, transducers can be defined without specifying the details of the input and output (collections, streams, channels, etc.) and therefore achieves a full reusability. Transducers are introduced by Rich Hickey, the creator of the Clojure language. His Strange Loop...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Wooey

    Wooey

    A Django app that creates automatic web UIs for Python scripts

    ...Enable the easy wrapping of any program in simple python instead of having to use language specific to existing tools such as Galaxy. Enable fellow lab members with no command line experience to utilize python scripts. Autodocument workflows for data analysis (simple model saving).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SZT-bigdata

    SZT-bigdata

    SZT‑bigdata is an open source project

    SZT‑bigdata is an open-source project analyzing real Shenzhen metro (subway) card usage data using big‑data frameworks like Spark, Hadoop, Hive, Kafka, Flink, ClickHouse, HBase, and Elasticsearch. Aimed at exploring transit passenger flow patterns and system optimization using a variety of Scala-based technologies.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    nonechucks

    nonechucks

    Deal with bad samples in your dataset dynamically

    ...What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you have an AlternateIndexSampler, and you want to be able to move to dataset[6] after dataset[4] fails while attempting to load! PyTorch's data processing module expects you to rid your dataset of any unwanted or invalid samples before you feed them into its pipeline, and provides no easy way to define a "fallback policy" in case such samples are encountered during dataset iteration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    SPar: Stream Parallelism in Multi-Cores

    SPar: Stream Parallelism in Multi-Cores

    An Embedded C++ Domain-Specific Language

    SPar is an internal C++ Domain-Specific Language (DSL) suitable to model and implement classical stream parallel patterns. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. Latest version can be downloaded from the SVN using the following command: svn checkout svn://svn.code.sf.net/p/spar-dsl-compiler/svn/ spar
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AI learning

    AI learning

    AiLearning, data analysis plus machine learning practice

    We actively respond to the Research Open Source Initiative (DOCX) . Open source today is not just open source, but datasets, models, tutorials, and experimental records. We are also exploring other categories of open source solutions and protocols. I hope you will understand this initiative, combine this initiative with your own interests, and do what you can. Everyone's tiny contributions, together, are the entire open source ecosystem. We are iBooker, a large open-source community,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    PRADA

    PRADA : Pipeline for RNA-Sequencing Data Analysis

    Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. PRADA currently supports 7 modules to process and identify...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    TeleScope

    TeleScope

    XML Data Stream Broker/Replicator

    TeleScope is the efficient intensive-load XML data stream broker, replicator and simple event processing platform (SEP) written in C for the Fedora 17-18, Slackware 13-14, Red Hat Enterprise Linux 6 (RHEL-6) Linux distributions. The platform is intended to be operated upon the single number/word values and is not meant to be deployed for full-text XML stream analysis. TeleScope has internal query language with a set of standard logical operators that allows to construct relatively complex query expressions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20

    Larch: Data Analysis for X-ray Spectra

    Data Processing and Analysis for X-ray Spectroscopy and More

    Larch is a scientific data processing language that is designed to be easy to use for novices and complete enough for advanced data processing and analysis. Larch provides a wide range of functionality for dealing with arrays of scientific data, and basic tools to make it easy to use and organize complex data. Larch has been primarily developed for dealing with x-ray spectroscopic and scattering data, especially the kind of data collected at modern synchrotrons and x-ray sources. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Cascalog

    Cascalog

    Data processing on Hadoop without the hassle

    Cascalog is a powerful Clojure (and Java) data processing and querying library built atop Hadoop (via Cascading), providing a high-level, Datalog-inspired abstraction for both big data processing and local computation. Cascalog is hosted at Clojars, and some of its dependencies are hosted at Conjars. Both Clo/Con-jars are maven repos that's easy to use with maven or leiningen. The Cascalog website contains more information and links to Various articles and tutorials. The best way to get...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Graphical Grammar Studio

    Graphical Grammar Studio

    An user friendly grammar tool for natural language processing tasks

    Full documentation with tutorials is included in the download package. Graphical Grammar Studio is a tool for applying grammars which behave as words acceptors/consumers and annotators. GGS grammars can be used to find and annotate sequences of words which respect certain conditions, in a given input. Its purpose is for creating NLP tools like phrase chunkers, named entity finders, pronoun co-reference solvers etc. A grammar is represented by a state machine which can be visualized, edited...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    q pipeline manager

    q pipeline manager

    q: integrated platform for pipeline configuration and management

    The q utility is a platform for creating and managing data analysis pipelines. It expands the value of your existing job scheduler - either Grid Engine or TORQUE PBS - through numerous functions that help you organize, submit, monitor, manage and share your informatics work. Data processing pipelines require high-level organization and parallelization of work to optimize resource utilization and decrease the time to results. q (from queue) allows complex job sequences to be efficiently...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Pear3DEngine

    Pear3DEngine

    Pear3DEngine is a modern and modular 3D development framework

    Pear3DEngine is a modern and modular 3D development framework that lets you create professional games, simulations and more. You are free to develop your program in C + +, XML or LUA and publish it as open source software or selling it as a commercial program. The rendering engine uses internally OpenGL or DirectX optionally. The planned editor supports software development on Linux, Windows and maybe MacOS X. DirectX 9 and therefore Windows XP are currently not supported and support is not...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Sanchay
    Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next