Showing 28 open source projects for "python text"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    HyperTools

    HyperTools

    A Python toolbox for gaining geometric insights

    ...Support for lists of Numpy arrays, Pandas dataframes, text or (mixed) lists. Applying topic models and other text vectorization methods to text data. HyperTools is designed to facilitate dimensionality reduction-based visual explorations of high-dimensional data. The basic pipeline is to feed in a high-dimensional dataset (or a series of high-dimensional datasets) and, in a single function call, reduce the dimensionality of the dataset(s) and create a plot.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    AutoGluon

    AutoGluon

    AutoGluon: AutoML for Image, Text, and Tabular Data

    AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Sweetviz

    Sweetviz

    Visualize and compare datasets, target values and associations

    Sweetviz is an open-source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with just two lines of code. Output is a fully self-contained HTML application. The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    NBInclude.jl

    NBInclude.jl

    import code from IJulia Jupyter notebooks into Julia programs

    NBInclude is a package for the Julia language that allows you to include and execute IJulia (Julia-language Jupyter) notebook files just as you would include an ordinary Julia file. The goal of this package is to make notebook files just as easy to incorporate into Julia programs as ordinary Julia (.jl) files, giving you the advantages of a notebook (integrated code, formatted text, equations, graphics, and other results) while retaining the modularity and re-usability of .jl files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Briefer

    Briefer

    Dashboards and notebooks in a single place

    Briefer is an open-source collaborative data platform that brings notebooks, dashboards, and interactive data apps into a unified workspace that combines the flexibility of code with the simplicity of visual exploration. It’s designed so technical users can write Markdown, SQL, and Python side by side for data analysis, visualization, and reporting, while non-technical viewers can interact with results through inputs, dropdowns, and date pickers without writing any code. Users work in a Notion-style interface where they can build, organize, and share pages that contain executable code blocks, charts, text explanations, and interactive elements within the same document, enabling rich data storytelling and reproducible analytics. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Obsidian Visual Skills Pack

    Obsidian Visual Skills Pack

    Generate Canvas, Excalidraw, and Mermaid diagrams from text

    LLM-TLDR is a Python-based tool designed to dramatically reduce the amount of code a large language model needs to read by extracting the essential structure and context from a codebase and presenting only the most relevant parts to the model. Traditional approaches often dump entire files into a model’s context, which quickly exceeds token limits; LLM-TLDR instead indexes project structure, traces dependencies, and summarizes code in a way that preserves semantic relevance while shrinking...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Asymptote

    Asymptote

    2D & 3D TeX-Aware Vector Graphics Language

    Asymptote is a powerful descriptive vector graphics language for technical drawing, inspired by MetaPost but with an improved C++-like syntax. Asymptote provides for figures the same high-quality typesetting that LaTeX does for scientific text.
    Leader badge
    Downloads: 368 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 10
    File Sorter for Photographers

    File Sorter for Photographers

    Organize files/images from a csv or xlsx file.

    A user-friendly application to efficiently sort all types of files from a source folder into a destination folder based on a list of filenames provided in an Excel or CSV file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PipeRider

    PipeRider

    Code review for data in dbt

    PipeRider automatically compares your data to highlight the difference in impacted downstream dbt models so you can merge your Pull Requests with confidence. PipeRider can profile your dbt models and obtain information such as basic data composition, quantiles, histograms, text length, top categories, and more. PipeRider can integrate with dbt metrics and present the time-series data of metrics in the report. PipeRider generates a static HTML report each time it runs, which can be viewed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SentimentAnalysis-Rick&Morty

    SentimentAnalysis-Rick&Morty

    Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

    ...In this end-of-degree work, we analyze and classify the dialogue of characters in an English-language television series as "Rick and Morty" using Python. The objective is to identify and categorize the feelings and emotions expressed in the text, comparing the human perception of the characters' personalities with the results obtained using natural language processing techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Visdom

    Visdom

    A tool for creating, organizing, and sharing data visualizations

    A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy. Visdom aims to facilitate visualization of (remote) data with an emphasis on supporting scientific experimentation. Broadcast visualizations of plots, images, and text for yourself and your collaborators. Organize your visualization space programmatically or through the UI to create dashboards for live data, inspect results of experiments, or debug experimental code. Visdom has...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DataMelt

    DataMelt

    Computation and Visualization environment

    DataMelt (or "DMelt") is an environment for numeric computation, data analysis, computational statistics, and data visualization. This Java multiplatform program is integrated with several scripting languages such as Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    DataStation Community Edition

    DataStation Community Edition

    App to easily query, script, and visualize data from every database

    DataStation is an open-source data IDE for developers. It allows you to easily build graphs and tables with data pulled from SQL databases, logging databases, metrics databases, HTTP servers, and all kinds of text and binary files. Need to join or munge data? Write embedded scripts as needed in languages like Python, JavaScript, R or SQL. All in one application. Build reports with graphs, charts and tables. Script against data. Cross-platform: Windows, macOS, and Linux. Easily fetch your data, wherever it is: 18 SQL and non-SQL databases, files, HTTP server. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Data Science at the Command Line

    Data Science at the Command Line

    Data science at the command line

    ...To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools, useful whether you work with Windows, macOS, or Linux. You’ll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you’re comfortable processing data with Python or R, you’ll learn how to greatly improve your data science workflow by leveraging the command line’s power.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    nonechucks

    nonechucks

    Deal with bad samples in your dataset dynamically

    nonechucks is a library that provides wrappers for PyTorch's datasets, samplers and transforms to allow for dropping unwanted or invalid samples dynamically. What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    paralline

    Big Data tool

    Paralline executes a python function (or lambda function) or a script over each line of huge text files, in parallel processes and aggregates the result to a list.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Spark Notebook

    Spark Notebook

    Interactive and Reactive Data Science using Scala and Spark

    Spark Notebook is an interactive web-based computational notebook designed to make working with Apache Spark more productive, exploratory, and expressive. It allows developers, data scientists, and analysts to write, run, and visualize Spark code in cells that support multiple languages such as Scala, Python, and SQL, all within the same notebook. Users can interleave runnable code, rich text markup, visualizations, equations, and results, enabling reproducible research and exploratory data analysis workflows. Because it runs on top of Spark’s distributed engine, it can scale from running locally on a laptop to executing on clusters with large datasets without changing user workflow. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Rodeo

    Rodeo

    A data science IDE for Python

    A data science IDE for Python. RODEO, that is an open-source python IDE and has been brought up by the folks at yhat, is a development environment that is lightweight, intuitive and yet customizable to its very core and also contains all the features mentioned above that were searched for so long. It is just like your very own personal home base for exploration and interpretation of data that aims at Data Scientists and answers the main question, "Is there anything like RStudio for Python?"...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    CorNetMap

    A tool for Gene Expression Correlation Network

    Capabilities of CorNetMap: 1. Read data as tab-delimited text file. Can be used for analysis of any data set beyond gene expression. 2. Capable of both two-dimensional and multidimensional data analysis. 3. Calculate Pearson correlation and cross-correlation for analysis data with phase difference. 4. Generate correlation Heat-map and draws network map. 5. Save correlation data as text file. How to use and doccumentation:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    CSV*Loader for Oracle

    Simplified CSV turbo loader to Oracle

    Tired of writing control files? No problem! CSV*Loader will generate control file for SQL*Loader. Too slow? No problem! CSV*Loader turbo mode may load it 10x faster to your Oracle database than your good old Perl::DBI script.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Plotmeister is a data exploration tool. It parses your ASCII data and generates a simple (text-based) table format. You can modify this table and eventually create nice looking figures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB