Showing 29 open source projects for "pdf data mining"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 3
    Weka

    Weka

    Machine learning software to solve data mining problems

    Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
    Leader badge
    Downloads: 10,533 This Week
    Last Update:
    See Project
  • 4
    Smile

    Smile

    Statistical machine intelligence and learning engine

    Smile is a fast and comprehensive machine learning engine. With advanced data structures and algorithms, Smile delivers the state-of-art performance. Compared to this third-party benchmark, Smile outperforms R, Python, Spark, H2O, xgboost significantly. Smile is a couple of times faster than the closest competitor. The memory usage is also very efficient. If we can train advanced machine learning models on a PC, why buy a cluster? Write applications quickly in Java, Scala, or any JVM...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Your monitoring isn't a stack. It's a pile. Fix that. Icon
    Your monitoring isn't a stack. It's a pile. Fix that.

    Errors, performance, logs, uptime. One install, one invoice, one UI.

    Replace Datadog, New Relic, and Sentry without adding three more dashboards.
    Free 30 days.
  • 5
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    ...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GeoDMA

    GeoDMA

    Geographic feature extraction and data mining

    GeoDMA is a plugin for TerraView software, used for geographical data mining. With a single image, the user can perform segmentation, attributes extraction, normalization and classification.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    stkpp

    stkpp

    C++ Statistical ToolKit

    ...At a convenience, we propose the source packages on sourceforge. The library offers a dense set of (mostly) template classes in C++ and is suitable for projects ranging from small one-off projects to complete data mining application suites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    UnBBayes

    UnBBayes

    Framework & GUI for Bayes Nets and other probabilistic models.

    UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports Bayesian networks, influence diagrams, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, structure, parameter and incremental learning. Please, visit our wiki (https://sourceforge.net/p/unbbayes/wiki/Home/) for more information. Check out the license section (https://sourceforge.net/p/unbbayes/wiki/License/) for our licensing policy.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    ADAMS

    ADAMS

    ADAMS is a workflow engine for building complex knowledge workflows.

    ADAMS is a flexible workflow engine aimed at quickly building and maintaining data-driven, reactive workflows, easily integrated into business processes. Instead of placing operators on a canvas and manually connecting them, a tree structure and flow control operators determine how data is processed (sequentially/parallel). This allows rapid development and easy maintenance of large workflows, with hundreds or thousands of operators. Operators include machine learning (WEKA, MOA, MEKA)...
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    Karate Club

    Karate Club

    An API Oriented Open-source Python Framework for Unsupervised Learning

    Karate Club is an unsupervised machine learning extension library for NetworkX. Karate Club consists of state-of-the-art methods to do unsupervised learning on graph-structured data. To put it simply it is a Swiss Army knife for small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping community detection methods. Implemented methods cover a wide range of network science (NetSci, Complenet), data mining (ICDM, CIKM, KDD), artificial intelligence (AAAI, IJCAI) and machine learning (NeurIPS, ICML, ICLR) conferences, workshops, and pieces from prominent journals.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    Pattern is an open-source Python library that provides tools for web mining, natural language processing, machine learning, and network analysis. The project integrates multiple capabilities into a single framework that allows developers to collect, process, and analyze textual data from the web. It includes modules for web scraping and crawling that can retrieve information from sources such as social media platforms, search engines, and online knowledge bases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    OmicSelector

    OmicSelector

    Feature selection and deep learning modeling for omic biomarker study

    OmicSelector is an environment, Docker-based web application, and R package for biomarker signature selection (feature selection) from high-throughput experiments and others. It was initially developed for miRNA-seq (small RNA, smRNA-seq; hence the name was miRNAselector), RNA-seq and qPCR, but can be applied for every problem where numeric features should be selected to counteract overfitting of the models. Using our tool, you can choose features, like miRNAs, with the most significant...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Isolation Similarity

    Isolation Similarity

    aNNE similarity based on Isolation Kernel

    Demo of using aNNE similarity for DBSCAN. Written by Xiaoyu Qin, Monash University, March 2019, version 1.0 This software is under GNU General Public License version 3.0 (GPLv3) This code is a demo of method described by the following publication: Qin, X., Ting, K.M., Zhu, Y. and Lee, V.C., 2019, July. Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AI Cheatsheets

    AI Cheatsheets

    Essential Cheat Sheets for deep learning and machine learning research

    cheatsheets-ai is an open-source repository that collects essential cheat sheets covering many tools and concepts used in machine learning, deep learning, and data science. The project aims to provide quick-reference materials that help engineers, researchers, and students review key techniques and frameworks without reading extensive documentation. It compiles cheat sheets for widely used libraries and technologies such as TensorFlow, Keras, NumPy, Pandas, Scikit-learn, Matplotlib, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    xLearn

    xLearn

    High performance, easy-to-use, and scalable machine learning (ML)

    xLearn is a high-performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM), all of which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data. Many real-world datasets deal with high dimensional sparse feature vectors like a recommendation system where the number of categories and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    TEXT2DATA

    TEXT2DATA

    Text Analytics Platform

    Bring Text Analytics Platform that uses NLP (Natural Language Processing) and Machine Learning to your work environment. Extract essential information from your text documents and let Artificial Intelligence save your time. Get detailed and agile reports on your unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Siamese and triplet learning

    Siamese and triplet learning

    Siamese and triplet networks with online triplet mining in PyTorch

    ...The repository demonstrates how to train these models using contrastive loss and triplet loss functions, which encourage embeddings of similar samples to be close while pushing dissimilar samples farther apart. It includes data loaders, training scripts, neural network architectures, and evaluation metrics that allow researchers to experiment with different embedding learning strategies. The project also implements online pair and triplet mining techniques to efficiently generate training examples during model training.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Bolt ML

    Bolt ML

    10x faster matrix and vector operations

    Bolt is an open-source research project focused on accelerating machine learning and data mining workloads through efficient vector compression and approximate computation techniques. The core idea behind Bolt is to compress large collections of dense numeric vectors and perform mathematical operations directly on the compressed representations instead of decompressing them first. This approach significantly reduces both memory usage and computational overhead when working with high-dimensional data commonly used in machine learning systems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    SPAWNN

    SPatial Analysis With self-organizing Neural Networks

    The SPAWNN toolkit is an innovative toolkit for spatial analysis with self-organizing neural networks which is particularily useful for spatial analysis, visualization and geographical data mining. To run the toolkit, simply download and execute (double-click) the jar-file. Please cite: - Hagenauer, J., & Helbich, M. (2016). SPAWNN: A Toolkit for SPatial Analysis With Self-Organizing Neural Networks. Transactions in GIS, 20(5), 755-775. Other related publications: - Hagenauer, J. (2016). Weighted merge context for clustering and quantizing spatial data with self-organizing neural networks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    PyDaMelo

    Python-compatible Data mining elementary objects

    An attempt at offering machine learning and data mining algorithms at the finest grain we are able to, easy to combine together through Python scripting to glue together the Lego-like bricks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    All future developments will be implemented in the new MATLAB toolbox SciXMiner, please visit https://sourceforge.net/projects/scixminer/ to download the newest version. The former Matlab toolbox Gait-CAD was designed for the visualization and analysis of time series and features with a special focus to data mining problems including classification, regression, and clustering.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Mass-based dissimilarity

    Mass-based dissimilarity

    A data dependent dissimilarity measure based on mass estimation.

    This software calculates the mass-based dissimilarity matrix for data mining algorithms relying on a distance measure. References: Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure. KDD 2016 http://dx.doi.org/10.1145/2939672.2939779 The source code, presentation slide and poster are attached under "Files". The presentation video in KDD 2016 is published on https://youtu.be/eotD_-SuEoo .
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ExSTraCS

    ExSTraCS

    Extended Supervised Tracking and Classifying System

    This advanced machine learning algorithm is a Michigan-style learning classifier system (LCS) developed to specialize in classification, prediction, data mining, and knowledge discovery tasks. Michigan-style LCS algorithms constitute a unique class of algorithms that distribute learned patterns over a collaborative population of of individually interpretable IF:THEN rules, allowing them to flexibly and effectively describe complex and diverse problem spaces. ExSTraCS was primarily developed to address problems in epidemiological data mining to identify complex patterns relating predictive attributes in noisy datasets to disease phenotypes of interest. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    This site contains four packages of Mass and mass-based density estimation. 1. The first package is about the basic mass estimation (including one-dimensional mass estimation and Half-Space Tree based multi-dimensional mass estimation). This packages contains the necessary codes to run on MATLAB. 2. The second package includes source and object files of DEMass-DBSCAN to be used with the WEKA system. 3. The third package DEMassBayes includes the source and object files of a Bayesian...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Stochastico is an implementation of stochastic discrimination for pattern recognition, predictive modeling and data mining applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next