Search Results for "java text mining preprocessing"

Showing 57 open source projects for "java text mining preprocessing"

View related business solutions
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Dawarich

    Dawarich

    Self-hostable alternative to Google Timeline

    Dawarich is a command-line tool (likely Ruby-based) for transforming and analyzing Arabic text data with normalization, diacritic handling, segmentation, and morphological tokenization. Designed for text mining and NLP workflows in Arabic-language contexts.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Weka

    Weka

    Machine learning software to solve data mining problems

    Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.
    Leader badge
    Downloads: 11,449 This Week
    Last Update:
    See Project
  • 3
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    DataMelt

    DataMelt

    Computation and Visualization environment

    DataMelt (or "DMelt") is an environment for numeric computation, data analysis, computational statistics, and data visualization. This Java multiplatform program is integrated with several scripting languages such as Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Lingua

    Lingua

    The most accurate natural language detection library for Java

    Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    The Lemur Project

    The Lemur Project

    Search engine and data mining applications and ClueWeb datasets.

    The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
    Leader badge
    Downloads: 32 This Week
    Last Update:
    See Project
  • 9
    libpostal

    libpostal

    A C library for parsing/normalizing street addresses around the world

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    DynaQ

    DynaQ

    Innovative text document search. http://dynaq.opendfki.de for details.

    The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    RapidMiner -- Data Mining, ETL, OLAP, BI
    ETL, data warehousing, data mining, OLAP, business intelligence (BI) in Java. 500+ modules: extract, transform, load (ETL), data mining, data analysis + Weka, statistical forecasting, preprocessing, validation, visualization, OLAP, business intelligence.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12

    VecText

    Converting text to a structured representation

    VecText is an application that converts raw text to a structured format suitable for various data mining software. The application is written in interpreted programming language Perl. A part of the functionality is realized by external modules (e.g., Lingua::Stem::Snowball for stemming). The graphical user interface enables user-friendly software employment without requiring specialized technical skills and knowledge of a particular programming language, names of libraries and their functions, etc. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    @Note2

    @Note2

    @Note2 - A workbench for Biomedical Text Mining

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DSTK - Data Science TooKit 3

    DSTK - Data Science TooKit 3

    Data and Text Mining Software for Everyone

    DSTK - Data Science Toolkit 3 is a set of data and text mining softwares, following the CRISP DM model. DSTK offers data understanding using statistical and text analysis, data preparation using normalization and text processing, modeling and evaluation for machine learning and algorithms. It is based on the old version DSTK at https://sourceforge.net/projects/dstk2/ DSTK Engine is like R. DSTK ScriptWriter offers GUI to write DSTK script. DSTK Studio offers SPSS Statistics like GUI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    JSentiWordNet

    A wrapper for the famous SentiWordNet, a resource for opinion mining

    This project aims to provide a wrapper around the SentiWrodnet, a lexical resource for opinion mining. As defined by the authors : SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. You can find additional information about the creation of SentiWordnet here : http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf sentiWordnet (avilable here : https://drive.google.com/open?id=0B0ChLbwT19XcOVZFdm5wNXA5ODg) is a text file with a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    GNAT

    GNAT

    GNAT recognizes gene names in text and maps them to NCBI Entrez Gene

    GNAT is a BioNLP/text mining tool to recognize and identify gene/protein names in natural language text. It will detect mentions of genes in text, such as PubMed/Medline abstracts, and disambiguate them to remove false positives and map them to the correct entry in the NCBI Entrez Gene database by gene ID. March 2017: We started to upload GNAT output on Medline. See files/results/medline/.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    sgmweka

    Weka wrapper for the SGM toolkit for text classification and modeling.

    Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. ...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 19
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Jbowl is a Java library intended to provide an API for development of text mining applications. It provides facilities for text analysis, as well as for building, evaluating and applying of various supervised and unsupervised text mining models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Stemmer Gujarati

    Stemmer Gujarati

    Offline stemmer for Gujarati , which is one of 22 Indian languages.

    This is a Gujarati stemmer in Java. Stemming is a process in which affixes are removed form the root word (stem). It relates morphological variant words to corresponding common root. For example "પ્રતિઉપયોગી" is word which has stem " ઉપયોગ". Stemmers are language specific tools. The design of a stemming algorithm requires a significant level of linguistic expertise. There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Cenobi

    Cenobi

    cost estimation and management accounting, using neural networks

    Cenobi is designed for management accountants, not (only) for statisticians and data mining experts. Carefully arranged default settings make sure you can concentrate on Cenobi's many accounting features rather than worrying about setting up artificial neural networks or genetic algorithms, which are the main machine learning tools under Cenobi's hood. Cenobi's main benefits are: - ease of use - Utilizing artificial neural networks to estimate cost relationships, Cenobi is able to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    webtextanalysis

    Mining knowledge from text data

    This project aims to implement in java the following text mining techniques: Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Single or multiple documents Summarization, Plagiarism Detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    TML - Text Mining Library for LSA & CMM

    TML is a Java Library for LSA and extracting Concept Maps from text

    TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo