Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "java text mining preprocessing"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 54
Linux 47
Mac 46
More...
BSD 26
ChromeOS 25
Desktop Operating Systems 3
Mobile Operating Systems 1

Category

Scientific/Engineering 28
Artificial Intelligence 23
Business 12
Software Development 11
Internet 8
Database 4
Text Editors 4
Communications 2
Education 2
Formats and Protocols 2
System 2
Social sciences 1

License

OSI-Approved Open Source 41
Other License 3
Creative Commons Attribution License 1
Public Domain 1

Translations

English 12
German 3
Brazilian Portuguese 1
Portuguese 1
More...
Russian 1
Spanish 1

Programming Language

Java 45
C++ 3
XSL (XSLT/XPath/XSL-FO) 3
C 2
More...
C# 2
Groovy 2
Perl 2
Prolog 2
Python 2
Ruby 2
Fortran 1
JavaScript 1
JSP 1
Kotlin 1
Unix Shell 1

Status

Beta 15
Alpha 13
Production/Stable 11
Pre-Alpha 4
More...
Planning 2
Mature 2

Showing 54 open source projects for "java text mining preprocessing"

View related business solutions

Windows Clear Filters & Widen Search

Cloud-based help desk software with ServoDesk
Full access to Enterprise features. No credit card required.

What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.

Try ServoDesk for free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Dawarich

Self-hostable alternative to Google Timeline

Dawarich is a command-line tool (likely Ruby-based) for transforming and analyzing Arabic text data with normalization, diacritic handling, segmentation, and morphological tokenization. Designed for text mining and NLP workflows in Arabic-language contexts.

Downloads: 3 This Week

Last Update: 3 days ago
See Project
2

ant4docbook

ANT4DOCBOOK is an ANT task for DOCBOOK

ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
3

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 20 This Week

Last Update: 2025-10-25
See Project
4

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 8 This Week

Last Update: 2025-11-01
See Project
Keep company data safe with Chrome Enterprise
Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome
5

DataMelt

Computation and Visualization environment

DataMelt (or "DMelt") is an environment for numeric computation, data analysis, computational statistics, and data visualization. This Java multiplatform program is integrated with several scripting languages such as Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear...

4 Reviews

Downloads: 3 This Week

Last Update: 2023-04-21
See Project
6

Weka

Machine learning software to solve data mining problems

Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code.

51 Reviews

Downloads: 13,346 This Week

Last Update: 2023-09-25
See Project
7

Lingua

The most accurate natural language detection library for Java

Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.

Downloads: 0 This Week

Last Update: 2024-09-14
See Project
8

The Lemur Project

Search engine and data mining applications and ClueWeb datasets.

The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.

20 Reviews

Downloads: 6 This Week

Last Update: 2023-04-11
See Project
9

libpostal

A C library for parsing/normalizing street addresses around the world

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services,...

Downloads: 2 This Week

Last Update: 2022-05-02
See Project
Create and run cloud-based virtual machines.
Secure and customizable compute service that lets you create and run virtual machines.

Computing infrastructure in predefined or custom machine sizes to accelerate your cloud transformation. General purpose (E2, N1, N2, N2D) machines provide a good balance of price and performance. Compute optimized (C2) machines offer high-end vCPU performance for compute-intensive workloads. Memory optimized (M2) machines offer the highest memory and are great for in-memory databases. Accelerator optimized (A2) machines are based on the A100 GPU, for very demanding applications.

Try for free
10

DynaQ

Innovative text document search. http://dynaq.opendfki.de for details.

The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de

Downloads: 0 This Week

Last Update: 2021-08-05
See Project
11

RapidMiner -- Data Mining, ETL, OLAP, BI

ETL, data warehousing, data mining, OLAP, business intelligence (BI) in Java. 500+ modules: extract, transform, load (ETL), data mining, data analysis + Weka, statistical forecasting, preprocessing, validation, visualization, OLAP, business intelligence.

Downloads: 13 This Week

Last Update: 2020-08-06
See Project
12

@Note2

@Note2 - A workbench for Biomedical Text Mining

Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.

1 Review

Downloads: 1 This Week

Last Update: 2019-05-13
See Project
13

DSTK - Data Science TooKit 3

Data and Text Mining Software for Everyone

DSTK - Data Science Toolkit 3 is a set of data and text mining softwares, following the CRISP DM model. DSTK offers data understanding using statistical and text analysis, data preparation using normalization and text processing, modeling and evaluation for machine learning and algorithms. It is based on the old version DSTK at https://sourceforge.net/projects/dstk2/ DSTK Engine is like R. DSTK ScriptWriter offers GUI to write DSTK script. DSTK Studio offers SPSS Statistics like GUI...

Downloads: 0 This Week

Last Update: 2019-06-07
See Project
14

JSentiWordNet

A wrapper for the famous SentiWordNet, a resource for opinion mining

This project aims to provide a wrapper around the SentiWrodnet, a lexical resource for opinion mining. As defined by the authors : SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity. You can find additional information about the creation of SentiWordnet here : http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf sentiWordnet (avilable here : https://drive.google.com/open?id=0B0ChLbwT19XcOVZFdm5wNXA5ODg) is a text file with a...

Downloads: 0 This Week

Last Update: 2018-07-25
See Project
15

GNAT

GNAT recognizes gene names in text and maps them to NCBI Entrez Gene

GNAT is a BioNLP/text mining tool to recognize and identify gene/protein names in natural language text. It will detect mentions of genes in text, such as PubMed/Medline abstracts, and disambiguate them to remove false positives and map them to the correct entry in the NCBI Entrez Gene database by gene ID. March 2017: We started to upload GNAT output on Medline. See files/results/medline/.

Downloads: 1 This Week

Last Update: 2017-12-14
See Project
16

DSTK - DataScience ToolKit

DSTK - DataScience ToolKit for All of Us

DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify...

Downloads: 0 This Week

Last Update: 2018-05-08
See Project
17

sgmweka

Weka wrapper for the SGM toolkit for text classification and modeling.

Weka wrapper for the SGM toolkit for text classification and modeling. Provides Sparse Generative Models for scalable and accurate text classification and modeling for use in high-speed and large-scale text mining. Has lower time complexity of classification than comparable software due to inference based on sparse model representation and use of an inverted index. The provided .zip file is in the Weka package format, giving access to text classification. ...

Downloads: 38 This Week

Last Update: 2016-06-23
See Project
18

Java Data Mining Package

The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).

Downloads: 0 This Week

Last Update: 2015-08-19
See Project
19

Jbowl

Jbowl is a Java library intended to provide an API for development of text mining applications. It provides facilities for text analysis, as well as for building, evaluating and applying of various supervised and unsupervised text mining models.

Downloads: 0 This Week

Last Update: 2016-07-05
See Project
20

Stemmer Gujarati

Offline stemmer for Gujarati , which is one of 22 Indian languages.

This is a Gujarati stemmer in Java. Stemming is a process in which affixes are removed form the root word (stem). It relates morphological variant words to corresponding common root. For example "પ્રતિઉપયોગી" is word which has stem " ઉપયોગ". Stemmers are language specific tools. The design of a stemming algorithm requires a significant level of linguistic expertise. There has been lot of significant work in the development and evaluation of stemmer for non-Indian languages, but very less...

Downloads: 0 This Week

Last Update: 2015-04-05
See Project
21

Cenobi

cost estimation and management accounting, using neural networks

Cenobi is designed for management accountants, not (only) for statisticians and data mining experts. Carefully arranged default settings make sure you can concentrate on Cenobi's many accounting features rather than worrying about setting up artificial neural networks or genetic algorithms, which are the main machine learning tools under Cenobi's hood. Cenobi's main benefits are: - ease of use - Utilizing artificial neural networks to estimate cost relationships, Cenobi is able to...

Downloads: 0 This Week

Last Update: 2016-11-29
See Project
22

webtextanalysis

Mining knowledge from text data

This project aims to implement in java the following text mining techniques: Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Single or multiple documents Summarization, Plagiarism Detection.

Downloads: 0 This Week

Last Update: 2016-03-07
See Project
23

ONDEX Suite

Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.

Downloads: 1 This Week

Last Update: 2019-05-15
See Project
24

TML - Text Mining Library for LSA & CMM

TML is a Java Library for LSA and extracting Concept Maps from text

TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml

3 Reviews

Downloads: 1 This Week

Last Update: 2013-08-05
See Project
25

TextProcessor

A Java package to preprocess text datasets for posterior text analysis

The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. ...

Downloads: 0 This Week

Last Update: 2015-11-23
See Project

Previous
You're on page 1
2
3
Next

Related Searches

weka

weka for mac

download

weka mac

weka-stable-3.9.6.jar

download installer

weka jar

rapidminer

weka 32 bit

mac

Related Categories

Scientific/Engineering

Artificial Intelligence

Business

Software Development

Internet

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2025 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: