TEXminer - Browse /Update2023 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
readmeEN.txt	2025-03-25	3.4 kB	0
TEXminer_2023a.zip	2025-03-25	30.3 MB	0
nlpp_Sessons.zip	2025-03-23	15.4 kB	0
TEXminer_2023.zip	2023-10-21	30.2 MB	0
VocagramITA.zip	2023-10-21	251.5 kB	0
TEXminerHelp.htm	2023-10-16	43.0 kB	0
lpa_Sessons.zip	2023-06-28	9.2 kB	0
TextMiningData.zip	2023-06-26	80.2 MB	0
ZipfTest.zip	2023-06-26	1.1 MB	0
tmp_Sessions.zip	2023-06-24	3.1 kB	0
nlp_Sessons.zip	2023-06-22	2.2 kB	0
tmb_Sessons.zip	2023-06-20	8.4 kB	0
Totals: 12 Items		142.2 MB	0

User Manual for TEXminer 1.0 by gearwheelsoft beta Mar 2025


TEXminer allows to analyze Texts in Unicode Format.
Before import save your Text in Unicode/UTF8 Format to get all characters correctly, or import from a PDF File.
The Text Database can be saved in XML where the orginal Text, the Sentence and Word Lists and additional Parameters (e.g. Abbreviations) are stored.
Most Functions are universal for all Languages:
- Letter Frequency Analysis (19 Languages - extensible)
- Cooccurrence Analysis of Word-Pairs (universal)
- Determination of Central Expressions (universal)
- Thematic Model Statistics (5 Languages - fixed data)
- Database Similariy Analysis (Fingerprint Comparison - dependent of Thematic Model)
The Thematic Models also include Semantic Groups, which have been extended (2015).
The Thematic Models for Technical Terms have been extended (2015).
The Thematic Models for 1st additional Standard Vocabulary have been extended (2015).
The Thematic Models for 2nd additional Standard Vocabulary have been extended (2017).
The Thematic Models for 3rd additional Standard Vocabulary have been extended (2018).
The Italian Model has been added (2023).
The Word Frequency Ratio was added (2025).


------------
Key Features
------------

- Generic Processing of Unicode/UTF8 coded Texts
- Letter Frequency Analysis
- Generation of a Text Database using Abbreviations Lists and Stop-Word Lists in 19 Languages: 
  Bulgarian, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, 
  Italian, Norwegian, Polish, Portuguese, Russian, Romanian, Spanish, Swedish and Turkish; 
- Other Languages can be processed by creating new Entries vs. Lists: 
  the Letter Frequencies Data, Abbreviations Lists and Stop-Word Lists are extensible
- Searching for Words
- Calculation of Cooccurrences
- Determining of Central Expressions
- Calculation of Thematic Model Statistics using Thematic Language Models 
  containing up to 65572 entries in 6 Languages: English, German, French, Spanish, Italian and Russian
- Similariy Analysis (Comparation of Database Fingerprints)
- Compare Word Frequency Ratio (so far English only)


------------
Installation
------------

- extract the Project ZIP file to a new folder
- open the VB.NET Project with MS Visual Basic 2010 (or higher/Express Edition) or start the EXE file in the bin/Debug directory (.NET Framework required)
for optional Database Serialization:
- install SQLite for .NET2010 (or other SQLite bundle/wrapper appropriate for your Visual Studio Version):
  * Win32: sqlite-netFx40-setup-bundle-x86-2010-1.0.91.0.exe and the .NET Wrapper SQLite-1.0.66.0-setup.exe (System.Data.SQLite)
  * Win64: sqlite-netFx40-setup-bundle-x64-2010-1.0.91.0 and the .NET Wrapper SQLite-1.0.66.0-setup.exe (System.Data.SQLite)
- if you upgrade to a higher .NET Version, please download the appropriate Setup Bundle from the Web Site "sqlite.org"
  (known problem: SQLite may not work; use XML Serialisation as default)
- the PDF Import Function uses the Open Source PDF Software "iTextSharp" (see bin/iTextSharp directory for more Information)


------------------
Use of the Program
------------------

see TEXminerHelp.htm in the bin/Debug directory or click Menu Help - HTML Help in the Main Window


gearwheelsoft2
Mar 2025

Source: readmeEN.txt, updated 2025-03-25

TEXminer Files

Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

TEXminer Files

Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

Get an email when there's a new version of TEXminer