Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Scientific/Engineering
Linguistics Software
Search Results

Search Results for "data science"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 25
Linux 24
Mac 22
More...
BSD 16
ChromeOS 13
Desktop Operating Systems 1

Category

Scientific/Engineering 30
Artificial Intelligence 11
Education 5
Formats and Protocols 4
Business 3
Social sciences 3
Software Development 3
Multimedia 2
Internet 1
Text Editors 1

License

OSI-Approved Open Source 29
Creative Commons Attribution License 2
Public Domain 1

Translations

English 10
French 2
Arabic 1
Chinese (Simplified) 1
More...
Croatian 1
German 1
Russian 1

Programming Language

Java 11
Perl 5
Python 5
C 4
More...
C++ 3
C# 2
Groovy 2
JavaScript 2
S/R 2
XSL (XSLT/XPath/XSL-FO) 2
Assembly 1
Flex 1
PHP 1
REBOL 1
Ruby 1
Scheme 1

Status

Production/Stable 12
Beta 10
Alpha 8
Planning 2
More...
Pre-Alpha 2

Showing 30 open source projects for "data science"

View related business solutions

Linguistics Clear Filters & Widen Search

Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
1

TEI LingSIG

Production space for the TEI Linguistics SIG

This used to be the experimentation and production space for the Special Interest Group (SIG) of the Text Encoding Initiative (TEI) called "TEI for Linguists", LingSIG for short. Currently, this is a storage place for documents produced by the SIG. Use https://github.com/LingSIG to access the current production space.

Downloads: 3 This Week

Last Update: 2026-06-17
See Project
2

SPPAS

SPPAS - the automatic annotation and analyses of speech

SPPAS is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France. Available for free, with open source code, there is simply no other package for linguists to simple use in the automatic annotations of speech, the analyses of any kind of annotated data and the conversion of annotated files. SPPAS is able to produce automatically speech annotations from a recorded speech sound and its orthographic...

Downloads: 5 This Week

Last Update: 2026-04-06
See Project
3

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...

Downloads: 4 This Week

Last Update: 2024-12-09
See Project
4

MITRE Annotation Toolkit

A toolkit for managing and manipulating text annotations

The MITRE Annotation Toolkit (MAT) is a suite of tools which can be used for automated and human tagging of annotations. Annotation is a process, used mostly by researchers in natural language processing, of enhancing documents with information about the various phrase types the documents contain. MAT supports both UI interaction and command-line interaction, and provides various levels of control over the overall annotation process. It can be customized for specific tasks (e.g.,...

Downloads: 0 This Week

Last Update: 2023-04-19
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
5

concordia

Powerful search library, best suited for computer-aided translation

Concordia - Roman goddess of agreement. Concordance searcher - tool for translators who need their translations to "agree" with one standard. Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures. The effects are stunning - Concordia is able to do simple substring...

Downloads: 0 This Week

Last Update: 2019-02-28
See Project
6

KhmerText

Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.

4 Reviews

Downloads: 82 This Week

Last Update: 2018-05-17
See Project
7

Free Dictionaries

Free translating dictionaries. Source format: TEI-P5 XML. Delivery formats: DICT, Stardict, etc. The dictionaries may include information on the pronunciation, etymology and such, in a platform-independent format. Access: web/plugins/standalone.

Downloads: 61 This Week

Last Update: 2018-03-29
See Project
8

rcqp

R interface to the Corpus Query Protocol

Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.

Downloads: 0 This Week

Last Update: 2018-03-13
See Project
9

Colloquium QDA

A free and open source qualitative ethnographic interview coding tool.

Colloquium QDA is a tool for custom coding and analyzing qualitative ethnographic interviews. To run, make sure you first have JRE 8 or later installed (http://www.oracle.com/technetwork/java/javase/downloads/). Colloquium QDA is an open source cross-platform Java Swing app utilizing an embedded Java DB with Lucene integrated search.

Downloads: 0 This Week

Last Update: 2017-01-23
See Project
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
10

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are...

Downloads: 3 This Week

Last Update: 2016-08-08
See Project
11

Pacx

Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.

Downloads: 2 This Week

Last Update: 2014-03-15
See Project
12

Perstem

Perstem is a Persian (Farsi) stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger. Inflexional morphemes are separated or removed from their stems. Perstem can also tokenize and transliterate between various character set encodings and romanizations.

1 Review

Downloads: 0 This Week

Last Update: 2016-11-23
See Project
13

TML - Text Mining Library for LSA & CMM

TML is a Java Library for LSA and extracting Concept Maps from text

TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml

3 Reviews

Downloads: 0 This Week

Last Update: 2013-08-05
See Project
14

Hermes Natural Language Processing

A repository of software, documentation and data for NLP

Hermes is a repository of software, documentation and data for NLP. I am currently adding corpora extracted from Wikipedia (mostrly in Romance languages).

Downloads: 1 This Week

Last Update: 2013-04-26
See Project
15

ValiTerms

Validation of terms in corpus

ValiTerms is a tool that helps the validation of terms in corpus. It finds their occurrences and allows terminologists to choose if a term is relevant or not. ValiTerms is developed at LIPN (http://www-lipn.univ-paris13.fr), RCLN team. Please consult the wiki for instructions about installation and usage.

Downloads: 0 This Week

Last Update: 2015-10-06
See Project
16

EyeMap - Eye Movement Data Analyzer

EyeMap is a visualization and analysis tool for text reading eye movement data. It can process Unicode, proportion/non-proportion and spaced/unspaced reading materials, which supports various languages and experiment methods.

1 Review

Downloads: 4 This Week

Last Update: 2013-08-10
See Project
17

CRFSharp

CRFSharp is a .NET(C#) implementation of Conditional Random Field

CRFSharp(aka CRF#) is a .NET(C#) implementation of Conditional Random Fields, an machine learning algorithm for learning from labeled sequences of examples. It is widely used in Natural Language Process (NLP) tasks, for example: word breaker, postagging, named entity recognized, query chunking and so on. CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally...

Downloads: 0 This Week

Last Update: 2015-08-03
See Project
18

BioEvent

This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text. The method is based on SVM but other ML algorithms can be adopted. The method details are explained in the...

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
19

Parentheses Classifier

The Parenthesis Classifier takes the contents of a set of parentheses and classifies it into one of several categories. It includes a parenthesized-data extractor and the classifier.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
20

ELIA(eye-tracking for psycholinguistics)

ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.

1 Review

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
21

BD-1

BD-1 is a configurable database manager designed to provide efficient search and natural representations of annotated text, storing key-value pairs, triples, or n-tuples of text or binary data. It runs memory-resident or from disk.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
22

KAF2Tiger2

KAF2Tiger2 is a KAF (KYOTO annotation format) to <tiger2/> (Tiger2 XML) converter.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
23

crf decoder

CRF decoder is the simplified version of CRF++, only for decoding the sequential data. It removes the training component and its correspondent codes from CRF++, which makes CRF decoder more reabable and understandable for freshman.

Downloads: 0 This Week

Last Update: 2014-06-10
See Project
24

CTexT Alignment Interface Pro

Align parallel data at sentence level and also automatic creation of .tmx files for use with Autshumato ITE

Downloads: 0 This Week

Last Update: 2015-02-24
See Project
25

Varro

The Varro toolkit is a system for identifying and frequently recurring unordered subtrees in semi-structured data. It is mostly for linguistics but has applications in semi-structured data mining too.

Downloads: 0 This Week

Last Update: 2015-06-04
See Project

Previous
You're on page 1
2
Next

Related Searches

arabic stardict dictionary

annotation

autshumato alignment

sppas

tmx

medical diagnosis system

khmer

qda

pacx

persian pos tagger

Related Categories

Scientific/Engineering

Artificial Intelligence

Education

Formats and Protocols

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise