Showing 45 open source projects for "corpus analysis"

View related business solutions
  • Vivantio IT Service Management Icon
    Vivantio IT Service Management

    Your service operation isn’t one-size-fits all, so your IT service management solution shouldn’t be either

    The Vivantio Platform allows you to focus on the IT service management tools that make sense for your organization’s unique service model: from incident, problem and change requests, to service requests, client knowledge and asset management
    Learn More
  • Powerful small business accounting software Icon
    Powerful small business accounting software

    For small businesses looking for desktop accounting software

    With AccountEdge, business owners can organize, process, and report on their financial information so they can focus on their business. Features include: accounting, integrated payroll, sales and purchases, contact management, inventory tracking, time billing, and more.
    Learn More
  • 1
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2

    Syllabic Verse Analysis (SylVA)

    Syllabifies and scans syllabic verse texts for metrical annotation

    The tool syllabifies and scans texts written in syllabic verse for metrical corpus annotation. It is designed for Old French and Old Occitan and exports the results in PAULA format suitable for the ANNIS platform (http://corpus-tools.org/annis/). Used first in the preparation of the metrical treebank containing the Old Occitan <i>Boeci</i> text (cf. Rainsford and Scrivner 2014), development continued for use with the Old Gallo-Romance Corpus <http://www.ogr-corpus.org>).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    TXM

    TXM

    Unicode-XML-TEI text/corpus analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    TEXminer

    TEXminer

    Text Mining Classification for Texts in ASCII, Unicode and PDF Format.

    TEXminer uses generic Text Mining Methods to analyze Unicode Files as plain Text or PDF. The Text Database can be saved in XML where the orginal Text, the Sentence and Word Lists and additional Parameters (e.g. Abbreviations) are stored. TEXminer allows Language Detection by Letter Frequency Analysis, finding important Words by Cooccurrence Analysis, Determination of Central Expressions, Thematic Text Classification (also Semantic Groups) and Fingerprint Comparison. Because TEXminer...
    Downloads: 5 This Week
    Last Update:
    See Project
  • NeoLoad is a very comprehensive tool if you are looking for a performance test tool for web applications and other applications Icon
    Your applications are all built differently, but they all need to perform. NeoLoad simplifies and scales performance testing for everything, from APIs and microservices, to end-to-end application testing through innovative protocol and browser-based capabilities.
    Learn More
  • 5

    modnlp

    Modular Suite of NLP Tools

    modnlp aims to provide a modular architecture and tools for natural language processing written (mainly) in Java. It provides an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data, tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, sample classification tools, and evaluation modules, a suite of corpus management...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6

    modnlp-plugins

    External plugins for modnlp/teccli

    This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    This is an application generator for conflation algorithms in perl language. This system supports generation perl source code for a stemmer from a rule file, running a stemmer which is supported by the system, parsing a corpus file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    iramuteq
    IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
    Leader badge
    Downloads: 587 This Week
    Last Update:
    See Project
  • The CRM you&rsquo;ll want to use every day Icon
    The CRM you&rsquo;ll want to use every day

    With CRM, Sales, and Marketing Automation in one, Act! gives you everything you need for happier clients, more revenue, and less stress.

    Act! Premium is perfect for small and midsize businesses looking to market better, sell more, and create customers for life. With unparalleled flexibility and freedom of choice, Act! Premium accommodates the unique ways you do business. Whether it’s customizations to fit your specific business or industry processes or your preferences for deployment and access, the possibilities with Act! Premium are limitless.
    Learn More
  • 10
    Korean Analyzer Rhino

    Korean Analyzer Rhino

    Parsing Korean words by morpheme and part-of-speech

    RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
    Leader badge
    Downloads: 12 This Week
    Last Update:
    See Project
  • 11
    jieba

    jieba

    Stuttering Chinese word segmentation

    "Jaba" Chinese word segmentation, do the best Python Chinese word segmentation component. Four word segmentation modes are supported. Precise mode, which tries to cut the sentence most precisely, suitable for text analysis. Full mode, scans all the words that can be formed into words in the sentence, the speed is very fast, but the ambiguity cannot be resolved. The search engine mode, on the basis of the precise mode, divides the long words again to improve the recall rate, which is suitable...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    concordia

    concordia

    Powerful search library, best suited for computer-aided translation

    Concordia - Roman goddess of agreement. Concordance searcher - tool for translators who need their translations to "agree" with one standard. Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures. The effects are stunning - Concordia is able to do simple substring...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Corpus Toolkit

    Corpus Toolkit

    A text management tool for linguistic purposes...

    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Yet another corpus manager. Allows for HTTP access to annotated text corpora, client does not need to install any special software to access the server (any browser with JavaScript support will do).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Open data for a Khmer language corpus and lexicographic data that can be used for the development of free language tools for Khmer language, such as automatic translators, dictionaries, linguistic analysis tools, etc.
    Leader badge
    Downloads: 60 This Week
    Last Update:
    See Project
  • 16
    **CODE MOVED TO GITHUB: https://github.com/bitextor ** Bitextor is an application created to generate translation memories using multilingual websites as a corpus source. It downloads an entire website and applies a set of heuristics (based mainly on HTML tag structure and text block length) to find bitexts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SmartMuseum

    SmartMuseum

    Software for work with Corpus of Everyday life history Sources

    Everyday life history is becoming of high interest due to the growing amount of various historical sources related to common human being. Analysis of such sources needs considering them as interrelated. Evaluation of such relations leads to meaningful results for different groups of information consumers: from professional historians and experts from close humanitarian sciences to common people, interested in everyday community life. Corpuses of everyday life history sources are being...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    Reviz-it

    Software tools to re-tell stories in a better way and expand them

    ... to find which ones are inspiring. - Use the inspiring word clouds to rephrase the story in an original way, then expand it. Enrich with various text mining algorithms to retrieve automatically the different ways the same thing is said in a given context (series of publications on same topic or from same organization for example): latent semantic analysis, topic modeling, rule-based text mining, etc. This allows rewriting a text with the specific 'style' of a corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Projet sumtec

    Nettoyage et préparation de corpus de transcriptions d'entretiens

    Scripts réalisés dans le cadre du projet SUMTEC pour la préparation des corpus de transcription en vue d'une exploitation sur RQDA et IRAMUTEQ. http://www.msh-lorraine.fr/index.php?id=623 Le projet contient 3 programmes PERL. L'objectif consiste à récupérer des transcriptions d'entretien non structurées afin de les structurer sous la forme d'un arbre xml. L'intérêt consiste à pouvoir, in fine, identifier les tours de parole et séparer les discours des interviewés et des intervieweurs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    Natural Language Analysis with Ngrams

    NLP tool for statistical analysis of words, sentences, documents

    Goal of this project is to have a NLP tool that would give statistical analysis results based on Google Ngram data. Furthermore, it is now just a NetBeans project without a final JAR. Furthermore, there will be a github version for anyone who wishes to contribute. In the future versions, user will be able to convert a single word to numerical data, to be able to compare two words and get the comparison data, and to be able to do the same for the sentences, paragraphs and documents. I...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Persica-A new Persian corpus for NLP

    This project presents a new corpus for NEWS text analysis in Persian

    Lack of multi-application text corpus despite of the surging text data is a serious bottleneck in the text mining and natural language processing especially in Persian language. This project presents a new corpus for NEWS articles analysis in Persian called Persica. NEWS analysis includes NEWS classification, topic discovery and classification, category classification and many more procedures. Dealing with NEWS has special requirements and first of all a valid and reliable corpus to perform...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    TextTools
    TextTools is a freeware corpus linguistics tool developed in Python to aid in research. This program analyzes user-created corpora and displays information about word (token) frequency, n-grams, clusters, collocations, keyword in context (KWIC), and keyness. TextTools is designed to be user-friendly and intuitive and will run natively on Mac OS X.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    BM_news

    News Report Corpus for Recommendation

    Authors: Juan Cigarrán, Jose Luis Martinez, Angel Castellanos, David Hernandez-Aranda, Ana García-Serrano The corpus contains 5360 news reports from 14 different Spanish Newspapers of different topics and scopes. The corpus also includes a three-level (news, subcategory, category) modelling, based on the approach presented in: Castellanos, A. Cigarrán, J. Garcia-Serrano, A. Content-based Recommendation: Experimentation, Analysis and Evaluation in a Case Study. Conferencia de la...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    Transml

    Phrase based Statistical Machine Transltion system for English Languag

    This software will translate English language to Malayalam and vice versa. Statistical Machine Translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The SMT is a corpus based approach, where a massive parallel corpus is required for training the SMT systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Khmer Automatic Translation

    Khmer-English-Khmer Automatic Translation

    The project attempts to develop a parallel-corpus-based hybrid high quality English-Khmer-English automatic translation system based on statistical analysis and enhanced with part-of-speech analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next