Search Results for "corpus linguistics" - Page 2

Showing 41 open source projects for "corpus linguistics"

View related business solutions
  • Top-Rated Free CRM Software Icon
    Top-Rated Free CRM Software

    216,000+ customers in over 135 countries grow their businesses with HubSpot

    HubSpot is an AI-powered customer platform with all the software, integrations, and resources you need to connect your marketing, sales, and customer service. HubSpot's connected platform enables you to grow your business faster by focusing on what matters most: your customers.
  • Multi-Site Network and Cloud Connectivity for Businesses Icon
    Multi-Site Network and Cloud Connectivity for Businesses

    Internet connectivity without complexity

    As your users rely more and more on Cloud and Internet-based technologies, reliable internet connectivity becomes more and more important to your business. With Bigleaf’s proven SD-WAN architecture, groundbreaking AI, and DDoS attack mitigation, you can finally deliver the reliable internet connectivity your business needs without the limitations of traditional networking platforms. Bigleaf’s Cloud Access Network and plug-and-play router allow for limitless control to and from anywhere your traffic needs to go. Bigleaf’s self-driving AI automatically identifies and adapts to any changing circuit conditions and traffic needs—addressing issues before they impact your users. Bigleaf puts you in the driver’s seat of every complaint and support call with full-path traffic and network performance data, delivered as actionable insights, reports, and alerts.
  • 1
    Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools. Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB. Download 'uplug-main' first and then add other packages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    ValiTerms

    ValiTerms

    Validation of terms in corpus

    ValiTerms is a tool that helps the validation of terms in corpus. It finds their occurrences and allows terminologists to choose if a term is relevant or not. ValiTerms is developed at LIPN (http://www-lipn.univ-paris13.fr), RCLN team. Please consult the wiki for instructions about installation and usage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Manage your IT department more effectively Icon
    Manage your IT department more effectively

    Streamline your business from end to end with ConnectWise PSA

    ConnectWise PSA (formerly Manage) allows you to stop working in separate systems, and helps you build a more profitable business. No more duplicate data entries, inefficient employees, manual invoices, and the inability to accurately track client service issues. Get a behind the scenes look into the award-winning PSA that automates processes for each area of business: sales, help desk, support, finance, and HR.
  • 5
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Australian National Corpus

    Australian National Corpus

    An ongoing project to collate and provide access to language data

    Includes • Scripts for the program/ code developed • High level architecture diagrams • Install guides for developers • Links to end user documentation on the AusNC website Note: The BSD license applies to customised plug-ins, scripts and ingest programs developed by the AusNC project team. Additional open source, 3rd party software products used by the AusNC solution are referenced on our SF wiki space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    WebSynonymExtractor

    a synonym extractor based on web-corpora and a multilingual translator

    This project is an approach for synonym extraction and extending WordNet by the so found synonyms. The python application is realised as a kind of pipe that starts with a web-corpus-reader which is followed by several workers (tokenizers, lemmatizers, ...) and finally completed by a result writer. In contrast to the state of the art approaches, this implementation is based on single words found in the web used as a corpus and translated to other languages. If translations of different...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    ... encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gain insights and build data-powered applications Icon
    Gain insights and build data-powered applications

    Your unified business intelligence platform. Self-service. Governed. Embedded.

    Chat with your business data with Looker. More than just a modern business intelligence platform, you can turn to Looker for self-service or governed BI, build your own custom applications with trusted metrics, or even bring Looker modeling to your existing BI environment.
  • 10
    Ug is a program to generate pseudo-realistic usernames using a categorised BNC (British National Corpus) wordlist. It combines words using pre-defined strategies such as adjective-noun or adjective-conjunction-adjective.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files. Supported file formats are Kura XML, Elan XML and Toolbox files. A Corpus Reader API is provided to support statistical analysis within the NLTK.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    A database of linguistic annotation of medical text (from MEDLINE), including corpora used with ABGene, BioCreative I and II, and the MedPost training corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Clipsyll is a collection of scripts and programs for dowloading, codifying, analysing (using NLTK) CLIPS, the largest Italian corpus of spoken language. It includes a syllabification module based on the SSP: http://sourceforge.net/projects/silly
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Sanchay
    Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    An Internet Corpus Linguistics analysis tool focusing on lexical variation over time and by geographical location. Initially the project will analyze word frequencies over time in the Linux Kernel Mailing List.
    Downloads: 0 This Week
    Last Update:
    See Project