Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
Explore 10,000+ tools
Financial reporting cloud-based software.
For companies looking to automate their consolidation and financial statement function
The software is cloud based and automates complexities around consolidating and reporting for groups with multiple year ends, currencies and ERP systems with a slice and dice approach to reporting. While retaining the structure, control and validation needed in a financial reporting tool, we’ve managed to keep things flexible.
The Parenthesis Classifier takes the contents of a set of parentheses and classifies it into one of several categories. It includes a parenthesized-data extractor and the classifier.
HanNanum is a Korean Morphological Analyzer and POS Tagger. A plug-in component-based architecture is adapted to the new Java version for flexible use. You can find the work flow for morphological analysis, POS tagging, noun extraction, etc.
Contact:
kschoi@kaist.ac.kr
hjjeong@world.kaist.ac.kr
A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.
jWords is a port of WORDS (by William Whitaker, a free latin-to-english dictionary program written in Ada), to Java. Besides the dictionary will be translated to the German language.
WQuery is a domain-specific query language designed to process WordNet-like lexical databases. It may be used as a standalone application or as an API to a lexical database in Java based systems.
ELIA(Eyegaze Language Integration Analysis) supports the analysis of eye-tracking data for studies in language processing. ELIA eases early analysis of data to enable iterative development of experiments in response to spoken language.
The Simple Semantic Classifier classifies short chunks of natural language text into broad semantic classes that correspond to the OBO ontologies provided as input.
Total Network Visibility for Network Engineers and IT Managers
Network monitoring and troubleshooting is hard. TotalView makes it easy.
This means every device on your network, and every interface on every device is automatically analyzed for performance, errors, QoS, and configuration.
CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.
Java program to create a (potentially multilingual) glossary of the unique words in any given Lojban text.
Note that the Sourceforge page for this was superceded by the Bitbucket repository: https://bitbucket.org/pretoriusjf/vlastezba/overview
Any further updates will be made there.
A linguistic tool to aid in the study of Linguistics/Phonology, specifically distinctive features of possible language sounds. Comprised of both a Visual C++ .NET version as well as a Java based web applet version. The C++ version has all but been ab
A free Spanish - English Translator for Linux. It will translate a phrase (via internet) or single word (built-in dictionary.) Has capability to learn new words and is smart enough to find plural and feminine words. Written using Python/GTK under GPL
LexBase is a configurable lexical database manager. It reads lexical and semantic information from WordNet, allows flexible querying of the database, and supports programmatic addition and deletion of terms, word senses, and relations.
BD-1 is a configurable database manager designed to provide efficient search and natural representations of annotated text, storing key-value pairs, triples, or n-tuples of text or binary data. It runs memory-resident or from disk.
The Kyoto FST Decoder is a general decoding engine for Weighted Finite State Transducers. It features flexible XML-based configuration, beam-search decoding, and is able to output separate weights for weight tuning.
Migrado a GithUB EN: https://github.com/christiangda/es-ve
La mejor opción para verificar y corregir la gramática de tus documentos de LibreOffice escritos en español. La extensión incluye: Corrector Ortográfico, Tesauro de Sinónimos y Separación Silábica.
¡Hecho en Venezuela!
Sus principales características son:
Más DE 87.000 lemas y sus respectivas conjugaciones.
Contiene el Lemario actualizado de RAE (Real Academia Española)
Términos financieros e informáticos...
NooJ is used by linguists to describe linguistic phenomena and apply the formalized morphological, syntactic or semantic rules to corpora . It is used by non linguists in fields like psychology, sociology, history, literature studies as well.
PyAnnotation is a Python Library to access and manipulate linguistically annotated corpus files. Supported file formats are Kura XML, Elan XML and Toolbox files. A Corpus Reader API is provided to support statistical analysis within the NLTK.
Editor for formal grammars. Attempts to be universal – customizable for any grammatical formalism and any syntax. Provides features such as syntax checking and highlighting, transformations (refactoring) and advanced rule editor.