Atera all-in-one platform IT management software with AI agents
Ideal for internal IT departments or managed service providers (MSPs)
Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
Learn More
Cloud tools for web scraping and data extraction
Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.
Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
************************************************************
THIS PROJECT IS MOVED.
See http://khcoder.net/en for the latest & greatest.
You can download this tool from the new home.
See you there!
************************************************************
ElixirFM is a high-level implementation of Functional Arabic Morphology. The core of ElixirFM is written in Haskell, while interfaces in Python and Perl support lexicon editing and other interactions.
http://github.com/otakar-smrz/elixir-fm
Encode Arabic provides tools for encoding and decoding Arabic in Haskell, Python, Perl, or LaTeX. Interprets the ArabTeX notation to generate original orthography or phonetic transcription. Supports Buckwalter and other romanizations. Converts legacy byte encodings into Unicode.
http://github.com/otakar-smrz/encode-arabic
Part-of-speech tagging is the task of assigning symbols from a particular set to words in a natural language text. ACOPOST implements and extends well-known machine learning techniques and provides a uniform environment for testing.
This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
Perstem is a Persian (Farsi) stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger. Inflexional morphemes are separated or removed from their stems. Perstem can also tokenize and transliterate between various character set encodings and romanizations.
ValiTerms is a tool that helps the validation of terms in corpus. It finds their occurrences and allows terminologists to choose if a term is relevant or not. ValiTerms is developed at LIPN (http://www-lipn.univ-paris13.fr), RCLN team.
Please consult the wiki for instructions about installation and usage.
Various tools for creating annotated parallel corpora including pre-trained tagging and parsing models for various languages, sentence alignment tools and word alignment tools.
Uplug also includes a web-based interface for interactive sentence and word alignment and scripts for indexing and querying parallel corpora using the Corpus Work Bench CWB.
Download 'uplug-main' first and then add other packages.
Axe Credit Portal - ACP- is axefinance’s future-proof AI-driven solution to digitalize the loan process from KYC to servicing, available as a locally hosted or cloud-based software.
Banks, lending institutions
Founded in 2004, axefinance is a global market-leading software provider focused on credit risk automation for lenders looking to provide an efficient, competitive, and seamless omnichannel financing journey for all client segments (FI, Retail, Commercial, and Corporate.)
This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text.
The method is based on SVM but other ML algorithms can be adopted. The method details are explained in the...
The Parenthesis Classifier takes the contents of a set of parentheses and classifies it into one of several categories. It includes a parenthesized-data extractor and the classifier.
The Simple Semantic Classifier classifies short chunks of natural language text into broad semantic classes that correspond to the OBO ontologies provided as input.
This program reads each of 270,000 entries of the BÍN database of ICELANDIC lemmata and all their forms. It assigns one of hundreds of morphological paradigms to each entry. It won a special award in the Þú átt orðið competition (www.ordid.is)
Varamozhi is a free English-Malayalam transliteration library. It can transliterate Malayalam text between Malayalam and English scripts. Varamozhi takes as the input, the mapping between a Malayalam font and a transliteration scheme; outputs functions i
Based on the Buckwalter Morphological Analyzer (Version 1.0) for doing Arabic stemming and POS tagging. Includes a rewrite of the original Perl script, with better documentation and more flexible options, and a C++ interface (usable as a library or app).
Stance is a perl script for generating random sentences in Dutch, which can be used as translation exercises for students of Dutch. In its finished version, it should be able to generate only gramatically correct sentences.