webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
vbullmin is a data miner bot for vBulletin boards. vbullmin can get all Forums, Topics, Post and Users from a vBulletin. It can be export this values with phpbb2 database schema. It's a sample for Machine Learning. It's using patterns for getting data.