Name | Modified | Size | Downloads / Week |
---|---|---|---|
Release R01 | 2014-04-20 | ||
README-20140420-R01.txt | 2014-04-20 | 2.6 kB | |
Totals: 2 Items | 2.6 kB | 0 |
RELEASE NOTE Version : webStraktor Release 1.0 Date : 20-April-2014 Summary: webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax that is easy to master. The standard webStraktor output format is XML based, either in ASCII, UTF-8 or ISO-8859-1 (Latin1) code pages. webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of proxy servers. webStraktor extends the functionality of web crawlers, web spiders or web bots by integrating scraping and crawling capabilities and it provides exhaustive logging and tracing information. Components: The webStraktor crawler and script interpreter The webStraktor GUI builder is a java Swing based IDE (Integrated Development Environment). The webstraktor monitor is a java Swing application for displaying in real-time webStrakor tracing information. Release history: 20/04/2014 - First release Distribution: The WSQLC distribution comprises all required software, apart from a Java SDK and Ant software. Installattion instructions and user manual: Download the latest version of the webStraktor User Manual (webStraktor Manual 20140420-R01.pdf) Notices: Copyright (c) 2014 - webStraktor webStraktor is free software Permission is granted to copy, distribute and/or modify this software under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA to obtain the GNU General Public License This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. GNU General Public License: www.gnu.org/copyleft/gpl.html Contact details for copyright holder: webstraktor@gmail.com