Menu

Tree [r4] /
 History

HTTPS access


File Date Author Commit
 Properties 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 res 2007-10-24 japienaar [r2] * Included license information
 AboutForm.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 AboutForm.resx 2007-10-19 japienaar [r1] Initial commit.
 App.ico 2007-10-19 japienaar [r1] Initial commit.
 AssemblyInfo.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 BrowseForFolder.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 COPYING 2007-10-24 japienaar [r2] * Included license information
 Crawler.csproj 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 Crawler.csproj.user 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 Crawler.sln 2007-10-19 japienaar [r1] Initial commit.
 Crawler_TemporaryKey.pfx 2007-10-19 japienaar [r1] Initial commit.
 Documentation.WebCrawler.UserManual.PJC.1.0.0.1.2007-10-26.doc 2007-12-12 japienaar [r4] Updated README.txt and added documentation.
 FileTypeForm.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 FileTypeForm.resx 2007-10-19 japienaar [r1] Initial commit.
 InformationDlg.Designer.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 InformationDlg.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 InformationDlg.resx 2007-10-19 japienaar [r1] Initial commit.
 MainForm.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 MainForm.resx 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 README.txt 2007-12-12 japienaar [r4] Updated README.txt and added documentation.
 Schedule.Designer.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 Schedule.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 Schedule.resx 2007-10-19 japienaar [r1] Initial commit.
 Settings.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 SettingsForm.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 SettingsForm.resx 2007-10-19 japienaar [r1] Initial commit.
 SortTree.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...
 Util.cs 2007-12-11 japienaar [r3] Bug fixes and added method to compare the struc...

Read Me

NLP WebCrawler
==============

Description
-----------
A WebCrawler for Natural Language Processing. This WebCrawler searches for monolingual (in a specified language) and bilingual, parallel text. The WebCrawler was adapted from a project by Hatem Mostafa (http://www.codeproject.com/cs/internet/Crawler.asp).

The n-gram files, used for language identification, are not included in the distribution, but are available on request (http://www.ctext.co.za/).

Requires
--------
* IKVM.NET http://www.ikvm.net/
* PDFBox - Java PDF Library (http://www.pdfbox.org/)