Download Latest Version Autshumato.Text.Anonymiser.2.0.0.zip (2.1 MB)
Email in envelope

Get an email when there's a new version of Autshumato Text Anonymiser

Home / Source
Name Modified Size InfoDownloads / Week
Parent folder
Autshumato.Text.Anonymiser.2.0.0.Source.zip 2012-11-02 2.1 MB
ReadMe.txt 2012-11-02 5.0 kB
Totals: 2 Items   2.1 MB 0
##############################################################################
###																		   ###
###						 Autshumato Text Anonymiser						   ###
###								Version: 2.0							   ###
###																		   ###
##############################################################################

                         Autshumato Text Anonymiser 
==============================================================================
The Autshumato Text Anonymiser is a tool for the anonymisation of text corpora. This entails the identification of entities that may convey confidential information and replacing those entities with randomly selected entities of the same type or category. The Autshumato Text Anonymiser currently supports the anonymisation of corpora in the eleven official South African languages (Afrikaans, English, IsiNdebele, IsiXhosa, IsiZulu, Sepedi, Sesotho, Setswana, Siswati, Tshivenda and Xitsonga). The anonymiser can also be adapted for any other language.

The Autshumato Text Anonymiser can identify and anonymise the following entities:
 • Proper names and surnames
 • Geographical names (countries, cities, streets etc.)
 • Company names
 • Dates and Times
 • Amounts (monetary, percentages, other)
 • E-mail and Website addresses
 • Telephone numbers
 • ID Numbers

Version: 2.0

Platform: Windows (tested on Windows 7).

License: GNU GPL 2 (or any later versions). The license can be found in the "GNU General Public License 2.txt" file.


							 Install and Run  
==============================================================================
To install this program first download the "AutshumatoTextAnonymiser.X.X.X.zip" file from SourceForge.net.
Save the zip-file in a folder and extract it. 
To run the program open the folder where the AutshumatoTextAnonymiser.X.X.X.zip file was extracted and launch the application by double-clicking on AutshumatoTextAnonymiser.exe.


								Input Format  
==============================================================================
For the program to function properly the input file must comply with the following criteria:
 • The input file must be a .txt file with UTF8 encoding.
 • For optimal functioning of the program the input file should contain one sentence per line.
 • The input file must be tokenised (there should be spaces between words and punctuation).


								Anonymising a File  
==============================================================================
Follow these steps to anonymise a file:
 • Select the language of the input file from the "Select input file language" drop down box.
 • Select at least one of the parsers (Address, Amount, Dates, NE).
 • Select the input file by clicking on the "Browse" button next to "Choose an input file".
 • If you only want to classify the document, tick the "Classify only" checkbox.  No anonymisation will be done.
 • Click on the "Start" button to begin classification and/or anonymisation. When the program is finished processing the document, the classified and/or anonymised text will be displayed.
 • Click on the "Save" button to save the classified and/or anonymised text to a text file.
 • Click on the "Exit" Button to leave the program.


								Building Requirements  
==============================================================================
The following is needed to build the program: 
 • Lexicons (word lists) in the languages of the text that will be anonymised
 • Perl 5.14.2 or later (download at http://www.perl.com)
 • The following Perl modules (download at http://www.cpan.org or via Perl Package Manager):
	- ActivePerl::Config (License: Activestate Community Edition Software License)
	- AutoLoader
	- base
	- Cairo (License: License: GNU-GPL 2.1)
	- Carp
	- constant
	- Data::Dumper
	- DynaLoader 
	- Encode 
	- Exporter 
	- File::Basename 
	- File::Spec 
	- File::Spec::Functions 
	- File::Spec::Unix 
	- File::Spec::Win32 
	- Getopt::Std 
	- Glib (License: GNU Lesser General Public License)
	- Gtk2 (License: GNU Lesser General Public License)
	- Gtk2::Builder::Simple (License: GNU Lesser General Public License)
	- Gtk2::SourceView2 
	- IO 
	- Locale::Maketext::Simple (License: MIT license)
	- Log::Message 
	- Log::Message::Simple 
	- Module::Load
	- Pango (License: GNU Lesser General Public License)
	- Params::Check 
	- Scalar::Util 
	- Tie::Handle 
	- Tie::Hash 
	- Tie::StdHandle 
	- Win32 
	- XSLoader 

* Unless a license is specified for the module, it is licensed under the GNU-GPL 1 \ Artistic License. All licenses are distributed with the program. 
	
								Contact Information
==============================================================================
For further enquiries regarding this tool, contact Wildrich Fourie (wildrich.fourie@nwu.ac.za). 
You can also visit the project website at: http://autshumato.sourceforge.net


##############################################################################
Source: ReadMe.txt, updated 2012-11-02