Menu

InteractiveShell

Kostia

Text-Analysis Jython Interactive Shell

You can use the Text-Analysis API through an interactive jython shell. Jython is an implementation of the Python language for the Java platform. With Jython you can conjugate the elegant and not verbose syntax of the Python language with the reuse of any Java class.

First download and install jython from http://www.jython.org/ (don't forget to set the environment variable JYTHON_HOME). Then go to the command line and start the Text-Analysis shell (for automatic code completion we suggest DreamPie):

$ cd /home/user/Text-Analysis/bin
$ ./shell.sh 
Welcome to the Text-Analysis console!
type help() for commands
>>>

(Note that the first time the message "sys-package-mgr: processing modified jar" will occur several times.)

Typing help() you get some advices about special commands and how to start the demos:

>>> help()
parse_url(url)
    Parse the text in the body tag in a web page.
read_file(filename)
    Read the content of file as a string.
LanguageDetectorDemo().start()
    Start the LanguageDetector demo.

For example, let us start the Language Detector demo:

>>> LanguageDetectorDemo().start()

The following text:

        Fehler beim Programmieren sind unvermeidlich. Schätzungen zufolge besitzt der Source eines
        jeden Programms mindestens alle 1000 Zeilen einen Entwicklungsfehler. Dazu kommt, dass
        Fehler während der Programmausführung ihren Ursprung nicht im Programm selbst haben
        müssen. Fehlerquellen wie plötzlich abbrechende Datenbankverbindungen oder ein nicht mehr
        adressierbares Dateisystem addieren sich zu den Programmierfehlern. Das ergibt in Summe
        ein Fehlerpotenzial, dem man bei der Entwicklung oder bei der Refaktorisierung einer
        Software Beachtung schenken muss.

is written in german

The following text:

        La consigne est venue de Bercy, fin novembre, discrète mais ferme. Le ministre 
        de l'économie et des finances, Thierry Breton, veut solder les comptes de 
        Bernard Tapie et du Crédit lyonnais et mettre un terme au feuilleton judiciaire 
        qui les oppose depuis une décennie. Quitte à opposer son véto à tout pourvoi en 
        cassation contre l'arrêt de la cour d'appel de Paris qui, le 30 septembre, a 
        donné raison à l'homme d'affaires, lui octroyant 135 millions d'euros de 
        dommages et intérêts l'Etat devant en assurer le règlement.

is written in french

Now try yourself...
First declare an instance of the language detector:
    ld = LanguageDetector()
Then type:
    ld.detect(parse_url('http://<your url>'))
to detect the language of an url of your chooice or:
    ld.getAvailableLanguages()
for a list of the available languages.

>>>

As also suggested in the demo, you can experiment by yourself, for example by detecting the language of the page "www.focus.de":

>>> ld = LanguageDetector()
>>> ld.detect(parse_url('http://www.focus.de'))
GERMAN (reliability: 0,6219)

You can also write your own jython scripts but in this case the Text-Analysis toolkit must be in classpath. Read the content of bin/set_env.sh or_ bin/set_env.bat_ to see what jars or folders compose the classpath. For example, bin/set_env.sh includes the following locations:

export CLASSPATH=$TEXT_ANALYSIS_HOME/lib/*:$TEXT_ANALYSIS_HOME/resource:$TEXT_ANALYSIS_HOME/ext/*:$CLASSPATH
export JYTHONPATH=$TEXT_ANALYSIS_HOME/bin

To set the classpath under *nix system, don't forget to use the command source (i.e. source set_env.sh) otherwise the above variables will be not exported in the current bash session.


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.