=============================
= GRAMLAB CORPUS MANAGER =
= INSTALLATION INSTRUCTIONS =
=============================
== Why would I want to use GramLab CorpusManager ? ==
GramLab CorpusManager is corpora factory, with which you can :
- collect and upload various kinds of documents (PDF, DOC, TXT, ...)
- apply encoding normalization and conversion (thanks to Tika)
- search your collection (Solr-enabled search engine) to build a coherent set of documents
- consolidate the result in a single TEILite UTF-8 file for grammar building purposes.
Technically speaking, GramLab CorpusManager is an ETL (Extract Transform Load) available as a webapp including Solr indexation engine and Tikka documents converter.
Still to be developped:
- auto-detect document language
- crawl web to add more document to your corpus.
=
= Windows Instructions =
=
== Prerequisites : What do I need to check before installing ? ==
As GramLab CorpusManager is a web app, you need to have web server available.
These installation guidelines implies you have a Tomcat server (version 6 or above) installed on your own computer. For any other configuration, please adapt those instruction to your own computer configuration. You will need administrator rights in order to perform a few operations.
After downloading Tomcat, unzip it and launch the .exe installer. Once the installation process is complete, the following folder will have been created with X.X.X as the version of your tomcat :
C:\Program Files\Apache Software Foundation\apache-tomcat-X.X.X
Nota: Please make sure your tomcat conf file (C:\Program Files\Apache Software Foundation\apache-tomcat-X.X.X\conf\server.xml) includes the following line :
URIEncoding="UTF-8"
such as in this sample conf file:
<Connector port="8080"
...
URIEncoding="UTF-8"
/>
== GramLab CorpusManager installation instructions ==
1. Dowload latest version of GramLab CorpusManager: https://sourceforge.net/projects/gramlab/files/GlabCorpus%20Manager/
You should now see 2 packages in your Download directory: GLabCorpus.war and GramlabSolr.zip
2. Create your own workspace <glabdir> wherever you want, for instance, C:\Users\<user>\GLabCorpus.
And unzip GramlabSolr.zip in this directory.
You should now see a "GramlabSolr" directory in your <glabdir>, listing the following files
solr.war
solr.xml
GramlabSolr.xml
Your uploaded documents and corpora will be stored there.
3. Tell Tomcat about your <glabdir>\GramLabSolr
Edit <glabdir>\GramLabSolr\GramlabSolr.xml as follows:
<?xml version="1.0" encoding="utf-8"?>
<Context path="/GramlabSolr"
docBase="<glabdir>\GramlabSolr\solr.war"
debug="0"
crossContext="true">
<Environment name="solr/home"
type="java.lang.String"
value="<glabdir>\GramlabSolr"
override="true"/>
</Context>
then copy this file in Tomcat file system:
C:\Program Files\Apache Software Foundation\apache-tomcat-X.X.X\conf\Catalina\localhost
4. Define an environment variable GLABCORPUS_HOME to link to your workspace.
Go to Control Panel > System > Advanced system settings, and then select the Advanced tab.
On this page at the bottom is a button saying Environment Variable, click that button.
Add a new System Variable :
Variable : GLABCORPUS_HOME
Value : <glabdir>
5. Copy GLabCorpus.war to Tomcat "webapps" directory:
C:\Program Files\Apache Software Foundation\apache-tomcat-X.X.X \webapps
When you will first launch Tomcat, this file will be automatically unzipped.
6. Start Tomcat
Start Tomcat by double clicking the "startup.bat" file :
C:\Program Files\Apache Software Foundation\apache-tomcat-X.X.X \bin\startup.bat
A console should open up and print this message : INFO: Server startup.
You can add a shortcut to this file and then place it in a convenient location, such as on your Desktop.
Nota: It is safer to use the "shutdown.bat" file to stop tomcat
7. Launch GramLab CorpusManager app by copy-pasting this url in your browser:
localhost:8080/GLabCorpus/index.html
You can start playing with GramLab CorpusManager app.
Feedbacks are warmly welcome. Please use sourceforge gramlab wiki.
=
= MacOS Instructions =
=
== Prerequisites : What do I need to check before installing ? ==
As GramLab CorpusManager is a web app, you need to have web server available.
These installation guidelines implies you have a Tomcat server (version 6 or above) installed on your own computer. For any other configuration, please adapt those instruction to your own computer configuration.
We suggest you use the following tutorial for Tomcat installation and configuration: http://www.sgariepy.com/wiki/doku.php/web:tomcat:installation_70
If you did, Tomcat would be in /usr/local/ and accessible as /usr/local/tomcat/, whatever the version you downloaded.
Nota1: It is safer to use the following command to start/stop tomcat :
$ /usr/local/tomcat/bin/startup.sh
$ /usr/local/tomcat/bin/shutdown.sh
Nota2: Please make sure your tomcat conf file (/usr/local/tomcat/conf/server.xml) includes the following line :
URIEncoding="UTF-8"
such as in this sample conf file:
<Connector port="8080"
...
URIEncoding="UTF-8"
/>
== GramLab CorpusManager installation instructions ==
1. Dowload latest version of GramLab CorpusManager: https://sourceforge.net/projects/gramlab/files/GlabCorpus%20Manager/
You should now see 2 packages in your Download directory: GLabCorpus.war and GramlabSolr.zip
2. Create your own workspace directory <glabdir> and unzip GramlabSolr.zip in this directory.
$ mkdir myglabdir
$ cp GramlabSolr.zip myglabdir
$ cd myglabdir
$ unzip GramlabSolr.zip
You should now see a "GramlabSolr" directory in your <glabdir>, listing the following files
solr.war
solr.xml
GramlabSolr.xml
Your uploaded documents and corpora will be stored there.
3. Tell Tomcat about your <glabdir>/GramLabSolr
Edit GramlabSolr.xml as follows. Don't forget to replace <user> by your own username and <glabdir> by your glabcorpus directory (created in 2.).
<?xml version="1.0" encoding="utf-8"?>
<Context path="/GramlabSolr" docBase="/users/<user>/<glabdir>/GramlabSolr/solr.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="
/users/<user>/<glabdir>/GramlabSolr" override="true"/>
</Context>
then copy this file in Tomcat file system:
$ cp GramlabSolr/GramlabSolr.xml /usr/local/tomcat/conf/Catalina/localhost
4. Define an environment variable $GLABCORPUS_HOME to link to your workspace.
Add the following line to your .profile file (It should in your home directory - if not, juste create it before.
$ export GLABCORPUS_HOME="$HOME/<glabdir>"
5. Copy GLabCorpus.war to Tomcat "webapps" directory:
$ cp GLabCorpus.war /usr/local/tomcat/webapps/
When you will first launch Tomcat, this file will be automatically unzipped.
6. Start Tomcat
$ /usr/local/tomcat/bin/startup.sh
7. Launch GramLab CorpusManager app by copy-pasting this url in your browser:
localhost:8080/GLabCorpus/index.html
You can start playing with GramlabCorpus app.
Feedbacks are warmly welcome. Please use the sourceforge wiki.
=
= Linux Instructions =
=
== Prerequisites : What do I need to check before installing ? ==
As GramLab CorpusManager is a web app, you need to have a web server available.
These installation guidelines implies you have a Tomcat server (version 6 or above) installed on your own computer. For any other configuration, please adapt those instruction to your own computer configuration.
On most linux distribution, you will be able to find dedicated packages to install Tomcat. For example :
Ubuntu : $> sudo apt-get install java-6-openjdk tomcat6 tomcat6-admin
RedHat : $> yum -y install java tomcat6 tomcat6-webapps tomcat6-admin-webapps
For more precise information, please refer to the official tomcat documentation.
If you choose to download the standalone package, then you should already be familiar with tomcat.
== GramLab CorpusManager installation instructions ==
1. Dowload latest version of GramLab CorpusManager: https://sourceforge.net/projects/gramlab/files/GlabCorpus%20Manager/
You should now see 2 packages in your Download directory: GLabCorpus.war and GramlabSolr.zip
2. Create your own workspace <glabdir> and uncompress GramlabSolr.zip in this directory.
$> mkdir glabdir
$> cp GramlabSolr.zip glabdir
$> cd glabir
$> unzip GramlabSolr.zip
You should now see a "GramlabSolr" directory in your <glabdir>, listing the following files
solr.war
solr.xml
GramlabSolr.xml
Your uploaded documents and corpora will be stored there.
3. Tell Tomcat about your <glabdir>/GramLabSolr
Edit GramlabSolr.xml as follows:
<?xml version="1.0" encoding="utf-8"?>
<Context path="/GramlabSolr" docBase="/pathtoglabdir/<glabdir>/GramlabSolr/solr.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="
/pathtoglabdir/<glabdir>/GramlabSolr" override="true"/>
</Context>
then copy this file in Tomcat file system:
cp GramlabSolr/GramlabSolr.xml /usr/local/tomcat/conf/Catalina/localhost
4. Define an environment variable $GLABCORPUS_HOME to link to your workspace.
Add the following line to your .profile file (It should in your home directory - if not, just create it before)
export GLABCORPUS_HOME="/pathtoglabdir/glabdir"
5. Copy GLabCorpus.war to Tomcat "webapps" directory:
cp GLabCorpus.war /usr/local/tomcat/webapps/
When you will first launch Tomcat, this file will be automatically deployed.
6. Start Tomcat
If you have installed tomcat using yum or apt then a service command would do the trick :
$> service tomcat6 start
To stop tomcat just use the following command :
$> service tomcat6 stop
7. Launch GramLab CorpusManager app by copy-pasting this url in your browser:
localhost:8080/GLabCorpus/index.html
You can start playing with GramLab CorpusManager app.
Feedbacks are warmly welcome. Please use the sourceforge Wiki.