LAITOR - LITERATURE ASSISTANT FOR IDENTIFICATION OF TERMS CO-OCCURRENCES AND RELATIONSHIPS
README
1. INTRODUCTION
This file will guide you about the main aspects of LAITOR configuration and installation procedures. These are very un-complicated tasks, which will take only a few minutes to do.
2. SYSTEM REQUIREMENTS
2.1. LINUX
LAITOR has been developed under LINUX platform (Ubuntu 7.1 distribution), however it is expected to work without problems in other distributions.
2.2. PHP
LAITOR is an open-source script developed in PHP (version 5.3.2). Therefore, you will need to have PHP locally installed on your machine. Normally, PHP is installed by default in most of the LINUX systems, however if not, you will need to install and configure PHP by yourself. To do this, follow the installation of PHP from its source website at http://www.php.net , maybe you will need the assistance of your system administrator for this purpose.
IMPORTANT: Due to the longer execution time of LAITOR for some analysis, you must increasy the PHP resource limits. For that go to you php.ini configuration file, locate the section "Resource_Limits" and increase the following limits for the bellow values:
;;;;;;;;;;;;;;;;;;;
; Resource Limits ;
;;;;;;;;;;;;;;;;;;;
max_execution_time = 300 ; Maximum execution time of each script, in seconds
max_input_time = 600 ; Maximum amount of time each script may spend parsing request data
memory_limit = 10000M ; Maximum amount of memory a script may consume (128MB)
2.3. MySQL
LAITOR accesses MySQL as the database query system (MySQL version 5.0.45 was used for LAITOR's development). MySQL is normally installed by default in any LINUX system; otherwise, you should follow the installation guide at MySQL website http://www.mysql.com .
2.4. Versions of the softwares used for development
LAITOR has been developed and evaluated under LINUX and the above mentioned softwares (PHP 5.3.2 and MySQL 5.0.45). We expect the program to perform well in superior releases. Please report found bugs along with specific version of the used softwares.
3. CONFIGURATIONS
Within LAITOR's distribution package exists a configuration file named 'config_laitor.php', the main aspects necessary to be configured for LAITOR usage must be informed there, and are described below:
3.1. MySQL configuration
In this session, the user must inform details for the MySQL accession on the server in which LAITOR is installed.
3.1.1. MySQL server ($server)
Replace the value "localhost" by the name of the server in which the MySQL is installed.
Alternatively, you can use the below commented lines if you want to connect to a port number (e.g. "hostname:port") or a path to a local socket (e.g. ":/path/to/socket") for the localhost. For that remove the commentary charachteres "//" and comment the other '$server' attributions lines.
3.1.2. MySQL user ($user)
Replace the value "user_name" by the name of the user attributed to LAITOR script in the MySQL server.
3.1.3. MySQL password ($pass)
Replace the value "password" by the name of the password attributed to LAITOR script in the MySQL server.
3.1.4. MySQL database name ($db)
Replace the value "database" by the name of the database attributed to LAITOR script in the MySQL server.
3.1.5. MySQL Table gene names ($table_genes)
Replace the value "gene_names" by the name of the table for the gene names to be loaded.
3.1.6. MySQL Table biointeractions ($table_biointeractions)
Replace the value "biointeraction_names" by the name of the table for the biointeraction terms names to be loaded.
3.1.7. MySQL Table concepts ($table_concepts)
Replace the value "concept_names" by the name of the table for the concept names to be loaded.
3.2. Dictionaries
In this session, users need to inform the name of the files containing the dictionaries used by LAITOR in the following variables.
IMPORTANT: To make it easier to locate the specific dictionaries set to LAITOR the files must be located in the "db_files" directory.
3.2.1. Gene names dictionary ($table_genes_dump)
This dictionary is the text file with content to be automatically loaded by LAITOR installation script in the MySQL gene names table. Fill the variable "$table_genes_dump" with the full name for the file to be loaded (this file MUST be located in the "db_files" subdirectory). By default we provide the gene table for Viridiplantae as cited in the LAITOR publication. If you want to load your own dictionary change the value "gene_info_viridiplantae_syn.dump" for your own file name.
In order to extend LAITOR's functionalities to a more broad range of organism, LAITOR is distribute with a pre-computed list of 505,402 gene dictionaries for all the organisms deposited in the NCBI Taxonomy Database which have gene records deposited in the NCBI Gene Database. Therefore, simply inform the Taxonomy identifier of your preferred organism followed by ".dictionary" (Ex.: 9606.dictionary for Homo sapiens; or 3702.taxonomy for Arabidopsis thaliana) and the corresponding gene name dictionary will be loaded as LAITOR's default dictionary.
To consult the NCBI's Taxonomy ID of your preferred organism, please check at http://www.ncbi.nlm.nih.gov/Taxonomy .
IMPORTANT: place your new customized gene name dictionary in the "db_files" directory.
3.2.2. Biointeractions dictionary ($table_biointeractions_dump)
Like the dictionary above, change the value "biointeractions.dump" for the name your own biointeraction dictionary. By default, we provide the biointeraction.dump dictionary as mentioned in the LAITOR publication.
IMPORTANT: place your biointeraction dictionary in the "db_files" directory.
3.2.3. Concepts dictionary ($table_concepts_dump)
Change the value "stimuli.dump" for the name your own concepts dictionary. By default, we provide the "stimuli.dump" dictionary mentioned in the LAITOR publication.
Additionally to the above mentioned concepts dictionary, we distribute as optional LAITOR's dictionaries a compiled list of the NCBI's MeSH trees structures (available at http://www.nlm.nih.gov/mesh/trees.html) where every sub-heading category is parsed as a LAITOR concepts dictionary. Therefore, inform the MeSH sub-heading identifier followed by ".concepts" (Ex.: A10.concepts for "Tissues"; or C02.concepts for "Virus Diseases") as the value of $table_concepts_dump to configure this dictionary as LAITOR's default.
You find a list with the description of the MeSH concepts dictionaries in the mesh_description.txt file within the folter "docs".
IMPORTANT: place your new customized concepts dictionary in the "db_files" directory.
4. INSTALATION
After configure all the variables in the config_laitor.php script, you just need to run the script install_laitor.php on you working directory.
4.1. Install
To install LAITOR, simply type: php install_laitor.php install
4.2. Remove
To remove LAITOR, simply type: php install_laitor.php remove
5. RUNNING LAITOR
5.1. Preparing input (NLPROT Output)
As input, LAITOR receives the TEXT format of an tagging analysis performed by the software NLPROT. For that, users must type the format "txt" (text output) for the flag "-f" (format) at NLPROT's command line.
Example of NLPROT command line to create a LAITOR readable format:
~$ nlprot -i input -f txt -s off -o input.nlprot
NLPROT Documentation can be found at http://cubic.bioc.columbia.edu/services/nlprot/
5.2. Command line
LAITOR needs a simple command line to run. The command line receives the following values (flags):
-i [String] INPUT: you must inform the name of the NLPROT tagged file to perform the co-occurrence analysis;
-c [String] CONCEPTS DICTIONARY: if you would like to search for terms from a customized concepts dictionary different of the one innitially configured in the MySQL database, inform a file path here. By doing that, LAITOR will use the content of this file as a concept dictionary.
-t [Integer] TYPE OF CO-OCCURRENCES: inform which type of co-occurrences LAITOR should search for as explained in LAITOR publication, refer in the next section for the type of co-occurrences explanation
Example of command line:
~$ php laitor.php -i ./examples/input_example -c ./db_files/stimuli.dump -t 1
NOTE: in this example note that the dictionary of concepts (-c) is located in the "./db_files/stimuli.dump". If you do not inform the value for this flag or inform "null", LAITOR adopts the MySQL preloaded concept dictionary as default, you also can load a different concept dictionary (Ex.: one of the distributed MeSH dictionaries) from the ./db_files directory in the command line (see item 3.2.3);
6. OUTPUT
LAITOR generates six output files for a simple analysis, which are explained below:
6.1. Syntactic analysis (*.laitor)
In this file, every protein name tagged for such an abstract and validated against the gene name dictionary is placed below the PMID ("tagged proteins") for the article.
The abstract is divided into sentences ("Syntactic analysis") and the co-occurrences pairs for every type are placed immediately below the sentences in which they have been extracted. These co-occurrence lines begin always with the tag "INT_X" where "X" is the type of co-occurrence from 1 to 3 that has been extracted at the level of the sentence.
Moreover, after the last line for each article, in the cases where users have selected the type 4 of co-occurrence in the command line, LAITOR created a list of the "protein pairs co-occurring in the whole abstract".
The co-occurrence report lines reports follow the following tab delimited structure:
Moreover, in this line the data is TAB delimited, and the fields are the following:
1: Co-occurrence type;
2: Pair 1;
3: *Biointeractions terms ("|" delimited);
4: Pair 2;
5: Pubmed ID;
6: *Line;
7: *Concepts;
NOTE: Fields marked with "*" are not applicable to the co-occurrences of type 4, as the whole abstract is considered for this type of co-occurrence.
6.2. Co-occurrences (*.co)
In this file all co-occurrence report lines are reported.
6.3. Co-occurrences in HTML format (*_output.html)
In this file all co-occurrence report lines are reported in HTML format.
6.4. Co-occurrences per term (*_report_per_gene.html)
This file groups the co-occurrence identified for each term filtered in the analysis in a HTML format.
6.5. Medusa network (*_medusa.txt)
Graphical representation of extracted co-occurrences in 2D format by the program EMBL Medusa.
6.6. Arena3d network (*_arena.txt)
Graphical representation of extracted co-occurrences in 3D format by the program EMBL Arena3D.
7. Third-part software for output visualization
7.1. EMBL Medusa: http://coot.embl.de/medusa/
7.2. EMBL Arena3D: http://arena3d.org
8. Contact
Adriano Barbosa-Silva
Postdoctoral Fellow
Computational Biology and Data Mining Group
Max-Delbrueck-Center for Molecular Medicine
Robert-Roessle-Str. 10
D-13125 Berlin
Tel: +49 30 9406 4307
Fax: +49 30 9406 4240
Web: http://cbdm.mdc-berlin.de/adriano
END OF README