Menu

TechnicalManual14

Aleksi Kallio

Technical manual for Chipster 1.4

Note! This is an unmaintained archive site.
Wiki has been moved to Github and current documentation is available at https://github.com/chipster/chipster/wiki

The manual covers Chipster platform version 1.4 and older. It instructs in setting up your own Chipster server, adding your own tools into Chipster, and more. For the user manual, please see https://extras.csc.fi/biosciences/chipster-manual/.

Introduction

Chipster is a versatile data analysis platform with an intuitive graphical user interface. The version 1.4 of the platform has been mainly used for microarray and proteomics data.

In the basic setup, Chipster is a client-server system. Chipster server can be run on a single server computer or even a laptop. The Chipster server itself actually contains multiple independent services, so it can be scaled across a cluster of servers to distribute computational and data transfer load.

The system consists of  compute, authentication, management and logging services, and message and file brokers, which act as communication channels between the components.

System installation

System installation in Linux

These are instructions for installation using the automatic tools provided in the installation package.

0) Requirements

Following software needs to be installed:

The following tcp ports need to be open in the firewall:

  • 61616 for message broker service
  • 8080 for file broker service
  • 8081 for webstart service (optional)

1) Downloading and extracting

Installation packages can be obtained from http://chipster.sourceforge.net/downloads.shtml.

After downloading extract the tar archive. It contains directory "chipster", where all components are in their own subdirectories. It can be placed anywhere, but usually /opt/chipster is used.

Downloading and extraction can be done easily on command line (we use version 1.4.7 here):

cd /opt
wget http://www.nic.funet.fi/pub/sci/molbio/chipster/dist/versions/1.4.7/chipster-1.4.7.tar.gz
tar -xzf chipster-1.1.2.tar.gz

2) Installing external tools

No external tools are needed to start the server environment, but for the microarray analysis tools to work, R and a collection of libraries are needed. You can skip this step if you just want to get the system running first.

If you have installed R to default location /opt/chipster/tools/R-2.9.0, you can install the R libraries needed by Chipster with the setup tool directly. Otherwise you have to update comp/conf/environment.xml first with the correct location of the R binary. Next run (as root if needed):

./setup.sh

For more information on setup tool see Setup tool section.

The setup tool will print out instruction for carrying out the remaining installation steps for additional tools and databases.

3) Configuring Chipster services

To configure the Chipster services, run the following two scripts. Both scripts will ask for confirmation before writing changes to files. Defaults should be fine for a local installation.

./configure.sh
./genpasswd.sh

* configure.sh* configures all the components, and genpasswd.sh generates secure passwords that server components use to authenticate each other.

4) Starting and stopping services

To start all the Chipster services, run:

./chipster start

In addition to start, you can also use stop, restart, and status.

5) Testing installation

To start the client using Java Web Start, go to the Web Start address specified when running the configure.sh. Default address is:

http://hostname-of-this-machine:8081

Note! Java Web Start server (Jetty) is not bundled to backported versions 1.1.x. You have to set up your own web server for serving Web Start files.

To start the client locally (on the same machine as the services), run:

./client/bin/chipster-client

The default username/password is chipster/chipster. Users can be added by editing the userlist at auth/security/users. Chipster also supports several more advanced authentication providers.

6) Starting services at boot time

The steps needed for making services start at boot time are somewhat system dependent. In most Linux systems two steps are needed:

  • Make link from /etc/init.d/ to the executable of the service, for example /etc/init.d/chipster-auth -> /opt/chipster/auth/bin/chipster-auth.
  • Make links from /etc/rcX.d to the link at /etc/init.d to define the runlevels at which the service is started (typically 3).

You can also control Chipster as a single service

  • Make link from /etc/init.d/ to the Chipster service script chipster/chipster

In Red Hat Linux chkconfig can take care of creating the runlevel links, and you can use service <service_name> start | stop | status | console to control services.

Please note that brokers must be started before other components can be started. This is taken care for you if you use the single service option.

Tool installation in Linux

One of the key ideas behind Chipster is to take all the high quality tools in the relevant field of data analysis and integrate them together. For the end user, this is great. Unfortunately for the person installing the system the situation is not that optimal. We really wish that substantial amount of quality data analysis algorithms were available in some clean, platform independent format and we could simply distribute them just like we distribute Chipster itself. But it is not reality, at least yet. So we have to face the facts and install different analysis applications the way the original author had in mind. Fortunately most of Chipster's analysis functionality is based on R and Bioconductor, which makes life a lot easier already.

What people developing Chipster have done to help you in this?

  • We have ready made environments available for some platforms
  • We have specified exactly what is needed to run Chipster (see chipster/comp/conf/environment.xml)
  • We do not require any of the external dependencies to be there, you just won't be able to use some of the tools
  • We have written a little tool that does most of the installation automatically (setup.sh)
  • We try to keep dependencies to minimum
  • And we have tried to document everything

And of course we are constantly improving the support for external applications. Feedback and suggestions are always welcome.

What do "tools" mean?

By external applications we mean the computational environment needed to run Chipster compute service. Chipster itself is plain Java and does not have any dependencies to external application other than Java Runtime Environment. We do package Chipster with Tanuki Software's free Java Service Wrapper for convenience, but using the wrapper is not required. So, without the external applications in place your compute service will boot up, but will not be able to run successfully any analysis jobs.

External dependencies can be divided to 3 layers.

  1. OS level packages
  2. external applications and databases (R and others)
  3. R packages

Level 1 contains a collection of operating system packages that are required for applications at levels 2 and 3 to work. Naturally level 1 is OS specific and so the packages are installed into OS specific locations using OS specific tools (typically apt-get or yum). Levels 2 and 3 are contained in the Chipster tools directory. The most important application at level 2 is R, as it hosts most of the analysis functionality and is also the basis for layer 3. Rest of the external applications are explained in more detail in next section. There are also some simple databases, i.e. plain files, that reside on layer 2. The R specific layer 3 consist mostly of CRAN and Bioconductor packages, with some additional third party packages. They are installed using the standard R installation methods and will be located in chipster/tools/R-<version>/library. Setup tool should handle layer 3 automatically without trouble.

Chipster tool directory or tool home is the place to store all external dependencies (except for OS packages). By default it is /opt/chipster/tools (since version 1.4.0, earlier versions had a non-standardised approach). Analysis scripts have access to tool directory path via a variable so that they can access external applications and databases. You need to configure tool home to chipster/comp/conf/runtimes.xml if you change it.

Basic installation with setup.sh

It is advisable to first install R. After that check that the R installation directory corresponds to that in chipster/comp/conf/environment.xml and edit the file if needed.

Setup tool should help you through rest of the installation. It can be started by issuing the following (as root, if needed).

./setup.sh

For more information on using the tool, please see Setup tool documentation.

With the help of output from the tool and this document, you should be able to get rest of the external applications installed. Should you run into trouble, don't hesitate to contact Chipster development team.

More information on external applications

Chipster relies on external programs other than R for some areas, such as promoter analysis, multiple sequence alignment and phylogenetic analysis. The authorative list of external applications are contained inside the setup tool, but we also maintain a more user friendly copy here. However if there is a conflict between the two, this list is the outdated one.

When all the external applications are installed, the Chipster tool directory should look like the following.

/opt/chipster/tools/
  ClusterBuster
  weeder
  R-&lt;version&gt;

ClusterBuster

ClusterBuster is needed in promoter analysis for searching the known transcription factor binding sites. It uses Jaspar database that needs to be downloaded separately.

Manual installation:

Weeder

Weeder is used in promoter analysis for inferring the common elements in the promoter sequences, i.e., finding possible unknown transcription factor binding sites. Promoter analysis tools need the promoter sequences in a suitable format and they need to be downloaded separately.

Manual installation:

Muscle (old)

Needed only for previous versions of Chipster (1.4 and earlier)!

Muscle is used for sequence alignment.

ClustalW (old)

Needed only for previous versions of Chipster (1.4 and earlier)!

ClustalW is used for phylogenetic analysis.

  • Download ClustalW v. 2.0.4 from ftp://ftp.ebi.ac.uk/pub/software/clustalw2
  • Install to clustal in Chipster tool directory

RAxML (old)

Needed only for previous versions of Chipster (1.4 and earlier)!

RAxML is used for phylogenetic analysis.

Client installation in Linux

Client installs automatically with Java Web Start.

Installation in Mac OS X

Chipster client is fully Mac OS X compatible and supported on Mac platforms. It installs automatically with Java Web Start.

Chipster server has experimental support for Mac OS X from version 1.4.7.

Installation in Windows

Chipster client is fully Windows compatible and supported on Windows platforms. It installs automatically with Java Web Start.

Chipster server has experimental support for Windows. As the bioinformatics tool environment is Unix oriented, doing a complete installation in Windows will require significant efforts.

System administration

Chipster architecture

The shortest description for Chipster architecture would be that it is very flexible. The Chipster environment is based on message oriented architecture (called also message passing architecture or message oriented middleware architecture). Components are connected using message broker (ActiveMQ). This results in a loosely coupled distributed system. Chipster is designed to be based on the idea of broadcast, allowing components to be unaware of each other. Also the system does not depend on the protocol used for communication.

The Chipster environment consists of the following components:

  • message broker (1 to many)
  • file broker (1 to many)
  • authenticator (1)
  • compute server (1 to many)
  • client (many)

All components can be added and removed on fly. In case there are multiple instances of a same component running there's no need for extra configuration, because, for example, multiple analysers can function without being aware of each other. This allows system administrator to add analyser components on fly if there is need for extra processing power, for example during large courses. Currently there can be only one and authenticator.

One of the key ideas in designing Chipster architecture was to carefully consider where each bit of the system's state is managed. Chipster client follows fat client paradigm where client is functionally rich. This decision was made to keep server environment simple and lightweight, to reduce number of messages, to distribute processing load (especially data visualisation) to clients and to allow improved user experience as client application is mostly independent of server components. As most of the relevant state has the same lifecycle as one client session, managing state at the client side is also logically a good solution.

Server components explained

Message Broker (ActiveMQ) acts as a central point of the system, passing messages in-between components. ActiveMQ supports broker redundancy for improving scalability and reliability, so multiple brokers can be used simultaneously.

File broker distributes files to other components, acting as a supplement to message broker. File distribution is based on pull mechanism, where components needing data go and retrieve needed files from the file broker. This way compute servers and clients can be behind firewalls. Using separate file broker also allows compute servers to use minimal disk space as files are cached at file server.

Authenticator processes requests from clients. Each request is examined, and if valid session exists for that client it is allowed to continue. Otherwise a request is made for user to authenticate and after a successfull authentication a new session is created. Authenticator supports many types of authentication sources (Unix passwd, JAAS, LDAP...), and can use them simultanously. Server components authenticate to broker using server specific keys, and are allowed to communicate directly without going through the authenticator. Authenticator is a separate component so that it can be deployed inside intranet, as it might need access to sensitive information such as user databases.

Compute service listens for computation requests. When client initiates a new task, all compute services with free resources reply and client decides which service gets to process the task. This way there is no single point of failure in distribution of tasks to server environment, and compute services can be modified easily on fly. Compute services must have identical tool descriptions, but tools can be activated and inactivated per compute service, allowing different tool selections on different servers.

Simple server installation

The simple way to install Chipster environment is to deploy all components to a single server and distribute clients by using Java Web Start.

All server components run inside their own directories, so having them on a single server does not require any special arrangements. Message broker and file broker are running in their respective ports, and other components connect to them using local network loopback.

Advanced server installation

A good guideline for setting up advanced installation is to dedicate an untrusted server for message broker and file broker components, as they are the only components that have open server ports. That server should not be inside organisations firewall, i.e., be in DMZ network. To secure user credentials, authenticator should be installed separately on a strongly protected machine.

It is possible to deploy multiple compute servers. All of them should have same tools descriptions, but it is possible to select enabled tools per server. It is also possible to configure maximum job counts. If you have many nodes available but they have also other use besides Chipster it is recommended to deploy compute servers on as many nodes as possible but limit the per server job count to keep Chipster from hogging all the resources. If there are memory intensive tools, it might be a good idea to deploy dedicated node for them with large memory and low maximum job count. Independent compute services can also be deployed to batch processing system (LSF etc.), following a worker paradigm.

Chipster and firewalls

One of the design guidelines in Chipster was to make it easily adaptable to various firewall configurations. Even though there are many server components, only message and file brokers are listening to open ports. In other words, they act as a hub to which other components connect to. Both of the components are designed so that they can be installed on a "untrusted" machine located in the DMZ. Compute and authentication services often have to be located inside intranet, which is not a problem as they do not act as servers from a networking point of view.

Client uses TCP or SSL to connect to message and file brokers. This communication can be configured to ports 80 and 443 to bypass strict firewalls. In some high security environments practically all network access is disabled, except for HTTP using local proxy. Currently Chipster does not use HTTP, so in this extreme case deployment is not possible without changes to firewall configuration. However routing messages through HTTP is supported by ActiveMQ message broker, so in future these scenarios might also be supported directly.

Upgrading server installation

Here you can find short instructions on updating existing Chipster server installation with a newer version. The basic principle is that non-backwards compatible changes are introduced between major versions (1.0, 1.1, etc.), so when upgrading between minor versions typically only Java libraries (JAR) need updating. Possible exceptions are documented here.

Upgrading from 1.4.x to later 1.4.x

  • Use the migration tool (migrate.sh)
  • The tool does not modify the old installation, but always make a backup of the old installation first!
  • In the later 1.4.x Chipster installation directory, command: ./migrate.sh /path/to/old/chipster
  • The tool copies and converts basic configuration and Web Start files. Custom changes such as custom Web Start pages and login modules must be copied manually.

Upgrading from 1.3.x to 1.4.x

  • Use the upgrade tool (upgrade.sh)
  • The tool does not modify the old installation, but always make a backup of the old installation first!
  • In the 1.4.x Chipster installation directory, command: ./upgrade.sh /path/to/old/chipster
  • The tool copies and converts basic configuration and Web Start files. Custom changes such as custom Web Start pages and login modules must be copied manually.
  • Note: version 1.4.x uses R 2.9.x instead of R 2.6.x in previous versions. It is recommended that you run configure.sh in the 1.4.x directory to configure path to your R 2.9.x installation (by default assumed /opt/chipster/tools/R-2.9.0).

Upgrading from 1.2.x to 1.3.x

  • Use the upgrade tool (upgrade.sh, since version 1.3.0)
  • The tool does not modify the old installation, but always make a backup of the old installation first!
  • In the 1.3.x Chipster installation directory, command: ./upgrade.sh /path/to/old/chipster
  • The tool copies and converts basic configuration and Web Start files. Custom changes such as custom Web Start pages and login modules must be copied manually.

Upgrading from 1.2.x to later 1.2.x

  • Versions 1.2.x use the same directory layout, so only Java libraries (JAR) need upgrading
  • Replace contents of chipster/shared/lib from the later 1.2.x installation package
  • Replace chipster/client/bin/chipster-current.jar from the later 1.2.x installation package
  • Replace chipster/webstart/web-content/lib/chipster-current.jar from the later 1.2.x installation package
  • Replace chipster/activemq/conf/activemq.xml from the later 1.2.x installation package

Upgrading from 1.1.2/1.1.3 to 1.2.x

  • Versions 1.1.2 and 1.1.3 use the same directory layout as 1.2.x, so only Java libraries (JAR) need upgrading
  • Replace contents of chipster/shared/lib from the 1.2.x installation package
  • Replace chipster/client/bin/chipster-current.jar from the 1.2.x installation package
  • Replace chipster/webstart/web-content/lib/chipster-current.jar from the 1.2.x installation package
  • Update users file: the new format is username:password:exp_date:comment, where two first are mandatory. So if you have comments, you should add one colon before them.

Upgrading from 1.1.2 or earlier

  • Versions 1.1.2 and earlier are to be considered obsolete. Full install from scratch is recommended.

Directory layout

Chipster directory layout has been revised in version 1.3.0. The layout is different on client and server sides. On client side the goal has been to make placement of files and directories to be compatitible with operating system specific conventions. On server side the goal has been to make the layout as coherent as possible (especially integrate well into Java Service Wrapper that wraps all the server components).

Client

Application data (logs, SSL keys, user preferences ) is stored in a one place and user data (sessions, workflows) in another.

  • Windows
    • Application data stored in Local Settings\Application Data\Chipster inside user's home directory
    • User data stored in My Documents inside user's home directory
  • Mac OS X
    • Application data stored in Library/Application Support/Chipster inside user's home directory
    • User data stored in My Documents inside user's home directory
  • Linux/Unix
    • Application data stored in .chipster inside user's home directory
    • User data stored in home directory, or Document or My Documents inside the home directory if they exist

If operating system is not recognised, we fall back to Linux/Unix. This is because most often esoteric OS's are Unix variants.

Server on Linux

Typically Chipster is installed to /opt/chipster. Inside the installation directory there is a shared directory and several independent component directories (that depend on the shared directory). The contents of the shared directory are given below.

* chipster/shared
  * bin - generic executable files
  * lib - Java JAR and platform specific libraries
  * lib-src - source codes for libraries that require source code to be distributed together (LGPL)

All of the component directories follow the same basic layout. The contents of the components directories are given below. "Wrapper" means here Java Service Wrapper that is bundled with Chipster server installation.

* chipster/

ActiveMQ uses it's own directory layout. See ActiveMQ documentation for more information.

Configuration system

This manual describes the revised configuration system that was introduced in Chipster 1.3.0.

Configuring Chipster

If you just want to get your Chipster up and running, execute configure.sh script and your done! If you want to know more about Chipster configuration system, then read on.

Chipster stores application configuration to a file called chipster-config.xml. It is located either in a conf subdirectory (see directory layout) or loaded dynamically via URL. The former approach is meant for server components and the latter for clients starting over Java Web Start. The configuration file is not created automatically any more, but it must always exists (locally or behind an URL).

The configuration is loaded in two steps. First an internal default configuration is loaded (chipster-config-specification.xml, located inside the Chipster JAR) and then the normal configuration file chipster-config.xml. The latter contains only information that needs to be set per instance basis, so it is quite minimalistic. However it is possible to overwrite configuration entries of the internal default configuration using the normal configuration file. Just include the entry in the file and it will replace the default one.

The recommended way to configure a new Chipster instance is to use the configure.sh script located at the installation root directory. It will configure all the components and the Web Start client descriptor. You can also modify the configuration files manually. For information on meaning of the different configuration entries, please refer to http://code.google.com/p/chipster/sou.../chipster-config-specification.xml in the code repository.

Loading configuration over URL

Each Chipster component (client, analysis server, file broker etc.) has its own configuration file. If configuration file is not explicitly specified, chipster-config.xml is used. Configuration can be loaded over URL by passing an argument -config <url> at component startup. You can also specify a local file (e.g. -config file:/path/to/config.xml). For Web Start clients configuration file can be set in the chipster.jnlp descriptor file. Using this mechanism allows to manage the configuration (such as the address of the broker server) centrally. Previously a default configuration was created and parts of it overridden from the chipster.jnlp descriptor. The mechanism was changed to be simpler and easier to manage.

The configuration file

The configuration file chipster-config.xml contains all the configuration that different components require. See below for an example configuration file of a file broker component.

&lt;configuration content-version="3"&gt;

    &lt;configuration-module moduleId="messaging"&gt;

        &lt;entry entryKey="broker-host"&gt;
            &lt;value&gt;&lt;/value&gt;
        &lt;/entry&gt;

        &lt;entry entryKey="broker-protocol"&gt;
            &lt;value&gt;&lt;/value&gt;
        &lt;/entry&gt;

        &lt;entry entryKey="broker-port"&gt;
            &lt;value&gt;&lt;/value&gt;
        &lt;/entry&gt;

    &lt;/configuration-module&gt;

    &lt;configuration-module moduleId="security"&gt;

        &lt;entry entryKey="username"&gt;
            &lt;value&gt;filebroker&lt;/value&gt;
        &lt;/entry&gt;

        &lt;entry entryKey="password"&gt;
            &lt;value&gt;filebroker&lt;/value&gt;
        &lt;/entry&gt;

    &lt;/configuration-module&gt;

    &lt;configuration-module moduleId="filebroker"&gt;

        &lt;entry entryKey="url"&gt;
            &lt;value&gt;http://chipster.example.com:8080&lt;/value&gt;
        &lt;/entry&gt;

        &lt;entry entryKey="port"&gt;
            &lt;value&gt;8080&lt;/value&gt;
              &lt;/entry&gt;

    &lt;/configuration-module&gt;

&lt;/configuration&gt;

The file contains several modules (XML element configuration-module), and the selection of modules varies between different components. Modules security and messaging are related to how Chipster node connects to messaging fabric and are always required. Additionally, there are node specific modules, such as filebroker in the example.

Inside the module, there are configuration entries (XML element entry). Every entry has a key (XML attribute entryKey) and it contains one or more values (XML element value).

Programming API

Configuration can be accessed programmatically as shown below.

DirectoryLayout.initialiseServerLayout(Arrays.asList(new String[] {}));
Configuration configuration = DirectoryLayout.getInstance().getConfiguration();

First directory layout must be initialised. Here we initialised server layout and do not specify any node specific configuration modules that need to exist. Next we fetch a fi.csc.microarray.config.Configuration object that can be used to read configuration modules and entries.

Secure communications

Setting up SSL

By default Chipster server installation uses plain TCP for communication. Setting up SSL is not trivial when using Java's default implementation, so it is not done by default. However here you'll find instructions on how to do it.

Step 1. Locate keystore

You can either use the keystore that is bundled with Chipster clients and generate your own (see [#Generating_SSL_keys]). Save it to file keystore.ks.

Step 2. Configure message broker

You need to:

  • copy keystore.ks to chipster/activemq/conf
  • open chipster/activemq/bin/<platform>/wrapper.conf, uncomment and edit the following settings
    • javax.net.ssl.keystorePassword=microarray (or whatever you have used)
    • javax.net.ssl.keystore=%ACTIVEMQ_BASE/conf/keystore.ks
  • open chipster/activemq/conf/activemq.xml and change protocol to "ssl" (you can change port also)

Step 3. Configure Chipster components

For each of the server components, you need to:

  • copy keystore.ks to chipster/<component>/security
  • open chipster/<component>/conf/chipster-config.xml and in module "messaging" change protocol to "ssl" (you can change port also)

That's it. You also need to change setting in the module "security" if you have used other than default values; see [#Generating_SSL_keys] for more details.

Generating SSL keys

Chipster comes with dummy keystore that gets you going with SSL. If you want to use SSL not only for encrypting communication but also establishing trust between server components and clients, you have to replace these publicly available keys with your own ones. Chipster uses Java's normal SSL implementation. Keystore can be manipulated as explained in Security documentation, so you can also use your existing keys.

Here we describe how you can generate your own SSL keys. Please note that these keys will not be approved by any Certificate Authority, and cause warnings if used outside of Chipster environment.

Step 1. Generate a new keystore

Keys can be generated using Java's keytool-application.

Generate key using keytool:

keytool -genkey -alias your_key_alias -dname "cn=Your name or organisation, ou=Your name or organisation, o=Your name or organisation, c=your_country_code" -validity 1800 -keyalg RSA -keystore keystore.ks

keytool will ask your keystore password (twice). You can choose any name (alias) for the key and you can use any password you want. The dummy keystore uses "client" as key alias and "microarray" as keystore password.

Next we need to set up trust for the newly generated key. It is done by exporting and importing the certificate.

keytool -exportcert -alias your_key_alias -file cert -keystore keystore.ks
keytool -importcert -alias your_trusted_key_alias -file cert -keystore keystore.ks

You can choose any name (alias) for the trusted key. The dummy keystore uses "microarray" and that is also the default in Chipster SSL configuration.

Now we have set up another dummy keystore. To actually set up trust between communication endpoints, read the next step.

Step 2. Distribute keystore

Chipster components have subdirectory "security" where keystore is stored in file keystore.ks, and message broker stores keystore in "conf" subdirectory. You can replace it with your newly generated keystore. If you wish to establish trust between different Chipster components, you should generate at least two dedicated keys: one for clients and one for server components. You might also generate a dedicated key for each server component.

Step 3. Update configuration

After deploying new keystore you have to configure modules to understand them. If you used default trusted key alias or keystore password, no changes are required. Keystore related settings are placed to configuration module "security", in configuration files chipster-config.xml.

&lt;configuration-module moduleId="security" description="encryption and authentication"&gt;
  &lt;entry entryKey="keystore" type="string" description="keystore file for SSL"&gt;
    &lt;value&gt;${chipster_security_dir}/keystore.ks&lt;/value&gt;
  &lt;/entry&gt;

  &lt;entry entryKey="keypass" type="string" description="keystore password for SSL"&gt;
    &lt;value&gt;microarray&lt;/value&gt;
  &lt;/entry&gt;

  &lt;entry entryKey="keyalias" type="string" description="alias of key to be used for SSL"&gt;
    &lt;value&gt;microarray&lt;/value&gt;
  &lt;/entry&gt;

        ...

Default configuration does not have SSL specific settings, so you need to add those entries. You should update values for "keypass" and "keyalias" to reflect appropriate settings for each component. The key alias refers to the trusted key, not the private key. The alias of the private key needs not to be configured, but the key needs to be in the keystore anyway. You can also change keystore path if you don't wish to store the keystore inside the "security" directory.

Authentication

Users file

The simplest supported authentication mechanism is the users file in auth/security/users. The format is:

&lt;username&gt;:&lt;password&gt;:&lt;exp. date as YYYY-MM-DD&gt;:comment

Only username and password are required. Blank lines and comment lines starting with # are allowed.

LDAP

See Authentication via LDAP.

Tool development

Writing Chipster tools

Basically, you have to do three things:

  • provide the R script
  • write a description, so that the script can be run and shown in the client application
  • make compute service aware of the script

You should also follow conventions for Chipster analysis tools.

Making R scripts Chipster compatible

Chipster uses regular R scripts. The only thing to remember is that interactive functions can not be used.

Before running the script, the system runs the following initialisation snippet:

setwd(".")

The script should output results in table format to a file specified in description header. So, for example like this:

write.table(mytable, file="results.txt", quote=FALSE, col.names=FALSE, row.names=FALSE)

Writing VVSADL header

R-scripts must be described by using a Chipster specific description format called VVSADL (Very Very Simple Analysis Description Language). A VVSADL-snippet is added in comments as a header to a R-script file. The specification for the VVSADL can be found from the Javadoc documentation of the class http://chipster.csc.fi/javadocs/1.4.0.../VVSADLSyntax.html|fi.csc.microarray.VVSADLSyntax.

Here is an example of a VVSADL snippet to attach before of the actual R code:

# ANALYSIS Test/test (Just a test analysis for development)
# INPUT CDNA microarray[...].txt OUTPUT results.txt, messages.txt
# PARAMETER value1 INTEGER FROM 0 TO 200 DEFAULT 10 (the first value of the result set)
# PARAMETER value2 DECIMAL FROM 0 TO 200 DEFAULT 20 (the second value of the result set)
# PARAMETER value3 DECIMAL FROM 0 TO 200 DEFAULT 30.2 (the third value of the result set)
# PARAMETER method PERCENT DEFAULT 34 (how much we need)
# PARAMETER method [linear, logarithmic, exponential] DEFAULT logarithmic (which method to apply)
# PARAMETER genename STRING DEFAULT at_something (which gene we are interested in)
# PARAMETER key COLNAME (which column we use as a key)

The analysis function named "test" is added to category "Test". Multi word names must be put into quotation marks. The function has a description "Just a test analysis for development". It takes a set of input files called microarray0.txt, microarray1.txt etc. in cDNA-format (tab separated values), so the script can assume that the files exist in the working directory when it is called by the compute service. It outputs results to a file "results.txt" and execution related extra information to a file "messages.txt". The compute service assumes that the files exist after the script is run.
Then we have to define parameters. They are made available by the compute service as R variables. So, for example, the script can assume to have an integer variable called value1 with a value in between of 0 and 200. Parameters must be given a name and a type, and they can be given a range and a default value, and must be given a description (in parentheses).
Everything should be in the same order is in the example snippet. So, for example, parameters have to be described after input/output. Only the first line (ANALYSIS...) is compulsory.

VVSADL input and output files

Analysis tools inputs and outputs are files. Chipster system takes care that all input files are in place before the analysis tool is started and it is tool's responsibility to take care that all output files are produced when it ends.

Input and output definition line has format:

# INPUT &lt;TYPE&gt; files[...].tsv OUTPUT results.tsv

Input files need to have a specified type. They can be either single files or a series of files, which are denoted with [...]. Output files do not have types and they are always single files.

It is possible to combine input and output definitions with comma, for example:

# INPUT &lt;TYPE1&gt; file.tsv, &lt;TYPE2&gt; files[...].tsv OUTPUT results1.tsv, results2.png, results3.mp3

In this case, possible set of input files could be for example file.tsv, files1.tsv, files2.tsv and files3.tsv. Analysis tool would have to produce results1.tsv, results2.png and results3.mp3 always.

Supported input types are:

  • CDNA
    • Is a generic raw microarray dataset
    • Must not have a separate textual header (i.e., must begin with header row)
    • Must have column "sample"
  • AFFY
    • Is a Affymetrix CEL-file (text or binary)
    • Must have CEL-header
    • Must have extension .cel
  • GENE_EXPRS
    • Is a expression value file
    • Must not have a separate textual header (i.e., must begin with header row)
    • Must have column or columns with names starting with "chip."
  • GENELIST
    • Is a list of gene/probeset names
    • Must not have a separate textual header (i.e., must begin with header row)
    • Must contain column "identifier" or row name column with empty name (R style)
  • GENERIC
    • Can be any file

Definitions are inclusive, so tables can contain other columns besides the required ones (and usually do).

Input binding at client

The analysis tool defines a set of input files. At the client side, when an operation is selected the client compares the set of selected datasets to what the operation specifies. So to speak in computer science terms, it does binding of concrete inputs to formal inputs. In a nutshell, formal inputs (as defined by the operation) are bound to concrete inputs (as chosen by user) using greedy and order-based algorithm. Formal inputs are processed in order and first fitting concrete input is bound. If formal input can have multiple concrete inputs (it is series of files), then all fitting ones are bound (greedy binding). Always at least one concrete input must be bound and a single concrete input cannot be bound multiple times. In the end all concrete inputs must be bound.

The one (and only) exception is phenodata. User does not need to select the phenodata dataset, but instead it is located automatically by looking at the relations between datasets. So phenodata is "inherited" when for example normalised dataset is filtered to produce a new dataset. It is possible for user to select the appropriate phenodata explicitly, but as it would be cumbersome in most cases, automatical deduction is offered.

When binding a formal input to a concrete dataset, client looks at the dataset and sees if it matches the type of the input. Basically it either looks at the file extension (.cel for example) or at the names of the table columns ("sample" for example).

Output post-processing at client

Output files are connected to input files with a derivation type of link and placed to a proper folder. However phenodata file gets special treatment from client. When a script produces phenodata file, client recognises it and does a couple of special things:

  • creates annotation links to other results (the dotted line)
  • creates original_name column by looking at the input dataset names
    • not created if column already exists
  • creates description column by looking at the input dataset names
    • not created if column already exists

VVSADL parameters

Parameters allow user to control the behavior of an analysis tool. They are shown in the graphical parameter panel in the Chipster user interface and stored to variables when running a tool.

Parameter definition format is:

PARAMETER name &lt;TYPE&gt; FROM min_value TO max_value DEFAULT def_value (description)

FROM, TO and DEFAULT are optional. Description can be left blank.

Valid parameter types are:

  • INTEGER
    • For integer values
    • Represented as a text box in GUI
  • DECIMAL
    • For decimal values
    • Represented as a text box in GUI
  • PERCENT
    • For percentages (integer from between 0 - 100)
    • Might be removed in future, if there is no need for this
    • Represented as a slider in GUI
  • STRING
    • For free string values
    • Represented as a text box in GUI
  • [val1, val2, val3]
    • For enumerated values (selection from a predefined list)
    • Valid values are given in block parenthesis
    • Represented as a drop-down list in GUI
  • COLUMN_SEL
    • For selecting one column from the input dataset
    • Possible values are read from the input dataset
    • In case of multiple inputs, present in all of them
    • Can also be empty
    • Represented as a drop-down list in GUI
  • METACOLUMN_SEL
    • For selecting one column from the phenodata
    • Behaves exactly like COLUMN_SEL, but uses phenodata as input dataset

Numeric parameters allow also minimum and maximum values to be set, by using keywords FROM and TO after the parameter type. All parameters allow a default value, which is given by using the keyword DEFAULT. The default value must be a valid value for the parameter. User interface implements validity checking in real time, so writing "one" to a INTEGER text box or "10" to a INTEGER text box with maximum 5 results in immediate error shown in the parameter panel.

Validating annotated R scripts

A validator is provided in the Chipster distribution that allows you to check script syntax before trying to deploy it. You can trigger the validator with command line parameter rcheck, followed with script name. So, for example:

java -jar chipster-x.x.x.jar rcheck my_scripts/script.R

Making compute service aware of the script

It is possible to override scripts running at a compute service for making fixes, local modifications etc. You need to create a directory comp/custom-scripts/<RUNTIME TOOL PATH>, where <RUNTIME TOOL PATH> matches the toolPath parameter of the corresponding runtime in comp/conf/runtimes.xml (before version 1.3.0, the correct path is always comp/custom-scripts/R). Then place a script with the correct file name to that directory (from example, stat-two-groups.R for Two groups test). Next time the script is used the compute service will read the updated script and use it instead.

To check the script file names of different analysis tools, you have to unzip chipster-x-x-x.jar. In future this functionality will be made a lot more elegant with a new analysis module system.

Current limitations:

  • overriding scripts changes the order of analysis tools in client (overridden scripts will be last)
  • when adding new scripts (not overriding existing), compute service must be rebooted for them to become visible
  • it is your responsibility to make sure that custom-scripts are synchronised (or at least not in conflict) across multiple compute services

(adding new scripts was introduced in version 1.2.3.)

You can also change the header of the script with custom-scripts, including the name and category of the script. Categories are created on fly.

Customising tool selection

It is possible to configure compute service instance to run only a given set of tools or all but a given set of tools. Using this mechanism different tools can be distributed to different services, for example to deploy some tools to a special platform or to run different versions of R/Bioconductor at the same time.

Tool (analysis operation) includes and excludes are configured in comp/conf/chipster-config.xml. Naturally only one of the two can be used at a time.

Here's an example that uses first service to run only Affymetrix normalisation tool and the second service all the rest.

The comp/conf/chipster-config.xml of the first service contains:

&lt;entry entryKey="includeOperations"&gt;
&lt;value&gt;/R/norm-affy.R&lt;/value&gt;
&lt;/entry&gt;

The comp/conf/chipster-config.xml of the second service contains:

&lt;entry entryKey="excludeOperations"&gt;
&lt;value&gt;/R/norm-affy.R&lt;/value&gt;
&lt;/entry&gt;

Tools conventions

The goal in Chipster is always to produce a coherent user experience. Here are some conventions that can be useful when integrating tools into Chipster and should be followed when writing tools that are to be integrated into Chipster main repository.

  • The default data format is TSV (tab separated values), with one row for each gene or probeset
  • The first column should be unnamed or "identifier" and contain the gene/probeset name
  • Tool should not removed any existing columns unless the row structure is changed
    • In other words, inputs can have annotation etc. data that just passes through analysis steps

FAQ

Latest version of Chipster

Q: I have installed my own server environment. When testing it with a client, everything works fine until I try to run an analysis job. Job proceeds normally, but client fails when result data is returned.
A: Look at the URL the client is trying to load and see if you can open it with a browser on the client computer. The most common problem is that compute service has been configured to use localhost as the fileserver. It works inside the compute service if it is running on the same machine, but fails on a remote client because the URL generated to result message points to localhost. You should never use localhost as the fileserver address.

Q: Chipster seems to ignore Java proxy settings and our firewall allows connections only through proxy.
A: As of version 1.3.0, by default Chipster ignores proxy settings and always uses direct connection. As of version 1.4.0, it is possible the disable the override and make Chipster to use Java proxy settings. In chipster-config.xml, add the following under the module messaging:

&lt;entry entryKey="disable-proxy" type="boolean" description="should we ignore Java proxy settings and connect directly"&gt;
&lt;value&gt;false&lt;/value&gt;
&lt;/entry&gt;

The change needs to made to chipster-config.xml of clients. In normal setups it is served by webstart server and will be in effect when clients are restarted.

Q: Client application fails to start with UnknownHostException.
A: You are running a Linux workstation (say "foobar") and startup fails with "fi.csc.microarray.MicroarrayException: could not connect to message broker at ssl://chipster.csc.fi:61617 (Could not connect to broker URL: ssl://chipster.csc.fi:61617. Reason: java.net.UnknownHostException: foobar: foobar)". The problem is that your hostname cannot be resolved for your workstation (Java SSL requires that hostnames can be resolved for both endpoints). Try "host foobar" on shell. If it says "host not found" your network is a bit problematic. You can add "foobar" to your /etc/hosts after localhost, like "127.0.0.1 localhost foobar", and it should work. You can also contact system administrator to find out why your hostname cannot be resolved.

Q: Starting Chipster server environment results in: "Could not detect hardware architecture, please set platform manually."
A: If hardware architecture is not detected automatically, it can be set manually by editing all instances of chipster-generic.sh. Architecture is configured by changing the PLATFORM line to match your hardware architecture (see comment above the line for options).

Q: I get "RSA premaster secret error" when trying to run Chipster server.
A: Some JRE's are not bundled with complete security files (needed by Chipster for SSL). Installing "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files" should fix it. They can be installed using your system's package manager (if available there) or from http://java.sun.com/javase/downloads/index_jdk5.jsp.

Q: Attempts to start client always end with: "fi.csc.microarray.MicroarrayException: could not connect to message broker at ssl://chipster.csc.fi:61617 (Could not connect to broker URL: ssl://chipster.csc.fi:61617. Reason: java.net.ConnectException: Connection timed out: connect)".
A: If broker is running properly, the reason is a firewall blocking communication between servers and client. To configure firewall, the default configuration of Chipster needs port 61616 (TCP) or 61617 (SSL) for messaging and port 8080 (HTTP) for file transfers. Also make sure that Java is not configured to use a non-compliant proxy server for HTTP.

Old versions of Chipster

Q: All attempts to run analysis jobs fail with complaints about illegal/bad files. When I look at working directories in server side, files seem to be a lot smaller than they were originally. There are "Error writing request body to server" messages in client's nami.log.
A: File transfers are being messed up by a HTTP proxy server. Chipster uses HTTP chunked mode by default and some proxies do not support it properly. As of version 1.3.0, Chipster ignores proxy settings and always uses direct connection. On earlier versions you can disable chunked mode from client's nami-config.xml by changing the entry use_chunked_http. This has the effect of loading all files to memory during transmit, which makes large transfers impossible. Better solution is to configure Java Web Start proxy settings. They can be accessed using Java Application Cache Viewer ("javaws" at command line) and going to Edit -> Preferences -> General -> Network settings. Configuring proxy manually and checking Bypass proxy server for local address should help if you use the system only inside your local network. Direct connection should always work, unless direct connections are blocked by a local firewall. We have seen reports that the default Use browser settings does not always work, even if proxy is properly configured to your browser.

Q: I have installed my own server environment. I systematically get " caught segfault address 0x10, cause 'memory not mapped'" for some CEL files when trying to do normalisation or quality control.
A: There is a bug in Bioconductor affyio package that causes segfault when reading new Affymetrix two plus chips. It is fixed in R 2.7 / Bioconductor 2.2. If you are running version 1.2.x, you can deploy a new compute service and configure it to use R 2.7.1 / Bioconductor 2.2. By using the excludeOperations setting in the old compute service and includeOperations setting in the new you can choose which scripts are run under newer R. There is a R 2.7.1 version of the three affected scripts in scripts/R/tools-272 inside the chipster-1.2.x.jar. The whole system cannot be run under R 2.7.2, but requires R 2.6.1. In future running different R versions will be made easier.

Q: Under heavy load ActiveMQ broker hangs. In some cases there is "Async error occurred: java.lang.OutOfMemoryError: unable to create new native thread" in the logs (but not necessarily always).
A: The reason is that the broker runs out of threads. By default, ActiveMQ creates a new thread for every JMS Destination and does not release them until the session is closed, i.e., client is shut down. If you have for example a full day course where clients are not shut down for a day and a lot of jobs are run, broker can eventually die. And because there is a clever reconnect mechanism, restarting the broker only helps for some time as connections are re-established and threads recreated. Disabling the dedicated task runner makes the broker not create dedicated threads for every temporary topic. It can be done by changing UseDedicatedTaskRunner to "false" in chipster/activemq/bin/<PLATFORM>/wrapper.conf. After the change the configuration line should look like:

wrapper.java.additional.8=-Dorg.apache.activemq.UseDedicatedTaskRunner=false

The setting UseDedicatedTaskRunner=false is the default in the bundled ActiveMQ of Chipster version 1.3.0 and later.


Related

Wiki: ChipsterVsRVersions
Wiki: DirectoryLayout
Wiki: LDAP
Wiki: Main_Page
Wiki: SetupTool

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.