TVscript Wiki

Exploration of the removal of count variable transcripts.

Brought to you by: diana-lobo, john-archer

Quick Start

Back
1. Obtaining TVscript

1.1 A zip file (tvscript.zip) containing the tvscript.jar file, license agreement, quick start guide and test data can be downloaded from the Files tab of the SourceForge url: https://sourceforge.net/projects/tvscript/.

1.2 TVscript has been tested on Ubuntu 20.04, Windows 10 and MacOS High Sierra, but it is usable on any operating system with installed Java Runtime Environment (JRE) 8.0 or higher. To find out what version of Java is running open a terminal window and type java -version. If an update is required the latest JRE's can be obtained from the Oracle website: https://www.oracle.com/java/technologies/javase-downloads.html

1.3 Extract the contents of the .zip file and place the TVscript.jar file within the desired folder. Make sure permissions are set on this file so that it can be executed. To do this right click and use the properties tab OR chmod the file (sudo chmod +x).

2. Running TVscript

As long as the TVscript.jar file has been given permission to be executed (1.3), and the configuration file has been created (see example config.txt for the test data within the zip), TVscript can be launched with the following command:

java -jar TVscript.jar –config path-to-config-file

where,

-config: indicates the location of the config file.

An additional option parameter is the -print flag. If this is set to ‘y’ then the intra-condition variation will be outputted for each transcript in a sorted list along with the percentile indicator.

3. The Configuration File

An example of a configuration file is available for the example data within the tvscript.zip. The file is setup as follows.

3.1 The first line is the path to where input data files reside. In this example it is: /user_path/example/data/.

3.2 The second line is the name of the file containing the lengths of the transcripts to which reads for each RNA-seq dataset have been mapped. This file needs to be created by the user and is required to be within the folder specified in (3.1).

3.3 The third line identifies the percentile threshold. Transcripts with intra-condition variance values, within one or both conditions, that correspond to this percentile (and above) will be removed.

3.4 The fourth line of the config file defines headers for three columns that make up all subsequent lines. This is where files containing read count data are indicated. The first column contains the file name containing the count data. These files must reside within the folder specified in (3.1). The second column is the name that will be associated with the output file after filtering. The third column takes a value of either 1 or 2 and is dependent on what condition the user allocates to the particular dataset.

4. Output Files

Output files, containing transcripts displaying intra-condition variation below the selected threshold, will be created within the folder containing the input data using the specified names within the config file (3.4). These output files contain the raw count data and can be used directly in tools for differential expression analysis. In addition a file titled “transcripts_removed.txt” will be created that contains the ID of the transcripts removed.

5. Sample Data

Test data is located within the downloaded zip file (tvscript.zip) and consists of read count files as well as a config.txt file. There are eight read count files, each of which are associated with one of two conditions (four with the condition of dog brain and four with that of wolf brain). We obtained these read counts by mapping RNA-seq datasets obtained from the brains of dogs and wolves to the dog reference transcriptome, as described in our paper. In addition to these count files, a file containing the name of each transcript within the reference, along with its corresponding length, is required; see example file contig_length.txt. To run the TVscript using the test data type the command:

java -jar TVscript.jar -config /path-to-config/config.txt

with the config file modified appropriately for local locations of the count files.

6. Obtaining Source Code

The code can be downloaded from the Code tab, imported into an IDE, such as Netbeans, and recompiled as desired. The steps below are for the Netbeans IDE, but others will have a similar process. Note: this is not the recommended (nor required) path for obtaining the working software, unless there is a specific requirement to edit the code. Steps to do this are:

6.1 On the code tab of the project obtain the read only svn checkout link (svn://svn.code.sf.net/p/tvscript/code/). There are three options: (i) SSH, (ii) HTTPS and (iii) RO. The read only option is RO and does not require a password.

6.2 Open Netbeans and under the Team menu select the sub menu Subversion and then sub-sub menu Checkout. This will open a small window with some field to fill.

6.3 In the field that is labelled Repository URL place the RO svn checkout link obtained in step 1.4.1. The username and password can be left blank. Click next.

6.4 Use the browse button to browse the project Repository Folders and select the core folder. This contains all the code. Once OK is pressed select the local folder where you want to download the code to e.g. testFolder.

6.5 Click finish. All the code files and subfolders within core folder will then be placed into the selected location.

6.6 These can be used to set up a new project within Netbeans and you can begin to edit and recompile the code. The easiest way to do this is to creating a new project from scratch and then past the core folder into the source folder of the new project.

TVscript Wiki

Exploration of the removal of count variable transcripts.

Quick Start

Related