Download Latest Version GENIES.zip (194.1 MB)
Email in envelope

Get an email when there's a new version of Genome Information Entropy Spectrum

Home
Name Modified Size InfoDownloads / Week
GENIES_readme_2021.txt 2021-09-12 5.1 kB
GENIES_user_manual.pdf 2021-01-18 483.8 kB
GENIES.zip 2020-12-09 194.1 MB
Totals: 3 Items   194.6 MB 0
Genetic Information Entropy Spectrum (GENIES)
 
Melvin M. Vopson, University of Portsmouth, Contact e-mail: melvin.vopson@port.ac.uk

We request users of this software to please kindly cite this article:
Melvin M. Vopson, Samuel C. Robson, A new method to study genome mutations using the information entropy, Physica A: Statistical Mechanics and its Applications, Volume 584, 126383 (2021) https://doi.org/10.1016/j.physa.2021.126383


Please read the User manual for a more detailed set of instructions and a summary of the theory behind this methodology. 

GENIES is a computer program developed to facilitate the study of genome sequences in a comparative way using the information entropy. 
The program can analyse genomes of any size by converting the genetic information contained in a given genome into a numerical Information Entropy Spectrum. 
This procedure is done for two genomes of the same size, one being a reference genome and the other being its mutated version. 
The program allows fast detection of base point mutations from the Information Entropy Spectra. 
Most importantly, the program could be used to research and identify predictive algorithms of future genetic mutations.    


1. Requirements

To use this software you must have:

a) At least 1Gb free space on the local HDD.
b) A Windows 64 bit operating system.
c) Admin rights on your PC. 


2. Installation 

Download the zipped folder "Genies.zip" and Unzip the folder. The content of the folder is shown in the image below. 

Click on the "INSTALLER", which contains the folder "VOLUME". Open "VOLUME" and click on the "setup" application file. 
Follow the on screen instructions to complete the installation. When installation finishes, a pop up message will say: 

"The Installation has finished updating the system" then click "NEXT". 

Another pop up message will ask you to restart the computer in order to complete the installation. 
After restarting, the Genies program will appear in your list of Windows programs. 
Please note that the installer does not add a desktop icon. You must do it yourself, if you wish one. 


3. Running the program 

When running the program, it will open its own folder in Program Files asking for the input genome sequences. 
The first input file must be the reference genome and the second is the mutated sequence. 
The program comes with two examples of RNA sequences of COVID-19. 
These could be used to run / test the program. 
One file is the reference Covid-19 from Wuhan, Dec. 2019: MN908947.3_Reference_China and the second file is a mutated version collected in Japan, March 2020: LC542809_Japan. 
The program will ask then for a file name and location to save the data after completing the computation.


4. Program input variables 

The key variables of the program are:

* m-block size - this must be set to 3 (size of codons)
* m-block step size - this must be always 1 to make sure it captures any possible correlations within a window
* window size - this is a very important variable. Too large results in mutations being picked up more than one. Too small, it results in segments that could miss some mutations as the IE values are unchanged. Depending on the genome size, a typical window of 50 characters is reasonable. For very large genomes, this could be increased to WS = 100 or even more. We recommend using this WS as a research parameter to investigate the optimal value for a specific project. 
* Sliding window step size - this is also a very important variable. To fully capture everything the smallest step size SS = 1 must be used. The smaller the step, the longer is the computation process. The SS must always be smaller or equal than the WS. If the wrong value is inputted, the program generates an error. A reasonable value is SS = 2, but we recommend to use this a research variable in order to identify the optimal scanning conditions for a given project. 
* ms to wait  - this is the time to wait for each iteration. It can be set to zero, meaning the program runs at the fastest rate. When a non zero value is set, the program runs slowly, which is useful when the investigator wishes to observe the codons array in real time or to probe in real time other aspects of the program.    
 

5. Main program outputs 

The key outputs of the program are:

* Genome size - total number of characters in the genome
* Array size of all windows - the size of the Information Entropy Spectrum 
* Number of mutations - the number of point bases detected different in the two genomes 
* % of mutations - the % value of the genome that suffered mutations
* IE Genome Spectrum  - top image shows the IE spectrum of the reference genome and then of the mutated genome
* IE ratio spectrum - bottom image shows the ratio of the two IE spectra   

The ASCII data saved by the program contains four columns arranged in the following sequence: 

Column 1 = IE ratio spectrum 
Column 2 = IE mutated genome spectrum 
Column 3 = IE reference genome
Column 4 = Window index
 


Source: GENIES_readme_2021.txt, updated 2021-09-12