Download Latest Version example_rF.tar.gz (80.5 kB)
Email in envelope

Get an email when there's a new version of R scripts for multivariate analysis

Home / rgp
Name Modified Size InfoDownloads / Week
Parent folder
rgp_run_6_16.tar.gz 2020-07-28 102.3 kB
README.txt 2020-07-28 7.0 kB
run_rgp.tar.gz 2014-04-17 91.1 kB
Totals: 3 Items   200.5 kB 0
The software implements rgp package from R environment allowing the development of models represented as a mathematical formula using evolutionary algorithms. The delivered solution controls data feed, testing, and model assessment using k-CV methodology, supervises evolution algorithms by providing parameters used by them, stops, and resumes the evolution process. Moreover, lots of important parameters are located in the header of the script which allows easily modifying them and starting separate computational tests. Generated reports include detailed information about model performance in the k-CV testing scheme as well as on the external validation dataset. 

After the evolution process is stopped models are tested using 10-fold cross-validation mode, which is based on the optimization methods: GenSA, nloptr, SANN and BFGS from the optim() tool. Software is adapted to regression problems and was tested under Linux OS.

To get more information about rgp framework and its functionality please check description by its author Olivier Flasch available under following address:
https://cos.bibl.th-koeln.de/frontdoor/deliver/index/docId/32/file/Flas13b.pdf

Rgp package can be downloaded from R-CRAn archive and is available under the following address:
https://cran.r-project.org/src/contrib/Archive/rgp/

# Authors: 
# Adam Pacławski: adam.paclawski@uj.edu.pl
# Aleksander Mendyk
# Jakub Szlęk
# License: LGPLv3

Software to propper work requires data prepared in a manner of k pairs of training-testing datasets in tab-delimited TXT files, where the last column contains the known answer to the problem (dependent variable) and previous columns are features (regressors or independent variables). The assumption is that the model is of MISO type (multiple-input-single-output)

The basic adjustable parameters within the user interface are:

## Maximum length of the chromosome related to maximum length and complexity of the equation.
individual_size_limit

## Goodness of fit measure for model fitting in the evolution process. Three are available: rmse, sse or mse
userErrorFuctSR

## Learning strategy chosen from two available options: tournament or ageComplex
gpSearchFunctionUse

## Option allowing saving RData for all performed computations (TRUE or FALSE)
saveRdata                                                

## There is a possibility to define our own cost function for the evolution process. If TRUE software will use the evolutionCostFunction() function defined below instead of the original cost function for the evolution process. 
myFunction<-TRUE

## By modifying cost function it is possible to punish models not using specific inputs for predictions. If 0 there is no special modification within cost function.
whichInputsAreDesired<-0 

##Number of learn-test pairs in k-CV model assessment scheme
max_loop<-10
##Multistart option for model fitting and testing in k-CV model assessment scheme
max_supra_loop<-1

## Data feed for symbolic regression core function. If TRUE there is no need for providing names of files with data  for k-CV testing scheme
automatic_name                     

#Data feed applied to model development through GP
plik_o

#Data set for test equations after symbolic regression process and its ranking preparation. File with data for testing models fitted to the defined problem (maybe the same as training file)
test_o

## Setting of validation process including switching it on or off and providing data feed for the process if TRUE.
validation<-FALSE
validationFileName<-"./baza_PLGA_11in_2nd_FS_MSE.txt"

 
## If automatic name = FLASE it is required to provide names of files with learning and test data for k-CV testing scheme. Names of files should be organized within the following scheme LearnFile{no}, TestFile{no} where {no} stays for number identifying k-th pair of the learn-test dataset and {no} is in range of 1-k.

skel_plik
skel_plik1

##Core name for RData file with optimized model
skel_outfile

## Options of switching on and off scaling of variables
use_scale

## Population size, take into account that bigger population increase required computational time
pop_size

## internal criteria for symbolic regression main function expressed as an error measure to stop the evolution process.
fit_stop

## algorithm stop criterion based on the number of fitness function evaluations
evaluations_stop

##Optimization parameters
optimization

#Number best equation redirected to be fitted and tested on learning and testing datasets respectively.
eq.to.optim.temp<-0.1*pop_size                                           

## Setting for fitting testing models in k-CV scheme. Modell fitting is organized in a loop where results from one optimized is passed through another one. All can be TRUE or FALSE
use_SANN
use_gensa
use_rgenoud
use_NM
use_nloptr

## Number of iterations within model fitting process for every optimizing algorithm. All represent natural number.
max_iter_rgenoud<-500
max_iter_gensa<-5000
maxit_NM<-50000
maxit_SANN<-10000
maxit_optimx<-5000
maxit_nloptr<-10000

##Delete duplicate solutions before k-CV process of models assessment
del_duplicate<-TRUE

## If TRUE you have all results of the optimization process in out file but it will be bigger delete duplicate equations from the pool to be optimized and tested by 10-fold cross-validation
opti_trace<-FALSE                               

##Parallel optimization mode # use this flag to parallel optimization mode - it will use all available cores of your CPU!.
use_multicore<-TRUE                              

This script uses uses heuristic strategy called makeAgeFitnessComplexityParetoGpSearchHeuristic() with complexity control criterion enabled (enableComplexityCriterion = TRUE)

As a result, the report with the best equations is created: Best_equation.txt

Labels "In{no}" mean variables (features)

A file named lista.txt contains a number of consecutive steps performed by the computation.sh script (Linux only). Single-step is represented by evolution process with symbolic regression function, the population of models assessment and running k-CV models testing.

Please find an example of how to run the script in the example.tar.gz archive Runs ONLY on Linux with automated multi-step and reporting/archiving facilities

##########################################################################
    This program comes with ABSOLUTELY NO WARRANTY
    This is free software, and you are welcome to redistribute it
    under certain conditions. Please find a LICENSE file to look 
    for a more detailed description of terms and conditions based on the 
    GNU GPLv3 license
##########################################################################

If you are satisfied with this script and find it useful please cite our last work:
Szlęk J, Pacławski A, Lau R, Jachowicz R, Mendyk A. Heuristic modeling of macromolecule release from PLGA microspheres. Int J Nanomedicine. 2013;8:4601-11. doi: 10.2147/IJN.S53364. Epub 2013 Dec 3.
Source: README.txt, updated 2020-07-28