Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
rgp_run_6_16.tar.gz | 2020-07-28 | 102.3 kB | |
README.txt | 2020-07-28 | 7.0 kB | |
run_rgp.tar.gz | 2014-04-17 | 91.1 kB | |
Totals: 3 Items | 200.5 kB | 0 |
The software implements rgp package from R environment allowing the development of models represented as a mathematical formula using evolutionary algorithms. The delivered solution controls data feed, testing, and model assessment using k-CV methodology, supervises evolution algorithms by providing parameters used by them, stops, and resumes the evolution process. Moreover, lots of important parameters are located in the header of the script which allows easily modifying them and starting separate computational tests. Generated reports include detailed information about model performance in the k-CV testing scheme as well as on the external validation dataset. After the evolution process is stopped models are tested using 10-fold cross-validation mode, which is based on the optimization methods: GenSA, nloptr, SANN and BFGS from the optim() tool. Software is adapted to regression problems and was tested under Linux OS. To get more information about rgp framework and its functionality please check description by its author Olivier Flasch available under following address: https://cos.bibl.th-koeln.de/frontdoor/deliver/index/docId/32/file/Flas13b.pdf Rgp package can be downloaded from R-CRAn archive and is available under the following address: https://cran.r-project.org/src/contrib/Archive/rgp/ # Authors: # Adam Pacławski: adam.paclawski@uj.edu.pl # Aleksander Mendyk # Jakub Szlęk # License: LGPLv3 Software to propper work requires data prepared in a manner of k pairs of training-testing datasets in tab-delimited TXT files, where the last column contains the known answer to the problem (dependent variable) and previous columns are features (regressors or independent variables). The assumption is that the model is of MISO type (multiple-input-single-output) The basic adjustable parameters within the user interface are: ## Maximum length of the chromosome related to maximum length and complexity of the equation. individual_size_limit ## Goodness of fit measure for model fitting in the evolution process. Three are available: rmse, sse or mse userErrorFuctSR ## Learning strategy chosen from two available options: tournament or ageComplex gpSearchFunctionUse ## Option allowing saving RData for all performed computations (TRUE or FALSE) saveRdata ## There is a possibility to define our own cost function for the evolution process. If TRUE software will use the evolutionCostFunction() function defined below instead of the original cost function for the evolution process. myFunction<-TRUE ## By modifying cost function it is possible to punish models not using specific inputs for predictions. If 0 there is no special modification within cost function. whichInputsAreDesired<-0 ##Number of learn-test pairs in k-CV model assessment scheme max_loop<-10 ##Multistart option for model fitting and testing in k-CV model assessment scheme max_supra_loop<-1 ## Data feed for symbolic regression core function. If TRUE there is no need for providing names of files with data for k-CV testing scheme automatic_name #Data feed applied to model development through GP plik_o #Data set for test equations after symbolic regression process and its ranking preparation. File with data for testing models fitted to the defined problem (maybe the same as training file) test_o ## Setting of validation process including switching it on or off and providing data feed for the process if TRUE. validation<-FALSE validationFileName<-"./baza_PLGA_11in_2nd_FS_MSE.txt" ## If automatic name = FLASE it is required to provide names of files with learning and test data for k-CV testing scheme. Names of files should be organized within the following scheme LearnFile{no}, TestFile{no} where {no} stays for number identifying k-th pair of the learn-test dataset and {no} is in range of 1-k. skel_plik skel_plik1 ##Core name for RData file with optimized model skel_outfile ## Options of switching on and off scaling of variables use_scale ## Population size, take into account that bigger population increase required computational time pop_size ## internal criteria for symbolic regression main function expressed as an error measure to stop the evolution process. fit_stop ## algorithm stop criterion based on the number of fitness function evaluations evaluations_stop ##Optimization parameters optimization #Number best equation redirected to be fitted and tested on learning and testing datasets respectively. eq.to.optim.temp<-0.1*pop_size ## Setting for fitting testing models in k-CV scheme. Modell fitting is organized in a loop where results from one optimized is passed through another one. All can be TRUE or FALSE use_SANN use_gensa use_rgenoud use_NM use_nloptr ## Number of iterations within model fitting process for every optimizing algorithm. All represent natural number. max_iter_rgenoud<-500 max_iter_gensa<-5000 maxit_NM<-50000 maxit_SANN<-10000 maxit_optimx<-5000 maxit_nloptr<-10000 ##Delete duplicate solutions before k-CV process of models assessment del_duplicate<-TRUE ## If TRUE you have all results of the optimization process in out file but it will be bigger delete duplicate equations from the pool to be optimized and tested by 10-fold cross-validation opti_trace<-FALSE ##Parallel optimization mode # use this flag to parallel optimization mode - it will use all available cores of your CPU!. use_multicore<-TRUE This script uses uses heuristic strategy called makeAgeFitnessComplexityParetoGpSearchHeuristic() with complexity control criterion enabled (enableComplexityCriterion = TRUE) As a result, the report with the best equations is created: Best_equation.txt Labels "In{no}" mean variables (features) A file named lista.txt contains a number of consecutive steps performed by the computation.sh script (Linux only). Single-step is represented by evolution process with symbolic regression function, the population of models assessment and running k-CV models testing. Please find an example of how to run the script in the example.tar.gz archive Runs ONLY on Linux with automated multi-step and reporting/archiving facilities ########################################################################## This program comes with ABSOLUTELY NO WARRANTY This is free software, and you are welcome to redistribute it under certain conditions. Please find a LICENSE file to look for a more detailed description of terms and conditions based on the GNU GPLv3 license ########################################################################## If you are satisfied with this script and find it useful please cite our last work: Szlęk J, Pacławski A, Lau R, Jachowicz R, Mendyk A. Heuristic modeling of macromolecule release from PLGA microspheres. Int J Nanomedicine. 2013;8:4601-11. doi: 10.2147/IJN.S53364. Epub 2013 Dec 3.