From: <nl...@us...> - 2014-07-03 07:52:50
|
Hi Shamshad, thank you for the illustration. So currently the tree/network structure is not saved? I think this is a very important feature. One solution might be to assign a unique id to every scaffold. Then each molecule can have a property "SCAFFOLD_ID" that identifies the associated scaffold. A scaffold could have an additional property "PARENT_SCAFFOLD_IDS" that contains a single scaffold id in case of a tree and a list of parent scaffolds for networks. A technical question: Does your approach require to keep all molecules and scaffolds in memory? Avoiding this would be great and would allow to use the CLI with very large data sets. Regards, Nils On Wednesday 02 July 2014 21:49:49 Shamshad Alam wrote: > Hi Nils, > > Nice suggestions! > > I'm currently saving scaffolds and associated molecules in a single sdf > file. The file structure can be better understood with the help of attached > illustration. > > Thanks, > Shamshad > > On Wed, Jul 2, 2014 at 1:58 PM, <nl...@us...> wrote: > > Hi Shamshad, > > > > I think it would be good not only to write out the scaffolds, but also the > > molecules with an additional property that identifies the associated > > scaffold. > > How do you plan to save the tree/network structure? > > An additional parameter --min-ring-size might also be useful. > > > > > > Regards, > > > > Nils > > > > On Wednesday 25 June 2014 08:37:11 Shamshad Alam wrote: > > > Hi, > > > > > > I am working on Command Line Interface (CLI) which is a GSoC-2014 > > > > project. > > > > > And I would like get some feedback on the commands and parameters used > > > with commands in CLI. > > > > > > In the first phase, our aim is to implement the commands to generate > > > Scaffold tree from the molecules in the SDF file or in the database. We > > > > are > > > > > also offering user to specify the ring size of the scaffolds which are > > > to > > > be included in output. The generated scaffold is saved in the sdf file > > > > that > > > > > can be further used for analysis. An SDF file is required to generate > > > scaffold tree from the file and connection data is needed to make > > > connection with database to generate scaffold tree from database. > > > > > > In the later stages we have planned to implement filtering by structure > > > > and > > > > > substructure e. g. only certain subtrees based on the structure / > > > substructure search will saved in output. > > > > > > These are proposed commands and their parameters so far: > > > > > > (1) generate : This command is used to generate scaffold tree / network > > > > and > > > > > it needs a source of molecules like sdf file and a destination file to > > > > save > > > > > generated scaffolds. It is used in combination with following parameters > > > > - > > > > > (a) To generate scaffold tree / network loading molecules from file > > > -n | --network : Use this parameter to generate scaffold network, > > > absence > > > of this parameter means you want to generate scaffold tree > > > -i | --input-file <file_location> : Specify location of input file to > > > > read > > > > > molecules from. > > > -o | --output-file <file_location> : Specify file in which generated > > > scaffold would be saved > > > -m | --max-ring-size <number> : Specify the maximum ring size of > > > scaffold > > > that should be included in the output > > > > > > (b) To generate scaffold tree / network by loading molecules from > > > > Scaffold > > > > > Hunter database > > > -c | --connection-name <connection-name> : Name of the connection that > > > would be used for connection with the database to retrieve molecules. > > > -d | --dataset <dataset_name> : Specify name of the dataset to retrieve > > > molecules from and generate scaffold tree > > > -o | --output-file <file_location> : Specify file in which generated > > > scaffold would be saved > > > -m | --max-ring-size <number> : Specify the maximum ring size of > > > scaffold > > > that should be included in the output > > > -n | --network : Use this parameter to generate scaffold network, > > > absence > > > of this parameter means you want to generate scaffold tree > > > > > > These are some examples of 'generate' command : > > > > > > (a) Read molecules from file > > > > > > sh generate -i <input-file> > > > Read molecules from input file and generate scaffold tree. Generated > > > tree > > > is saved in a file automatically. > > > > > > sh generate -i <input-file> -o scaffold.sdf > > > Read molecules from input file and generate scaffold tree. Generated > > > tree > > > is saved in scaffold.sdf. > > > > > > sh generate -i <input-file> -o scaffold-2.sdf -n > > > Read molecules from input file and generate scaffold 'network'. > > > Generated > > > network is saved in scaffold-2.sdf. > > > > > > sh generate -i <input-file> -o scaffold-3.sdf -m 5 > > > Read molecules from input file and generate scaffold tree. Generated > > > tree > > > is saved in scaffold-3.sdf. All scaffolds with rings more than 5 are not > > > saved. > > > > > > (b) Read molecules from database > > > > > > sh generate -c <connection-name> > > > Read molecules from database which connection data is pointed by > > > <connection-name> and generate scaffold tree. Generated tree is saved in > > > file automatically. > > > > > > sh generate -c <connection-name> -o scaffold.sdf > > > Read molecules from database which connection data is pointed by > > > <connection-name> and generate scaffold tree. Generated tree is saved in > > > scaffold.sdf. > > > > > > Similarly, you can use -n to generate network and -m <number> to limit > > > > the > > > > > ring size as you've done in scaffold tree generation from file. > > > > > > (2) connection [list | save | delete] : This command is used to manage > > > connection data which is required for connection with database. It is > > > followed by list, save or delete action. These are parameters supported > > > > by > > > > > the command : > > > > > > -c | --connection-name <name> : name of the connection > > > -t | --database-type : Type of database (mySql, HSQLDB) > > > -u | --url : Url of the mySql database server or file location of HSQLDB > > > -n | --database-name : Name of the database > > > -un | --user-name : user name to login > > > -p | --password : password of the database (You can avoid this parameter > > > and specify it at runtime during connection is made to database) > > > > > > Here is some uses of connection command - > > > > > > sh connection list > > > display the names of available connections on screen > > > > > > sh connection list -c <name> > > > display the details of particular connection data pointed by name > > > > excluding > > > > > password > > > > > > sh connection delete -c <name> > > > delete the connection data pointed by the name > > > > > > connection save -c <name> -t <hsqldb | mysql> -u <url> -n > > > <database-name> > > > -un <user-name> > > > save a new connection data with specified name which can be used later > > > to > > > make connection with database > > > > > > Feedback - > > > > > > We are seeking your feedback on implemented commands and proposed > > > parameters to make the command line interface more useful. So, please > > > > give > > > > > your inputs to a few questions given here : > > > 1. What are the features of scaffold hunter you want to use in command > > > > line > > > > > interface? > > > 2. Do you find the parameters provided for scaffold tree / network > > > generation user friendly? > > > 3. Do you need some more parameters to control the generation strategy? > > > 4. Do you think some proposed parameters are redundant? If so, please > > > specify the name of those parameters. > > > > > > You may also put any other suggestions regarding command line interface > > > > > > Thanks, > > > Shamshad |