From: Till S. <til...@tu...> - 2014-07-02 15:57:00
|
Hi, with defined properties i mean: - sdf/cvs: all properties that occur in any of the molecules. Regarding csv, there is usually a header column that gives you the names. For sdf file format you will need to iterate over the complete sdf and collect the individual properties for each molecule. They are identified by a string name. - internal dataset: there is a list of propertyDefinitions for each dataset. That are the properties, which are present for an imported dataset. -> for im/export it will be important to select the properties one want to import/export together with the molecule / scaffold structure. Regards, Till Am Mittwoch 02 Juli 2014, 07:14:28 schrieb Shamshad Alam: > Hi Tills, > > Thanks! Good to hear from you. All points are comprehensible except the > term "defined properties" in second point. Does it refer to molecular > properties? > > >> show the *defined properties* and some statistics (e.g. size) about an > sdf, csv file, without actually doing anything with the file > > Thanks, > Shamshad > > > > On Tue, Jul 1, 2014 at 6:09 PM, Till Schäfer <til...@tu...> > wrote: > > > Hi, > > here are some wishes, that came into my mind: > > - import/merging sdf and csv into the scaffold hunter database over > > command line > > - show the defined properties and some statistics (e.g. size) about an > > sdf, csv file, without actually doing anything with the file (this can be > > useful for later import / merging. > > - export an internal database over the command line > > - show the defined properties and some statistics (e.g. size) about an > > internal dataset / subset, without actually doing anything with the file > > (this can be useful for later import / merging. > > - add a subset switch for each operation that uses data from the scaffold > > hunter database > > - clustering -> as for scaffold network / tree generation, but with > > clustering > > - generate a subset based on a filter, split a subset by scaffold tree > > structure, random subset generation > > -> this should be possible to in a way, such that we can store the > > subset in the database, or store it directly in a file. > > -> it should be possible to read an sdf/csv, filter it and store it > > directly to a file (without any sh database operation involved) > > - apply a calc plugin to data > > -> pathways: read from file / database -> generate property -> store to > > file / database > > - the tree generation code should have an option to specify the rules, > > which are used to generate a scaffold tree > > - delete an internal database / subset > > - ... :-) > > > > Regards, > > Till > > > > Am Mittwoch, 25. Juni 2014, 08:37:11 schrieb Shamshad Alam: > > > Hi, > > > > > > I am working on Command Line Interface (CLI) which is a GSoC-2014 > > project. > > > And I would like get some feedback on the commands and parameters used > > > with commands in CLI. > > > > > > In the first phase, our aim is to implement the commands to generate > > > Scaffold tree from the molecules in the SDF file or in the database. We > > are > > > also offering user to specify the ring size of the scaffolds which are to > > > be included in output. The generated scaffold is saved in the sdf file > > that > > > can be further used for analysis. An SDF file is required to generate > > > scaffold tree from the file and connection data is needed to make > > > connection with database to generate scaffold tree from database. > > > > > > In the later stages we have planned to implement filtering by structure > > and > > > substructure e. g. only certain subtrees based on the structure / > > > substructure search will saved in output. > > > > > > These are proposed commands and their parameters so far: > > > > > > (1) generate : This command is used to generate scaffold tree / network > > and > > > it needs a source of molecules like sdf file and a destination file to > > save > > > generated scaffolds. It is used in combination with following parameters > > - > > > (a) To generate scaffold tree / network loading molecules from file > > > -n | --network : Use this parameter to generate scaffold network, absence > > > of this parameter means you want to generate scaffold tree > > > -i | --input-file <file_location> : Specify location of input file to > > read > > > molecules from. > > > -o | --output-file <file_location> : Specify file in which generated > > > scaffold would be saved > > > -m | --max-ring-size <number> : Specify the maximum ring size of scaffold > > > that should be included in the output > > > > > > (b) To generate scaffold tree / network by loading molecules from > > Scaffold > > > Hunter database > > > -c | --connection-name <connection-name> : Name of the connection that > > > would be used for connection with the database to retrieve molecules. > > > -d | --dataset <dataset_name> : Specify name of the dataset to retrieve > > > molecules from and generate scaffold tree > > > -o | --output-file <file_location> : Specify file in which generated > > > scaffold would be saved > > > -m | --max-ring-size <number> : Specify the maximum ring size of scaffold > > > that should be included in the output > > > -n | --network : Use this parameter to generate scaffold network, absence > > > of this parameter means you want to generate scaffold tree > > > > > > These are some examples of 'generate' command : > > > > > > (a) Read molecules from file > > > > > > sh generate -i <input-file> > > > Read molecules from input file and generate scaffold tree. Generated tree > > > is saved in a file automatically. > > > > > > sh generate -i <input-file> -o scaffold.sdf > > > Read molecules from input file and generate scaffold tree. Generated tree > > > is saved in scaffold.sdf. > > > > > > sh generate -i <input-file> -o scaffold-2.sdf -n > > > Read molecules from input file and generate scaffold 'network'. Generated > > > network is saved in scaffold-2.sdf. > > > > > > sh generate -i <input-file> -o scaffold-3.sdf -m 5 > > > Read molecules from input file and generate scaffold tree. Generated tree > > > is saved in scaffold-3.sdf. All scaffolds with rings more than 5 are not > > > saved. > > > > > > (b) Read molecules from database > > > > > > sh generate -c <connection-name> > > > Read molecules from database which connection data is pointed by > > > <connection-name> and generate scaffold tree. Generated tree is saved in > > > file automatically. > > > > > > sh generate -c <connection-name> -o scaffold.sdf > > > Read molecules from database which connection data is pointed by > > > <connection-name> and generate scaffold tree. Generated tree is saved in > > > scaffold.sdf. > > > > > > Similarly, you can use -n to generate network and -m <number> to limit > > the > > > ring size as you've done in scaffold tree generation from file. > > > > > > (2) connection [list | save | delete] : This command is used to manage > > > connection data which is required for connection with database. It is > > > followed by list, save or delete action. These are parameters supported > > by > > > the command : > > > > > > -c | --connection-name <name> : name of the connection > > > -t | --database-type : Type of database (mySql, HSQLDB) > > > -u | --url : Url of the mySql database server or file location of HSQLDB > > > -n | --database-name : Name of the database > > > -un | --user-name : user name to login > > > -p | --password : password of the database (You can avoid this parameter > > > and specify it at runtime during connection is made to database) > > > > > > Here is some uses of connection command - > > > > > > sh connection list > > > display the names of available connections on screen > > > > > > sh connection list -c <name> > > > display the details of particular connection data pointed by name > > excluding > > > password > > > > > > sh connection delete -c <name> > > > delete the connection data pointed by the name > > > > > > connection save -c <name> -t <hsqldb | mysql> -u <url> -n <database-name> > > > -un <user-name> > > > save a new connection data with specified name which can be used later to > > > make connection with database > > > > > > Feedback - > > > > > > We are seeking your feedback on implemented commands and proposed > > > parameters to make the command line interface more useful. So, please > > give > > > your inputs to a few questions given here : > > > 1. What are the features of scaffold hunter you want to use in command > > line > > > interface? > > > 2. Do you find the parameters provided for scaffold tree / network > > > generation user friendly? > > > 3. Do you need some more parameters to control the generation strategy? > > > 4. Do you think some proposed parameters are redundant? If so, please > > > specify the name of those parameters. > > > > > > You may also put any other suggestions regarding command line interface > > > > > > Thanks, > > > Shamshad > > > > > > > > -- > > Dipl.-Inf. Till Schäfer > > TU Dortmund University > > Chair 11 - Algorithm Engineering > > Otto-Hahn-Str. 14 / Room 237 > > 44227 Dortmund, Germany > > > > e-mail: til...@cs... > > phone: +49(231)755-7706 > > fax: +49(231)755-7740 > > web: http://ls11-www.cs.uni-dortmund.de/staff/schaefer > > pgp: > > https://keyserver2.pgp.com/vkd/SubmitSearch.event?&&SearchCriteria=0xD84DED79 > > > > > > -- Dipl.-Inf. Till Schäfer TU Dortmund University Chair 11 - Algorithm Engineering Otto-Hahn-Str. 14 / Room 237 44227 Dortmund, Germany e-mail: til...@cs... phone: +49(231)755-7706 fax: +49(231)755-7740 web: http://ls11-www.cs.uni-dortmund.de/staff/schaefer pgp: https://keyserver2.pgp.com/vkd/SubmitSearch.event?&&SearchCriteria=0xD84DED79 |