Name | Modified | Size | Downloads / Week |
---|---|---|---|
xmlToDatabase_source.zip | 2015-12-09 | 49.3 kB | |
ctti_mysqlscript.txt | 2015-12-07 | 13.2 kB | |
cttiparser.zip | 2015-12-07 | 10.3 MB | |
README.txt | 2015-11-13 | 2.7 kB | |
ClinicalXmlToDatabase-0.zip | 2015-11-13 | 10.5 MB | |
clinicalparser.jar | 2015-11-13 | 54.8 kB | |
Totals: 6 Items | 21.0 MB | 0 |
Data Source: This parser has been designed to extract data and metadata derived from XML files downloaded from the following source: https://www.clinicaltrials.gov/ct2/resources/download Example Data: In current form, the parser is capable of properly handling most NCT XML records, with a handful of such examples being provided in the subfolder xml within clinical.zip. An ongoing investigation is underway to identify XML records that produce SQL exceptions arising from the parsing process. These records will be stored within the subfolder xml_problems. Configuration: The file ClinTrialXmlToDatabase.ini must be modified to reflect the name and configuration of the database, as well as the location of files to be processed. Source code: All source files for non-generic code are provided in ClinicalXmlToDatabase-0.zip Compiled generic libraries are available as subfolders within the above zip file Jar file: The all-inclusive executable is ./executations/clinicalparser.jar Issues: The key refinement that is required at this point is to identify strategies by which the parser can reliably treat all XML files for which current processing produces SQL exceptions. This can be done, but typically requires making key decision regarding the SQL schema (in particular table and column naming conventions). Key decisions must be made for the following issues: - the default table and column naming convention produces unique names for tables and columns by ensuring that each such entity reflects the full XML hierarchical position of the entity. While this produces a fairly unambiguous naming convention, it frequently yields entity names that exceed the 64 character limit imposed by MySQL. In the current implementation, object names are truncated to only reflect character fields encompassing the three right-most underscore ("_") characters. For example, the table name: clinical_study_milestone_participants_list_participants becomes: milestone_participants_list_participants While this generally does reduce objecct names to less than 64 characters, it is unknown whether this convention is compatible with existing schemas for clinical trials data storage. - Another issue is that some XML records have multiple tags whose labels are identical (i.e., multiple <group> tags) but whose content differs. The parser will need to synchronize with existing schema specifications in order to find a consistent protocol form column specification that does not violate the SQL prohibition on having multiple columns with the same name in the same table.