Name Modified Size Downloads / Week Status
Totals: 6 Items   3.9 kB 1
version_2.4 2015-01-04 88 weekly downloads
upgrade_to_version_2.3 2014-11-24 55 weekly downloads
version_2.2 2014-10-27 77 weekly downloads
version_2.1 2014-06-04 55 weekly downloads
version_2 2014-02-06 11 weekly downloads
readme.txt 2015-01-04 3.9 kB 11 weekly downloads
Latest changes (version 2.4, 2015-01-04) - Added PostgreSQL support - Split the generic transformations of links and link-satellites to 1 generic transformation for each number of hubs (1 to 10). This prevents the creation of many database connections that are not used. - End of changes version 2.4 Changes (version 2.3, 2014-11-19): - adds optional mailing after error and/or success of the main jobs. mailing properties like smtp server, port etc. is specified with Kettle properties. - adds a 'management' job that collects database table sizes. - fixes bug that a run did not end with status 3 (error) after it failed. Directory upgrade_to_version_2.3 contains the way to upgrade from version 2.2 to 2.3. - End of changes version 2.3 Changes (version 2.2, 2014-10-27): - subruntype functionality (only process a subset of the staging and/or Data Vault tables) - reference tables with all columns of the satellites specified separately (no change to the Excel, just for easy impact analysis) - change detection of satellite columns by a HASH function, SHA1 is the default, MD5 also possible. The function is specified in the properties file. - Upgraded to Pentaho Data Integration 5.2 CE, versions 5.1 and 5.0.1 also work with the provided jobs and transformations. - Updated the documentation PDF - End of changes version 2.2 Description of the project Based on a designed Data Vault (hub-,link- and satellite tables are present) and an Excel sheet with the mappings, no Data Vault ETL development is needed for adding hubs, links etc. Kasper de Graaf played a big part in the specifications for the tool set, him being a Data Vault expert, me being an ETL designer/developer. The Virtual Machine (VMWare) is a 64 bit Ubuntu 14.04 Server with 2 RDBMS flavours: Percona Server, a MySQL replacement with an improved InnoDB storage engine. PostgreSQL 9.4. The latest version is available in directory version_2.4, which includes the most recent VMWare Virtual Machine. User: percona Password: percona mysql root/percona NB: entries to add/modify in my.cnf max_connections = 2048 table_definition_cache = 1200 PostgreSQL user/password (yes, the PostgreSQL passwords are also percona, so all passwords are :): postgres/percona sakila/percona sakila_data_vault/percona staging/percona pdi_meta/percona NB: changed in postgresql.conf max_connections=250 (was 100 by default) Starting Pentaho Data Integration (Kettle) Run the launcher at the Desktop (run in terminal): PDI_kff_launcher.sh Choose which RDBMS you want to use: M=MySQL, P=PostgreSQL Based on your choice the applicable configuration files are copied to /home/percona/.kettle and the applicable Excel sheet is copied to the right location: /media/percona/disk2/pdi_dv_framework/dwh_datavault_mappings_1.xls Select the file based repository (appears as default): pdi_file_repository_dv_demo_kff The job that 'does it all' (metadata + all Data Vault objects): job_data_vault_all_incl_md Running a complete batch including staging and the Data Vault (in directory /media/percona/disk2/pdi_dv_framework): percona@ubuntu:~$ ./run_job_complete_batch_data_warehouse.sh The logs of that batch run are in /media/percona/disk2/pdi_dv_framework/logs ----Attention---- After editing the metadata Excel sheet, be sure to refresh the column 'source_concat' in the sheet 'source_tables'. For some reason this colum is sometimes seen as 'null' by Kettle, destroying the joins in the metadata queries to obtain the 'record_source_id'. If you refresh this column by copying the value in the first row to all others, you'll be fine. ----Attention number 2---- If you discover errors/bugs in my code, please inform me at eacweber@gmail.com, so I can use your collective brains to improve it. Greetings, Edwin Weber, owner of the one man army Weber Solutions.
Source: readme.txt, updated 2015-01-04