PDI Data Vault framework - Browse /version

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
dwh_datavault_mappings_1.xls	2014-02-06	69.6 kB	0
docu_generated_by_cookbook.zip	2014-02-01	4.5 MB	0
pdi_dv_framework_code.zip	2014-02-01	6.8 MB	0
pdi_dv_framework_databases.sql.7z	2014-02-01	1.4 MB	0
pdi_data_vault_framework_docu.pdf	2014-02-01	712.2 kB	0
readme.txt	2014-02-01	2.2 kB	0
presentation_pdi_data_vault_framework_meetup2012.pdf	2012-10-01	4.1 MB	0
Totals: 7 Items		17.6 MB	0

Description of the project

The presentation is the one I did at the Pentaho Benedutch event in 2011.
It describes the functionality of the 'Pentaho Data Integration Datavault Framework' I developed.
At the moment the version for MySQL includes the latest developments.
The documentation PDF is very recent and complete, because I needed to turn it all over to new colleagues.

Based on a designed Data Vault (hub-,link- and satellite tables are present) and an Excel sheet with the mappings, no Data Vault ETL development is needed for adding hubs, links etc.
Kasper de Graaf played a big part in the specifications for the tool set, him being a Data Vault expert, me being an ETL designer/developer.

The Virtual Machine (VMWare) is a 64 bit Ubuntu 12.04 Server with Percona Server as the database, a MySQL replacement with an improved InnoDB storage engine.
The code is the latest and greatest, including link-(group) validity satellite functionality and generic staging of tables and files.
PDI version: Pentaho Data Integration 5.0.1 CE

User: percona
Password: percona

mysql root/percona
NB: entries to add/modify in my.cnf
max_connections = 2048
table_definition_cache = 1200

Starting Kettle

Run the launcher at the Desktop: PDI_kff_launcher.sh
Select the file based repository (appears as default): pdi_file_repository_dv_demo_kff
The job that 'does it all' (metadata + all Data Vault objects): job_data_vault_all_incl_md

Running a complete batch including staging and the Data Vault:
percona@ubuntu:~$ nohup ./run_job_complete_batch_data_warehouse.sh &

----Attention----

After editing the metadata Excel sheet, be sure to refresh the column 'source_concat' in the sheet 'source_tables'.
For some reason this colum is sometimes seen as 'null' by Kettle, destroying the joins in the metadata queries to obtain the 'record_source_id'.
If you refresh this column by copying the value in the first row to all others, you'll be fine.

----Attention number 2----
If you discover errors/bugs in my code, please inform me at eacweber@gmail.com, so I can use your collective brains to improve it.

Greetings,

Edwin Weber, owner of the one man army Weber Solutions.

Source: readme.txt, updated 2014-02-01

PDI Data Vault framework Files

Data Vault loading automation using Pentaho Data Integration.

PDI Data Vault framework Files

Data Vault loading automation using Pentaho Data Integration.

Get an email when there's a new version of PDI Data Vault framework