[Ground-user] Refreshing Materialized Views Performance
Status: Beta
Brought to you by:
calumfodder
From: Kelvin W. <kel...@go...> - 2011-10-11 12:09:48
|
Hi I've been using ground report a fair bit and I've noticed a performance bottleneck in uploading report data. Specifically, after data from a load run has been put into the database each database 'view' is refreshed using the postgres function refresh_matview. This is an I/O heavy function and running each request serially I see the following times to complete: SELECT refresh_matview('element_summary_http_mv'); 31s ... element_buckets_http_mv 269s ! element_percentile_http_mv 98s total_element_per_second_stats_http_mv 57s concurrent_users_byrun_http_mv 54s thread_stop_buckets_http_mv 49s individual_element_per_second_stats_http_mv 43s element_distribution_http_mv 43s ... element_totalbandwidth_http_mv 2s page_distribution_http_mv 6s page_summary_http_mv 6s element_max_time_http_mv 2s element_min_time_http_mv 2s static_element_percentile_http_mv 7s dynamic_element_percentile_http_mv 7s page_percentile_http_mv < 1s page_buckets_http_mv 3s page_stderror_http_mv 3s element_stderror_http_mv < 1s thread_start_buckets_http_mv 1s total_page_per_second_stats_http_mv 1s individual_page_per_second_stats_http_mv 1s static_element_per_second_stats_http_mv 1s dynamic_element_per_second_stats_http_mv 1s element_95confidence_http_mv 1s mean_load_throughput_stats_http_mv 1s concurrent_users_vs_tps_http_mv 1s concurrent_users_vs_pps_http_mv 1s concurrent_users_vs_statictps_http_mv 1s concurrent_users_vs_dynamictps_http_mv 1s concurrent_users_vs_totalrtime_http_mv 1s concurrent_users_vs_pagertime_http_mv 1s concurrent_users_vs_staticrtime_http_mv 1s concurrent_users_vs_dynamicrtime_http_mv 1s Right now I have about 80 load runs in the database. I know the I/O performance of my hardware is not great, but even with 80 load runs there's about 5.8 million rows in the slowest view 'element_buckets_http_mv'. I'm using postgres9 which has automatic vacuuming of the database and CPU or memory is not hit hard when running refresh_matview(), it's definitely a disk bottleneck. As ground report seems to do 10 concurrent calls to refresh_matview() the disk is being heavily thrashed and running in parallel seems to be slower that serial in this case - it's taking 19 minutes to complete refreshing all the views. Trying to understand the ground report python code and postgres, which I'm new to, it seems that refresh_matview deletes all rows in a table like 'element_buckets_http_mv' before copying in new data to it. I'm wondering if that can be improved? Cheers Kelvin. |