From: Grant T. <gt...@sw...> - 2002-06-06 20:45:00
|
>>>>> Erik Arjan Hendriks <er...@he...> writes: >> - Removing seemingly uneeded icache flushes for non-executable pages >> in load_map makes no difference, time-wise, on my platform. This > That's good. I don't find it surprising though since most of the > addresses involved in the flush aren't going to be in the icache OK, I lied. Evidently that experiment was done with the wrong kernel or something. In fact it's like 5X faster, as cache fluses are fairly expensive on my (SMP) platform. Profile below. So you definitely want to stick an if around the flush_icache_range call for the benefit of those of us with expensive ipi cache nastiness: if (mmap_prot & PROT_EXEC) { flush_icache_range(page.start, page.start + PAGE_SIZE); } I also observe in my profile that the read's memcpy ("both_aligned" in my profile) and the fault's memzero ("sb1_clear_page") each take nearly half the time. I'm going to bodge together some way to skip the page clear for this case; we already know we're overwriting the whole page, so this should be OK as long as it gets carefully zeroed on error. It'll be pretty ugly, though, so I suspect you won't actually want it. With needless flushes: 218 smp_call_function 0.5240 34 both_aligned 0.3148 24 local_sb1_flush_icache_range 0.1200 10 sb1_flush_icache_range_ipi 0.2500 8 sb1_clear_page 0.2000 5 cleanup_both_aligned 0.0781 4 kunmap_high 0.0135 4 do_anonymous_page 0.0069 3 kmap_high 0.0056 2 zap_page_range 0.0017 2 sb1_copy_page 0.0167 2 pte_alloc 0.0047 2 do_shmem_file_read 0.0054 Without needless flushes: 34 both_aligned 0.3148 12 sb1_clear_page 0.3000 7 cleanup_both_aligned 0.1094 4 sb1_sanitize_tlb 0.0312 2 kmap_high 0.0037 2 __free_pages_ok 0.0017 1 zap_page_range 0.0009 1 smp_call_function 0.0024 1 shmem_getpage_locked 0.0007 1 shmem_getpage 0.0025 1 shmem_file_read 0.0078 1 sb1_copy_page 0.0083 1 sb1___flush_cache_all_ipi 0.0078 -- Grant Taylor - x285 - http://pasta/~gtaylor/ Starent Networks - +1.978.851.1185 |