From: Phil D. <ph...@ph...> - 2007-10-26 15:02:18
|
Hi Slava, Slava Pestov wrote: > Hi Phil, > > We don't have this yet. Were you able to do what you wanted to do > using mmap alone? > Yes thanks. Actually in the end I used mmap + some C code - I was writing a suffix array and couldn't find a fast way to construct it in factor. I used the code below which comes in at about 3 seconds for a 12k text file full of \0 terminated strings (the c version called from factor with mmapped text file takes 30ms). Unfortunately the text files I need to index are about a gig and mergesort is O(nlogn) - I didn't bother waiting for it to finish on a 1mb file. Is there anything obvious I can do to speed the factor version up? Many thanks, Phil --------- : pos>substring ( pos symbolsseq -- str ) 2dup 0 -rot index* swap subseq >string ; inline : suffix-compare ( apos bpos symbolsseq -- n ) [ pos>substring ] curry 2apply <=> ; inline : sort-suffix-array ( suffixarray symbolsbarray -- sortedsuffixarray ) [ suffix-compare ] curry sort ; : generate-suffix-array ( symbolsfname -- array ) dup file-length tuck [ sort-suffix-array ] with-mapped-file ; [ "testdata/data.txt" generate-suffix-array ] time --------- Here's the c code for reference: ------ static int compare_suffix_ptrs(const void *m1, const void *m2) { const char **a = (const char **)m1; const char **b = (const char **)m2; return strcasecmp(*a,*b); } void suffix_sort(unsigned int *suffixarray, unsigned int size, const char* strbuffer) { unsigned int i; // turn from array of offsets into array of pointers for (i=0; i<size;i++) suffixarray[i] += (unsigned int)strbuffer; qsort(suffixarray,size,4,compare_suffix_ptrs); // turn back into array of offsets for (i=0; i<size;i++) suffixarray[i] -= (unsigned int)strbuffer; } ------- which is called via: : generate-suffix-array-c ( symbolsfname -- array ) [ file-length dup >c-ulong-array swap dupd ] keep dup file-length [ mapped-file-address suffix_sort ] with-mapped-file ; |