You can subscribe to this list here.
2003 |
Jan
|
Feb
(17) |
Mar
(26) |
Apr
(3) |
May
(20) |
Jun
(3) |
Jul
(22) |
Aug
(15) |
Sep
(3) |
Oct
(12) |
Nov
(1) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(4) |
Feb
(11) |
Mar
(14) |
Apr
(48) |
May
(14) |
Jun
(24) |
Jul
(38) |
Aug
(17) |
Sep
(29) |
Oct
(13) |
Nov
(19) |
Dec
(21) |
2005 |
Jan
(16) |
Feb
(14) |
Mar
(23) |
Apr
(36) |
May
(15) |
Jun
(13) |
Jul
(39) |
Aug
(29) |
Sep
(5) |
Oct
(2) |
Nov
(13) |
Dec
(8) |
2006 |
Jan
(6) |
Feb
(12) |
Mar
(8) |
Apr
(34) |
May
(8) |
Jun
(36) |
Jul
(8) |
Aug
(22) |
Sep
(16) |
Oct
(54) |
Nov
(33) |
Dec
(16) |
2007 |
Jan
(8) |
Feb
(18) |
Mar
(6) |
Apr
|
May
(1) |
Jun
(12) |
Jul
(2) |
Aug
(10) |
Sep
(31) |
Oct
(1) |
Nov
(5) |
Dec
(3) |
2008 |
Jan
(3) |
Feb
(6) |
Mar
(33) |
Apr
(21) |
May
|
Jun
(8) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(13) |
Nov
(2) |
Dec
(4) |
2009 |
Jan
(3) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
|
Aug
(12) |
Sep
(4) |
Oct
|
Nov
(4) |
Dec
(2) |
2011 |
Jan
(11) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(5) |
Aug
(3) |
Sep
(9) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
(19) |
Mar
(7) |
Apr
(2) |
May
(7) |
Jun
|
Jul
(3) |
Aug
|
Sep
(2) |
Oct
(1) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
|
Dec
(1) |
2014 |
Jan
(21) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(2) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
(3) |
Nov
|
Dec
(1) |
2025 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Nai y. z. <zha...@gm...> - 2012-02-14 01:35:13
|
Hello Vedran, Thank you for your time to reply. However it seems filesystem cache is not the cause for this problem. I tried 2 times run after boot, and each is around 28K (although a little less than yesterday). Regarding SSD, I was trying to ask each read I/O to access different LBA to avoid to hit SSD cache. Any further suggestion? Thanks! Nai Yan. 2012/2/13 Vedran Degoricija <ve...@ya...> > Nai Yan, > > Your program does not specify anything about the file caching attributes, > so the data is most likely coming out of the filesystem cache. And if you > have a bunch of threads scanning through the same set of LBAs, the SSD > cache might be helping as well. > > Try running 1 thread with 1 iteration right after boot and see what > numbers you get. > > Regards, > Ved > > *From:* Nai yan zhao <zha...@gm...> > *To:* jo...@ei... > *Cc:* Iom...@li... > *Sent:* Sunday, February 12, 2012 6:59 PM > > *Subject:* Re: [Iometer-devel] Please advise - Why IOPS by IOMeter is > much slower than windows multi-threading data fetch IOPS? > > Hello Joe, > Again, thank you for your reply! I will take your suggestion and try > again. But I am very looking forward to your further investigation on > Windows system for my program. > > I trust IOMeter, but I can't explain why and where's the problem with > my program. And further speaking, would you give me some comments? > 1) What's the difference between IOmeter I/O calculation and my > program (although it's much much simpler)? From the behavior of IOMeter, it > also seems to create a file on target disk and MAYBE fetch data from that > file by pre-defined I/O size and policy. If I am wrong? > If I am not wrong, then why there's so much difference. Joe, by > your experience, if my program has any big defect? > > 2) My major purpose is to have a program in our production env. > ,which will frequently fetch data from SSD, and there are also some > additional operations/work after data fetched - this is also why you see I > put some additional work after each I/O (such as memory allocation and > de-allocation in I/O calculation). > What I expect to see, its benchmark SHOULD be less than I/OMeter > benchmark. > > Would you advise more? Is there any big defect in my program for > either doing file I/O or I/O calculation? > > Thanks in advance!! > > Nai Yan. > > > > 2012/2/13 <jo...@ei...> > > Manufacturer's quoted sequential MB/s won't be with 512byte reads. In > Iometer, try 256KB sequential reads with about 8 outstanding I/Os. That > should come closer to the maximum throughput(I doubt you'll be able to get > your laptop to actually get close to 520MB/s though). > > I'll see if I can find a windows system to try to compile/run your > program, but I can't make any promises. > > > Joe > > > Quoting Nai yan zhao <zha...@gm...>: > > Hello Joe, > Thank you again for your time! > It's wired that from IOMeter, the throughput for sequential IOPS > (512B, queue depth is 64) is ONLY 42MB/s with around 82K IOPS. However, > from that SSD official website, this SSD sequential throughput should be > around 510MB/s ( > http://www.plextoramericas.com/index.php/ssd/px-m3-series?start=1, my SSD > is 128G). If there's any parameter I didn't set correctly in IOMeter? > > As you suggested, I try to create a 12GB sample file (my test bed > memory is 6GB and without RAID) and use 1 thread to do IO. The result > is 33666; However, with I/O meter, it's 11572 (throughput this time is ONLY > 5.93MB/s); IOPS still 3 times!! > > I attach my IOMeter settings, if there's anything wrong? Also, I > attach my modified code. Joe, could you help again to see where's the > problem? > > Thank you so much!! > > Nai Yan. > > 2012/2/13 <jo...@ei...> > > 82K sounds reasonable for iops on an SSD. You should check the specs of > your drive to see what you should expect. > > You need to remember that you are doing file i/o so you have several > layers of cache involved. think of it was file cache -> block cache -> > controller cache -> drive cache (you aren't testing a HW RAID, so you > probably don't have cache in you controller) My personal run of thumb for > random I/O is to have my file size be about 3x my combined cache size. For > example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB I'd > do a 16GB file. > > If in iometer you are accessing a PHYSICALDISK, then you are avoiding > window's file cache. > > I just pulled up the code and (keep in mind I'm not much of a windows guy) > something looks odd in your GetSecs routine. The cast to double is going to > lose resolution, I think I would store the start/end times as > LARGE_INTEGER. And you probably only have to call the frequency routine > once > > Also windows used to have issues in the HAL where if a thread got moved to > a different processor you'd get odd results. There is a Windows API call > for setting affinity, similar to the linux sched_set_affinity. > > This doesn't really matter for what we are talking about, it is just a pet > peeve of mine, your "delete c;" should be "delete [] c;" (are you intending > tp be timing your allocator calls as well? you may be if you are simulating > system performance, but typically for disk performance you'd try to > preallocate as much as possible so your only timing the transfers) > > > If it were me I would start with something simplier, (say single threaded > sequential read) and see if your program gets the correct values then. You > could also fire up windows performance monitor and try to correlate to its > counts as well (PHYSICALDISK transfers/sec). > > Good Luck, > > Joe > > > > Quoting Nai yan zhao <zha...@gm...>: > > Hello Fabian and Joe, > > Thank you so much for your reply. > > Actually, what I am trying to do, is to split a file into 32 parts, > and each part will be assigned to a thread to read. Each thread each time > to open file, read 512B, and close file. I was trying to avoid 2 read > I/Os > hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although most > read I/Os are ordered but not > contiguous<http://en.**wikiped**ia.org/wiki/Contiguity#**** > Computer_science <http://wikipedia.org/wiki/Contiguity#**Computer_science> > <http://en.**wikipedia.org/wiki/Contiguity#**Computer_science<http://en.wikipedia.org/wiki/Contiguity#Computer_science>> > > > > > . > > > By your suggestion, I tried 512B sequential I/O with settings below, > > Max disk size - 8388608 > # of Outstanding I/O - 32 (for 64, it's also around 82K) > Transfer request size - 512B, > 100% sequential > Reply size - no reply > Align I/Os on - Sector boundaries > > The result is around 82K, still much slower than my program. > > If my program has any defect in calculating IOPS? Or if I have any > misunderstanding of caching of SSD or file system, which causes my program > fetches data most from RAM of SSD? Or what parameters I should set in I/O > meter to simulate my program I/O? > > Thank you again in advance for your time to help investigate it!! > > Nai Yan. > > 2012/2/11 Fabian Tillier <fa...@ti...> > > If I read the test correctly, all threads start at offset 0, and then > > perform 512b reads with a 1024b stride between reads. As Joe said, > this is pretty much sequential reading, and all threads are reading > the same data, so most are likely to be satisifed from cache, either > in the OS or on the SSD itself. They'll do 320000/16=20000 IO > operations total each, so end up reading 20MB of the file. It's quite > likely that the whole 20MB that you are reading will sit happilly in > the file cache. > > Create an access pattern that mimics your app (512b sequential with > 1024b stride), create 32 workers, and see if the results are similar. > Best would be if you created a test file of 20MB, too. You can then > see how things compare if you go with async I/O and a single thread. > > Cheers, > -Fab > > On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: > > Forgive me if I missed it, but I don't see any randomization in your > > file reads. > > > > It looks like you just skip ahead so thread 0 reads the first > > 512bytes, thread 1 the next 512b. So any storage will be prefetching > > very effectively. > > > > Tell Iometer to do sequential instead of random and see how much > > closer the numbers are. Or better yet, make your program randomize > > its reads over the entire disk. > > > > Joe > > > > > > Quoting Nai yan zhao <zha...@gm...>: > > > >> Greetings, > >> Could anybody help me a little out of my difficulty? > >> > >> I have a SSD and I am trying to use it to simulate my program I/O > >> performance, however, IOPS calculated from my program is much much > faster > >> than IOMeter. > >> > >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read > >> IOPS is around 94k (queue depth is 32). > >> However my program (32 windows threads) can reach around 500k > 512B > >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't > find > >> any error in data fetching. It's because my data fetching in order? > >> > >> I paste my code belwo (it mainly fetch 512B from file and release > it; > >> I did use 4bytes (an int) to validate program logic and didn't find > >> problem), can anybody help me figure out where I am wrong? > >> > >> Thanks so much in advance!! > >> > >> Nai Yan. > >> > >> #include <stdio.h> > >> #include <Windows.h> > >> /* > >> ** Purpose: Verify file random read IOPS in comparison with IOMeter > >> ** Author: Nai Yan > >> ** Date: Feb. 9th, 2012 > >> **/ > >> //Global variables > >> long completeIOs = 0; > >> long completeBytes = 0; > >> int threadCount = 32; > >> unsigned long long length = 1073741824; //4G test > file > >> int interval = 1024; > >> int resultArrayLen = 320000; > >> int *result = new int[resultArrayLen]; > >> //Method declarison > >> double GetSecs(void); //Calculate out duration > >> int InitPool(long long,char*,int); //Initialize test data for > >> testing, if successful, return 1; otherwise, return a non 1 value. > >> int * FileRead(char * path); > >> unsigned int DataVerification(int*, int sampleItem); > >> //Verify data fetched from pool > >> int main() > >> { > >> int sampleItem = 0x1; > >> char * fPath = "G:\\workspace\\4G.bin"; > >> unsigned int invalidIO = 0; > >> if (InitPool(length,fPath,****sampleItem)!= 1) > > >> printf("File write err... \n"); > >> //start do random I/Os from initialized file > >> double start = GetSecs(); > >> int * fetchResult = FileRead(fPath); > >> double end = GetSecs(); > >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - > start)); > >> //start data validation, for 4 bytes fetch only > >> // invalidIO = DataVerification(fetchResult,****sampleItem); > > >> // if (invalidIO !=0) > >> // { > >> // printf("Total invalid data fetch IOs are %d", invalidIO); > >> // } > >> return 0; > >> } > >> > >> > >> int InitPool(long long length, char* path, int sample) > >> { > >> printf("Start initializing test data ... \n"); > >> FILE * fp = fopen(path,"wb"); > >> if (fp == NULL) > >> { > >> printf("file open err... \n"); > >> exit (-1); > >> } > >> else //initialize file for testing > >> { > >> fseek(fp,0L,SEEK_SET); > >> for (int i=0; i<length; i++) > >> { > >> fwrite(&sample,sizeof(int),1,****fp); > > >> } > >> fclose(fp); > >> fp = NULL; > >> printf("Data initialization is complete...\n"); > >> return 1; > >> } > >> } > >> double GetSecs(void) > >> { > >> LARGE_INTEGER frequency; > >> LARGE_INTEGER start; > >> if(! QueryPerformanceFrequency(&****frequency)) > >> printf("****QueryPerformanceFrequency Failed\n"); > >> if(! QueryPerformanceCounter(&****start)) > >> printf("****QueryPerformanceCounter Failed\n"); > >> return ((double)start.QuadPart/(****double)frequency.QuadPart); > > >> } > >> class input > >> { > >> public: > >> char *path; > >> int starting; > >> input (int st, char * filePath):starting(st),path(****filePath){} > > >> }; > >> //Workers > >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) > >> { > >> input * in = (input*) lpThreadParameter; > >> char* path = in->path; > >> FILE * fp = fopen(path,"rb"); > >> int sPos = in->starting; > >> // int * result = in->r; > >> if(fp != NULL) > >> { > >> fpos_t pos; > >> for (int i=0; i<resultArrayLen/threadCount;****i++) > > >> { > >> pos = i * interval; > >> fsetpos(fp,&pos); > >> //For 512 bytes fetch each time > >> unsigned char *c =new unsigned char [512]; > >> if (fread(c,512,1,fp) ==1) > >> { > >> InterlockedIncrement(&****completeIOs); > > >> delete c; > >> } > >> //For 4 bytes fetch each time > >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) > >> { > >> InterlockedIncrement(&****completeIOs); > > >> }*/ > >> else > >> { > >> printf("file read err...\n"); > >> exit(-1); > >> } > >> } > >> fclose(fp); > >> fp = NULL; > >> } > >> else > >> { > >> printf("File open err... \n"); > >> exit(-1); > >> } > >> } > >> int * FileRead(char * p) > >> { > >> printf("Starting reading file ... \n"); > >> HANDLE mWorkThread[256]; //max 256 threads > >> completeIOs = 0; > >> int slice = int (resultArrayLen/threadCount); > >> for(int i = 0; i < threadCount; i++) > >> { > >> mWorkThread[i] = CreateThread( > >> NULL, > >> 0, > >> FileReadThreadEntry, > >> (LPVOID)(new input(i*slice,p)), > >> 0, > >> NULL); > >> } > >> WaitForMultipleObjects(****threadCount, mWorkThread, TRUE, > INFINITE); > > >> printf("File read complete... \n"); > >> return result; > >> } > >> unsigned int DataVerification(int* result, int sampleItem) > >> { > >> unsigned int invalid = 0; > >> for (int i=0; i< resultArrayLen/interval;i++) > >> { > >> if (result[i]!=sampleItem) > >> { > >> invalid ++; > >> continue; > >> } > >> } > >> return invalid; > >> } > >> > > > > > > > > > > > ------------------------------****----------------------------**--** > > ------------------ > > Virtualization & Cloud Management Using Capacity Planning > > Cloud computing makes use of virtualization - but cloud computing > > also focuses on allowing computing to be delivered as a service. > > http://www.accelacomm.com/jaw/****sfnl/114/51521223/<http://www.accelacomm.com/jaw/**sfnl/114/51521223/> > <http://**www.accelacomm.com/jaw/sfnl/**114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> > > > > ______________________________****_________________ > > Iometer-devel mailing list > > Iometer-devel@lists.**sourcefo**rge.net <http://sourceforge.net/>< > Iometer-devel@lists.**sourceforge.net<Iom...@li...> > > > > https://lists.sourceforge.net/****lists/listinfo/iometer-devel<https://lists.sourceforge.net/**lists/listinfo/iometer-devel> > **<https://lists.sourceforge.**net/lists/listinfo/iometer-**devel<https://lists.sourceforge.net/lists/listinfo/iometer-devel> > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > > _______________________________________________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel > > > |
From: Fabian T. <fa...@ti...> - 2012-02-13 21:33:19
|
Hi Nai Yan, 2012/2/12 Nai yan zhao <zha...@gm...>: > Hello Joe, > Again, thank you for your reply! I will take your suggestion and try > again. But I am very looking forward to your further investigation on > Windows system for my program. > > I trust IOMeter, but I can't explain why and where's the problem with > my program. And further speaking, would you give me some comments? > 1) What's the difference between IOmeter I/O calculation and my program > (although it's much much simpler)? From the behavior of IOMeter, it also > seems to create a file on target disk and MAYBE fetch data from that file by > pre-defined I/O size and policy. If I am wrong? > If I am not wrong, then why there's so much difference. Joe, by > your experience, if my program has any big defect? You are ignoring the starting postion in your calls to set the file positon: pos = i * interval; <---- you need to change this to pos = sPos + (i * interval); fsetpos(fp,&pos); You'd also be better off hoisting you buffer allocation out of the for loop. Cheers, -Fab > 2) My major purpose is to have a program in our production env. ,which > will frequently fetch data from SSD, and there are also some additional > operations/work after data fetched - this is also why you see I put some > additional work after each I/O (such as memory allocation and de-allocation > in I/O calculation). > What I expect to see, its benchmark SHOULD be less than I/OMeter > benchmark. > > Would you advise more? Is there any big defect in my program for either > doing file I/O or I/O calculation? > > Thanks in advance!! > > Nai Yan. > > > > 2012/2/13 <jo...@ei...> > >> Manufacturer's quoted sequential MB/s won't be with 512byte reads. In >> Iometer, try 256KB sequential reads with about 8 outstanding I/Os. That >> should come closer to the maximum throughput(I doubt you'll be able to get >> your laptop to actually get close to 520MB/s though). >> >> I'll see if I can find a windows system to try to compile/run your >> program, but I can't make any promises. >> >> >> Joe >> >> >> Quoting Nai yan zhao <zha...@gm...>: >> >>> Hello Joe, >>> Thank you again for your time! >>> It's wired that from IOMeter, the throughput for sequential IOPS >>> (512B, queue depth is 64) is ONLY 42MB/s with around 82K IOPS. However, >>> from that SSD official website, this SSD sequential throughput should be >>> around 510MB/s ( >>> http://www.plextoramericas.com/index.php/ssd/px-m3-series?start=1, my SSD >>> is 128G). If there's any parameter I didn't set correctly in IOMeter? >>> >>> As you suggested, I try to create a 12GB sample file (my test bed >>> memory is 6GB and without RAID) and use 1 thread to do IO. The result >>> is 33666; However, with I/O meter, it's 11572 (throughput this time is >>> ONLY >>> 5.93MB/s); IOPS still 3 times!! >>> >>> I attach my IOMeter settings, if there's anything wrong? Also, I >>> attach my modified code. Joe, could you help again to see where's the >>> problem? >>> >>> Thank you so much!! >>> >>> Nai Yan. >>> >>> 2012/2/13 <jo...@ei...> >>> >>>> 82K sounds reasonable for iops on an SSD. You should check the specs of >>>> your drive to see what you should expect. >>>> >>>> You need to remember that you are doing file i/o so you have several >>>> layers of cache involved. think of it was file cache -> block cache -> >>>> controller cache -> drive cache (you aren't testing a HW RAID, so you >>>> probably don't have cache in you controller) My personal run of thumb >>>> for >>>> random I/O is to have my file size be about 3x my combined cache size. >>>> For >>>> example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB >>>> I'd >>>> do a 16GB file. >>>> >>>> If in iometer you are accessing a PHYSICALDISK, then you are avoiding >>>> window's file cache. >>>> >>>> I just pulled up the code and (keep in mind I'm not much of a windows >>>> guy) >>>> something looks odd in your GetSecs routine. The cast to double is going >>>> to >>>> lose resolution, I think I would store the start/end times as >>>> LARGE_INTEGER. And you probably only have to call the frequency routine >>>> once >>>> >>>> Also windows used to have issues in the HAL where if a thread got moved >>>> to >>>> a different processor you'd get odd results. There is a Windows API call >>>> for setting affinity, similar to the linux sched_set_affinity. >>>> >>>> This doesn't really matter for what we are talking about, it is just a >>>> pet >>>> peeve of mine, your "delete c;" should be "delete [] c;" (are you >>>> intending >>>> tp be timing your allocator calls as well? you may be if you are >>>> simulating >>>> system performance, but typically for disk performance you'd try to >>>> preallocate as much as possible so your only timing the transfers) >>>> >>>> >>>> If it were me I would start with something simplier, (say single >>>> threaded >>>> sequential read) and see if your program gets the correct values then. >>>> You >>>> could also fire up windows performance monitor and try to correlate to >>>> its >>>> counts as well (PHYSICALDISK transfers/sec). >>>> >>>> Good Luck, >>>> >>>> Joe >>>> >>>> >>>> >>>> Quoting Nai yan zhao <zha...@gm...>: >>>> >>>> Hello Fabian and Joe, >>>>> >>>>> Thank you so much for your reply. >>>>> >>>>> Actually, what I am trying to do, is to split a file into 32 parts, >>>>> and each part will be assigned to a thread to read. Each thread each >>>>> time >>>>> to open file, read 512B, and close file. I was trying to avoid 2 read >>>>> I/Os >>>>> hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although >>>>> most >>>>> read I/Os are ordered but not >>>>> >>>>> contiguous<http://en.**wikipedia.org/wiki/Contiguity#**Computer_science<http://en.wikipedia.org/wiki/Contiguity#Computer_science> >>>>> >>>>> > >>>>> . >>>>> >>>>> >>>>> By your suggestion, I tried 512B sequential I/O with settings below, >>>>> >>>>> Max disk size - 8388608 >>>>> # of Outstanding I/O - 32 (for 64, it's also around 82K) >>>>> Transfer request size - 512B, >>>>> 100% sequential >>>>> Reply size - no reply >>>>> Align I/Os on - Sector boundaries >>>>> >>>>> The result is around 82K, still much slower than my program. >>>>> >>>>> If my program has any defect in calculating IOPS? Or if I have any >>>>> misunderstanding of caching of SSD or file system, which causes my >>>>> program >>>>> fetches data most from RAM of SSD? Or what parameters I should set in >>>>> I/O >>>>> meter to simulate my program I/O? >>>>> >>>>> Thank you again in advance for your time to help investigate it!! >>>>> >>>>> Nai Yan. >>>>> >>>>> 2012/2/11 Fabian Tillier <fa...@ti...> >>>>> >>>>> If I read the test correctly, all threads start at offset 0, and then >>>>>> >>>>>> perform 512b reads with a 1024b stride between reads. As Joe said, >>>>>> this is pretty much sequential reading, and all threads are reading >>>>>> the same data, so most are likely to be satisifed from cache, either >>>>>> in the OS or on the SSD itself. They'll do 320000/16=20000 IO >>>>>> operations total each, so end up reading 20MB of the file. It's quite >>>>>> likely that the whole 20MB that you are reading will sit happilly in >>>>>> the file cache. >>>>>> >>>>>> Create an access pattern that mimics your app (512b sequential with >>>>>> 1024b stride), create 32 workers, and see if the results are similar. >>>>>> Best would be if you created a test file of 20MB, too. You can then >>>>>> see how things compare if you go with async I/O and a single thread. >>>>>> >>>>>> Cheers, >>>>>> -Fab >>>>>> >>>>>> On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: >>>>>> > Forgive me if I missed it, but I don't see any randomization in your >>>>>> > file reads. >>>>>> > >>>>>> > It looks like you just skip ahead so thread 0 reads the first >>>>>> > 512bytes, thread 1 the next 512b. So any storage will be >>>>>> > prefetching >>>>>> > very effectively. >>>>>> > >>>>>> > Tell Iometer to do sequential instead of random and see how much >>>>>> > closer the numbers are. Or better yet, make your program randomize >>>>>> > its reads over the entire disk. >>>>>> > >>>>>> > Joe >>>>>> > >>>>>> > >>>>>> > Quoting Nai yan zhao <zha...@gm...>: >>>>>> > >>>>>> >> Greetings, >>>>>> >> Could anybody help me a little out of my difficulty? >>>>>> >> >>>>>> >> I have a SSD and I am trying to use it to simulate my program >>>>>> >> I/O >>>>>> >> performance, however, IOPS calculated from my program is much much >>>>>> faster >>>>>> >> than IOMeter. >>>>>> >> >>>>>> >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random >>>>>> >> read >>>>>> >> IOPS is around 94k (queue depth is 32). >>>>>> >> However my program (32 windows threads) can reach around 500k >>>>>> 512B >>>>>> >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't >>>>>> find >>>>>> >> any error in data fetching. It's because my data fetching in order? >>>>>> >> >>>>>> >> I paste my code belwo (it mainly fetch 512B from file and >>>>>> >> release >>>>>> it; >>>>>> >> I did use 4bytes (an int) to validate program logic and didn't find >>>>>> >> problem), can anybody help me figure out where I am wrong? >>>>>> >> >>>>>> >> Thanks so much in advance!! >>>>>> >> >>>>>> >> Nai Yan. >>>>>> >> >>>>>> >> #include <stdio.h> >>>>>> >> #include <Windows.h> >>>>>> >> /* >>>>>> >> ** Purpose: Verify file random read IOPS in comparison with >>>>>> >> IOMeter >>>>>> >> ** Author: Nai Yan >>>>>> >> ** Date: Feb. 9th, 2012 >>>>>> >> **/ >>>>>> >> //Global variables >>>>>> >> long completeIOs = 0; >>>>>> >> long completeBytes = 0; >>>>>> >> int threadCount = 32; >>>>>> >> unsigned long long length = 1073741824; //4G test >>>>>> file >>>>>> >> int interval = 1024; >>>>>> >> int resultArrayLen = 320000; >>>>>> >> int *result = new int[resultArrayLen]; >>>>>> >> //Method declarison >>>>>> >> double GetSecs(void); //Calculate out duration >>>>>> >> int InitPool(long long,char*,int); //Initialize test data >>>>>> >> for >>>>>> >> testing, if successful, return 1; otherwise, return a non 1 value. >>>>>> >> int * FileRead(char * path); >>>>>> >> unsigned int DataVerification(int*, int sampleItem); >>>>>> >> //Verify data fetched from pool >>>>>> >> int main() >>>>>> >> { >>>>>> >> int sampleItem = 0x1; >>>>>> >> char * fPath = "G:\\workspace\\4G.bin"; >>>>>> >> unsigned int invalidIO = 0; >>>>>> >> if (InitPool(length,fPath,**sampleItem)!= 1) >>>>>> >>>>>> >> printf("File write err... \n"); >>>>>> >> //start do random I/Os from initialized file >>>>>> >> double start = GetSecs(); >>>>>> >> int * fetchResult = FileRead(fPath); >>>>>> >> double end = GetSecs(); >>>>>> >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - >>>>>> start)); >>>>>> >> //start data validation, for 4 bytes fetch only >>>>>> >> // invalidIO = DataVerification(fetchResult,**sampleItem); >>>>>> >>>>>> >> // if (invalidIO !=0) >>>>>> >> // { >>>>>> >> // printf("Total invalid data fetch IOs are %d", invalidIO); >>>>>> >> // } >>>>>> >> return 0; >>>>>> >> } >>>>>> >> >>>>>> >> >>>>>> >> int InitPool(long long length, char* path, int sample) >>>>>> >> { >>>>>> >> printf("Start initializing test data ... \n"); >>>>>> >> FILE * fp = fopen(path,"wb"); >>>>>> >> if (fp == NULL) >>>>>> >> { >>>>>> >> printf("file open err... \n"); >>>>>> >> exit (-1); >>>>>> >> } >>>>>> >> else //initialize file for testing >>>>>> >> { >>>>>> >> fseek(fp,0L,SEEK_SET); >>>>>> >> for (int i=0; i<length; i++) >>>>>> >> { >>>>>> >> fwrite(&sample,sizeof(int),1,**fp); >>>>>> >>>>>> >> } >>>>>> >> fclose(fp); >>>>>> >> fp = NULL; >>>>>> >> printf("Data initialization is complete...\n"); >>>>>> >> return 1; >>>>>> >> } >>>>>> >> } >>>>>> >> double GetSecs(void) >>>>>> >> { >>>>>> >> LARGE_INTEGER frequency; >>>>>> >> LARGE_INTEGER start; >>>>>> >> if(! QueryPerformanceFrequency(&**frequency)) >>>>>> >> printf("**QueryPerformanceFrequency Failed\n"); >>>>>> >> if(! QueryPerformanceCounter(&**start)) >>>>>> >> printf("**QueryPerformanceCounter Failed\n"); >>>>>> >> return ((double)start.QuadPart/(**double)frequency.QuadPart); >>>>>> >>>>>> >> } >>>>>> >> class input >>>>>> >> { >>>>>> >> public: >>>>>> >> char *path; >>>>>> >> int starting; >>>>>> >> input (int st, char * filePath):starting(st),path(**filePath){} >>>>>> >>>>>> >> }; >>>>>> >> //Workers >>>>>> >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >>>>>> >> { >>>>>> >> input * in = (input*) lpThreadParameter; >>>>>> >> char* path = in->path; >>>>>> >> FILE * fp = fopen(path,"rb"); >>>>>> >> int sPos = in->starting; >>>>>> >> // int * result = in->r; >>>>>> >> if(fp != NULL) >>>>>> >> { >>>>>> >> fpos_t pos; >>>>>> >> for (int i=0; i<resultArrayLen/threadCount;**i++) >>>>>> >>>>>> >> { >>>>>> >> pos = i * interval; >>>>>> >> fsetpos(fp,&pos); >>>>>> >> //For 512 bytes fetch each time >>>>>> >> unsigned char *c =new unsigned char [512]; >>>>>> >> if (fread(c,512,1,fp) ==1) >>>>>> >> { >>>>>> >> InterlockedIncrement(&**completeIOs); >>>>>> >>>>>> >> delete c; >>>>>> >> } >>>>>> >> //For 4 bytes fetch each time >>>>>> >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >>>>>> >> { >>>>>> >> InterlockedIncrement(&**completeIOs); >>>>>> >>>>>> >> }*/ >>>>>> >> else >>>>>> >> { >>>>>> >> printf("file read err...\n"); >>>>>> >> exit(-1); >>>>>> >> } >>>>>> >> } >>>>>> >> fclose(fp); >>>>>> >> fp = NULL; >>>>>> >> } >>>>>> >> else >>>>>> >> { >>>>>> >> printf("File open err... \n"); >>>>>> >> exit(-1); >>>>>> >> } >>>>>> >> } >>>>>> >> int * FileRead(char * p) >>>>>> >> { >>>>>> >> printf("Starting reading file ... \n"); >>>>>> >> HANDLE mWorkThread[256]; //max 256 threads >>>>>> >> completeIOs = 0; >>>>>> >> int slice = int (resultArrayLen/threadCount); >>>>>> >> for(int i = 0; i < threadCount; i++) >>>>>> >> { >>>>>> >> mWorkThread[i] = CreateThread( >>>>>> >> NULL, >>>>>> >> 0, >>>>>> >> FileReadThreadEntry, >>>>>> >> (LPVOID)(new input(i*slice,p)), >>>>>> >> 0, >>>>>> >> NULL); >>>>>> >> } >>>>>> >> WaitForMultipleObjects(**threadCount, mWorkThread, TRUE, >>>>>> >> INFINITE); >>>>>> >>>>>> >> printf("File read complete... \n"); >>>>>> >> return result; >>>>>> >> } >>>>>> >> unsigned int DataVerification(int* result, int sampleItem) >>>>>> >> { >>>>>> >> unsigned int invalid = 0; >>>>>> >> for (int i=0; i< resultArrayLen/interval;i++) >>>>>> >> { >>>>>> >> if (result[i]!=sampleItem) >>>>>> >> { >>>>>> >> invalid ++; >>>>>> >> continue; >>>>>> >> } >>>>>> >> } >>>>>> >> return invalid; >>>>>> >> } >>>>>> >> >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> ------------------------------**------------------------------** >>>>>> >>>>>> ------------------ >>>>>> > Virtualization & Cloud Management Using Capacity Planning >>>>>> > Cloud computing makes use of virtualization - but cloud computing >>>>>> > also focuses on allowing computing to be delivered as a service. >>>>>> > >>>>>> > http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>>>> > ______________________________**_________________ >>>>>> > Iometer-devel mailing list >>>>>> > >>>>>> > Iometer-devel@lists.**sourceforge.net<Iom...@li...> >>>>>> > >>>>>> > https://lists.sourceforge.net/**lists/listinfo/iometer-devel<https://lists.sourceforge.net/lists/listinfo/iometer-devel> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > > > ------------------------------------------------------------------------------ > Try before you buy = See our experts in action! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-dev2 > _______________________________________________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel > |
From: Vedran D. <ve...@ya...> - 2012-02-13 06:20:57
|
Nai Yan, Your program does not specify anything about the file caching attributes, so the data is most likely coming out of the filesystem cache. And if you have a bunch of threads scanning through the same set of LBAs, the SSD cache might be helping as well. Try running 1 thread with 1 iteration right after boot and see what numbers you get. Regards, Ved >________________________________ >From: Nai yan zhao <zha...@gm...> >To: jo...@ei... >Cc: Iom...@li... >Sent: Sunday, February 12, 2012 6:59 PM >Subject: Re: [Iometer-devel] Please advise - Why IOPS by IOMeter is much slower than windows multi-threading data fetch IOPS? > > >Hello Joe, > Again, thank you for your reply! I will take your suggestion and try again. But I am very looking forward to your further investigation on Windows system for my program. > > > I trust IOMeter, but I can't explain why and where's the problem with my program. And further speaking, would you give me some comments? > 1) What's the difference between IOmeter I/O calculation and my program (although it's much much simpler)? From the behavior of IOMeter, it also seems to create a file on target disk and MAYBE fetch data from that file by pre-defined I/O size and policy. If I am wrong? > If I am not wrong, then why there's so much difference. Joe, by your experience, if my program has any big defect? > > > 2) My major purpose is to have a program in our production env. ,which will frequently fetch data from SSD, and there are also some additional operations/work after data fetched - this is also why you see I put some additional work after each I/O (such as memory allocation and de-allocation in I/O calculation). > What I expect to see, its benchmark SHOULD be less than I/OMeter benchmark. > > > Would you advise more? Is there any big defect in my program for either doing file I/O or I/O calculation? > > > Thanks in advance!! > > >Nai Yan. > > > > > > >2012/2/13 <jo...@ei...> > >Manufacturer's quoted sequential MB/s won't be with 512byte reads. In Iometer, try 256KB sequential reads with about 8 outstanding I/Os. That should come closer to the maximum throughput(I doubt you'll be able to get your laptop to actually get close to 520MB/s though). >> >>I'll see if I can find a windows system to try to compile/run your program, but I can't make any promises. >> >> >>Joe >> >> >>Quoting Nai yan zhao <zha...@gm...>: >> >> >>Hello Joe, >>> Thank you again for your time! >>> It's wired that from IOMeter, the throughput for sequential IOPS >>>(512B, queue depth is 64) is ONLY 42MB/s with around 82K IOPS. However, >>>from that SSD official website, this SSD sequential throughput should be >>>around 510MB/s ( >>>http://www.plextoramericas.com/index.php/ssd/px-m3-series?start=1, my SSD >>>is 128G). If there's any parameter I didn't set correctly in IOMeter? >>> >>> As you suggested, I try to create a 12GB sample file (my test bed >>>memory is 6GB and without RAID) and use 1 thread to do IO. The result >>>is 33666; However, with I/O meter, it's 11572 (throughput this time is ONLY >>>5.93MB/s); IOPS still 3 times!! >>> >>> I attach my IOMeter settings, if there's anything wrong? Also, I >>>attach my modified code. Joe, could you help again to see where's the >>>problem? >>> >>> Thank you so much!! >>> >>>Nai Yan. >>> >>>2012/2/13 <jo...@ei...> >>> >>> >>>82K sounds reasonable for iops on an SSD. You should check the specs of >>>>your drive to see what you should expect. >>>> >>>>You need to remember that you are doing file i/o so you have several >>>>layers of cache involved. think of it was file cache -> block cache -> >>>>controller cache -> drive cache (you aren't testing a HW RAID, so you >>>>probably don't have cache in you controller) My personal run of thumb for >>>>random I/O is to have my file size be about 3x my combined cache size. For >>>>example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB I'd >>>>do a 16GB file. >>>> >>>>If in iometer you are accessing a PHYSICALDISK, then you are avoiding >>>>window's file cache. >>>> >>>>I just pulled up the code and (keep in mind I'm not much of a windows guy) >>>>something looks odd in your GetSecs routine. The cast to double is going to >>>>lose resolution, I think I would store the start/end times as >>>>LARGE_INTEGER. And you probably only have to call the frequency routine once >>>> >>>>Also windows used to have issues in the HAL where if a thread got moved to >>>>a different processor you'd get odd results. There is a Windows API call >>>>for setting affinity, similar to the linux sched_set_affinity. >>>> >>>>This doesn't really matter for what we are talking about, it is just a pet >>>>peeve of mine, your "delete c;" should be "delete [] c;" (are you intending >>>>tp be timing your allocator calls as well? you may be if you are simulating >>>>system performance, but typically for disk performance you'd try to >>>>preallocate as much as possible so your only timing the transfers) >>>> >>>> >>>>If it were me I would start with something simplier, (say single threaded >>>>sequential read) and see if your program gets the correct values then. You >>>>could also fire up windows performance monitor and try to correlate to its >>>>counts as well (PHYSICALDISK transfers/sec). >>>> >>>>Good Luck, >>>> >>>>Joe >>>> >>>> >>>> >>>>Quoting Nai yan zhao <zha...@gm...>: >>>> >>>> Hello Fabian and Joe, >>>> >>>> Thank you so much for your reply. >>>>> >>>>> Actually, what I am trying to do, is to split a file into 32 parts, >>>>>and each part will be assigned to a thread to read. Each thread each time >>>>>to open file, read 512B, and close file. I was trying to avoid 2 read >>>>>I/Os >>>>>hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although most >>>>>read I/Os are ordered but not >>>>>contiguous<http://en.**wikipedia.org/wiki/Contiguity#**Computer_science<http://en.wikipedia.org/wiki/Contiguity#Computer_science> >>>>> >>>>>> >>>>>. >>>>> >>>>> >>>>> By your suggestion, I tried 512B sequential I/O with settings below, >>>>> >>>>> Max disk size - 8388608 >>>>> # of Outstanding I/O - 32 (for 64, it's also around 82K) >>>>> Transfer request size - 512B, >>>>> 100% sequential >>>>> Reply size - no reply >>>>> Align I/Os on - Sector boundaries >>>>> >>>>> The result is around 82K, still much slower than my program. >>>>> >>>>> If my program has any defect in calculating IOPS? Or if I have any >>>>>misunderstanding of caching of SSD or file system, which causes my program >>>>>fetches data most from RAM of SSD? Or what parameters I should set in I/O >>>>>meter to simulate my program I/O? >>>>> >>>>> Thank you again in advance for your time to help investigate it!! >>>>> >>>>>Nai Yan. >>>>> >>>>>2012/2/11 Fabian Tillier <fa...@ti...> >>>>> >>>>> If I read the test correctly, all threads start at offset 0, and then >>>>> >>>>>perform 512b reads with a 1024b stride between reads. As Joe said, >>>>>>this is pretty much sequential reading, and all threads are reading >>>>>>the same data, so most are likely to be satisifed from cache, either >>>>>>in the OS or on the SSD itself. They'll do 320000/16=20000 IO >>>>>>operations total each, so end up reading 20MB of the file. It's quite >>>>>>likely that the whole 20MB that you are reading will sit happilly in >>>>>>the file cache. >>>>>> >>>>>>Create an access pattern that mimics your app (512b sequential with >>>>>>1024b stride), create 32 workers, and see if the results are similar. >>>>>>Best would be if you created a test file of 20MB, too. You can then >>>>>>see how things compare if you go with async I/O and a single thread. >>>>>> >>>>>>Cheers, >>>>>>-Fab >>>>>> >>>>>>On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: >>>>>>> Forgive me if I missed it, but I don't see any randomization in your >>>>>>> file reads. >>>>>>> >>>>>>> It looks like you just skip ahead so thread 0 reads the first >>>>>>> 512bytes, thread 1 the next 512b. So any storage will be prefetching >>>>>>> very effectively. >>>>>>> >>>>>>> Tell Iometer to do sequential instead of random and see how much >>>>>>> closer the numbers are. Or better yet, make your program randomize >>>>>>> its reads over the entire disk. >>>>>>> >>>>>>> Joe >>>>>>> >>>>>>> >>>>>>> Quoting Nai yan zhao <zha...@gm...>: >>>>>>> >>>>>>>> Greetings, >>>>>>>> Could anybody help me a little out of my difficulty? >>>>>>>> >>>>>>>> I have a SSD and I am trying to use it to simulate my program I/O >>>>>>>> performance, however, IOPS calculated from my program is much much >>>>>>faster >>>>>>>> than IOMeter. >>>>>>>> >>>>>>>> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read >>>>>>>> IOPS is around 94k (queue depth is 32). >>>>>>>> However my program (32 windows threads) can reach around 500k >>>>>>512B >>>>>>>> IOPS, around 5 times of IOMeter!!! I did data validation but didn't >>>>>>find >>>>>>>> any error in data fetching. It's because my data fetching in order? >>>>>>>> >>>>>>>> I paste my code belwo (it mainly fetch 512B from file and release >>>>>>it; >>>>>>>> I did use 4bytes (an int) to validate program logic and didn't find >>>>>>>> problem), can anybody help me figure out where I am wrong? >>>>>>>> >>>>>>>> Thanks so much in advance!! >>>>>>>> >>>>>>>> Nai Yan. >>>>>>>> >>>>>>>> #include <stdio.h> >>>>>>>> #include <Windows.h> >>>>>>>> /* >>>>>>>> ** Purpose: Verify file random read IOPS in comparison with IOMeter >>>>>>>> ** Author: Nai Yan >>>>>>>> ** Date: Feb. 9th, 2012 >>>>>>>> **/ >>>>>>>> //Global variables >>>>>>>> long completeIOs = 0; >>>>>>>> long completeBytes = 0; >>>>>>>> int threadCount = 32; >>>>>>>> unsigned long long length = 1073741824; //4G test >>>>>>file >>>>>>>> int interval = 1024; >>>>>>>> int resultArrayLen = 320000; >>>>>>>> int *result = new int[resultArrayLen]; >>>>>>>> //Method declarison >>>>>>>> double GetSecs(void); //Calculate out duration >>>>>>>> int InitPool(long long,char*,int); //Initialize test data for >>>>>>>> testing, if successful, return 1; otherwise, return a non 1 value. >>>>>>>> int * FileRead(char * path); >>>>>>>> unsigned int DataVerification(int*, int sampleItem); >>>>>>>> //Verify data fetched from pool >>>>>>>> int main() >>>>>>>> { >>>>>>>> int sampleItem = 0x1; >>>>>>>> char * fPath = "G:\\workspace\\4G.bin"; >>>>>>>> unsigned int invalidIO = 0; >>>>>>>> if (InitPool(length,fPath,**sampleItem)!= 1) >>>>>> >>>>>>>> printf("File write err... \n"); >>>>>>>> //start do random I/Os from initialized file >>>>>>>> double start = GetSecs(); >>>>>>>> int * fetchResult = FileRead(fPath); >>>>>>>> double end = GetSecs(); >>>>>>>> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - >>>>>>start)); >>>>>>>> //start data validation, for 4 bytes fetch only >>>>>>>> // invalidIO = DataVerification(fetchResult,**sampleItem); >>>>>> >>>>>>>> // if (invalidIO !=0) >>>>>>>> // { >>>>>>>> // printf("Total invalid data fetch IOs are %d", invalidIO); >>>>>>>> // } >>>>>>>> return 0; >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> int InitPool(long long length, char* path, int sample) >>>>>>>> { >>>>>>>> printf("Start initializing test data ... \n"); >>>>>>>> FILE * fp = fopen(path,"wb"); >>>>>>>> if (fp == NULL) >>>>>>>> { >>>>>>>> printf("file open err... \n"); >>>>>>>> exit (-1); >>>>>>>> } >>>>>>>> else //initialize file for testing >>>>>>>> { >>>>>>>> fseek(fp,0L,SEEK_SET); >>>>>>>> for (int i=0; i<length; i++) >>>>>>>> { >>>>>>>> fwrite(&sample,sizeof(int),1,**fp); >>>>>> >>>>>>>> } >>>>>>>> fclose(fp); >>>>>>>> fp = NULL; >>>>>>>> printf("Data initialization is complete...\n"); >>>>>>>> return 1; >>>>>>>> } >>>>>>>> } >>>>>>>> double GetSecs(void) >>>>>>>> { >>>>>>>> LARGE_INTEGER frequency; >>>>>>>> LARGE_INTEGER start; >>>>>>>> if(! QueryPerformanceFrequency(&**frequency)) >>>>>>>> printf("**QueryPerformanceFrequency Failed\n"); >>>>>>>> if(! QueryPerformanceCounter(&**start)) >>>>>>>> printf("**QueryPerformanceCounter Failed\n"); >>>>>>>> return ((double)start.QuadPart/(**double)frequency.QuadPart); >>>>>> >>>>>>>> } >>>>>>>> class input >>>>>>>> { >>>>>>>> public: >>>>>>>> char *path; >>>>>>>> int starting; >>>>>>>> input (int st, char * filePath):starting(st),path(**filePath){} >>>>>> >>>>>>>> }; >>>>>>>> //Workers >>>>>>>> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >>>>>>>> { >>>>>>>> input * in = (input*) lpThreadParameter; >>>>>>>> char* path = in->path; >>>>>>>> FILE * fp = fopen(path,"rb"); >>>>>>>> int sPos = in->starting; >>>>>>>> // int * result = in->r; >>>>>>>> if(fp != NULL) >>>>>>>> { >>>>>>>> fpos_t pos; >>>>>>>> for (int i=0; i<resultArrayLen/threadCount;**i++) >>>>>> >>>>>>>> { >>>>>>>> pos = i * interval; >>>>>>>> fsetpos(fp,&pos); >>>>>>>> //For 512 bytes fetch each time >>>>>>>> unsigned char *c =new unsigned char [512]; >>>>>>>> if (fread(c,512,1,fp) ==1) >>>>>>>> { >>>>>>>> InterlockedIncrement(&**completeIOs); >>>>>> >>>>>>>> delete c; >>>>>>>> } >>>>>>>> //For 4 bytes fetch each time >>>>>>>> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >>>>>>>> { >>>>>>>> InterlockedIncrement(&**completeIOs); >>>>>> >>>>>>>> }*/ >>>>>>>> else >>>>>>>> { >>>>>>>> printf("file read err...\n"); >>>>>>>> exit(-1); >>>>>>>> } >>>>>>>> } >>>>>>>> fclose(fp); >>>>>>>> fp = NULL; >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> printf("File open err... \n"); >>>>>>>> exit(-1); >>>>>>>> } >>>>>>>> } >>>>>>>> int * FileRead(char * p) >>>>>>>> { >>>>>>>> printf("Starting reading file ... \n"); >>>>>>>> HANDLE mWorkThread[256]; //max 256 threads >>>>>>>> completeIOs = 0; >>>>>>>> int slice = int (resultArrayLen/threadCount); >>>>>>>> for(int i = 0; i < threadCount; i++) >>>>>>>> { >>>>>>>> mWorkThread[i] = CreateThread( >>>>>>>> NULL, >>>>>>>> 0, >>>>>>>> FileReadThreadEntry, >>>>>>>> (LPVOID)(new input(i*slice,p)), >>>>>>>> 0, >>>>>>>> NULL); >>>>>>>> } >>>>>>>> WaitForMultipleObjects(**threadCount, mWorkThread, TRUE, INFINITE); >>>>>> >>>>>>>> printf("File read complete... \n"); >>>>>>>> return result; >>>>>>>> } >>>>>>>> unsigned int DataVerification(int* result, int sampleItem) >>>>>>>> { >>>>>>>> unsigned int invalid = 0; >>>>>>>> for (int i=0; i< resultArrayLen/interval;i++) >>>>>>>> { >>>>>>>> if (result[i]!=sampleItem) >>>>>>>> { >>>>>>>> invalid ++; >>>>>>>> continue; >>>>>>>> } >>>>>>>> } >>>>>>>> return invalid; >>>>>>>> } >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>------------------------------**------------------------------** >>>>>> >>>>>>------------------ >>>>>>> Virtualization & Cloud Management Using Capacity Planning >>>>>>> Cloud computing makes use of virtualization - but cloud computing >>>>>>> also focuses on allowing computing to be delivered as a service. >>>>>>> http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>>>>> ______________________________**_________________ >>>>>>> Iometer-devel mailing list >>>>>>> Iometer-devel@lists.**sourceforge.net<Iom...@li...> >>>>>>> https://lists.sourceforge.net/**lists/listinfo/iometer-devel<https://lists.sourceforge.net/lists/listinfo/iometer-devel> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > >------------------------------------------------------------------------------ >Try before you buy = See our experts in action! >The most comprehensive online learning library for Microsoft developers >is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >Metro Style Apps, more. Free future releases when you subscribe now! >http://p.sf.net/sfu/learndevnow-dev2 >_______________________________________________ >Iometer-devel mailing list >Iom...@li... >https://lists.sourceforge.net/lists/listinfo/iometer-devel > > > |
From: Nai y. z. <zha...@gm...> - 2012-02-13 03:00:05
|
Hello Joe, Again, thank you for your reply! I will take your suggestion and try again. But I am very looking forward to your further investigation on Windows system for my program. I trust IOMeter, but I can't explain why and where's the problem with my program. And further speaking, would you give me some comments? 1) What's the difference between IOmeter I/O calculation and my program (although it's much much simpler)? From the behavior of IOMeter, it also seems to create a file on target disk and MAYBE fetch data from that file by pre-defined I/O size and policy. If I am wrong? If I am not wrong, then why there's so much difference. Joe, by your experience, if my program has any big defect? 2) My major purpose is to have a program in our production env. ,which will frequently fetch data from SSD, and there are also some additional operations/work after data fetched - this is also why you see I put some additional work after each I/O (such as memory allocation and de-allocation in I/O calculation). What I expect to see, its benchmark SHOULD be less than I/OMeter benchmark. Would you advise more? Is there any big defect in my program for either doing file I/O or I/O calculation? Thanks in advance!! Nai Yan. 2012/2/13 <jo...@ei...> > Manufacturer's quoted sequential MB/s won't be with 512byte reads. In > Iometer, try 256KB sequential reads with about 8 outstanding I/Os. That > should come closer to the maximum throughput(I doubt you'll be able to get > your laptop to actually get close to 520MB/s though). > > I'll see if I can find a windows system to try to compile/run your > program, but I can't make any promises. > > > Joe > > > Quoting Nai yan zhao <zha...@gm...>: > > Hello Joe, >> Thank you again for your time! >> It's wired that from IOMeter, the throughput for sequential IOPS >> (512B, queue depth is 64) is ONLY 42MB/s with around 82K IOPS. However, >> from that SSD official website, this SSD sequential throughput should be >> around 510MB/s ( >> http://www.plextoramericas.**com/index.php/ssd/px-m3-**series?start=1<http://www.plextoramericas.com/index.php/ssd/px-m3-series?start=1>, >> my SSD >> is 128G). If there's any parameter I didn't set correctly in IOMeter? >> >> As you suggested, I try to create a 12GB sample file (my test bed >> memory is 6GB and without RAID) and use 1 thread to do IO. The result >> is 33666; However, with I/O meter, it's 11572 (throughput this time is >> ONLY >> 5.93MB/s); IOPS still 3 times!! >> >> I attach my IOMeter settings, if there's anything wrong? Also, I >> attach my modified code. Joe, could you help again to see where's the >> problem? >> >> Thank you so much!! >> >> Nai Yan. >> >> 2012/2/13 <jo...@ei...> >> >> 82K sounds reasonable for iops on an SSD. You should check the specs of >>> your drive to see what you should expect. >>> >>> You need to remember that you are doing file i/o so you have several >>> layers of cache involved. think of it was file cache -> block cache -> >>> controller cache -> drive cache (you aren't testing a HW RAID, so you >>> probably don't have cache in you controller) My personal run of thumb for >>> random I/O is to have my file size be about 3x my combined cache size. >>> For >>> example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB I'd >>> do a 16GB file. >>> >>> If in iometer you are accessing a PHYSICALDISK, then you are avoiding >>> window's file cache. >>> >>> I just pulled up the code and (keep in mind I'm not much of a windows >>> guy) >>> something looks odd in your GetSecs routine. The cast to double is going >>> to >>> lose resolution, I think I would store the start/end times as >>> LARGE_INTEGER. And you probably only have to call the frequency routine >>> once >>> >>> Also windows used to have issues in the HAL where if a thread got moved >>> to >>> a different processor you'd get odd results. There is a Windows API call >>> for setting affinity, similar to the linux sched_set_affinity. >>> >>> This doesn't really matter for what we are talking about, it is just a >>> pet >>> peeve of mine, your "delete c;" should be "delete [] c;" (are you >>> intending >>> tp be timing your allocator calls as well? you may be if you are >>> simulating >>> system performance, but typically for disk performance you'd try to >>> preallocate as much as possible so your only timing the transfers) >>> >>> >>> If it were me I would start with something simplier, (say single threaded >>> sequential read) and see if your program gets the correct values then. >>> You >>> could also fire up windows performance monitor and try to correlate to >>> its >>> counts as well (PHYSICALDISK transfers/sec). >>> >>> Good Luck, >>> >>> Joe >>> >>> >>> >>> Quoting Nai yan zhao <zha...@gm...>: >>> >>> Hello Fabian and Joe, >>> >>>> Thank you so much for your reply. >>>> >>>> Actually, what I am trying to do, is to split a file into 32 parts, >>>> and each part will be assigned to a thread to read. Each thread each >>>> time >>>> to open file, read 512B, and close file. I was trying to avoid 2 read >>>> I/Os >>>> hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although >>>> most >>>> read I/Os are ordered but not >>>> contiguous<http://en.**wikiped**ia.org/wiki/Contiguity#**** >>>> Computer_science<http://wikipedia.org/wiki/Contiguity#**Computer_science> >>>> <http://en.**wikipedia.org/wiki/Contiguity#**Computer_science<http://en.wikipedia.org/wiki/Contiguity#Computer_science> >>>> > >>>> >>>> > >>>> . >>>> >>>> >>>> By your suggestion, I tried 512B sequential I/O with settings below, >>>> >>>> Max disk size - 8388608 >>>> # of Outstanding I/O - 32 (for 64, it's also around 82K) >>>> Transfer request size - 512B, >>>> 100% sequential >>>> Reply size - no reply >>>> Align I/Os on - Sector boundaries >>>> >>>> The result is around 82K, still much slower than my program. >>>> >>>> If my program has any defect in calculating IOPS? Or if I have any >>>> misunderstanding of caching of SSD or file system, which causes my >>>> program >>>> fetches data most from RAM of SSD? Or what parameters I should set in >>>> I/O >>>> meter to simulate my program I/O? >>>> >>>> Thank you again in advance for your time to help investigate it!! >>>> >>>> Nai Yan. >>>> >>>> 2012/2/11 Fabian Tillier <fa...@ti...> >>>> >>>> If I read the test correctly, all threads start at offset 0, and then >>>> >>>>> perform 512b reads with a 1024b stride between reads. As Joe said, >>>>> this is pretty much sequential reading, and all threads are reading >>>>> the same data, so most are likely to be satisifed from cache, either >>>>> in the OS or on the SSD itself. They'll do 320000/16=20000 IO >>>>> operations total each, so end up reading 20MB of the file. It's quite >>>>> likely that the whole 20MB that you are reading will sit happilly in >>>>> the file cache. >>>>> >>>>> Create an access pattern that mimics your app (512b sequential with >>>>> 1024b stride), create 32 workers, and see if the results are similar. >>>>> Best would be if you created a test file of 20MB, too. You can then >>>>> see how things compare if you go with async I/O and a single thread. >>>>> >>>>> Cheers, >>>>> -Fab >>>>> >>>>> On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: >>>>> > Forgive me if I missed it, but I don't see any randomization in your >>>>> > file reads. >>>>> > >>>>> > It looks like you just skip ahead so thread 0 reads the first >>>>> > 512bytes, thread 1 the next 512b. So any storage will be prefetching >>>>> > very effectively. >>>>> > >>>>> > Tell Iometer to do sequential instead of random and see how much >>>>> > closer the numbers are. Or better yet, make your program randomize >>>>> > its reads over the entire disk. >>>>> > >>>>> > Joe >>>>> > >>>>> > >>>>> > Quoting Nai yan zhao <zha...@gm...>: >>>>> > >>>>> >> Greetings, >>>>> >> Could anybody help me a little out of my difficulty? >>>>> >> >>>>> >> I have a SSD and I am trying to use it to simulate my program >>>>> I/O >>>>> >> performance, however, IOPS calculated from my program is much much >>>>> faster >>>>> >> than IOMeter. >>>>> >> >>>>> >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random >>>>> read >>>>> >> IOPS is around 94k (queue depth is 32). >>>>> >> However my program (32 windows threads) can reach around 500k >>>>> 512B >>>>> >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't >>>>> find >>>>> >> any error in data fetching. It's because my data fetching in order? >>>>> >> >>>>> >> I paste my code belwo (it mainly fetch 512B from file and >>>>> release >>>>> it; >>>>> >> I did use 4bytes (an int) to validate program logic and didn't find >>>>> >> problem), can anybody help me figure out where I am wrong? >>>>> >> >>>>> >> Thanks so much in advance!! >>>>> >> >>>>> >> Nai Yan. >>>>> >> >>>>> >> #include <stdio.h> >>>>> >> #include <Windows.h> >>>>> >> /* >>>>> >> ** Purpose: Verify file random read IOPS in comparison with IOMeter >>>>> >> ** Author: Nai Yan >>>>> >> ** Date: Feb. 9th, 2012 >>>>> >> **/ >>>>> >> //Global variables >>>>> >> long completeIOs = 0; >>>>> >> long completeBytes = 0; >>>>> >> int threadCount = 32; >>>>> >> unsigned long long length = 1073741824; //4G test >>>>> file >>>>> >> int interval = 1024; >>>>> >> int resultArrayLen = 320000; >>>>> >> int *result = new int[resultArrayLen]; >>>>> >> //Method declarison >>>>> >> double GetSecs(void); //Calculate out duration >>>>> >> int InitPool(long long,char*,int); //Initialize test data >>>>> for >>>>> >> testing, if successful, return 1; otherwise, return a non 1 value. >>>>> >> int * FileRead(char * path); >>>>> >> unsigned int DataVerification(int*, int sampleItem); >>>>> >> //Verify data fetched from pool >>>>> >> int main() >>>>> >> { >>>>> >> int sampleItem = 0x1; >>>>> >> char * fPath = "G:\\workspace\\4G.bin"; >>>>> >> unsigned int invalidIO = 0; >>>>> >> if (InitPool(length,fPath,****sampleItem)!= 1) >>>>> >>>>> >> printf("File write err... \n"); >>>>> >> //start do random I/Os from initialized file >>>>> >> double start = GetSecs(); >>>>> >> int * fetchResult = FileRead(fPath); >>>>> >> double end = GetSecs(); >>>>> >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - >>>>> start)); >>>>> >> //start data validation, for 4 bytes fetch only >>>>> >> // invalidIO = DataVerification(fetchResult,****sampleItem); >>>>> >>>>> >> // if (invalidIO !=0) >>>>> >> // { >>>>> >> // printf("Total invalid data fetch IOs are %d", invalidIO); >>>>> >> // } >>>>> >> return 0; >>>>> >> } >>>>> >> >>>>> >> >>>>> >> int InitPool(long long length, char* path, int sample) >>>>> >> { >>>>> >> printf("Start initializing test data ... \n"); >>>>> >> FILE * fp = fopen(path,"wb"); >>>>> >> if (fp == NULL) >>>>> >> { >>>>> >> printf("file open err... \n"); >>>>> >> exit (-1); >>>>> >> } >>>>> >> else //initialize file for testing >>>>> >> { >>>>> >> fseek(fp,0L,SEEK_SET); >>>>> >> for (int i=0; i<length; i++) >>>>> >> { >>>>> >> fwrite(&sample,sizeof(int),1,****fp); >>>>> >>>>> >> } >>>>> >> fclose(fp); >>>>> >> fp = NULL; >>>>> >> printf("Data initialization is complete...\n"); >>>>> >> return 1; >>>>> >> } >>>>> >> } >>>>> >> double GetSecs(void) >>>>> >> { >>>>> >> LARGE_INTEGER frequency; >>>>> >> LARGE_INTEGER start; >>>>> >> if(! QueryPerformanceFrequency(&****frequency)) >>>>> >> printf("****QueryPerformanceFrequency Failed\n"); >>>>> >> if(! QueryPerformanceCounter(&****start)) >>>>> >> printf("****QueryPerformanceCounter Failed\n"); >>>>> >> return ((double)start.QuadPart/(****double)frequency.QuadPart); >>>>> >>>>> >> } >>>>> >> class input >>>>> >> { >>>>> >> public: >>>>> >> char *path; >>>>> >> int starting; >>>>> >> input (int st, char * filePath):starting(st),path(****filePath){} >>>>> >>>>> >> }; >>>>> >> //Workers >>>>> >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >>>>> >> { >>>>> >> input * in = (input*) lpThreadParameter; >>>>> >> char* path = in->path; >>>>> >> FILE * fp = fopen(path,"rb"); >>>>> >> int sPos = in->starting; >>>>> >> // int * result = in->r; >>>>> >> if(fp != NULL) >>>>> >> { >>>>> >> fpos_t pos; >>>>> >> for (int i=0; i<resultArrayLen/threadCount;****i++) >>>>> >>>>> >> { >>>>> >> pos = i * interval; >>>>> >> fsetpos(fp,&pos); >>>>> >> //For 512 bytes fetch each time >>>>> >> unsigned char *c =new unsigned char [512]; >>>>> >> if (fread(c,512,1,fp) ==1) >>>>> >> { >>>>> >> InterlockedIncrement(&****completeIOs); >>>>> >>>>> >> delete c; >>>>> >> } >>>>> >> //For 4 bytes fetch each time >>>>> >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >>>>> >> { >>>>> >> InterlockedIncrement(&****completeIOs); >>>>> >>>>> >> }*/ >>>>> >> else >>>>> >> { >>>>> >> printf("file read err...\n"); >>>>> >> exit(-1); >>>>> >> } >>>>> >> } >>>>> >> fclose(fp); >>>>> >> fp = NULL; >>>>> >> } >>>>> >> else >>>>> >> { >>>>> >> printf("File open err... \n"); >>>>> >> exit(-1); >>>>> >> } >>>>> >> } >>>>> >> int * FileRead(char * p) >>>>> >> { >>>>> >> printf("Starting reading file ... \n"); >>>>> >> HANDLE mWorkThread[256]; //max 256 threads >>>>> >> completeIOs = 0; >>>>> >> int slice = int (resultArrayLen/threadCount); >>>>> >> for(int i = 0; i < threadCount; i++) >>>>> >> { >>>>> >> mWorkThread[i] = CreateThread( >>>>> >> NULL, >>>>> >> 0, >>>>> >> FileReadThreadEntry, >>>>> >> (LPVOID)(new input(i*slice,p)), >>>>> >> 0, >>>>> >> NULL); >>>>> >> } >>>>> >> WaitForMultipleObjects(****threadCount, mWorkThread, TRUE, >>>>> INFINITE); >>>>> >>>>> >> printf("File read complete... \n"); >>>>> >> return result; >>>>> >> } >>>>> >> unsigned int DataVerification(int* result, int sampleItem) >>>>> >> { >>>>> >> unsigned int invalid = 0; >>>>> >> for (int i=0; i< resultArrayLen/interval;i++) >>>>> >> { >>>>> >> if (result[i]!=sampleItem) >>>>> >> { >>>>> >> invalid ++; >>>>> >> continue; >>>>> >> } >>>>> >> } >>>>> >> return invalid; >>>>> >> } >>>>> >> >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------****----------------------------**--** >>>>> >>>>> ------------------ >>>>> > Virtualization & Cloud Management Using Capacity Planning >>>>> > Cloud computing makes use of virtualization - but cloud computing >>>>> > also focuses on allowing computing to be delivered as a service. >>>>> > http://www.accelacomm.com/jaw/****sfnl/114/51521223/<http://www.accelacomm.com/jaw/**sfnl/114/51521223/> >>>>> <http://**www.accelacomm.com/jaw/sfnl/**114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>>> > >>>>> > ______________________________****_________________ >>>>> > Iometer-devel mailing list >>>>> > Iometer-devel@lists.**sourcefo**rge.net <http://sourceforge.net>< >>>>> Iometer-devel@lists.**sourceforge.net<Iom...@li...> >>>>> > >>>>> > https://lists.sourceforge.net/****lists/listinfo/iometer-devel<https://lists.sourceforge.net/**lists/listinfo/iometer-devel> >>>>> **<https://lists.sourceforge.**net/lists/listinfo/iometer-**devel<https://lists.sourceforge.net/lists/listinfo/iometer-devel> >>>>> > >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > > |
From: <jo...@ei...> - 2012-02-12 20:34:58
|
Manufacturer's quoted sequential MB/s won't be with 512byte reads. In Iometer, try 256KB sequential reads with about 8 outstanding I/Os. That should come closer to the maximum throughput(I doubt you'll be able to get your laptop to actually get close to 520MB/s though). I'll see if I can find a windows system to try to compile/run your program, but I can't make any promises. Joe Quoting Nai yan zhao <zha...@gm...>: > Hello Joe, > Thank you again for your time! > It's wired that from IOMeter, the throughput for sequential IOPS > (512B, queue depth is 64) is ONLY 42MB/s with around 82K IOPS. However, > from that SSD official website, this SSD sequential throughput should be > around 510MB/s ( > http://www.plextoramericas.com/index.php/ssd/px-m3-series?start=1, my SSD > is 128G). If there's any parameter I didn't set correctly in IOMeter? > > As you suggested, I try to create a 12GB sample file (my test bed > memory is 6GB and without RAID) and use 1 thread to do IO. The result > is 33666; However, with I/O meter, it's 11572 (throughput this time is ONLY > 5.93MB/s); IOPS still 3 times!! > > I attach my IOMeter settings, if there's anything wrong? Also, I > attach my modified code. Joe, could you help again to see where's the > problem? > > Thank you so much!! > > Nai Yan. > > 2012/2/13 <jo...@ei...> > >> 82K sounds reasonable for iops on an SSD. You should check the specs of >> your drive to see what you should expect. >> >> You need to remember that you are doing file i/o so you have several >> layers of cache involved. think of it was file cache -> block cache -> >> controller cache -> drive cache (you aren't testing a HW RAID, so you >> probably don't have cache in you controller) My personal run of thumb for >> random I/O is to have my file size be about 3x my combined cache size. For >> example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB I'd >> do a 16GB file. >> >> If in iometer you are accessing a PHYSICALDISK, then you are avoiding >> window's file cache. >> >> I just pulled up the code and (keep in mind I'm not much of a windows guy) >> something looks odd in your GetSecs routine. The cast to double is going to >> lose resolution, I think I would store the start/end times as >> LARGE_INTEGER. And you probably only have to call the frequency routine once >> >> Also windows used to have issues in the HAL where if a thread got moved to >> a different processor you'd get odd results. There is a Windows API call >> for setting affinity, similar to the linux sched_set_affinity. >> >> This doesn't really matter for what we are talking about, it is just a pet >> peeve of mine, your "delete c;" should be "delete [] c;" (are you intending >> tp be timing your allocator calls as well? you may be if you are simulating >> system performance, but typically for disk performance you'd try to >> preallocate as much as possible so your only timing the transfers) >> >> >> If it were me I would start with something simplier, (say single threaded >> sequential read) and see if your program gets the correct values then. You >> could also fire up windows performance monitor and try to correlate to its >> counts as well (PHYSICALDISK transfers/sec). >> >> Good Luck, >> >> Joe >> >> >> >> Quoting Nai yan zhao <zha...@gm...>: >> >> Hello Fabian and Joe, >>> Thank you so much for your reply. >>> >>> Actually, what I am trying to do, is to split a file into 32 parts, >>> and each part will be assigned to a thread to read. Each thread each time >>> to open file, read 512B, and close file. I was trying to avoid 2 read >>> I/Os >>> hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although most >>> read I/Os are ordered but not >>> contiguous<http://en.**wikipedia.org/wiki/Contiguity#**Computer_science<http://en.wikipedia.org/wiki/Contiguity#Computer_science> >>> > >>> . >>> >>> >>> By your suggestion, I tried 512B sequential I/O with settings below, >>> >>> Max disk size - 8388608 >>> # of Outstanding I/O - 32 (for 64, it's also around 82K) >>> Transfer request size - 512B, >>> 100% sequential >>> Reply size - no reply >>> Align I/Os on - Sector boundaries >>> >>> The result is around 82K, still much slower than my program. >>> >>> If my program has any defect in calculating IOPS? Or if I have any >>> misunderstanding of caching of SSD or file system, which causes my program >>> fetches data most from RAM of SSD? Or what parameters I should set in I/O >>> meter to simulate my program I/O? >>> >>> Thank you again in advance for your time to help investigate it!! >>> >>> Nai Yan. >>> >>> 2012/2/11 Fabian Tillier <fa...@ti...> >>> >>> If I read the test correctly, all threads start at offset 0, and then >>>> perform 512b reads with a 1024b stride between reads. As Joe said, >>>> this is pretty much sequential reading, and all threads are reading >>>> the same data, so most are likely to be satisifed from cache, either >>>> in the OS or on the SSD itself. They'll do 320000/16=20000 IO >>>> operations total each, so end up reading 20MB of the file. It's quite >>>> likely that the whole 20MB that you are reading will sit happilly in >>>> the file cache. >>>> >>>> Create an access pattern that mimics your app (512b sequential with >>>> 1024b stride), create 32 workers, and see if the results are similar. >>>> Best would be if you created a test file of 20MB, too. You can then >>>> see how things compare if you go with async I/O and a single thread. >>>> >>>> Cheers, >>>> -Fab >>>> >>>> On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: >>>> > Forgive me if I missed it, but I don't see any randomization in your >>>> > file reads. >>>> > >>>> > It looks like you just skip ahead so thread 0 reads the first >>>> > 512bytes, thread 1 the next 512b. So any storage will be prefetching >>>> > very effectively. >>>> > >>>> > Tell Iometer to do sequential instead of random and see how much >>>> > closer the numbers are. Or better yet, make your program randomize >>>> > its reads over the entire disk. >>>> > >>>> > Joe >>>> > >>>> > >>>> > Quoting Nai yan zhao <zha...@gm...>: >>>> > >>>> >> Greetings, >>>> >> Could anybody help me a little out of my difficulty? >>>> >> >>>> >> I have a SSD and I am trying to use it to simulate my program I/O >>>> >> performance, however, IOPS calculated from my program is much much >>>> faster >>>> >> than IOMeter. >>>> >> >>>> >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read >>>> >> IOPS is around 94k (queue depth is 32). >>>> >> However my program (32 windows threads) can reach around 500k >>>> 512B >>>> >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't >>>> find >>>> >> any error in data fetching. It's because my data fetching in order? >>>> >> >>>> >> I paste my code belwo (it mainly fetch 512B from file and release >>>> it; >>>> >> I did use 4bytes (an int) to validate program logic and didn't find >>>> >> problem), can anybody help me figure out where I am wrong? >>>> >> >>>> >> Thanks so much in advance!! >>>> >> >>>> >> Nai Yan. >>>> >> >>>> >> #include <stdio.h> >>>> >> #include <Windows.h> >>>> >> /* >>>> >> ** Purpose: Verify file random read IOPS in comparison with IOMeter >>>> >> ** Author: Nai Yan >>>> >> ** Date: Feb. 9th, 2012 >>>> >> **/ >>>> >> //Global variables >>>> >> long completeIOs = 0; >>>> >> long completeBytes = 0; >>>> >> int threadCount = 32; >>>> >> unsigned long long length = 1073741824; //4G test >>>> file >>>> >> int interval = 1024; >>>> >> int resultArrayLen = 320000; >>>> >> int *result = new int[resultArrayLen]; >>>> >> //Method declarison >>>> >> double GetSecs(void); //Calculate out duration >>>> >> int InitPool(long long,char*,int); //Initialize test data for >>>> >> testing, if successful, return 1; otherwise, return a non 1 value. >>>> >> int * FileRead(char * path); >>>> >> unsigned int DataVerification(int*, int sampleItem); >>>> >> //Verify data fetched from pool >>>> >> int main() >>>> >> { >>>> >> int sampleItem = 0x1; >>>> >> char * fPath = "G:\\workspace\\4G.bin"; >>>> >> unsigned int invalidIO = 0; >>>> >> if (InitPool(length,fPath,**sampleItem)!= 1) >>>> >> printf("File write err... \n"); >>>> >> //start do random I/Os from initialized file >>>> >> double start = GetSecs(); >>>> >> int * fetchResult = FileRead(fPath); >>>> >> double end = GetSecs(); >>>> >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - >>>> start)); >>>> >> //start data validation, for 4 bytes fetch only >>>> >> // invalidIO = DataVerification(fetchResult,**sampleItem); >>>> >> // if (invalidIO !=0) >>>> >> // { >>>> >> // printf("Total invalid data fetch IOs are %d", invalidIO); >>>> >> // } >>>> >> return 0; >>>> >> } >>>> >> >>>> >> >>>> >> int InitPool(long long length, char* path, int sample) >>>> >> { >>>> >> printf("Start initializing test data ... \n"); >>>> >> FILE * fp = fopen(path,"wb"); >>>> >> if (fp == NULL) >>>> >> { >>>> >> printf("file open err... \n"); >>>> >> exit (-1); >>>> >> } >>>> >> else //initialize file for testing >>>> >> { >>>> >> fseek(fp,0L,SEEK_SET); >>>> >> for (int i=0; i<length; i++) >>>> >> { >>>> >> fwrite(&sample,sizeof(int),1,**fp); >>>> >> } >>>> >> fclose(fp); >>>> >> fp = NULL; >>>> >> printf("Data initialization is complete...\n"); >>>> >> return 1; >>>> >> } >>>> >> } >>>> >> double GetSecs(void) >>>> >> { >>>> >> LARGE_INTEGER frequency; >>>> >> LARGE_INTEGER start; >>>> >> if(! QueryPerformanceFrequency(&**frequency)) >>>> >> printf("**QueryPerformanceFrequency Failed\n"); >>>> >> if(! QueryPerformanceCounter(&**start)) >>>> >> printf("**QueryPerformanceCounter Failed\n"); >>>> >> return ((double)start.QuadPart/(**double)frequency.QuadPart); >>>> >> } >>>> >> class input >>>> >> { >>>> >> public: >>>> >> char *path; >>>> >> int starting; >>>> >> input (int st, char * filePath):starting(st),path(**filePath){} >>>> >> }; >>>> >> //Workers >>>> >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >>>> >> { >>>> >> input * in = (input*) lpThreadParameter; >>>> >> char* path = in->path; >>>> >> FILE * fp = fopen(path,"rb"); >>>> >> int sPos = in->starting; >>>> >> // int * result = in->r; >>>> >> if(fp != NULL) >>>> >> { >>>> >> fpos_t pos; >>>> >> for (int i=0; i<resultArrayLen/threadCount;**i++) >>>> >> { >>>> >> pos = i * interval; >>>> >> fsetpos(fp,&pos); >>>> >> //For 512 bytes fetch each time >>>> >> unsigned char *c =new unsigned char [512]; >>>> >> if (fread(c,512,1,fp) ==1) >>>> >> { >>>> >> InterlockedIncrement(&**completeIOs); >>>> >> delete c; >>>> >> } >>>> >> //For 4 bytes fetch each time >>>> >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >>>> >> { >>>> >> InterlockedIncrement(&**completeIOs); >>>> >> }*/ >>>> >> else >>>> >> { >>>> >> printf("file read err...\n"); >>>> >> exit(-1); >>>> >> } >>>> >> } >>>> >> fclose(fp); >>>> >> fp = NULL; >>>> >> } >>>> >> else >>>> >> { >>>> >> printf("File open err... \n"); >>>> >> exit(-1); >>>> >> } >>>> >> } >>>> >> int * FileRead(char * p) >>>> >> { >>>> >> printf("Starting reading file ... \n"); >>>> >> HANDLE mWorkThread[256]; //max 256 threads >>>> >> completeIOs = 0; >>>> >> int slice = int (resultArrayLen/threadCount); >>>> >> for(int i = 0; i < threadCount; i++) >>>> >> { >>>> >> mWorkThread[i] = CreateThread( >>>> >> NULL, >>>> >> 0, >>>> >> FileReadThreadEntry, >>>> >> (LPVOID)(new input(i*slice,p)), >>>> >> 0, >>>> >> NULL); >>>> >> } >>>> >> WaitForMultipleObjects(**threadCount, mWorkThread, TRUE, INFINITE); >>>> >> printf("File read complete... \n"); >>>> >> return result; >>>> >> } >>>> >> unsigned int DataVerification(int* result, int sampleItem) >>>> >> { >>>> >> unsigned int invalid = 0; >>>> >> for (int i=0; i< resultArrayLen/interval;i++) >>>> >> { >>>> >> if (result[i]!=sampleItem) >>>> >> { >>>> >> invalid ++; >>>> >> continue; >>>> >> } >>>> >> } >>>> >> return invalid; >>>> >> } >>>> >> >>>> > >>>> > >>>> > >>>> > >>>> > >>>> ------------------------------**------------------------------** >>>> ------------------ >>>> > Virtualization & Cloud Management Using Capacity Planning >>>> > Cloud computing makes use of virtualization - but cloud computing >>>> > also focuses on allowing computing to be delivered as a service. >>>> > >>>> http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>> > ______________________________**_________________ >>>> > Iometer-devel mailing list >>>> > >>>> Iometer-devel@lists.**sourceforge.net<Iom...@li...> >>>> > >>>> https://lists.sourceforge.net/**lists/listinfo/iometer-devel<https://lists.sourceforge.net/lists/listinfo/iometer-devel> >>>> >>>> >>> >> >> >> > |
From: Nai y. z. <zha...@gm...> - 2012-02-12 17:56:35
|
Hello Joe, Thank you again for your time! It's wired that from IOMeter, the throughput for sequential IOPS (512B, queue depth is 64) is ONLY 42MB/s with around 82K IOPS. However, from that SSD official website, this SSD sequential throughput should be around 510MB/s ( http://www.plextoramericas.com/index.php/ssd/px-m3-series?start=1, my SSD is 128G). If there's any parameter I didn't set correctly in IOMeter? As you suggested, I try to create a 12GB sample file (my test bed memory is 6GB and without RAID) and use 1 thread to do IO. The result is 33666; However, with I/O meter, it's 11572 (throughput this time is ONLY 5.93MB/s); IOPS still 3 times!! I attach my IOMeter settings, if there's anything wrong? Also, I attach my modified code. Joe, could you help again to see where's the problem? Thank you so much!! Nai Yan. 2012/2/13 <jo...@ei...> > 82K sounds reasonable for iops on an SSD. You should check the specs of > your drive to see what you should expect. > > You need to remember that you are doing file i/o so you have several > layers of cache involved. think of it was file cache -> block cache -> > controller cache -> drive cache (you aren't testing a HW RAID, so you > probably don't have cache in you controller) My personal run of thumb for > random I/O is to have my file size be about 3x my combined cache size. For > example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB I'd > do a 16GB file. > > If in iometer you are accessing a PHYSICALDISK, then you are avoiding > window's file cache. > > I just pulled up the code and (keep in mind I'm not much of a windows guy) > something looks odd in your GetSecs routine. The cast to double is going to > lose resolution, I think I would store the start/end times as > LARGE_INTEGER. And you probably only have to call the frequency routine once > > Also windows used to have issues in the HAL where if a thread got moved to > a different processor you'd get odd results. There is a Windows API call > for setting affinity, similar to the linux sched_set_affinity. > > This doesn't really matter for what we are talking about, it is just a pet > peeve of mine, your "delete c;" should be "delete [] c;" (are you intending > tp be timing your allocator calls as well? you may be if you are simulating > system performance, but typically for disk performance you'd try to > preallocate as much as possible so your only timing the transfers) > > > If it were me I would start with something simplier, (say single threaded > sequential read) and see if your program gets the correct values then. You > could also fire up windows performance monitor and try to correlate to its > counts as well (PHYSICALDISK transfers/sec). > > Good Luck, > > Joe > > > > Quoting Nai yan zhao <zha...@gm...>: > > Hello Fabian and Joe, >> Thank you so much for your reply. >> >> Actually, what I am trying to do, is to split a file into 32 parts, >> and each part will be assigned to a thread to read. Each thread each time >> to open file, read 512B, and close file. I was trying to avoid 2 read >> I/Os >> hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although most >> read I/Os are ordered but not >> contiguous<http://en.**wikipedia.org/wiki/Contiguity#**Computer_science<http://en.wikipedia.org/wiki/Contiguity#Computer_science> >> > >> . >> >> >> By your suggestion, I tried 512B sequential I/O with settings below, >> >> Max disk size - 8388608 >> # of Outstanding I/O - 32 (for 64, it's also around 82K) >> Transfer request size - 512B, >> 100% sequential >> Reply size - no reply >> Align I/Os on - Sector boundaries >> >> The result is around 82K, still much slower than my program. >> >> If my program has any defect in calculating IOPS? Or if I have any >> misunderstanding of caching of SSD or file system, which causes my program >> fetches data most from RAM of SSD? Or what parameters I should set in I/O >> meter to simulate my program I/O? >> >> Thank you again in advance for your time to help investigate it!! >> >> Nai Yan. >> >> 2012/2/11 Fabian Tillier <fa...@ti...> >> >> If I read the test correctly, all threads start at offset 0, and then >>> perform 512b reads with a 1024b stride between reads. As Joe said, >>> this is pretty much sequential reading, and all threads are reading >>> the same data, so most are likely to be satisifed from cache, either >>> in the OS or on the SSD itself. They'll do 320000/16=20000 IO >>> operations total each, so end up reading 20MB of the file. It's quite >>> likely that the whole 20MB that you are reading will sit happilly in >>> the file cache. >>> >>> Create an access pattern that mimics your app (512b sequential with >>> 1024b stride), create 32 workers, and see if the results are similar. >>> Best would be if you created a test file of 20MB, too. You can then >>> see how things compare if you go with async I/O and a single thread. >>> >>> Cheers, >>> -Fab >>> >>> On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: >>> > Forgive me if I missed it, but I don't see any randomization in your >>> > file reads. >>> > >>> > It looks like you just skip ahead so thread 0 reads the first >>> > 512bytes, thread 1 the next 512b. So any storage will be prefetching >>> > very effectively. >>> > >>> > Tell Iometer to do sequential instead of random and see how much >>> > closer the numbers are. Or better yet, make your program randomize >>> > its reads over the entire disk. >>> > >>> > Joe >>> > >>> > >>> > Quoting Nai yan zhao <zha...@gm...>: >>> > >>> >> Greetings, >>> >> Could anybody help me a little out of my difficulty? >>> >> >>> >> I have a SSD and I am trying to use it to simulate my program I/O >>> >> performance, however, IOPS calculated from my program is much much >>> faster >>> >> than IOMeter. >>> >> >>> >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read >>> >> IOPS is around 94k (queue depth is 32). >>> >> However my program (32 windows threads) can reach around 500k >>> 512B >>> >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't >>> find >>> >> any error in data fetching. It's because my data fetching in order? >>> >> >>> >> I paste my code belwo (it mainly fetch 512B from file and release >>> it; >>> >> I did use 4bytes (an int) to validate program logic and didn't find >>> >> problem), can anybody help me figure out where I am wrong? >>> >> >>> >> Thanks so much in advance!! >>> >> >>> >> Nai Yan. >>> >> >>> >> #include <stdio.h> >>> >> #include <Windows.h> >>> >> /* >>> >> ** Purpose: Verify file random read IOPS in comparison with IOMeter >>> >> ** Author: Nai Yan >>> >> ** Date: Feb. 9th, 2012 >>> >> **/ >>> >> //Global variables >>> >> long completeIOs = 0; >>> >> long completeBytes = 0; >>> >> int threadCount = 32; >>> >> unsigned long long length = 1073741824; //4G test >>> file >>> >> int interval = 1024; >>> >> int resultArrayLen = 320000; >>> >> int *result = new int[resultArrayLen]; >>> >> //Method declarison >>> >> double GetSecs(void); //Calculate out duration >>> >> int InitPool(long long,char*,int); //Initialize test data for >>> >> testing, if successful, return 1; otherwise, return a non 1 value. >>> >> int * FileRead(char * path); >>> >> unsigned int DataVerification(int*, int sampleItem); >>> >> //Verify data fetched from pool >>> >> int main() >>> >> { >>> >> int sampleItem = 0x1; >>> >> char * fPath = "G:\\workspace\\4G.bin"; >>> >> unsigned int invalidIO = 0; >>> >> if (InitPool(length,fPath,**sampleItem)!= 1) >>> >> printf("File write err... \n"); >>> >> //start do random I/Os from initialized file >>> >> double start = GetSecs(); >>> >> int * fetchResult = FileRead(fPath); >>> >> double end = GetSecs(); >>> >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - >>> start)); >>> >> //start data validation, for 4 bytes fetch only >>> >> // invalidIO = DataVerification(fetchResult,**sampleItem); >>> >> // if (invalidIO !=0) >>> >> // { >>> >> // printf("Total invalid data fetch IOs are %d", invalidIO); >>> >> // } >>> >> return 0; >>> >> } >>> >> >>> >> >>> >> int InitPool(long long length, char* path, int sample) >>> >> { >>> >> printf("Start initializing test data ... \n"); >>> >> FILE * fp = fopen(path,"wb"); >>> >> if (fp == NULL) >>> >> { >>> >> printf("file open err... \n"); >>> >> exit (-1); >>> >> } >>> >> else //initialize file for testing >>> >> { >>> >> fseek(fp,0L,SEEK_SET); >>> >> for (int i=0; i<length; i++) >>> >> { >>> >> fwrite(&sample,sizeof(int),1,**fp); >>> >> } >>> >> fclose(fp); >>> >> fp = NULL; >>> >> printf("Data initialization is complete...\n"); >>> >> return 1; >>> >> } >>> >> } >>> >> double GetSecs(void) >>> >> { >>> >> LARGE_INTEGER frequency; >>> >> LARGE_INTEGER start; >>> >> if(! QueryPerformanceFrequency(&**frequency)) >>> >> printf("**QueryPerformanceFrequency Failed\n"); >>> >> if(! QueryPerformanceCounter(&**start)) >>> >> printf("**QueryPerformanceCounter Failed\n"); >>> >> return ((double)start.QuadPart/(**double)frequency.QuadPart); >>> >> } >>> >> class input >>> >> { >>> >> public: >>> >> char *path; >>> >> int starting; >>> >> input (int st, char * filePath):starting(st),path(**filePath){} >>> >> }; >>> >> //Workers >>> >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >>> >> { >>> >> input * in = (input*) lpThreadParameter; >>> >> char* path = in->path; >>> >> FILE * fp = fopen(path,"rb"); >>> >> int sPos = in->starting; >>> >> // int * result = in->r; >>> >> if(fp != NULL) >>> >> { >>> >> fpos_t pos; >>> >> for (int i=0; i<resultArrayLen/threadCount;**i++) >>> >> { >>> >> pos = i * interval; >>> >> fsetpos(fp,&pos); >>> >> //For 512 bytes fetch each time >>> >> unsigned char *c =new unsigned char [512]; >>> >> if (fread(c,512,1,fp) ==1) >>> >> { >>> >> InterlockedIncrement(&**completeIOs); >>> >> delete c; >>> >> } >>> >> //For 4 bytes fetch each time >>> >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >>> >> { >>> >> InterlockedIncrement(&**completeIOs); >>> >> }*/ >>> >> else >>> >> { >>> >> printf("file read err...\n"); >>> >> exit(-1); >>> >> } >>> >> } >>> >> fclose(fp); >>> >> fp = NULL; >>> >> } >>> >> else >>> >> { >>> >> printf("File open err... \n"); >>> >> exit(-1); >>> >> } >>> >> } >>> >> int * FileRead(char * p) >>> >> { >>> >> printf("Starting reading file ... \n"); >>> >> HANDLE mWorkThread[256]; //max 256 threads >>> >> completeIOs = 0; >>> >> int slice = int (resultArrayLen/threadCount); >>> >> for(int i = 0; i < threadCount; i++) >>> >> { >>> >> mWorkThread[i] = CreateThread( >>> >> NULL, >>> >> 0, >>> >> FileReadThreadEntry, >>> >> (LPVOID)(new input(i*slice,p)), >>> >> 0, >>> >> NULL); >>> >> } >>> >> WaitForMultipleObjects(**threadCount, mWorkThread, TRUE, INFINITE); >>> >> printf("File read complete... \n"); >>> >> return result; >>> >> } >>> >> unsigned int DataVerification(int* result, int sampleItem) >>> >> { >>> >> unsigned int invalid = 0; >>> >> for (int i=0; i< resultArrayLen/interval;i++) >>> >> { >>> >> if (result[i]!=sampleItem) >>> >> { >>> >> invalid ++; >>> >> continue; >>> >> } >>> >> } >>> >> return invalid; >>> >> } >>> >> >>> > >>> > >>> > >>> > >>> > >>> ------------------------------**------------------------------** >>> ------------------ >>> > Virtualization & Cloud Management Using Capacity Planning >>> > Cloud computing makes use of virtualization - but cloud computing >>> > also focuses on allowing computing to be delivered as a service. >>> > http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>> > ______________________________**_________________ >>> > Iometer-devel mailing list >>> > Iometer-devel@lists.**sourceforge.net<Iom...@li...> >>> > https://lists.sourceforge.net/**lists/listinfo/iometer-devel<https://lists.sourceforge.net/lists/listinfo/iometer-devel> >>> >>> >> > > > |
From: <jo...@ei...> - 2012-02-12 16:34:46
|
82K sounds reasonable for iops on an SSD. You should check the specs of your drive to see what you should expect. You need to remember that you are doing file i/o so you have several layers of cache involved. think of it was file cache -> block cache -> controller cache -> drive cache (you aren't testing a HW RAID, so you probably don't have cache in you controller) My personal run of thumb for random I/O is to have my file size be about 3x my combined cache size. For example, 4G ram in system, 512MB RAID cache, (8 drives*32MB) = 4.75GB I'd do a 16GB file. If in iometer you are accessing a PHYSICALDISK, then you are avoiding window's file cache. I just pulled up the code and (keep in mind I'm not much of a windows guy) something looks odd in your GetSecs routine. The cast to double is going to lose resolution, I think I would store the start/end times as LARGE_INTEGER. And you probably only have to call the frequency routine once Also windows used to have issues in the HAL where if a thread got moved to a different processor you'd get odd results. There is a Windows API call for setting affinity, similar to the linux sched_set_affinity. This doesn't really matter for what we are talking about, it is just a pet peeve of mine, your "delete c;" should be "delete [] c;" (are you intending tp be timing your allocator calls as well? you may be if you are simulating system performance, but typically for disk performance you'd try to preallocate as much as possible so your only timing the transfers) If it were me I would start with something simplier, (say single threaded sequential read) and see if your program gets the correct values then. You could also fire up windows performance monitor and try to correlate to its counts as well (PHYSICALDISK transfers/sec). Good Luck, Joe Quoting Nai yan zhao <zha...@gm...>: > Hello Fabian and Joe, > Thank you so much for your reply. > > Actually, what I am trying to do, is to split a file into 32 parts, > and each part will be assigned to a thread to read. Each thread each time > to open file, read 512B, and close file. I was trying to avoid 2 read I/Os > hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although most > read I/Os are ordered but not > contiguous<http://en.wikipedia.org/wiki/Contiguity#Computer_science> > . > > By your suggestion, I tried 512B sequential I/O with settings below, > > Max disk size - 8388608 > # of Outstanding I/O - 32 (for 64, it's also around 82K) > Transfer request size - 512B, > 100% sequential > Reply size - no reply > Align I/Os on - Sector boundaries > > The result is around 82K, still much slower than my program. > > If my program has any defect in calculating IOPS? Or if I have any > misunderstanding of caching of SSD or file system, which causes my program > fetches data most from RAM of SSD? Or what parameters I should set in I/O > meter to simulate my program I/O? > > Thank you again in advance for your time to help investigate it!! > > Nai Yan. > > 2012/2/11 Fabian Tillier <fa...@ti...> > >> If I read the test correctly, all threads start at offset 0, and then >> perform 512b reads with a 1024b stride between reads. As Joe said, >> this is pretty much sequential reading, and all threads are reading >> the same data, so most are likely to be satisifed from cache, either >> in the OS or on the SSD itself. They'll do 320000/16=20000 IO >> operations total each, so end up reading 20MB of the file. It's quite >> likely that the whole 20MB that you are reading will sit happilly in >> the file cache. >> >> Create an access pattern that mimics your app (512b sequential with >> 1024b stride), create 32 workers, and see if the results are similar. >> Best would be if you created a test file of 20MB, too. You can then >> see how things compare if you go with async I/O and a single thread. >> >> Cheers, >> -Fab >> >> On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: >> > Forgive me if I missed it, but I don't see any randomization in your >> > file reads. >> > >> > It looks like you just skip ahead so thread 0 reads the first >> > 512bytes, thread 1 the next 512b. So any storage will be prefetching >> > very effectively. >> > >> > Tell Iometer to do sequential instead of random and see how much >> > closer the numbers are. Or better yet, make your program randomize >> > its reads over the entire disk. >> > >> > Joe >> > >> > >> > Quoting Nai yan zhao <zha...@gm...>: >> > >> >> Greetings, >> >> Could anybody help me a little out of my difficulty? >> >> >> >> I have a SSD and I am trying to use it to simulate my program I/O >> >> performance, however, IOPS calculated from my program is much much >> faster >> >> than IOMeter. >> >> >> >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read >> >> IOPS is around 94k (queue depth is 32). >> >> However my program (32 windows threads) can reach around 500k 512B >> >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't find >> >> any error in data fetching. It's because my data fetching in order? >> >> >> >> I paste my code belwo (it mainly fetch 512B from file and release >> it; >> >> I did use 4bytes (an int) to validate program logic and didn't find >> >> problem), can anybody help me figure out where I am wrong? >> >> >> >> Thanks so much in advance!! >> >> >> >> Nai Yan. >> >> >> >> #include <stdio.h> >> >> #include <Windows.h> >> >> /* >> >> ** Purpose: Verify file random read IOPS in comparison with IOMeter >> >> ** Author: Nai Yan >> >> ** Date: Feb. 9th, 2012 >> >> **/ >> >> //Global variables >> >> long completeIOs = 0; >> >> long completeBytes = 0; >> >> int threadCount = 32; >> >> unsigned long long length = 1073741824; //4G test file >> >> int interval = 1024; >> >> int resultArrayLen = 320000; >> >> int *result = new int[resultArrayLen]; >> >> //Method declarison >> >> double GetSecs(void); //Calculate out duration >> >> int InitPool(long long,char*,int); //Initialize test data for >> >> testing, if successful, return 1; otherwise, return a non 1 value. >> >> int * FileRead(char * path); >> >> unsigned int DataVerification(int*, int sampleItem); >> >> //Verify data fetched from pool >> >> int main() >> >> { >> >> int sampleItem = 0x1; >> >> char * fPath = "G:\\workspace\\4G.bin"; >> >> unsigned int invalidIO = 0; >> >> if (InitPool(length,fPath,sampleItem)!= 1) >> >> printf("File write err... \n"); >> >> //start do random I/Os from initialized file >> >> double start = GetSecs(); >> >> int * fetchResult = FileRead(fPath); >> >> double end = GetSecs(); >> >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - >> start)); >> >> //start data validation, for 4 bytes fetch only >> >> // invalidIO = DataVerification(fetchResult,sampleItem); >> >> // if (invalidIO !=0) >> >> // { >> >> // printf("Total invalid data fetch IOs are %d", invalidIO); >> >> // } >> >> return 0; >> >> } >> >> >> >> >> >> int InitPool(long long length, char* path, int sample) >> >> { >> >> printf("Start initializing test data ... \n"); >> >> FILE * fp = fopen(path,"wb"); >> >> if (fp == NULL) >> >> { >> >> printf("file open err... \n"); >> >> exit (-1); >> >> } >> >> else //initialize file for testing >> >> { >> >> fseek(fp,0L,SEEK_SET); >> >> for (int i=0; i<length; i++) >> >> { >> >> fwrite(&sample,sizeof(int),1,fp); >> >> } >> >> fclose(fp); >> >> fp = NULL; >> >> printf("Data initialization is complete...\n"); >> >> return 1; >> >> } >> >> } >> >> double GetSecs(void) >> >> { >> >> LARGE_INTEGER frequency; >> >> LARGE_INTEGER start; >> >> if(! QueryPerformanceFrequency(&frequency)) >> >> printf("QueryPerformanceFrequency Failed\n"); >> >> if(! QueryPerformanceCounter(&start)) >> >> printf("QueryPerformanceCounter Failed\n"); >> >> return ((double)start.QuadPart/(double)frequency.QuadPart); >> >> } >> >> class input >> >> { >> >> public: >> >> char *path; >> >> int starting; >> >> input (int st, char * filePath):starting(st),path(filePath){} >> >> }; >> >> //Workers >> >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >> >> { >> >> input * in = (input*) lpThreadParameter; >> >> char* path = in->path; >> >> FILE * fp = fopen(path,"rb"); >> >> int sPos = in->starting; >> >> // int * result = in->r; >> >> if(fp != NULL) >> >> { >> >> fpos_t pos; >> >> for (int i=0; i<resultArrayLen/threadCount;i++) >> >> { >> >> pos = i * interval; >> >> fsetpos(fp,&pos); >> >> //For 512 bytes fetch each time >> >> unsigned char *c =new unsigned char [512]; >> >> if (fread(c,512,1,fp) ==1) >> >> { >> >> InterlockedIncrement(&completeIOs); >> >> delete c; >> >> } >> >> //For 4 bytes fetch each time >> >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >> >> { >> >> InterlockedIncrement(&completeIOs); >> >> }*/ >> >> else >> >> { >> >> printf("file read err...\n"); >> >> exit(-1); >> >> } >> >> } >> >> fclose(fp); >> >> fp = NULL; >> >> } >> >> else >> >> { >> >> printf("File open err... \n"); >> >> exit(-1); >> >> } >> >> } >> >> int * FileRead(char * p) >> >> { >> >> printf("Starting reading file ... \n"); >> >> HANDLE mWorkThread[256]; //max 256 threads >> >> completeIOs = 0; >> >> int slice = int (resultArrayLen/threadCount); >> >> for(int i = 0; i < threadCount; i++) >> >> { >> >> mWorkThread[i] = CreateThread( >> >> NULL, >> >> 0, >> >> FileReadThreadEntry, >> >> (LPVOID)(new input(i*slice,p)), >> >> 0, >> >> NULL); >> >> } >> >> WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE); >> >> printf("File read complete... \n"); >> >> return result; >> >> } >> >> unsigned int DataVerification(int* result, int sampleItem) >> >> { >> >> unsigned int invalid = 0; >> >> for (int i=0; i< resultArrayLen/interval;i++) >> >> { >> >> if (result[i]!=sampleItem) >> >> { >> >> invalid ++; >> >> continue; >> >> } >> >> } >> >> return invalid; >> >> } >> >> >> > >> > >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Virtualization & Cloud Management Using Capacity Planning >> > Cloud computing makes use of virtualization - but cloud computing >> > also focuses on allowing computing to be delivered as a service. >> > http://www.accelacomm.com/jaw/sfnl/114/51521223/ >> > _______________________________________________ >> > Iometer-devel mailing list >> > Iom...@li... >> > https://lists.sourceforge.net/lists/listinfo/iometer-devel >> > |
From: Nai y. z. <zha...@gm...> - 2012-02-12 14:17:42
|
Hello Fabian and Joe, Thank you so much for your reply. Actually, what I am trying to do, is to split a file into 32 parts, and each part will be assigned to a thread to read. Each thread each time to open file, read 512B, and close file. I was trying to avoid 2 read I/Os hit 1 block(512B) - i.e. to avoid cache in SSD (it's 128MB), although most read I/Os are ordered but not contiguous<http://en.wikipedia.org/wiki/Contiguity#Computer_science> . By your suggestion, I tried 512B sequential I/O with settings below, Max disk size - 8388608 # of Outstanding I/O - 32 (for 64, it's also around 82K) Transfer request size - 512B, 100% sequential Reply size - no reply Align I/Os on - Sector boundaries The result is around 82K, still much slower than my program. If my program has any defect in calculating IOPS? Or if I have any misunderstanding of caching of SSD or file system, which causes my program fetches data most from RAM of SSD? Or what parameters I should set in I/O meter to simulate my program I/O? Thank you again in advance for your time to help investigate it!! Nai Yan. 2012/2/11 Fabian Tillier <fa...@ti...> > If I read the test correctly, all threads start at offset 0, and then > perform 512b reads with a 1024b stride between reads. As Joe said, > this is pretty much sequential reading, and all threads are reading > the same data, so most are likely to be satisifed from cache, either > in the OS or on the SSD itself. They'll do 320000/16=20000 IO > operations total each, so end up reading 20MB of the file. It's quite > likely that the whole 20MB that you are reading will sit happilly in > the file cache. > > Create an access pattern that mimics your app (512b sequential with > 1024b stride), create 32 workers, and see if the results are similar. > Best would be if you created a test file of 20MB, too. You can then > see how things compare if you go with async I/O and a single thread. > > Cheers, > -Fab > > On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: > > Forgive me if I missed it, but I don't see any randomization in your > > file reads. > > > > It looks like you just skip ahead so thread 0 reads the first > > 512bytes, thread 1 the next 512b. So any storage will be prefetching > > very effectively. > > > > Tell Iometer to do sequential instead of random and see how much > > closer the numbers are. Or better yet, make your program randomize > > its reads over the entire disk. > > > > Joe > > > > > > Quoting Nai yan zhao <zha...@gm...>: > > > >> Greetings, > >> Could anybody help me a little out of my difficulty? > >> > >> I have a SSD and I am trying to use it to simulate my program I/O > >> performance, however, IOPS calculated from my program is much much > faster > >> than IOMeter. > >> > >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read > >> IOPS is around 94k (queue depth is 32). > >> However my program (32 windows threads) can reach around 500k 512B > >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't find > >> any error in data fetching. It's because my data fetching in order? > >> > >> I paste my code belwo (it mainly fetch 512B from file and release > it; > >> I did use 4bytes (an int) to validate program logic and didn't find > >> problem), can anybody help me figure out where I am wrong? > >> > >> Thanks so much in advance!! > >> > >> Nai Yan. > >> > >> #include <stdio.h> > >> #include <Windows.h> > >> /* > >> ** Purpose: Verify file random read IOPS in comparison with IOMeter > >> ** Author: Nai Yan > >> ** Date: Feb. 9th, 2012 > >> **/ > >> //Global variables > >> long completeIOs = 0; > >> long completeBytes = 0; > >> int threadCount = 32; > >> unsigned long long length = 1073741824; //4G test file > >> int interval = 1024; > >> int resultArrayLen = 320000; > >> int *result = new int[resultArrayLen]; > >> //Method declarison > >> double GetSecs(void); //Calculate out duration > >> int InitPool(long long,char*,int); //Initialize test data for > >> testing, if successful, return 1; otherwise, return a non 1 value. > >> int * FileRead(char * path); > >> unsigned int DataVerification(int*, int sampleItem); > >> //Verify data fetched from pool > >> int main() > >> { > >> int sampleItem = 0x1; > >> char * fPath = "G:\\workspace\\4G.bin"; > >> unsigned int invalidIO = 0; > >> if (InitPool(length,fPath,sampleItem)!= 1) > >> printf("File write err... \n"); > >> //start do random I/Os from initialized file > >> double start = GetSecs(); > >> int * fetchResult = FileRead(fPath); > >> double end = GetSecs(); > >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - > start)); > >> //start data validation, for 4 bytes fetch only > >> // invalidIO = DataVerification(fetchResult,sampleItem); > >> // if (invalidIO !=0) > >> // { > >> // printf("Total invalid data fetch IOs are %d", invalidIO); > >> // } > >> return 0; > >> } > >> > >> > >> int InitPool(long long length, char* path, int sample) > >> { > >> printf("Start initializing test data ... \n"); > >> FILE * fp = fopen(path,"wb"); > >> if (fp == NULL) > >> { > >> printf("file open err... \n"); > >> exit (-1); > >> } > >> else //initialize file for testing > >> { > >> fseek(fp,0L,SEEK_SET); > >> for (int i=0; i<length; i++) > >> { > >> fwrite(&sample,sizeof(int),1,fp); > >> } > >> fclose(fp); > >> fp = NULL; > >> printf("Data initialization is complete...\n"); > >> return 1; > >> } > >> } > >> double GetSecs(void) > >> { > >> LARGE_INTEGER frequency; > >> LARGE_INTEGER start; > >> if(! QueryPerformanceFrequency(&frequency)) > >> printf("QueryPerformanceFrequency Failed\n"); > >> if(! QueryPerformanceCounter(&start)) > >> printf("QueryPerformanceCounter Failed\n"); > >> return ((double)start.QuadPart/(double)frequency.QuadPart); > >> } > >> class input > >> { > >> public: > >> char *path; > >> int starting; > >> input (int st, char * filePath):starting(st),path(filePath){} > >> }; > >> //Workers > >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) > >> { > >> input * in = (input*) lpThreadParameter; > >> char* path = in->path; > >> FILE * fp = fopen(path,"rb"); > >> int sPos = in->starting; > >> // int * result = in->r; > >> if(fp != NULL) > >> { > >> fpos_t pos; > >> for (int i=0; i<resultArrayLen/threadCount;i++) > >> { > >> pos = i * interval; > >> fsetpos(fp,&pos); > >> //For 512 bytes fetch each time > >> unsigned char *c =new unsigned char [512]; > >> if (fread(c,512,1,fp) ==1) > >> { > >> InterlockedIncrement(&completeIOs); > >> delete c; > >> } > >> //For 4 bytes fetch each time > >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) > >> { > >> InterlockedIncrement(&completeIOs); > >> }*/ > >> else > >> { > >> printf("file read err...\n"); > >> exit(-1); > >> } > >> } > >> fclose(fp); > >> fp = NULL; > >> } > >> else > >> { > >> printf("File open err... \n"); > >> exit(-1); > >> } > >> } > >> int * FileRead(char * p) > >> { > >> printf("Starting reading file ... \n"); > >> HANDLE mWorkThread[256]; //max 256 threads > >> completeIOs = 0; > >> int slice = int (resultArrayLen/threadCount); > >> for(int i = 0; i < threadCount; i++) > >> { > >> mWorkThread[i] = CreateThread( > >> NULL, > >> 0, > >> FileReadThreadEntry, > >> (LPVOID)(new input(i*slice,p)), > >> 0, > >> NULL); > >> } > >> WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE); > >> printf("File read complete... \n"); > >> return result; > >> } > >> unsigned int DataVerification(int* result, int sampleItem) > >> { > >> unsigned int invalid = 0; > >> for (int i=0; i< resultArrayLen/interval;i++) > >> { > >> if (result[i]!=sampleItem) > >> { > >> invalid ++; > >> continue; > >> } > >> } > >> return invalid; > >> } > >> > > > > > > > > > > > ------------------------------------------------------------------------------ > > Virtualization & Cloud Management Using Capacity Planning > > Cloud computing makes use of virtualization - but cloud computing > > also focuses on allowing computing to be delivered as a service. > > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > > _______________________________________________ > > Iometer-devel mailing list > > Iom...@li... > > https://lists.sourceforge.net/lists/listinfo/iometer-devel > |
From: Fabian T. <fa...@ti...> - 2012-02-10 16:30:42
|
If I read the test correctly, all threads start at offset 0, and then perform 512b reads with a 1024b stride between reads. As Joe said, this is pretty much sequential reading, and all threads are reading the same data, so most are likely to be satisifed from cache, either in the OS or on the SSD itself. They'll do 320000/16=20000 IO operations total each, so end up reading 20MB of the file. It's quite likely that the whole 20MB that you are reading will sit happilly in the file cache. Create an access pattern that mimics your app (512b sequential with 1024b stride), create 32 workers, and see if the results are similar. Best would be if you created a test file of 20MB, too. You can then see how things compare if you go with async I/O and a single thread. Cheers, -Fab On Fri, Feb 10, 2012 at 5:40 AM, <jo...@ei...> wrote: > Forgive me if I missed it, but I don't see any randomization in your > file reads. > > It looks like you just skip ahead so thread 0 reads the first > 512bytes, thread 1 the next 512b. So any storage will be prefetching > very effectively. > > Tell Iometer to do sequential instead of random and see how much > closer the numbers are. Or better yet, make your program randomize > its reads over the entire disk. > > Joe > > > Quoting Nai yan zhao <zha...@gm...>: > >> Greetings, >> Could anybody help me a little out of my difficulty? >> >> I have a SSD and I am trying to use it to simulate my program I/O >> performance, however, IOPS calculated from my program is much much faster >> than IOMeter. >> >> My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read >> IOPS is around 94k (queue depth is 32). >> However my program (32 windows threads) can reach around 500k 512B >> IOPS, around 5 times of IOMeter!!! I did data validation but didn't find >> any error in data fetching. It's because my data fetching in order? >> >> I paste my code belwo (it mainly fetch 512B from file and release it; >> I did use 4bytes (an int) to validate program logic and didn't find >> problem), can anybody help me figure out where I am wrong? >> >> Thanks so much in advance!! >> >> Nai Yan. >> >> #include <stdio.h> >> #include <Windows.h> >> /* >> ** Purpose: Verify file random read IOPS in comparison with IOMeter >> ** Author: Nai Yan >> ** Date: Feb. 9th, 2012 >> **/ >> //Global variables >> long completeIOs = 0; >> long completeBytes = 0; >> int threadCount = 32; >> unsigned long long length = 1073741824; //4G test file >> int interval = 1024; >> int resultArrayLen = 320000; >> int *result = new int[resultArrayLen]; >> //Method declarison >> double GetSecs(void); //Calculate out duration >> int InitPool(long long,char*,int); //Initialize test data for >> testing, if successful, return 1; otherwise, return a non 1 value. >> int * FileRead(char * path); >> unsigned int DataVerification(int*, int sampleItem); >> //Verify data fetched from pool >> int main() >> { >> int sampleItem = 0x1; >> char * fPath = "G:\\workspace\\4G.bin"; >> unsigned int invalidIO = 0; >> if (InitPool(length,fPath,sampleItem)!= 1) >> printf("File write err... \n"); >> //start do random I/Os from initialized file >> double start = GetSecs(); >> int * fetchResult = FileRead(fPath); >> double end = GetSecs(); >> printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - start)); >> //start data validation, for 4 bytes fetch only >> // invalidIO = DataVerification(fetchResult,sampleItem); >> // if (invalidIO !=0) >> // { >> // printf("Total invalid data fetch IOs are %d", invalidIO); >> // } >> return 0; >> } >> >> >> int InitPool(long long length, char* path, int sample) >> { >> printf("Start initializing test data ... \n"); >> FILE * fp = fopen(path,"wb"); >> if (fp == NULL) >> { >> printf("file open err... \n"); >> exit (-1); >> } >> else //initialize file for testing >> { >> fseek(fp,0L,SEEK_SET); >> for (int i=0; i<length; i++) >> { >> fwrite(&sample,sizeof(int),1,fp); >> } >> fclose(fp); >> fp = NULL; >> printf("Data initialization is complete...\n"); >> return 1; >> } >> } >> double GetSecs(void) >> { >> LARGE_INTEGER frequency; >> LARGE_INTEGER start; >> if(! QueryPerformanceFrequency(&frequency)) >> printf("QueryPerformanceFrequency Failed\n"); >> if(! QueryPerformanceCounter(&start)) >> printf("QueryPerformanceCounter Failed\n"); >> return ((double)start.QuadPart/(double)frequency.QuadPart); >> } >> class input >> { >> public: >> char *path; >> int starting; >> input (int st, char * filePath):starting(st),path(filePath){} >> }; >> //Workers >> DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) >> { >> input * in = (input*) lpThreadParameter; >> char* path = in->path; >> FILE * fp = fopen(path,"rb"); >> int sPos = in->starting; >> // int * result = in->r; >> if(fp != NULL) >> { >> fpos_t pos; >> for (int i=0; i<resultArrayLen/threadCount;i++) >> { >> pos = i * interval; >> fsetpos(fp,&pos); >> //For 512 bytes fetch each time >> unsigned char *c =new unsigned char [512]; >> if (fread(c,512,1,fp) ==1) >> { >> InterlockedIncrement(&completeIOs); >> delete c; >> } >> //For 4 bytes fetch each time >> /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) >> { >> InterlockedIncrement(&completeIOs); >> }*/ >> else >> { >> printf("file read err...\n"); >> exit(-1); >> } >> } >> fclose(fp); >> fp = NULL; >> } >> else >> { >> printf("File open err... \n"); >> exit(-1); >> } >> } >> int * FileRead(char * p) >> { >> printf("Starting reading file ... \n"); >> HANDLE mWorkThread[256]; //max 256 threads >> completeIOs = 0; >> int slice = int (resultArrayLen/threadCount); >> for(int i = 0; i < threadCount; i++) >> { >> mWorkThread[i] = CreateThread( >> NULL, >> 0, >> FileReadThreadEntry, >> (LPVOID)(new input(i*slice,p)), >> 0, >> NULL); >> } >> WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE); >> printf("File read complete... \n"); >> return result; >> } >> unsigned int DataVerification(int* result, int sampleItem) >> { >> unsigned int invalid = 0; >> for (int i=0; i< resultArrayLen/interval;i++) >> { >> if (result[i]!=sampleItem) >> { >> invalid ++; >> continue; >> } >> } >> return invalid; >> } >> > > > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel |
From: <jo...@ei...> - 2012-02-10 14:34:50
|
Forgive me if I missed it, but I don't see any randomization in your file reads. It looks like you just skip ahead so thread 0 reads the first 512bytes, thread 1 the next 512b. So any storage will be prefetching very effectively. Tell Iometer to do sequential instead of random and see how much closer the numbers are. Or better yet, make your program randomize its reads over the entire disk. Joe Quoting Nai yan zhao <zha...@gm...>: > Greetings, > Could anybody help me a little out of my difficulty? > > I have a SSD and I am trying to use it to simulate my program I/O > performance, however, IOPS calculated from my program is much much faster > than IOMeter. > > My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read > IOPS is around 94k (queue depth is 32). > However my program (32 windows threads) can reach around 500k 512B > IOPS, around 5 times of IOMeter!!! I did data validation but didn't find > any error in data fetching. It's because my data fetching in order? > > I paste my code belwo (it mainly fetch 512B from file and release it; > I did use 4bytes (an int) to validate program logic and didn't find > problem), can anybody help me figure out where I am wrong? > > Thanks so much in advance!! > > Nai Yan. > > #include <stdio.h> > #include <Windows.h> > /* > ** Purpose: Verify file random read IOPS in comparison with IOMeter > ** Author: Nai Yan > ** Date: Feb. 9th, 2012 > **/ > //Global variables > long completeIOs = 0; > long completeBytes = 0; > int threadCount = 32; > unsigned long long length = 1073741824; //4G test file > int interval = 1024; > int resultArrayLen = 320000; > int *result = new int[resultArrayLen]; > //Method declarison > double GetSecs(void); //Calculate out duration > int InitPool(long long,char*,int); //Initialize test data for > testing, if successful, return 1; otherwise, return a non 1 value. > int * FileRead(char * path); > unsigned int DataVerification(int*, int sampleItem); > //Verify data fetched from pool > int main() > { > int sampleItem = 0x1; > char * fPath = "G:\\workspace\\4G.bin"; > unsigned int invalidIO = 0; > if (InitPool(length,fPath,sampleItem)!= 1) > printf("File write err... \n"); > //start do random I/Os from initialized file > double start = GetSecs(); > int * fetchResult = FileRead(fPath); > double end = GetSecs(); > printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - start)); > //start data validation, for 4 bytes fetch only > // invalidIO = DataVerification(fetchResult,sampleItem); > // if (invalidIO !=0) > // { > // printf("Total invalid data fetch IOs are %d", invalidIO); > // } > return 0; > } > > > int InitPool(long long length, char* path, int sample) > { > printf("Start initializing test data ... \n"); > FILE * fp = fopen(path,"wb"); > if (fp == NULL) > { > printf("file open err... \n"); > exit (-1); > } > else //initialize file for testing > { > fseek(fp,0L,SEEK_SET); > for (int i=0; i<length; i++) > { > fwrite(&sample,sizeof(int),1,fp); > } > fclose(fp); > fp = NULL; > printf("Data initialization is complete...\n"); > return 1; > } > } > double GetSecs(void) > { > LARGE_INTEGER frequency; > LARGE_INTEGER start; > if(! QueryPerformanceFrequency(&frequency)) > printf("QueryPerformanceFrequency Failed\n"); > if(! QueryPerformanceCounter(&start)) > printf("QueryPerformanceCounter Failed\n"); > return ((double)start.QuadPart/(double)frequency.QuadPart); > } > class input > { > public: > char *path; > int starting; > input (int st, char * filePath):starting(st),path(filePath){} > }; > //Workers > DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) > { > input * in = (input*) lpThreadParameter; > char* path = in->path; > FILE * fp = fopen(path,"rb"); > int sPos = in->starting; > // int * result = in->r; > if(fp != NULL) > { > fpos_t pos; > for (int i=0; i<resultArrayLen/threadCount;i++) > { > pos = i * interval; > fsetpos(fp,&pos); > //For 512 bytes fetch each time > unsigned char *c =new unsigned char [512]; > if (fread(c,512,1,fp) ==1) > { > InterlockedIncrement(&completeIOs); > delete c; > } > //For 4 bytes fetch each time > /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) > { > InterlockedIncrement(&completeIOs); > }*/ > else > { > printf("file read err...\n"); > exit(-1); > } > } > fclose(fp); > fp = NULL; > } > else > { > printf("File open err... \n"); > exit(-1); > } > } > int * FileRead(char * p) > { > printf("Starting reading file ... \n"); > HANDLE mWorkThread[256]; //max 256 threads > completeIOs = 0; > int slice = int (resultArrayLen/threadCount); > for(int i = 0; i < threadCount; i++) > { > mWorkThread[i] = CreateThread( > NULL, > 0, > FileReadThreadEntry, > (LPVOID)(new input(i*slice,p)), > 0, > NULL); > } > WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE); > printf("File read complete... \n"); > return result; > } > unsigned int DataVerification(int* result, int sampleItem) > { > unsigned int invalid = 0; > for (int i=0; i< resultArrayLen/interval;i++) > { > if (result[i]!=sampleItem) > { > invalid ++; > continue; > } > } > return invalid; > } > |
From: Nai y. z. <zha...@gm...> - 2012-02-10 08:01:29
|
Greetings, Could anybody help me a little out of my difficulty? I have a SSD and I am trying to use it to simulate my program I/O performance, however, IOPS calculated from my program is much much faster than IOMeter. My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read IOPS is around 94k (queue depth is 32). However my program (32 windows threads) can reach around 500k 512B IOPS, around 5 times of IOMeter!!! I did data validation but didn't find any error in data fetching. It's because my data fetching in order? I paste my code belwo (it mainly fetch 512B from file and release it; I did use 4bytes (an int) to validate program logic and didn't find problem), can anybody help me figure out where I am wrong? Thanks so much in advance!! Nai Yan. #include <stdio.h> #include <Windows.h> /* ** Purpose: Verify file random read IOPS in comparison with IOMeter ** Author: Nai Yan ** Date: Feb. 9th, 2012 **/ //Global variables long completeIOs = 0; long completeBytes = 0; int threadCount = 32; unsigned long long length = 1073741824; //4G test file int interval = 1024; int resultArrayLen = 320000; int *result = new int[resultArrayLen]; //Method declarison double GetSecs(void); //Calculate out duration int InitPool(long long,char*,int); //Initialize test data for testing, if successful, return 1; otherwise, return a non 1 value. int * FileRead(char * path); unsigned int DataVerification(int*, int sampleItem); //Verify data fetched from pool int main() { int sampleItem = 0x1; char * fPath = "G:\\workspace\\4G.bin"; unsigned int invalidIO = 0; if (InitPool(length,fPath,sampleItem)!= 1) printf("File write err... \n"); //start do random I/Os from initialized file double start = GetSecs(); int * fetchResult = FileRead(fPath); double end = GetSecs(); printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - start)); //start data validation, for 4 bytes fetch only // invalidIO = DataVerification(fetchResult,sampleItem); // if (invalidIO !=0) // { // printf("Total invalid data fetch IOs are %d", invalidIO); // } return 0; } int InitPool(long long length, char* path, int sample) { printf("Start initializing test data ... \n"); FILE * fp = fopen(path,"wb"); if (fp == NULL) { printf("file open err... \n"); exit (-1); } else //initialize file for testing { fseek(fp,0L,SEEK_SET); for (int i=0; i<length; i++) { fwrite(&sample,sizeof(int),1,fp); } fclose(fp); fp = NULL; printf("Data initialization is complete...\n"); return 1; } } double GetSecs(void) { LARGE_INTEGER frequency; LARGE_INTEGER start; if(! QueryPerformanceFrequency(&frequency)) printf("QueryPerformanceFrequency Failed\n"); if(! QueryPerformanceCounter(&start)) printf("QueryPerformanceCounter Failed\n"); return ((double)start.QuadPart/(double)frequency.QuadPart); } class input { public: char *path; int starting; input (int st, char * filePath):starting(st),path(filePath){} }; //Workers DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter) { input * in = (input*) lpThreadParameter; char* path = in->path; FILE * fp = fopen(path,"rb"); int sPos = in->starting; // int * result = in->r; if(fp != NULL) { fpos_t pos; for (int i=0; i<resultArrayLen/threadCount;i++) { pos = i * interval; fsetpos(fp,&pos); //For 512 bytes fetch each time unsigned char *c =new unsigned char [512]; if (fread(c,512,1,fp) ==1) { InterlockedIncrement(&completeIOs); delete c; } //For 4 bytes fetch each time /*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1) { InterlockedIncrement(&completeIOs); }*/ else { printf("file read err...\n"); exit(-1); } } fclose(fp); fp = NULL; } else { printf("File open err... \n"); exit(-1); } } int * FileRead(char * p) { printf("Starting reading file ... \n"); HANDLE mWorkThread[256]; //max 256 threads completeIOs = 0; int slice = int (resultArrayLen/threadCount); for(int i = 0; i < threadCount; i++) { mWorkThread[i] = CreateThread( NULL, 0, FileReadThreadEntry, (LPVOID)(new input(i*slice,p)), 0, NULL); } WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE); printf("File read complete... \n"); return result; } unsigned int DataVerification(int* result, int sampleItem) { unsigned int invalid = 0; for (int i=0; i< resultArrayLen/interval;i++) { if (result[i]!=sampleItem) { invalid ++; continue; } } return invalid; } |
From: Jinpyo K. <jk...@vm...> - 2011-09-29 18:15:18
|
Hi Marty, Thanks for updating test results. I attached a new patch for 2006 version (rev2006) and v1.1.0-rc1. This patch will fix the problem in prepare disk and CPU util update. I usually tested linux dynamo on Linux dynamo VM running VMWare ESX. I compared Iometer reported numbers and IOPS numbers from esxtop (ESX performance monitoring tool). I didn't find such performance difference in my tests (local disk, FC SAN -- random/sequential mix) But I suggest you turn off linux I/O scheduler (from CFQ to noop). With CFQ I/O scheduler, I also observed a big difference between Iometer reported numbers and ESXTOP reported numbers. Here is an instruction to turn off CFQ scheduler in Linux. ----------------------------------------------------------------- Here is instructions to turn off CFQ I/O scheduler. To set the scheduler in Grub. If we take noop as the target default scheduler for the system, the /boot/grub/menu.lst kernel entry would look like this: title CentOS (2.6.18-128.4.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-128.4.1.el5 ro root=/dev/VolGroup00/LogVol00 elevator=noop initrd /initrd-2.6.18-128.4.1.el5.img Having the elevator entry in place, the system will set the I/O scheduler to the specified one on every boot. In short, add "elevator=noop" in kernel boot parameter line in /boot/grub/menu.lst. ----------------------------------------------------------------- Hope this helps. Thanks. -JK -----Original Message----- From: Marty Schlining [mailto:msc...@dd...] Sent: Thursday, September 29, 2011 10:08 AM To: Jinpyo Kim Subject: RE: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi Jinpyo, I am giving your changes a try. I compiled the IOMeter GUI (Win32 Release) using Microsoft VC++ 2008 on Windows 7. I compiled dynamo on a linux x86_84 platform (Centos 5.5). No problems, there. I was also able to run everything the way I expected. My concern is that the results posted in the IOMeter GUI do not match what is being measured at the target. I'm not sure how to begin debugging this. Perhaps I should try a later version of IOMeter with your changes? For a 512k 100% sequential read to my target (12 LUNs), IOMeter reports a total of 1.4 GB/s (1423 MB/s), while the target is showing 2.0 GB/s (which is accurate based on previous measurements with xdd). IO sizes looks correct at the target. Target IO measurements per LUN: Virtual Disk Counters: Elapsed time = 1475.216 seconds Idx IOs/sec KiB/sec KiB/IO Fwd IO/s Fwd KiB/s| IOs/sec KiB/sec KiB/IO Fwd IO/s Fwd KiB/s| -------------------------------------------------------------------------------------------------- 0 0 0 0 0 0 | 323 165712 512 0 0 | 1 338 173412 512 0 0 | 0 0 0 0 0 | 2 0 0 0 0 0 | 366 187689 512 0 0 | 3 339 173869 512 0 0 | 0 0 0 0 0 | 4 0 0 0 0 0 | 325 166456 512 0 0 | 5 340 174273 512 0 0 | 0 0 0 0 0 | 6 0 0 0 0 0 | 326 167141 512 0 0 | 7 337 172666 512 0 0 | 0 0 0 0 0 | 8 0 0 0 0 0 | 325 166473 512 0 0 | 9 338 173213 512 0 0 | 0 0 0 0 0 | 10 0 0 0 0 0 | 325 166590 512 0 0 | 11 346 177442 512 0 0 | 0 0 0 0 0 | Total 2.064 GB/s That's pretty awesome for dynamo on Linux. Can't wait to see what it will do on a larger system. Best Regards, Marty Schlining DataDirect Networks -----Original Message----- From: Jinpyo Kim [mailto:jk...@vm...] Sent: Friday, September 23, 2011 2:27 PM To: Daniel Scheibli Cc: Iom...@li... Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi, I quickly created and attached a patch for 1.1 RC build. It compiles OK, but not tested with 1.1 RC Iometer controller. I will do it by next week. Thanks. -JK -----Original Message----- From: Daniel Scheibli [mailto:da...@sc...] Sent: Friday, September 23, 2011 1:39 AM To: Jinpyo Kim Cc: Vedran Degoricija; Iom...@li... Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi Jinpyo, thanks for the patch! Aravind contacted me some months ago about it, so its awesome to see it happen. If you can provide the path against the recent RC that would be of great help so checking and integrating becomes easier. Thanks, Daniel PS: The mail with the attachment got hold up by the mailing list, but I approved it, so it should show up soon. > Yes, there is little difference in Linux code in recent 1.1 RC build > compared to 2006 version. > If you need a patch for recent RC build, I will send it too. > > Thanks. > -JK > > From: Vedran Degoricija [mailto:ve...@ya...] > Sent: Thursday, September 22, 2011 3:30 PM > To: Jinpyo Kim; Iom...@li... > Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo > > Hi Jinpyo, > > This is good news! I was not aware of your efforts. We have been > needing this for a long time. > > I suspect that today's Linux code has not changed much since 2006, but > we'd need to check that is the case, or else you need to make a patch > based on the latest code in SVN. > > Also, would you pass your code changes to us in a tarball for review? > > Thanks, > Ved > > > From: Jinpyo Kim <jk...@vm...> > To: "Iom...@li..." > <Iom...@li...> > Sent: Thursday, September 22, 2011 3:07 PM > Subject: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi, > > A few months back, Aravind B. (from VMware) contacted whether we can > contribute AIO fix for linux dynamo. > Iometer tool was widely used for many I/O tests at VMware or by our > customers. > > But existing released version and recent RC build had the same > problems not issuing multiple outstanding I/Os properly. > We have a fixed linux dynamo version (used for a while), but it would > be better integrated in next release of Iometer. > > I already created a patch from 2006 released version src tree. > Please let me how to check it in. > > Thanks. > -JK > > > ---------------------------------------------------------------------- > -------- All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity and more. Splunk takes this data > and makes sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Iometer-devel mailing list > Iom...@li...<mailto:Iom...@li...urcef > orge.net> https://lists.sourceforge.net/lists/listinfo/iometer-devel > > ---------------------------------------------------------------------- > -------- All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity and more. Splunk takes this data > and makes sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1___________________________________ > ____________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel > |
From: Jinpyo K. <jk...@vm...> - 2011-09-23 18:26:53
|
Hi, I quickly created and attached a patch for 1.1 RC build. It compiles OK, but not tested with 1.1 RC Iometer controller. I will do it by next week. Thanks. -JK -----Original Message----- From: Daniel Scheibli [mailto:da...@sc...] Sent: Friday, September 23, 2011 1:39 AM To: Jinpyo Kim Cc: Vedran Degoricija; Iom...@li... Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi Jinpyo, thanks for the patch! Aravind contacted me some months ago about it, so its awesome to see it happen. If you can provide the path against the recent RC that would be of great help so checking and integrating becomes easier. Thanks, Daniel PS: The mail with the attachment got hold up by the mailing list, but I approved it, so it should show up soon. > Yes, there is little difference in Linux code in recent 1.1 RC > build compared to 2006 version. > If you need a patch for recent RC build, I will send it too. > > Thanks. > -JK > > From: Vedran Degoricija [mailto:ve...@ya...] > Sent: Thursday, September 22, 2011 3:30 PM > To: Jinpyo Kim; Iom...@li... > Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo > > Hi Jinpyo, > > This is good news! I was not aware of your efforts. We have been > needing this for a long time. > > I suspect that today's Linux code has not changed much since 2006, but > we'd need to check that is the case, or else you need to make a patch > based on the latest code in SVN. > > Also, would you pass your code changes to us in a tarball for review? > > Thanks, > Ved > > > From: Jinpyo Kim <jk...@vm...> > To: "Iom...@li..." > <Iom...@li...> > Sent: Thursday, September 22, 2011 3:07 PM > Subject: [Iometer-devel] Checking in AIO fix for Linux dynamo > Hi, > > A few months back, Aravind B. (from VMware) contacted whether we can > contribute AIO fix for linux dynamo. > Iometer tool was widely used for many I/O tests at VMware or by our > customers. > > But existing released version and recent RC build had the same > problems not issuing multiple outstanding I/Os properly. > We have a fixed linux dynamo version (used for a while), but it would > be better integrated in next release of Iometer. > > I already created a patch from 2006 released version src tree. > Please let me how to check it in. > > Thanks. > -JK > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains > a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Iometer-devel mailing list > Iom...@li...<mailto:Iom...@li...> > https://lists.sourceforge.net/lists/listinfo/iometer-devel > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains > a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1_______________________________________________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel > |
From: Jinpyo K. <jk...@vm...> - 2011-09-23 17:03:45
|
Thanks for the prompt reply. I will create patch for recent RC build and send to this list by Monday. Thanks. -JK -----Original Message----- From: Daniel Scheibli [mailto:da...@sc...] Sent: Friday, September 23, 2011 1:39 AM To: Jinpyo Kim Cc: Vedran Degoricija; Iom...@li... Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi Jinpyo, thanks for the patch! Aravind contacted me some months ago about it, so its awesome to see it happen. If you can provide the path against the recent RC that would be of great help so checking and integrating becomes easier. Thanks, Daniel PS: The mail with the attachment got hold up by the mailing list, but I approved it, so it should show up soon. > Yes, there is little difference in Linux code in recent 1.1 RC > build compared to 2006 version. > If you need a patch for recent RC build, I will send it too. > > Thanks. > -JK > > From: Vedran Degoricija [mailto:ve...@ya...] > Sent: Thursday, September 22, 2011 3:30 PM > To: Jinpyo Kim; Iom...@li... > Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo > > Hi Jinpyo, > > This is good news! I was not aware of your efforts. We have been > needing this for a long time. > > I suspect that today's Linux code has not changed much since 2006, but > we'd need to check that is the case, or else you need to make a patch > based on the latest code in SVN. > > Also, would you pass your code changes to us in a tarball for review? > > Thanks, > Ved > > > From: Jinpyo Kim <jk...@vm...> > To: "Iom...@li..." > <Iom...@li...> > Sent: Thursday, September 22, 2011 3:07 PM > Subject: [Iometer-devel] Checking in AIO fix for Linux dynamo > Hi, > > A few months back, Aravind B. (from VMware) contacted whether we can > contribute AIO fix for linux dynamo. > Iometer tool was widely used for many I/O tests at VMware or by our > customers. > > But existing released version and recent RC build had the same > problems not issuing multiple outstanding I/Os properly. > We have a fixed linux dynamo version (used for a while), but it would > be better integrated in next release of Iometer. > > I already created a patch from 2006 released version src tree. > Please let me how to check it in. > > Thanks. > -JK > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains > a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Iometer-devel mailing list > Iom...@li...<mailto:Iom...@li...> > https://lists.sourceforge.net/lists/listinfo/iometer-devel > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains > a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1_______________________________________________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel > |
From: Daniel S. <da...@sc...> - 2011-09-23 08:56:46
|
Hi Jinpyo, thanks for the patch! Aravind contacted me some months ago about it, so its awesome to see it happen. If you can provide the path against the recent RC that would be of great help so checking and integrating becomes easier. Thanks, Daniel PS: The mail with the attachment got hold up by the mailing list, but I approved it, so it should show up soon. > Yes, there is little difference in Linux code in recent 1.1 RC > build compared to 2006 version. > If you need a patch for recent RC build, I will send it too. > > Thanks. > -JK > > From: Vedran Degoricija [mailto:ve...@ya...] > Sent: Thursday, September 22, 2011 3:30 PM > To: Jinpyo Kim; Iom...@li... > Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo > > Hi Jinpyo, > > This is good news! I was not aware of your efforts. We have been > needing this for a long time. > > I suspect that today's Linux code has not changed much since 2006, but > we'd need to check that is the case, or else you need to make a patch > based on the latest code in SVN. > > Also, would you pass your code changes to us in a tarball for review? > > Thanks, > Ved > > > From: Jinpyo Kim <jk...@vm...> > To: "Iom...@li..." > <Iom...@li...> > Sent: Thursday, September 22, 2011 3:07 PM > Subject: [Iometer-devel] Checking in AIO fix for Linux dynamo > Hi, > > A few months back, Aravind B. (from VMware) contacted whether we can > contribute AIO fix for linux dynamo. > Iometer tool was widely used for many I/O tests at VMware or by our > customers. > > But existing released version and recent RC build had the same > problems not issuing multiple outstanding I/Os properly. > We have a fixed linux dynamo version (used for a while), but it would > be better integrated in next release of Iometer. > > I already created a patch from 2006 released version src tree. > Please let me how to check it in. > > Thanks. > -JK > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains > a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > Iometer-devel mailing list > Iom...@li...<mailto:Iom...@li...> > https://lists.sourceforge.net/lists/listinfo/iometer-devel > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains > a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and > makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1_______________________________________________ > Iometer-devel mailing list > Iom...@li... > https://lists.sourceforge.net/lists/listinfo/iometer-devel > |
From: Jinpyo K. <jk...@vm...> - 2011-09-22 23:03:56
|
Yes, there is little difference in Linux code in recent 1.1 RC build compared to 2006 version. If you need a patch for recent RC build, I will send it too. Thanks. -JK From: Vedran Degoricija [mailto:ve...@ya...] Sent: Thursday, September 22, 2011 3:30 PM To: Jinpyo Kim; Iom...@li... Subject: Re: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi Jinpyo, This is good news! I was not aware of your efforts. We have been needing this for a long time. I suspect that today's Linux code has not changed much since 2006, but we'd need to check that is the case, or else you need to make a patch based on the latest code in SVN. Also, would you pass your code changes to us in a tarball for review? Thanks, Ved From: Jinpyo Kim <jk...@vm...> To: "Iom...@li..." <Iom...@li...> Sent: Thursday, September 22, 2011 3:07 PM Subject: [Iometer-devel] Checking in AIO fix for Linux dynamo Hi, A few months back, Aravind B. (from VMware) contacted whether we can contribute AIO fix for linux dynamo. Iometer tool was widely used for many I/O tests at VMware or by our customers. But existing released version and recent RC build had the same problems not issuing multiple outstanding I/Os properly. We have a fixed linux dynamo version (used for a while), but it would be better integrated in next release of Iometer. I already created a patch from 2006 released version src tree. Please let me how to check it in. Thanks. -JK ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Iometer-devel mailing list Iom...@li...<mailto:Iom...@li...> https://lists.sourceforge.net/lists/listinfo/iometer-devel |
From: Vedran D. <ve...@ya...> - 2011-09-22 22:30:35
|
Hi Jinpyo, This is good news! I was not aware of your efforts. We have been needing this for a long time. I suspect that today's Linux code has not changed much since 2006, but we'd need to check that is the case, or else you need to make a patch based on the latest code in SVN. Also, would you pass your code changes to us in a tarball for review? Thanks, Ved From: Jinpyo Kim <jk...@vm...> >To: "Iom...@li..." <Iom...@li...> >Sent: Thursday, September 22, 2011 3:07 PM >Subject: [Iometer-devel] Checking in AIO fix for Linux dynamo > > >Hi, > >A few months back, Aravind B. (from VMware) contacted whether we can contribute AIO fix for linux dynamo. >Iometer tool was widely used for many I/O tests at VMware or by our customers. > >But existing released version and recent RC build had the same problems not issuing multiple outstanding I/Os properly. >We have a fixed linux dynamo version (used for a while), but it would be better integrated in next release of Iometer. > >I already created a patch from 2006 released version src tree. >Please let me how to check it in. > >Thanks. >-JK > >------------------------------------------------------------------------------ >All the data continuously generated in your IT infrastructure contains a >definitive record of customers, application performance, security >threats, fraudulent activity and more. Splunk takes this data and makes >sense of it. Business sense. IT sense. Common sense. >http://p.sf.net/sfu/splunk-d2dcopy1 >_______________________________________________ >Iometer-devel mailing list >Iom...@li... >https://lists.sourceforge.net/lists/listinfo/iometer-devel > > > |
From: Jinpyo K. <jk...@vm...> - 2011-09-22 22:07:38
|
Hi, A few months back, Aravind B. (from VMware) contacted whether we can contribute AIO fix for linux dynamo. Iometer tool was widely used for many I/O tests at VMware or by our customers. But existing released version and recent RC build had the same problems not issuing multiple outstanding I/Os properly. We have a fixed linux dynamo version (used for a while), but it would be better integrated in next release of Iometer. I already created a patch from 2006 released version src tree. Please let me how to check it in. Thanks. -JK |
From: SourceForge.net <no...@so...> - 2011-08-24 19:51:05
|
Bugs item #3397626, was opened at 2011-08-24 14:51 Message generated for change (Tracker Item Submitted) made by You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=427254&aid=3397626&group_id=40179 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: https://www.google.com/accounts () Assigned to: Nobody/Anonymous (nobody) Summary: Block device - block size, Inaccurate Results Initial Comment: Under linux, dynamo opens physical device without O_DIRECT flag, resulting in higher performance due to device buffered i/o. Example: Sector Size 512, Block Size 4096. Access Specification: 512 Bytes Read. Test will open the block device and perform 4096 reads opposed to 512, resulting in ~8x higher performance. Fix: In function TargetDisk::Open (ln 1499 on iometer-1.1.0-rc1-src), add O_DIRECT to open flags. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=427254&aid=3397626&group_id=40179 |
From: Vedran D. <ve...@ya...> - 2011-08-17 18:07:54
|
Should be miliseconds. Ved From: Michaelian Ennis <mic...@gm...> >To: iom...@li... >Sent: Wednesday, August 17, 2011 7:18 AM >Subject: [Iometer-devel] csv and latency > >The report csv columns units of measure are not all labeled at least >in 1.1.0-rc1. > >In what units of measure are the "Average Response Time" columns? > >ian > >------------------------------------------------------------------------------ >Get a FREE DOWNLOAD! and learn more about uberSVN rich system, >user administration capabilities and model configuration. Take >the hassle out of deploying and managing Subversion and the >tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 >_______________________________________________ >Iometer-devel mailing list >Iom...@li... >https://lists.sourceforge.net/lists/listinfo/iometer-devel > > > |
From: Michaelian E. <mic...@gm...> - 2011-08-17 14:18:24
|
The report csv columns units of measure are not all labeled at least in 1.1.0-rc1. In what units of measure are the "Average Response Time" columns? ian |
From: Daniel S. <da...@sc...> - 2011-07-13 23:54:53
|
Thanks Jeff, fixed it. Jeff Squyres wrote: > Just a friendly note: the links to the mailing list archives on this web page are incorrect: > > http://iometer.org/doc/mailinglists.html > |
From: Jeff S. <jsq...@ci...> - 2011-07-13 22:00:48
|
On Jul 13, 2011, at 5:44 PM, Allen, Wayne wrote: > 1. We have migrated the Windows portion of the code away from RDTSC due to the reasons you mentioned. Linux hasn't been addressed yet. Gotcha. > 2. You point out something that has been under discussion among us IOMeter admins for some time. We'd like to make that migration; however we haven't had the bandwidth to get to it. We'd certainly entertain others contributing code that gets it done. :) I'm Cisco's contributor to another open source project, so I can certainly understand "patches are welcome!". I do believe I've used that phrase a few times myself. :-) Sadly, I don't have the cycles to do such a port to the Linux-specific io_* API at the moment. :-( I was mainly asking if anyone else was doing it. -- Jeff Squyres jsq...@ci... For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ |
From: Allen, W. <way...@in...> - 2011-07-13 21:44:35
|
Hi Jeff, Thanks for pinging us regarding your questions. 1. We have migrated the Windows portion of the code away from RDTSC due to the reasons you mentioned. Linux hasn't been addressed yet. 2. You point out something that has been under discussion among us IOMeter admins for some time. We'd like to make that migration; however we haven't had the bandwidth to get to it. We'd certainly entertain others contributing code that gets it done. :) Best Regards, Wayne -----Original Message----- From: Jeff Squyres [mailto:jsq...@ci...] Sent: Wednesday, July 13, 2011 12:54 PM To: iom...@li... Subject: [Iometer-devel] Dynamo RDTSC / Linux aio_* API questions Greetings. 1. It looks like IOTime.cpp is using the RDTSC clock for time measurement. This isn't safe on multicore systems (e.g., if the OS moves the dynamo process from processor socket A to processor socket B, the RDTSC values are likely to be unrelated). Is there any thought of changing the use of RDTSC to some other method? E.g., on Linux, the clock_gettime() method can be used. Or is processor affinity always enforced to lock dynamo in place so that consecutive RDTSC values are relevant? I ask because it *looks* like you can disable affinity, but the RDTSC clock is still used...? Please feel free to tell me that I completely misunderstand the code. :-) 2. A quick browse through the source code shows that IOCompletionQ.cpp is using the aio_* API for reads and writes. Is there any effort going into using the Linux-native io_* API for reads and writes? It seems to perform significantly better than the aio_* API (in RHEL 5 and 6, at least). Thanks! -- Jeff Squyres jsq...@ci... For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ------------------------------------------------------------------------------ AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on "Lean Startup Secrets Revealed." This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev _______________________________________________ Iometer-devel mailing list Iom...@li... https://lists.sourceforge.net/lists/listinfo/iometer-devel |
From: Jeff S. <jsq...@ci...> - 2011-07-13 19:54:11
|
Greetings. 1. It looks like IOTime.cpp is using the RDTSC clock for time measurement. This isn't safe on multicore systems (e.g., if the OS moves the dynamo process from processor socket A to processor socket B, the RDTSC values are likely to be unrelated). Is there any thought of changing the use of RDTSC to some other method? E.g., on Linux, the clock_gettime() method can be used. Or is processor affinity always enforced to lock dynamo in place so that consecutive RDTSC values are relevant? I ask because it *looks* like you can disable affinity, but the RDTSC clock is still used...? Please feel free to tell me that I completely misunderstand the code. :-) 2. A quick browse through the source code shows that IOCompletionQ.cpp is using the aio_* API for reads and writes. Is there any effort going into using the Linux-native io_* API for reads and writes? It seems to perform significantly better than the aio_* API (in RHEL 5 and 6, at least). Thanks! -- Jeff Squyres jsq...@ci... For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ |