From: Tom M. <mad...@gm...> - 2007-07-18 13:28:36
|
We have servers with 6 GB of memory. Ideally we would like to leave 1 GB for Linux and have FB use the other 5GB. Will the 64bit version do this? i.e. break the 2gb memory barrier? Thanks. Tom Miller |
From: Alex P. <pes...@ma...> - 2007-07-18 13:35:37
|
On Wednesday 18 July 2007 17:28, Tom Miller wrote: > We have servers with 6 GB of memory. Ideally we would like to leave 1 > GB for Linux and have FB use the other 5GB. Will the 64bit version do > this? i.e. break the 2gb memory barrier? Yes. |
From: Dmitry Y. <fir...@ya...> - 2007-07-19 04:12:22
|
Tom Miller wrote: > We have servers with 6 GB of memory. Ideally we would like to leave 1 > GB for Linux and have FB use the other 5GB. Will the 64bit version do > this? i.e. break the 2gb memory barrier? While the 64-bit builds can use more than 2GB of memory, it's true for the overall memory usage only. But the page cache size is limited by 128K pages (512MB-2GB, depending on the page size). It's not a problem for Classic, but it's going to bite you if you're using SuperServer. I think we should increase the limit, at least for the 64-bit builds. Objections anyone? Dmitry |
From: Alexandre B. S. <ib...@th...> - 2007-07-19 04:52:21
|
Dmitry Yemanov wrote: > Tom Miller wrote: > >> We have servers with 6 GB of memory. Ideally we would like to leave 1 >> GB for Linux and have FB use the other 5GB. Will the 64bit version do >> this? i.e. break the 2gb memory barrier? >> > > While the 64-bit builds can use more than 2GB of memory, it's true for > the overall memory usage only. But the page cache size is limited by > 128K pages (512MB-2GB, depending on the page size). It's not a problem > for Classic, but it's going to bite you if you're using SuperServer. > > I think we should increase the limit, at least for the 64-bit builds. > Objections anyone? > > > Dmitry > What would happen if one defines more than 128k pages in 32 bit builds ? just ignore and use 128k or abort with a invalid configuration parameter ? There are any plans to provide bigger page sizes ? (32kb, 64kb) A kind of "support question" but anyone has any idea of the effects of "double caching" ? FB uses RAM to cache MRU pages in it's own cache, OS uses RAM to hold MRU disk clusters/blocks, wouldn't the "hot" disk blocks tend to be the same data as the hot database pages on a machine that runs only FB ? (I think this is the kind of use when one wishes FB to use massive amounts of RAM) see you ! -- Alexandre Benson Smith Development THOR Software e Comercial Ltda Santo Andre - Sao Paulo - Brazil www.thorsoftware.com.br |
From: Vlad H. <hv...@us...> - 2007-07-19 06:38:00
|
> What would happen if one defines more than 128k pages in 32 bit builds ? > just ignore and use 128k or abort with a invalid configuration parameter ? Silently ignore and use upper limit (128K) > There are any plans to provide bigger page sizes ? (32kb, 64kb) I think it worth to consider for next ODS > A kind of "support question" but anyone has any idea of the effects of > "double caching" ? FB uses RAM to cache MRU pages in it's own cache, OS > uses RAM to hold MRU disk clusters/blocks, wouldn't the "hot" disk > blocks tend to be the same data as the hot database pages on a machine > that runs only FB ? (I think this is the kind of use when one wishes FB > to use massive amounts of RAM) Look at second part of "RFC: Look-ahead disk space allocation and no use of file system cache" from Jan 24, 2007 This second part is implemented (for Windows only) but still not committed Regards, Vlad |
From: Leyne, S. <Se...@br...> - 2007-07-19 15:48:45
|
Alexandre, > There are any plans to provide bigger page sizes ? (32kb, 64kb) Actually, Mike Nordell and I tried to get those pages sizes to work when we added support for 16KB pages, but the maximum size of a page was not defined in a single place but rather in each module/function within the codebase. At the time (v1.0) the codebase was too much of a mess to be able to work through, so it was decided to leave the changes for later, once the codebase had been scrubbed. A couple of months ago I had started to work through the code to make the necessary changes but stopped when I realized that the term "page" in the codebase refers to both db pages (on disk and in the cache) but also to in-memory/unrelated structures, and that I couldn't be sure that I hasn't going to really screw things up. Sean |
From: Helen B. <he...@tp...> - 2007-07-19 05:12:17
|
At 02:12 PM 19/07/2007, Dmitry Yemanov wrote: >the page cache size is limited by 128K pages (512MB-2GB, depending >on the page size). Is the 128K-page buffer size available to 2.0.x, Dmitry? (If so, I should put it in the release notes, {{{blush}}} ) Helen |
From: Alexandre B. S. <ib...@th...> - 2007-07-19 05:19:17
|
Helen Borrie wrote: > At 02:12 PM 19/07/2007, Dmitry Yemanov wrote: > >> the page cache size is limited by 128K pages (512MB-2GB, depending >> on the page size). >> > > Is the 128K-page buffer size available to 2.0.x, Dmitry? (If so, I > should put it in the release notes, {{{blush}}} ) > > Helen > Helen, I think it's already there ! Page 61: Some General Improvements O. Loa, D. Yemanov • Much faster algorithms to process the dirty pages tree Firebird 2.0 offers a more efficient processing of the list of modified pages, a.k.a. the dirty pages tree. It affects all kinds of batch data modifications performed in a single transaction and eliminates the known issues with performance getting slower when using a buffer cache of >10K pages. This change also improves the overall performance of data modifications. • Increased maximum page cache size to 128K pages (2GB for 16K page size) see you ! -- Alexandre Benson Smith Development THOR Software e Comercial Ltda Santo Andre - Sao Paulo - Brazil www.thorsoftware.com.br |
From: Helen B. <he...@tp...> - 2007-07-19 05:37:02
|
At 03:17 PM 19/07/2007, Alexandre Benson Smith wrote: > > > > > Is the 128K-page buffer size available to 2.0.x, Dmitry? (If so, I > > should put it in the release notes, {{{blush}}} ) > > > > Helen > > > >Helen, > >I think it's already there ! > >Page 61: >Some General Improvements >O. Loa, D. Yemanov >=95 Much faster algorithms to process the dirty pages tree >Firebird 2.0 offers a more efficient processing of the list of modified >pages, a.k.a. the dirty pages >tree. It affects all kinds of batch data modifications performed in a >single transaction and eliminates >the known issues with performance getting slower when using a buffer >cache of >10K pages. >This change also improves the overall performance of data modifications. >=95 Increased maximum page cache size to 128K pages (2GB for 16K page size) Good, thanks Alexandre. I see it is well=20 buried....so I'll keep blushing until I highlight it in the "New in.."= section. Helen =20 |
From: Leyne, S. <Se...@br...> - 2007-07-19 16:00:01
|
Dmitry, > While the 64-bit builds can use more than 2GB of memory, it's true for > the overall memory usage only. But the page cache size is limited by > 128K pages (512MB-2GB, depending on the page size). How was the cache 128K page limit arrived at? Could the limit be increased to 256K and beyond; for 32Bit? > I think we should increase the limit, at least for the 64-bit builds. > Objections anyone? No. But I have a question: Is there a point at which the performance of new cache logic might degrade? (If we allowed the cache to be 1M pages; would that result in degraded performance?) Sean |
From: Dmitry Y. <fir...@ya...> - 2007-07-19 16:41:54
|
Leyne, Sean wrote: > > How was the cache 128K page limit arrived at? IIRC, it always existed (at least since IB6). Originally it was 64K pages and the limit has been increased to 128K pages in v2.0. > Could the limit be increased to 256K and beyond; for 32Bit? Yes, it could. The question is who will be responsible for the too large values (> 2GB in total). Should we adjust the specified size to some hardcoded limit before allocating memory or rely on the OS kernel and allocate as much as allowed (and perhaps write to firebird.log that we couldn't do better). > Is there a point at which the performance of new cache logic might > degrade? > > (If we allowed the cache to be 1M pages; would that result in degraded > performance?) It's definitely worth measuring before committing the patch into CVS. Dmitry |
From: Tom M. <mad...@gm...> - 2007-07-19 23:54:11
|
Dmitry Yemanov wrote: > Leyne, Sean wrote: > >> How was the cache 128K page limit arrived at? >> > > IIRC, it always existed (at least since IB6). Originally it was 64K > pages and the limit has been increased to 128K pages in v2.0. > > >> Could the limit be increased to 256K and beyond; for 32Bit? >> > > Yes, it could. The question is who will be responsible for the too large > values (> 2GB in total). Should we adjust the specified size to some > hardcoded limit before allocating memory or rely on the OS kernel and > allocate as much as allowed (and perhaps write to firebird.log that we > couldn't do better). > That would be my vote. > >> Is there a point at which the performance of new cache logic might >> degrade? >> >> (If we allowed the cache to be 1M pages; would that result in degraded >> performance?) >> > > It's definitely worth measuring before committing the patch into CVS. > > > Dmitry > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel > > |
From: Alex P. <pes...@ma...> - 2007-07-20 09:33:28
|
On Thursday 19 July 2007 20:41, Dmitry Yemanov wrote: > > Could the limit be increased to 256K and beyond; for 32Bit? > > Yes, it could. The question is who will be responsible for the too large > values (> 2GB in total). Should we adjust the specified size to some > hardcoded limit before allocating memory or rely on the OS kernel and > allocate as much as allowed (and perhaps write to firebird.log that we > couldn't do better). I do not like an idea of allocating as much memory as possible for cash. It's highly possible (or even unavoidable) that after it we will need some memory for other purporses. And we get in OOM case, which is purely handled in current codebase. |
From: Vlad H. <hv...@us...> - 2007-07-20 10:15:24
|
> > Yes, it could. The question is who will be responsible for the too large > > values (> 2GB in total). Should we adjust the specified size to some > > hardcoded limit before allocating memory or rely on the OS kernel and > > allocate as much as allowed (and perhaps write to firebird.log that we > > couldn't do better). > > I do not like an idea of allocating as much memory as possible for cash. It's > highly possible (or even unavoidable) that after it we will need some memory > for other purporses. And we get in OOM case, which is purely handled in > current codebase. I already planned to reimplement memory allocation algorithm in cch to allocate memory for page buffers when it needed (not one huge chunk at time). But have no time to do it :( Regards, Vlad |
From: Alexandre B. S. <ib...@th...> - 2007-07-19 23:07:10
|
Vlad Horsun wrote: > Look at second part of > "RFC: Look-ahead disk space allocation and no use of file system cache" > from Jan 24, 2007 > > This second part is implemented (for Windows only) but still not committed > > Regards, > Vlad > Hi Vlad ! Thanks for the reminder, I had forgot that part of your proposal... As I see in windows it will works exactly as I wish, but on linux I saw it will use SYNC (works as FW = on) so the double caching will still be in place on linux ? see you ! -- Alexandre Benson Smith Development THOR Software e Comercial Ltda Santo Andre - Sao Paulo - Brazil www.thorsoftware.com.br |
From: Vlad H. <hv...@us...> - 2007-07-21 21:48:57
|
"Alexandre Benson Smith" : > Vlad Horsun wrote: > > Look at second part of > > "RFC: Look-ahead disk space allocation and no use of file system cache" > > from Jan 24, 2007 > > > > This second part is implemented (for Windows only) but still not committed > > > > Regards, > > Vlad > > > > Hi Vlad ! > > Thanks for the reminder, I had forgot that part of your proposal... > > As I see in windows it will works exactly as I wish, but on linux I saw > it will use SYNC (works as FW = on) so the double caching will still be > in place on linux ? On posix it will use or O_DIRECT flag (trivial to implement) or posix_fadvise calls (need think more on it). I still don't know what is better as i can't test it Regards, Vlad |
From: Alex P. <pes...@ma...> - 2007-07-23 08:13:18
|
On Sunday 22 July 2007 01:49, Vlad Horsun wrote: > "Alexandre Benson Smith" : > > Vlad Horsun wrote: > > > Look at second part of > > > "RFC: Look-ahead disk space allocation and no use of file system cache" > > > from Jan 24, 2007 > > > > > > This second part is implemented (for Windows only) but still not > > > committed > > > > > > Regards, > > > Vlad > > > > Hi Vlad ! > > > > Thanks for the reminder, I had forgot that part of your proposal... > > > > As I see in windows it will works exactly as I wish, but on linux I saw > > it will use SYNC (works as FW = on) so the double caching will still be > > in place on linux ? > > On posix it will use or O_DIRECT flag (trivial to implement) or > posix_fadvise calls (need think more on it). > > I still don't know what is better as i can't test it O_DIRECT looks well for raw partitions, though not sure will it give some real effect or not. Also seems that we must allocate page on sector boundary to make it work. What about fadvise... I do not see a good way to change behaviour on the fly in MT system. And when used to statically modify files to perform direct IO, why can it have any difference with O_DIRECT? |
From: Vlad H. <hv...@us...> - 2007-07-23 08:32:34
|
> > > Thanks for the reminder, I had forgot that part of your proposal... > > > > > > As I see in windows it will works exactly as I wish, but on linux I saw > > > it will use SYNC (works as FW = on) so the double caching will still be > > > in place on linux ? > > > > On posix it will use or O_DIRECT flag (trivial to implement) or > > posix_fadvise calls (need think more on it). > > > > I still don't know what is better as i can't test it > > O_DIRECT looks well for raw partitions, though not sure will it give some real > effect or not. AFAIU it will not use FS cache (or will almost not use it) - exactly what we want. Am i wrong ? > Also seems that we must allocate page on sector boundary to > make it work. Its already done. Tested in Windows but i guess it will work on Linux too ;) > What about fadvise... I do not see a good way to change > behaviour on the fly in MT system. I think about to call posix_fadvise(POSIX_FADV_DONTNEED and\or POSIX_FADV_NOREUSE) for every just read page, i.e. right after read call Regards, Vlad |
From: Leyne, S. <Se...@br...> - 2007-07-23 19:21:40
|
Alex, > > On posix it will use or O_DIRECT flag (trivial to implement) or > > posix_fadvise calls (need think more on it). > > > > I still don't know what is better as i can't test it >=20 > O_DIRECT looks well for raw partitions, though not sure will it give some > real effect or not. Also seems that we must allocate page on sector > boundary to make it work. That is not a problem. Disk sectors have been 512 bytes for the last 20 years and likely for many more. Accordingly, our disk accesses will always be on sector boundaries. Sean |
From: Fred P. <fps...@ya...> - 2007-07-25 13:16:29
|
On 07/23/2007 12:23 PM, Leyne, Sean wrote: > Disk sectors have been 512 bytes for the last 20 years and likely for > many more. Accordingly, our disk accesses will always be on sector > boundaries. Hi, A few months ago, I learned that the disk drive industry association, IDEMA, approved their new Long Block Data (LBD) sector standard. The LBD spec. increases the standard sector size from 512 bytes to 4096 bytes. The idea is that the larger sector should allow more efficient error correction, resulting in increased data densities, improved reliability (up to 10x reduction in error rates), and increased performance. Seagate and Western Digital are supposed to have "transitional" drives available this year. These drives will support either the old 512 byte sector or the new 4096 byte sector (consisting of 8 contiguous 512 byte chunks). Sector size will be software selectable (512 or 4096). Other drive makers don't want to support both standards, so they may wait until the "transitional" period is near its end before offering LBD sector drives. The IDEMA said that most drives will support 4k sectors by 2010 and adoption of the new standard will be gradual. My concern: the new sector standard may impact direct I/O alignment constraints for I/O buffers, transfer sizes and file offsets, depending on the implementation of direct I/O used. Sadly, direct I/O isn't part of POSIX (or any other standard that comes to mind). Some implementations use an O_DIRECT open() flag to enable direct I/O on a file and some use other methods (e.g. Solaris' directio() library function). Each implementation may impose its own set of constraints on subsequent read/write/lseek parameters. I think I have a few notes on some of the specifics. If it will help, I can try to locate them. I know various db "vendors" support direct I/O as a db option on Linux, using the current O_DIRECT implementation, but they had to be careful in their code to avoid certain O_DIRECT kernel bugs and race conditions. For example, it seems mixing O_DIRECT and non-O_DIRECT reads and writes on the same file simultaneously may result in stale data reads or writes. The workaround: don't do that! I suspect that the FB's direct I/O option on linux is going to need a lot of testing. ;-) Fred P. |
From: Vlad H. <hv...@us...> - 2007-07-25 13:30:36
|
> Hi, > > A few months ago, I learned that the disk drive industry > association, IDEMA, approved their new Long Block Data (LBD) > sector standard. The LBD spec. increases the standard sector > size from 512 bytes to 4096 bytes. ... > My concern: the new sector standard may impact direct I/O > alignment constraints for I/O buffers, transfer sizes and file > offsets, depending on the implementation of direct I/O used. We not allow database page size less than 4KB since FB 2.1 Old databases may still use small pages however ... > I know various db "vendors" support direct I/O as a db option > on Linux, using the current O_DIRECT implementation, but they > had to be careful in their code to avoid certain O_DIRECT > kernel bugs and race conditions. For example, it seems mixing > O_DIRECT and non-O_DIRECT reads and writes on the same file > simultaneously may result in stale data reads or writes. The > workaround: don't do that! posix_fadvise free from this drawback and not restrict us to use aligned memory buffers, right ? > I suspect that the FB's direct I/O option on linux is going to > need a lot of testing. ;-) As it may be disabled (and will be disabledby default) i see no problem with it. As for testing - are you taker ? :) Regards, Vlad |
From: Fred P. <fps...@ya...> - 2007-07-26 09:20:44
|
On 07/25/2007 06:30 AM, Vlad Horsun wrote: >> My concern: the new sector standard may impact direct I/O >> alignment constraints for I/O buffers, transfer sizes and file >> offsets, depending on the implementation of direct I/O used. > > We not allow database page size less than 4KB since FB 2.1 > Old databases may still use small pages however Hi Vlad, Good to know. Currently, Linux 2.6's O_DIRECT requires 512 byte alignment. So, no problem there! However, some O_DIRECT implementations (e.g. Linux 2.4) want buffer alignment to be on a *filesystem* block size boundary and, possibly, the read/write transfer size and file offset to be a multiple of this value. So, an fs block size might be 4k, 8k, or more, depending on how the filesystem was built. E.g., AIX typically wants 4k alignment, except for JFS+big_file filesystems, which want 128k alignment. Don't you just love non-standard feature implementations? :) IMHO, rather than try to make FB support every OS/filesystem combination in the first direct I/O implementation, it seems reasonable to start with a popular subset and gain some experience with this new tuning feature before expanding support. Phased implementations can be a good thing. That said, some questions remain: how smart should FB be about detecting, reporting and/or recovering from a user's attempt to use direct I/O when it isn't supported or encounters a read/write error downstream? Should FB silently fallback to non-direct I/O and retry the I/O op? Or something else...? It might be nice if FB could handle this is a consistent way across all platforms. > ... > posix_fadvise free from this drawback and not restrict us to use > aligned memory buffers, right ? Correct. Also, posix_fadvise() might be useful for tuning I/O elsewhere in FB (e.g., read-ahead?), but that's another topic. :) To be clear, posix_fadvise() is an "advisory" POSIX feature: a system can choose not to implement it (e.g. Solaris). But, I suspect most POSIX vendors will support it in time. >> I suspect that the FB's direct I/O option on linux is going to >> need a lot of testing. ;-) > > As it may be disabled (and will be disabledby default) i see no > problem with it. Sounds good. > As for testing - are you taker ? :) :) Maybe. That will depend on my schedule at the time. I may be able to do some basic qualitative testing on Linux 2.6. Fred P. |
From: Vlad H. <hv...@us...> - 2007-07-26 10:37:02
|
> >> My concern: the new sector standard may impact direct I/O > >> alignment constraints for I/O buffers, transfer sizes and file > >> offsets, depending on the implementation of direct I/O used. > > > > We not allow database page size less than 4KB since FB 2.1 > > Old databases may still use small pages however > > Hi Vlad, > > Good to know. Currently, Linux 2.6's O_DIRECT requires 512 > byte alignment. So, no problem there! Yes > However, some O_DIRECT implementations (e.g. Linux 2.4) want > buffer alignment to be on a *filesystem* block size boundary > and, possibly, the read/write transfer size and file offset to > be a multiple of this value. So, an fs block size might be 4k, > 8k, or more, depending on how the filesystem was built. E.g., > AIX typically wants 4k alignment, except for JFS+big_file > filesystems, which want 128k alignment. Very sad > Don't you just love non-standard feature implementations? :) :) > IMHO, rather than try to make FB support every OS/filesystem > combination in the first direct I/O implementation, it seems > reasonable to start with a popular subset and gain some > experience with this new tuning feature before expanding > support. Phased implementations can be a good thing. > > That said, some questions remain: how smart should FB be about > detecting, reporting and/or recovering from a user's attempt > to use direct I/O when it isn't supported or encounters a > read/write error downstream? Should FB silently fallback to > non-direct I/O and retry the I/O op? Or something else...? Silent fallback to non-direct I/O. If direct IO is not supported it must be detected when FB tried to switch it on, not when real IO is performed. > It might be nice if FB could handle this is a consistent way > across all platforms. Agree > > ... > > posix_fadvise free from this drawback and not restrict us to use > > aligned memory buffers, right ? > > Correct. Also, posix_fadvise() might be useful for > tuning I/O elsewhere in FB (e.g., read-ahead?), but that's > another topic. :) I thinking about it too ;) ... > > As for testing - are you taker ? :) > > :) Maybe. That will depend on my schedule at the time. I may > be able to do some basic qualitative testing on Linux 2.6. Tell me when (if) you ready and if you'll need any help (with FB sources) from my side. Regards, Vlad |
From: Fred P. <fps...@ya...> - 2007-07-26 12:26:18
|
On 07/26/2007 03:37 AM, Vlad Horsun wrote: >> ...E.g., >> AIX typically wants 4k alignment, except for JFS+big_file >> filesystems, which want 128k alignment. Oops!! Bad example. I misread my notes. The alignment is 4k for AIX JFS+big_file filesystem and the file offset and transfer sizes should be multiples of 128k. Guess it's time for sleep! ;-) ... > Silent fallback to non-direct I/O. Sounds good. > If direct IO is not supported it > must be detected when FB tried to switch it on, not when real IO > is performed. If you decide to use O_DIRECT, the open() call may report if O_DIRECT isn't supported on that file. But, at that point, the OS won't know what I/O alignment you'll be using later. So, a downstream read/write may still fail, on some implementations, if param. alignments aren't right. Hmmm. I suppose you could check I/O alignments yourself. For example, fstat() the file descriptor after the open() and get that file's filesystem block size. ... > Tell me when (if) you ready and if you'll need any help (with FB > sources) from my side. I'm not sure when I can test. Are your source changes checked into CVS or available for download? Fred P. |
From: Alex P. <pes...@ma...> - 2007-07-24 06:36:37
|
On Monday 23 July 2007 23:23, Leyne, Sean wrote: > > Also seems that we must allocate page on sector > > boundary to make it work. > > That is not a problem. > > Disk sectors have been 512 bytes for the last 20 years and likely for > many more. Accordingly, our disk accesses will always be on sector > boundaries. Sean, problem is that not all of our buffers are 512 bytes aligned now. |