Thread: [Linux-NTFS-Dev] [RFC 00/32] making inode time stamps y2038 ready
Development moved to https://sourceforge.net/projects/ntfs-3g/
Brought to you by:
antona,
cha0smaster
From: Arnd B. <ar...@ar...> - 2014-05-30 20:07:54
|
Based on the recent discussion about 64-bit time_t for new architectures, and for solving the year 2038 problem in general, I decided to try out what it would take to solve part of the kernel side of things. This is a proof-of-concept work to get us to the point where two system calls (utimes and stat) provide a working interface to user space to pass 64-bit inode time stamps in and out of the kernel all the way to the file systems. I picked this because it is a fairly isolated problem, as the inode time stamps are rarely assigned to any other time values. As a byproduct of this work, I documented for each of the file systems we support how long the on-disk format can work[1]. Obviously we also need to convert all the other syscalls and have a proper libc implementation using those for this to be really useful, but it's a start and it can be tested independently (I didn't so far, want to wait for initial feedback). All the interesting stuff is in the first five patches here, the rest is the straightforward conversion of all file systems that use 'timespec' values internally. There are of course a number of open questions: a) is this the right approach in general? The previous discussion pointed this way, but there may be other opinions. b) what type should we use internally to represent inode time stamps? The code contains three different versions that would all work, we just have to pick a good tradeoff between efficiency and the range of times we want to cover. c) Should we continue this way for all 32-bit platforms for consistency, including future ones, or should we go to different 64-bit types right away? My feeling is that the second approach would complicate this work. Arnd [1] http://kernelnewbies.org/y2038 Arnd Bergmann (32): fs: introduce new 'struct inode_time' uapi: add struct __kernel_timespec{32,64} fs: introduce sys_utimens64at fs: introduce sys_newfstat64/sys_newfstatat64 arch: hook up new stat and utimes syscalls isofs: fix timestamps beyond 2027 fs/nfs: convert to struct inode_time fs/ceph: convert to 'struct inode_time' fs/pstore: convert to struct inode_time fs/coda: convert to struct inode_time xfs: convert to struct inode_time btrfs: convert to struct inode_time ext3: convert to struct inode_time ext4: convert to struct inode_time cifs: convert to struct inode_time ntfs: convert to struct inode_time ubifs: convert to struct inode_time ocfs2: convert to struct inode_time fs/fat: convert to struct inode_time afs: convert to struct inode_time udf: convert to struct inode_time fs: convert simple fs to inode_time logfs: convert to struct inode_time hfs, hfsplus: convert to struct inode_time gfs2: convert to struct inode_time reiserfs: convert to struct inode_time jffs2: convert to struct inode_time adfs: convert to struct inode_time f2fs: convert to struct inode_time fuse: convert to struct inode_time scsi: fnic: use current_kernel_time() for timestamp fs: use new inode_time definition unconditionally arch/alpha/kernel/osf_sys.c | 2 +- arch/arm/include/asm/unistd.h | 2 +- arch/arm/include/uapi/asm/stat.h | 25 +++++++++++++++++ arch/arm/include/uapi/asm/unistd.h | 3 +++ arch/arm/kernel/calls.S | 3 +++ arch/arm64/include/asm/unistd32.h | 5 +++- arch/x86/include/uapi/asm/stat.h | 28 +++++++++++++++++++ arch/x86/syscalls/syscall_32.tbl | 3 +++ drivers/block/rbd.c | 2 +- drivers/firmware/efi/efi-pstore.c | 28 +++++++++---------- drivers/scsi/fnic/fnic_trace.c | 2 +- drivers/tty/tty_io.c | 2 +- drivers/usb/gadget/f_fs.c | 2 +- fs/adfs/inode.c | 4 +-- fs/afs/afs.h | 6 ++--- fs/afs/fsclient.c | 2 +- fs/attr.c | 8 +++--- fs/btrfs/file.c | 6 ++--- fs/btrfs/inode.c | 4 +-- fs/btrfs/ioctl.c | 4 +-- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 2 +- fs/ceph/cache.c | 2 +- fs/ceph/caps.c | 6 ++--- fs/ceph/file.c | 4 +-- fs/ceph/inode.c | 20 +++++++------- fs/ceph/super.h | 8 +++--- fs/cifs/cache.c | 6 ++--- fs/cifs/cifsglob.h | 6 ++--- fs/cifs/cifsproto.h | 6 ++--- fs/cifs/cifssmb.c | 5 ++-- fs/cifs/inode.c | 2 +- fs/cifs/netmisc.c | 15 ++++++----- fs/coda/coda_linux.c | 18 ++++++++----- fs/compat.c | 19 ++----------- fs/configfs/inode.c | 6 ++--- fs/cramfs/inode.c | 2 +- fs/ext3/inode.c | 4 +-- fs/ext4/ext4.h | 10 +++---- fs/ext4/extents.c | 2 +- fs/f2fs/file.c | 6 ++--- fs/fat/dir.c | 2 +- fs/fat/fat.h | 6 ++--- fs/fat/misc.c | 4 +-- fs/fat/namei_msdos.c | 8 +++--- fs/fat/namei_vfat.c | 10 +++---- fs/fuse/inode.c | 6 ++--- fs/gfs2/dir.c | 6 ++--- fs/gfs2/glops.c | 4 +-- fs/hfs/hfs_fs.h | 2 +- fs/hfsplus/hfsplus_fs.h | 2 +- fs/inode.c | 18 ++++++------- fs/isofs/util.c | 2 +- fs/jffs2/os-linux.h | 2 +- fs/locks.c | 4 +-- fs/logfs/readwrite.c | 18 ++++++------- fs/nfs/callback.h | 4 +-- fs/nfs/callback_xdr.c | 6 ++--- fs/nfs/file.c | 2 +- fs/nfs/fscache-index.c | 8 +++--- fs/nfs/inode.c | 10 +++---- fs/nfs/internal.h | 4 +-- fs/nfs/netns.h | 2 +- fs/nfs/nfs2xdr.c | 8 +++--- fs/nfs/nfs3xdr.c | 10 +++---- fs/nfs/nfs4xdr.c | 20 +++++++------- fs/nfsd/nfs3xdr.c | 6 ++--- fs/nfsd/nfsfh.h | 4 +-- fs/nfsd/nfsxdr.c | 2 +- fs/ntfs/inode.c | 12 ++++----- fs/ntfs/time.h | 8 +++--- fs/ocfs2/dlmglue.c | 16 +++++------ fs/ocfs2/file.c | 6 ++--- fs/ocfs2/ocfs2.h | 2 +- fs/pstore/inode.c | 2 +- fs/pstore/internal.h | 2 +- fs/pstore/platform.c | 2 +- fs/pstore/ram.c | 18 +++++++------ fs/reiserfs/namei.c | 2 +- fs/reiserfs/xattr.c | 4 +-- fs/stat.c | 55 ++++++++++++++++++++++++++++++++++++++ fs/ubifs/dir.c | 2 +- fs/ubifs/file.c | 16 +++++------ fs/ubifs/misc.h | 2 +- fs/udf/udf_i.h | 2 +- fs/udf/udf_sb.h | 2 +- fs/udf/udfdecl.h | 7 ++--- fs/udf/udftime.c | 7 ++--- fs/utimes.c | 47 +++++++++++++++++++++++++++----- fs/xfs/time.h | 4 +-- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_iops.c | 2 +- fs/xfs/xfs_trans_inode.c | 6 ++--- include/linux/ceph/decode.h | 8 +++--- include/linux/ceph/osd_client.h | 4 +-- include/linux/compat.h | 2 +- include/linux/fs.h | 32 +++++++++++----------- include/linux/nfs_fs_sb.h | 2 +- include/linux/nfs_xdr.h | 14 +++++----- include/linux/pstore.h | 4 +-- include/linux/stat.h | 6 ++--- include/linux/syscalls.h | 9 ++++++- include/linux/time.h | 44 +++++++++++++++++++++++++++--- include/uapi/asm-generic/stat.h | 29 ++++++++++++++++++-- include/uapi/asm-generic/unistd.h | 8 +++++- include/uapi/linux/coda.h | 1 + include/uapi/linux/time.h | 40 ++++++++++++++++++++++++++- init/initramfs.c | 2 +- kernel/audit.c | 2 +- kernel/auditsc.c | 2 +- kernel/time.c | 44 +++++++++++++++++++++++++----- kernel/time/timekeeping.c | 16 +++++++++++ net/ceph/auth_x.c | 2 +- net/ceph/osd_client.c | 4 +-- 114 files changed, 642 insertions(+), 333 deletions(-) -- 1.8.3.2 Bcc: "J. Bruce Fields" <bf...@fi...> Bcc: "Theodore Ts'o" <ty...@mi...> Bcc: Adrian Hunter <adr...@in...> Bcc: Andreas Dilger <adi...@di...> Bcc: Andrew Morton <ak...@li...> Bcc: Anton Altaparmakov <an...@tu...> Bcc: Anton Vorontsov <an...@en...> Bcc: Artem Bityutskiy <ded...@gm...> Bcc: Brian Uchino <bu...@ci...> Bcc: Chris Mason <cl...@fb...> Bcc: Colin Cross <cc...@an...> Bcc: Dave Chinner <da...@fr...> Bcc: David Howells <dho...@re...> Bcc: David Woodhouse <dw...@in...> Bcc: Greg Kroah-Hartman <gr...@li...> Bcc: Hiral Patel <hir...@ci...> Bcc: Jaegeuk Kim <jae...@sa...> Bcc: Jan Harkes <jah...@cs...> Bcc: Jan Kara <ja...@su...> Bcc: Joel Becker <jl...@ev...> Bcc: Joern Engel <jo...@lo...> Bcc: Josef Bacik <jb...@fb...> Bcc: Kees Cook <kee...@ch...> Bcc: Mark Fasheh <mf...@su...> Bcc: Miklos Szeredi <mi...@sz...> Bcc: OGAWA Hirofumi <hir...@ma...> Bcc: Prasad Joshi <pra...@gm...> Bcc: Sage Weil <sa...@in...> Bcc: Steve French <sf...@sa...> Bcc: Steven Whitehouse <swh...@re...> Bcc: Suma Ramars <sr...@ci...> Bcc: Tony Luck <ton...@in...> Cc: cep...@vg... Cc: clu...@re... Cc: co...@cs... Cc: cod...@co... Cc: fus...@li... Cc: lin...@li... Cc: lin...@vg... Cc: lin...@vg... Cc: lin...@vg... Cc: lin...@li... Cc: lin...@li... Cc: lin...@vg... Cc: lin...@li... Cc: lin...@vg... Cc: lo...@lo... Cc: ocf...@os... Cc: rei...@vg... Cc: sam...@li... Cc: xf...@os... |
From: Arnd B. <ar...@ar...> - 2014-05-30 20:06:02
|
ntfs uses 64-bit integers for inode timestamps, which will work thousands of years, but the VFS uses struct timespec for timestamps, which is only good until 2038 on 32-bit CPUs. This gets us one small step closer to lifting the VFS limit by using struct inode_time in ntfs. Signed-off-by: Arnd Bergmann <ar...@ar...> Cc: Anton Altaparmakov <an...@tu...> Cc: lin...@li... --- fs/ntfs/inode.c | 12 ++++++------ fs/ntfs/time.h | 8 ++++---- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c index f47af5e..8f7cba5 100644 --- a/fs/ntfs/inode.c +++ b/fs/ntfs/inode.c @@ -2811,11 +2811,11 @@ done: * for real. */ if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) { - struct timespec now = current_fs_time(VFS_I(base_ni)->i_sb); + struct inode_time now = current_fs_time(VFS_I(base_ni)->i_sb); int sync_it = 0; - if (!timespec_equal(&VFS_I(base_ni)->i_mtime, &now) || - !timespec_equal(&VFS_I(base_ni)->i_ctime, &now)) + if (!inode_time_equal(&VFS_I(base_ni)->i_mtime, &now) || + !inode_time_equal(&VFS_I(base_ni)->i_ctime, &now)) sync_it = 1; VFS_I(base_ni)->i_mtime = now; VFS_I(base_ni)->i_ctime = now; @@ -2930,13 +2930,13 @@ int ntfs_setattr(struct dentry *dentry, struct iattr *attr) } } if (ia_valid & ATTR_ATIME) - vi->i_atime = timespec_trunc(attr->ia_atime, + vi->i_atime = inode_time_trunc(attr->ia_atime, vi->i_sb->s_time_gran); if (ia_valid & ATTR_MTIME) - vi->i_mtime = timespec_trunc(attr->ia_mtime, + vi->i_mtime = inode_time_trunc(attr->ia_mtime, vi->i_sb->s_time_gran); if (ia_valid & ATTR_CTIME) - vi->i_ctime = timespec_trunc(attr->ia_ctime, + vi->i_ctime = inode_time_trunc(attr->ia_ctime, vi->i_sb->s_time_gran); mark_inode_dirty(vi); out: diff --git a/fs/ntfs/time.h b/fs/ntfs/time.h index 0123398..2c8d325 100644 --- a/fs/ntfs/time.h +++ b/fs/ntfs/time.h @@ -45,7 +45,7 @@ * measured as the number of 100-nano-second intervals since 1st January 1601, * 00:00:00 UTC. */ -static inline sle64 utc2ntfs(const struct timespec ts) +static inline sle64 utc2ntfs(const struct inode_time ts) { /* * Convert the seconds to 100ns intervals, add the nano-seconds @@ -63,7 +63,7 @@ static inline sle64 utc2ntfs(const struct timespec ts) */ static inline sle64 get_current_ntfs_time(void) { - return utc2ntfs(current_kernel_time()); + return utc2ntfs(CURRENT_TIME); } /** @@ -82,9 +82,9 @@ static inline sle64 get_current_ntfs_time(void) * measured as the number of 100 nano-second intervals since 1st January 1601, * 00:00:00 UTC. */ -static inline struct timespec ntfs2utc(const sle64 time) +static inline struct inode_time ntfs2utc(const sle64 time) { - struct timespec ts; + struct inode_time ts; /* Subtract the NTFS time offset. */ u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET); -- 1.8.3.2 |
From: Richard C. <ric...@gm...> - 2014-05-31 14:51:57
|
On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote: > > I picked this because it is a fairly isolated problem, as the > inode time stamps are rarely assigned to any other time values. > As a byproduct of this work, I documented for each of the file > systems we support how long the on-disk format can work[1]. Why are some of the time stamp expiration dates marked as "never"? Thanks, Richard |
From: Vyacheslav D. <sl...@du...> - 2014-05-31 14:58:00
|
Hi Arnd, On Fri, 2014-05-30 at 22:01 +0200, Arnd Bergmann wrote: [snip] > > Arnd Bergmann (32): > fs: introduce new 'struct inode_time' > uapi: add struct __kernel_timespec{32,64} > fs: introduce sys_utimens64at > fs: introduce sys_newfstat64/sys_newfstatat64 > arch: hook up new stat and utimes syscalls > isofs: fix timestamps beyond 2027 > fs/nfs: convert to struct inode_time > fs/ceph: convert to 'struct inode_time' > fs/pstore: convert to struct inode_time > fs/coda: convert to struct inode_time > xfs: convert to struct inode_time > btrfs: convert to struct inode_time > ext3: convert to struct inode_time > ext4: convert to struct inode_time > cifs: convert to struct inode_time > ntfs: convert to struct inode_time > ubifs: convert to struct inode_time > ocfs2: convert to struct inode_time > fs/fat: convert to struct inode_time > afs: convert to struct inode_time > udf: convert to struct inode_time > fs: convert simple fs to inode_time > logfs: convert to struct inode_time > hfs, hfsplus: convert to struct inode_time > gfs2: convert to struct inode_time > reiserfs: convert to struct inode_time > jffs2: convert to struct inode_time > adfs: convert to struct inode_time > f2fs: convert to struct inode_time > fuse: convert to struct inode_time > scsi: fnic: use current_kernel_time() for timestamp > fs: use new inode_time definition unconditionally > By the way, what about NILFS2? Is NILFS2 ready for suggested approach without any changes? Thanks, Vyacheslav Dubeyko. |
From: Arnd B. <ar...@ar...> - 2014-06-03 12:22:19
|
On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote: > By the way, what about NILFS2? Is NILFS2 ready for suggested approach > without any changes? nilfs2 and a lot of other file systems don't need any changes for this, because they don't assign the inode time stamp fields to a 'struct timespec'. FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and can represent the full range of user space timespec on all machines. Arnd |
From: Arnd B. <ar...@ar...> - 2014-05-31 15:25:32
|
On Saturday 31 May 2014 16:51:15 Richard Cochran wrote: > On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote: > > > > I picked this because it is a fairly isolated problem, as the > > inode time stamps are rarely assigned to any other time values. > > As a byproduct of this work, I documented for each of the file > > systems we support how long the on-disk format can work[1]. > > Why are some of the time stamp expiration dates marked as "never"? It's an approximation: with 64-bit timestamps, you can represent close to 300 billion years, which is way past the time that our planet can sustain life of any form[1]. Arnd [1] http://en.wikipedia.org/wiki/Timeline_of_the_far_future |
From: Geert U. <ge...@li...> - 2014-05-31 16:20:51
|
On Sat, May 31, 2014 at 5:23 PM, Arnd Bergmann <ar...@ar...> wrote: > On Saturday 31 May 2014 16:51:15 Richard Cochran wrote: >> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote: >> > I picked this because it is a fairly isolated problem, as the >> > inode time stamps are rarely assigned to any other time values. >> > As a byproduct of this work, I documented for each of the file >> > systems we support how long the on-disk format can work[1]. >> >> Why are some of the time stamp expiration dates marked as "never"? > > It's an approximation: > with 64-bit timestamps, you can represent close to 300 billion > years, which is way past the time that our planet can sustain > life of any form[1]. FWIW, the 48-bit second limit of befs marked never happens sooner than the 32-bit day limit of affs marked as Y11760870. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@li... In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds |
From: Richard C. <ric...@gm...> - 2014-06-01 04:45:20
|
On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote: > On Saturday 31 May 2014 16:51:15 Richard Cochran wrote: > > > > Why are some of the time stamp expiration dates marked as "never"? > > It's an approximation: Also, the term "never" might mean using arbitrarily long integers as in ASN.1. Thanks, Richard |
From: Richard C. <ric...@gm...> - 2014-05-31 18:23:18
|
On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote: > > It's an approximation: (Approximately never ;) > with 64-bit timestamps, you can represent close to 300 billion > years, which is way past the time that our planet can sustain > life of any form[1]. Did you mean mean 64 bits worth of seconds? 2^64 / (3600*24*365) = 584,942,417,355 That is more than 300 billion years, and still, it is not quite the same as "never". In any case, that term is not too helpful in the comparison table, IMHO. One could think that some sort of clever running count relative to the last mount time was implied. Thanks, Richard [1] You are forgetting the immortal robotic overlords. |
From: H. P. A. <hp...@zy...> - 2014-05-31 19:36:34
|
Typically they are using 64-bit signed seconds. On May 31, 2014 11:22:37 AM PDT, Richard Cochran <ric...@gm...> wrote: >On Sat, May 31, 2014 at 05:23:02PM +0200, Arnd Bergmann wrote: >> >> It's an approximation: > >(Approximately never ;) > >> with 64-bit timestamps, you can represent close to 300 billion >> years, which is way past the time that our planet can sustain >> life of any form[1]. > >Did you mean mean 64 bits worth of seconds? > > 2^64 / (3600*24*365) = 584,942,417,355 > >That is more than 300 billion years, and still, it is not quite the >same as "never". > >In any case, that term is not too helpful in the comparison table, >IMHO. One could think that some sort of clever running count relative >to the last mount time was implied. > >Thanks, >Richard > >[1] You are forgetting the immortal robotic overlords. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. |
From: Richard C. <ric...@gm...> - 2014-06-01 04:46:49
|
On Sat, May 31, 2014 at 12:34:12PM -0700, H. Peter Anvin wrote: > Typically they are using 64-bit signed seconds. Okay, that is what I wanted to know. Thanks, Richard |
From: Joseph S. M. <jo...@co...> - 2014-06-02 14:12:52
|
On Fri, 30 May 2014, Arnd Bergmann wrote: > a) is this the right approach in general? The previous discussion > pointed this way, but there may be other opinions. The syscall changes seem like the sort of thing I'd expect, although patches adding new syscalls or otherwise affecting the kernel/userspace interface (as opposed to those relating to an individual filesystem) should go to linux-api as well as other relevant lists. -- Joseph S. Myers jo...@co... |
From: Arnd B. <ar...@ar...> - 2014-06-02 19:21:11
|
On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: > On Fri, 30 May 2014, Arnd Bergmann wrote: > > > a) is this the right approach in general? The previous discussion > > pointed this way, but there may be other opinions. > > The syscall changes seem like the sort of thing I'd expect, although > patches adding new syscalls or otherwise affecting the kernel/userspace > interface (as opposed to those relating to an individual filesystem) > should go to linux-api as well as other relevant lists. Ok. Sorry about missing linux-api, I confused it with linux-arch, which may not be as relevant here, except for the one question whether we actually want to have the new ABI on all 32-bit architectures or only as an opt-in for those that expect to stay around for another 24 years. Two more questions for you: - are you (and others) happy with adding this type of stat syscall (fstatat64/fstat64) as opposed to the more generic xstat that has been discussed in the past and that never made it through the bike- shedding discussion? - once we have enough buy-in from reviewers to merge this initial series, should we proceed to define rest of the syscall ABI (minus driver ioctls) so glibc and kernel can do the conversion on top of that, or should we better try to do things one syscall family at a time and actually get the kernel to handle them correctly internally? Arnd |
From: H. P. A. <hp...@zy...> - 2014-06-02 19:27:47
|
On 06/02/2014 12:19 PM, Arnd Bergmann wrote: > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: >> On Fri, 30 May 2014, Arnd Bergmann wrote: >> >>> a) is this the right approach in general? The previous discussion >>> pointed this way, but there may be other opinions. >> >> The syscall changes seem like the sort of thing I'd expect, although >> patches adding new syscalls or otherwise affecting the kernel/userspace >> interface (as opposed to those relating to an individual filesystem) >> should go to linux-api as well as other relevant lists. > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > may not be as relevant here, except for the one question whether we > actually want to have the new ABI on all 32-bit architectures or only > as an opt-in for those that expect to stay around for another 24 years. > > Two more questions for you: > > - are you (and others) happy with adding this type of stat syscall > (fstatat64/fstat64) as opposed to the more generic xstat that has > been discussed in the past and that never made it through the bike- > shedding discussion? > > - once we have enough buy-in from reviewers to merge this initial > series, should we proceed to define rest of the syscall ABI > (minus driver ioctls) so glibc and kernel can do the conversion > on top of that, or should we better try to do things one syscall > family at a time and actually get the kernel to handle them > correctly internally? > The bit that is really going to hurt is every single ioctl that uses a timespec. Honestly, though, I really don't understand the point with "struct inode_time". It seems like the zeroeth-order thing is to change the kernel internal version of struct timespec to have a 64-bit time... it isn't just about inodes. We then should be explicit about the external uses of time, and use accessors. -hpa |
From: Arnd B. <ar...@ar...> - 2014-06-02 19:58:18
|
On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote: > On 06/02/2014 12:19 PM, Arnd Bergmann wrote: > > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: > >> On Fri, 30 May 2014, Arnd Bergmann wrote: > >> > >>> a) is this the right approach in general? The previous discussion > >>> pointed this way, but there may be other opinions. > >> > >> The syscall changes seem like the sort of thing I'd expect, although > >> patches adding new syscalls or otherwise affecting the kernel/userspace > >> interface (as opposed to those relating to an individual filesystem) > >> should go to linux-api as well as other relevant lists. > > > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > > may not be as relevant here, except for the one question whether we > > actually want to have the new ABI on all 32-bit architectures or only > > as an opt-in for those that expect to stay around for another 24 years. > > > > Two more questions for you: > > > > - are you (and others) happy with adding this type of stat syscall > > (fstatat64/fstat64) as opposed to the more generic xstat that has > > been discussed in the past and that never made it through the bike- > > shedding discussion? > > > > - once we have enough buy-in from reviewers to merge this initial > > series, should we proceed to define rest of the syscall ABI > > (minus driver ioctls) so glibc and kernel can do the conversion > > on top of that, or should we better try to do things one syscall > > family at a time and actually get the kernel to handle them > > correctly internally? > > > > The bit that is really going to hurt is every single ioctl that uses a > timespec. > > Honestly, though, I really don't understand the point with "struct > inode_time". It seems like the zeroeth-order thing is to change the > kernel internal version of struct timespec to have a 64-bit time... it > isn't just about inodes. We then should be explicit about the external > uses of time, and use accessors. I picked these because they are fairly isolated from all other uses, in particular since inode times are the only things where we really care about times in the distant past or future (decades away as opposed to things that happened between boot and shutdown). For other kernel-internal uses, we may be better off migrating to a completely different representation, such as nanoseconds since boot or the architecture specific ktime_t, but this is really something to decide for each subsystem. I just tried building an arm32 kernel with a s64 time_t, and that failed horribly, I get linker errors for missing 64-bit divides and lots of warnings for code that expects time_t pointers to functions taking a 'long' or vice versa. I also think the only way to maintain ABI compatibility is to separate the internal uses from the interface, which means auditing all code in the end. Arnd |
From: Joseph S. M. <jo...@co...> - 2014-06-02 21:02:41
|
On Mon, 2 Jun 2014, Arnd Bergmann wrote: > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > may not be as relevant here, except for the one question whether we > actually want to have the new ABI on all 32-bit architectures or only > as an opt-in for those that expect to stay around for another 24 years. For glibc I think it will make the most sense to add the support for 64-bit time_t across all architectures that currently have 32-bit time_t (with the new interfaces having fallback support to implementation in terms of the 32-bit kernel interfaces, if the 64-bit syscalls are unavailable either at runtime or in the kernel headers against which glibc is compiled - this fallback code will of course need to check for overflow when passing a time value to the kernel, hopefully with error handling consistent with whatever the kernel ends up doing when a filesystem can't support a timestamp). If some architectures don't provide the new interfaces in the kernel then that will mean the fallback code in glibc can't be removed until glibc support for those architectures is removed (as opposed to removing it when glibc no longer supports kernels predating the kernel support). > Two more questions for you: > > - are you (and others) happy with adding this type of stat syscall > (fstatat64/fstat64) as opposed to the more generic xstat that has > been discussed in the past and that never made it through the bike- > shedding discussion? I am. > - once we have enough buy-in from reviewers to merge this initial > series, should we proceed to define rest of the syscall ABI > (minus driver ioctls) so glibc and kernel can do the conversion > on top of that, or should we better try to do things one syscall > family at a time and actually get the kernel to handle them > correctly internally? I don't have any comments on that ordering question. -- Joseph S. Myers jo...@co... |
From: Arnd B. <ar...@ar...> - 2014-06-04 15:06:39
|
On Monday 02 June 2014, Joseph S. Myers wrote: > On Mon, 2 Jun 2014, Arnd Bergmann wrote: > > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > > may not be as relevant here, except for the one question whether we > > actually want to have the new ABI on all 32-bit architectures or only > > as an opt-in for those that expect to stay around for another 24 years. > > For glibc I think it will make the most sense to add the support for > 64-bit time_t across all architectures that currently have 32-bit time_t > (with the new interfaces having fallback support to implementation in > terms of the 32-bit kernel interfaces, if the 64-bit syscalls are > unavailable either at runtime or in the kernel headers against which glibc > is compiled - this fallback code will of course need to check for overflow > when passing a time value to the kernel, hopefully with error handling > consistent with whatever the kernel ends up doing when a filesystem can't > support a timestamp). If some architectures don't provide the new > interfaces in the kernel then that will mean the fallback code in glibc > can't be removed until glibc support for those architectures is removed > (as opposed to removing it when glibc no longer supports kernels predating > the kernel support). Ok, that's a good reason to just provide the new interfaces on all architectures right away. Thanks for the insight! Arnd |
From: H. P. A. <hp...@zy...> - 2014-06-02 21:59:02
|
On 06/02/2014 12:55 PM, Arnd Bergmann wrote: >> >> The bit that is really going to hurt is every single ioctl that uses a >> timespec. >> >> Honestly, though, I really don't understand the point with "struct >> inode_time". It seems like the zeroeth-order thing is to change the >> kernel internal version of struct timespec to have a 64-bit time... it >> isn't just about inodes. We then should be explicit about the external >> uses of time, and use accessors. > > I picked these because they are fairly isolated from all other uses, > in particular since inode times are the only things where we really > care about times in the distant past or future (decades away as opposed > to things that happened between boot and shutdown). > If nothing else, I would expect to be able to set the system time to weird values for testing. So I'm not so sure I agree with that... > For other kernel-internal uses, we may be better off migrating to > a completely different representation, such as nanoseconds since > boot or the architecture specific ktime_t, but this is really something > to decide for each subsystem. Having a bunch of different time representations in the kernel seems like a real headache... -hpa |
From: Arnd B. <ar...@ar...> - 2014-06-03 14:23:37
|
On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > >> > >> The bit that is really going to hurt is every single ioctl that uses a > >> timespec. > >> > >> Honestly, though, I really don't understand the point with "struct > >> inode_time". It seems like the zeroeth-order thing is to change the > >> kernel internal version of struct timespec to have a 64-bit time... it > >> isn't just about inodes. We then should be explicit about the external > >> uses of time, and use accessors. > > > > I picked these because they are fairly isolated from all other uses, > > in particular since inode times are the only things where we really > > care about times in the distant past or future (decades away as opposed > > to things that happened between boot and shutdown). > > > > If nothing else, I would expect to be able to set the system time to > weird values for testing. So I'm not so sure I agree with that... I think John Stultz and Thomas Gleixner have already started looking at how the timekeeping code can be updated. Once that is done, we should be able to add a functional 64-bit gettimeofday/settimeofday syscall pair. While I definitely agree this is one of the most basic things to have, it's also not an area of the kernel that is easy to change. > > For other kernel-internal uses, we may be better off migrating to > > a completely different representation, such as nanoseconds since > > boot or the architecture specific ktime_t, but this is really something > > to decide for each subsystem. > > Having a bunch of different time representations in the kernel seems > like a real headache... We already have time_t, ktime_t, timeval, timespec, compat_timespec, clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64, and lots of driver or file system specific representations. I'm all for removing a bunch of these from the kernel, but my feeling is that this is one of the cases where we first have to add new ones in order to remove those that are already there. To complicate things further, we also have various times bases (realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...), and at least for the timespec values we pass around, it's not always obvious which one is used, of if that's the right one. We probably don't want to add a lot of new representations, and it's possible that we can change most of the internal code we have to ktime_t and then convert that to whatever user space wants at the interfaces. The possible uses I can see for non-ktime_t types in the kernel are: * inodes need 96 bit timestamps to represent the full range of values that can be stored in a file system, you made a convincing argument for that. Almost everything else can fit into 64 bit on a 32-bit kernel, in theory also on a 64-bit kernel if we want that. * A number of interfaces pass relative timespecs: nanosleep(), poll(), select(), sigtimedwait(), alarm(), futex() and probably more. There is nothing wrong with the use of timespec here, and it may be good to annotate that by using a new type (e.g. struct timeout) that is defined as compatible with the current timespec. * For new user interfaces, we need a new type such as the __kernel_timespec64 I introduced, so it doesn't clash with the normal user timespec that may be smaller, depending on the libc. * A lot of drivers will need new ioctl commands, and for drivers that just need time stamps (audio, v4l, sockets, ...) it may be more efficient and more correct to use a new timestamp_t (e.g. boot time 64-bit nanoseconds) than __kernel_timespec64, which is not normally monotonic and requires a normalization step. If we end up introducing such a type in the user interface, we can also start using it in the kernel. Arnd |
From: Joseph S. M. <jo...@co...> - 2014-06-03 14:33:37
|
On Tue, 3 Jun 2014, Arnd Bergmann wrote: > I think John Stultz and Thomas Gleixner have already started looking > at how the timekeeping code can be updated. Once that is done, we should > be able to add a functional 64-bit gettimeofday/settimeofday syscall > pair. While I definitely agree this is one of the most basic things to > have, it's also not an area of the kernel that is easy to change. 64-bit clock_gettime / clock_settime instead of gettimeofday / settimeofday should avoid the need for the kernel to have a 64-bit version of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would need to use a combination of the syscalls if the tz pointer is non-NULL.) -- Joseph S. Myers jo...@co... |
From: Arnd B. <ar...@ar...> - 2014-06-03 14:50:32
|
On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote: > On Tue, 3 Jun 2014, Arnd Bergmann wrote: > > > I think John Stultz and Thomas Gleixner have already started looking > > at how the timekeeping code can be updated. Once that is done, we should > > be able to add a functional 64-bit gettimeofday/settimeofday syscall > > pair. While I definitely agree this is one of the most basic things to > > have, it's also not an area of the kernel that is easy to change. > > 64-bit clock_gettime / clock_settime instead of gettimeofday / > settimeofday should avoid the need for the kernel to have a 64-bit version > of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would > need to use a combination of the syscalls if the tz pointer is non-NULL.) Yes, that's what I meant. Arnd |
From: Dave C. <da...@fr...> - 2014-06-03 21:54:14
|
On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote: > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > The possible uses I can see for non-ktime_t types in the kernel are: > * inodes need 96 bit timestamps to represent the full range of values > that can be stored in a file system, you made a convincing argument > for that. Almost everything else can fit into 64 bit on a 32-bit > kernel, in theory also on a 64-bit kernel if we want that. Just ot be pedantic, inodes don't *need* 96 bit timestamps - some filesystems can *support up to* 96 bit timestamps. If the kernel only supports 64 bit timestamps and that's all the kernel can represent, then the upper bits of the 96 bit on-disk inode timestamps simply remain zero. If you move the filesystem between kernels with different time ranges, then the filesystem needs to be able to tell the kernel what it's supported range is. This is where having the VFS limit the range of supported timestamps is important: the limit is the min(kernel range, filesystem range). This allows the filesystems to be indepenent of the kernel time representation, and the kernel to be independent of the physical filesystem time encoding.... Cheers, Dave. -- Dave Chinner da...@fr... |
From: Arnd B. <ar...@ar...> - 2014-06-04 15:04:19
|
On Tuesday 03 June 2014, Dave Chinner wrote: > On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote: > > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > > The possible uses I can see for non-ktime_t types in the kernel are: > > * inodes need 96 bit timestamps to represent the full range of values > > that can be stored in a file system, you made a convincing argument > > for that. Almost everything else can fit into 64 bit on a 32-bit > > kernel, in theory also on a 64-bit kernel if we want that. > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > filesystems can *support up to* 96 bit timestamps. If the kernel > only supports 64 bit timestamps and that's all the kernel can > represent, then the upper bits of the 96 bit on-disk inode > timestamps simply remain zero. I meant the reverse: since we have file systems that can store 96-bit timestamps when using 64-bit kernels, we need to extend 32-bit kernels to have the same internal representation so we can actually read those file systems correctly. > If you move the filesystem between kernels with different time > ranges, then the filesystem needs to be able to tell the kernel what > it's supported range is. This is where having the VFS limit the > range of supported timestamps is important: the limit is the > min(kernel range, filesystem range). This allows the filesystems > to be indepenent of the kernel time representation, and the kernel > to be independent of the physical filesystem time encoding.... I agree it makes sense to let the kernel know about the limits of the file system it accesses, but for the reverse, we're probably better off just making the kernel representation large enough (i.e. 96 bits) so it can work with any known file system. We need another check at the user space boundary to turn that into a value that the user can understand, but that's another problem. Arnd |
From: Nicolas P. <nic...@li...> - 2014-06-04 17:38:24
|
On Wed, 4 Jun 2014, Arnd Bergmann wrote: > On Tuesday 03 June 2014, Dave Chinner wrote: > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > > filesystems can *support up to* 96 bit timestamps. If the kernel > > only supports 64 bit timestamps and that's all the kernel can > > represent, then the upper bits of the 96 bit on-disk inode > > timestamps simply remain zero. > > I meant the reverse: since we have file systems that can store > 96-bit timestamps when using 64-bit kernels, we need to extend > 32-bit kernels to have the same internal representation so we > can actually read those file systems correctly. > > > If you move the filesystem between kernels with different time > > ranges, then the filesystem needs to be able to tell the kernel what > > it's supported range is. This is where having the VFS limit the > > range of supported timestamps is important: the limit is the > > min(kernel range, filesystem range). This allows the filesystems > > to be indepenent of the kernel time representation, and the kernel > > to be independent of the physical filesystem time encoding.... > > I agree it makes sense to let the kernel know about the limits > of the file system it accesses, but for the reverse, we're probably > better off just making the kernel representation large enough (i.e. > 96 bits) so it can work with any known file system. Depends... 96 bit handling may get prohibitive on 32-bit archs. The important point here is for the kernel to be able to represent the time _range_ used by any known filesystem, not necessarily the time _precision_. For example, a 64 bit representation can be made of 40 bits for seconds spanning 34865 years, and 24 bits for fractional seconds providing precision down to 60 nanosecs. That ought to be plenty good on 32 bit systems while still being cheap to handle. Nicolas |
From: Arnd B. <ar...@ar...> - 2014-06-04 19:26:32
|
On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote: > On Wed, 4 Jun 2014, Arnd Bergmann wrote: > > > On Tuesday 03 June 2014, Dave Chinner wrote: > > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > > > filesystems can *support up to* 96 bit timestamps. If the kernel > > > only supports 64 bit timestamps and that's all the kernel can > > > represent, then the upper bits of the 96 bit on-disk inode > > > timestamps simply remain zero. > > > > I meant the reverse: since we have file systems that can store > > 96-bit timestamps when using 64-bit kernels, we need to extend > > 32-bit kernels to have the same internal representation so we > > can actually read those file systems correctly. > > > > > If you move the filesystem between kernels with different time > > > ranges, then the filesystem needs to be able to tell the kernel what > > > it's supported range is. This is where having the VFS limit the > > > range of supported timestamps is important: the limit is the > > > min(kernel range, filesystem range). This allows the filesystems > > > to be indepenent of the kernel time representation, and the kernel > > > to be independent of the physical filesystem time encoding.... > > > > I agree it makes sense to let the kernel know about the limits > > of the file system it accesses, but for the reverse, we're probably > > better off just making the kernel representation large enough (i.e. > > 96 bits) so it can work with any known file system. > > Depends... 96 bit handling may get prohibitive on 32-bit archs. > > The important point here is for the kernel to be able to represent the > time _range_ used by any known filesystem, not necessarily the time > _precision_. > > For example, a 64 bit representation can be made of 40 bits for seconds > spanning 34865 years, and 24 bits for fractional seconds providing > precision down to 60 nanosecs. That ought to be plenty good on 32 bit > systems while still being cheap to handle. I have checked earlier that we don't do any computation on inode time stamps in common code, we just pass them around, so there is very little runtime overhead. There is a small bit of space overhead (12 byte) per inode, but that structure is already on the order of 500 bytes. For other timekeeping stuff in the kernel, I agree that using some 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds, ...) has advantages, that's exactly the point I was making earlier against simply extending the internal time_t/timespec to 64-bit seconds for everything. Arnd |