From: Louis G. <lge...@gm...> - 2008-03-07 01:46:56
|
On Thu, Mar 6, 2008 at 4:38 PM, Markus Krötzsch <ma...@ai...> wrote: > First of all: thanks a lot. Comments inline. > > > > On Donnerstag, 6. März 2008, Louis Gerbarg wrote: > > I was getting errors trying to store dates in the mid 2290s... > > I thought my schedule was packed with meetings, and you are already planning > until the 2290s. What kind of site is that? I have been experimenting with annotating information about some science fiction series. Some of that takes place far in the future. Nothing is on a public facing server, and I have no idea if I will actually do anything useful with this, but it is more about me working through some issues with categorization and templating properties than this particular data set. > > here is > > a patch that remedies this. It was tested under php 5.2.5 on Mac OS X > > 10.5.2 (32 bit). The patch should run correctly on php >= 5.1.3, > > though I suspect it will not actually give people the extended ranges > > unless you are on more recent releases due to fixes in the DateTime > > class. If I did everything correctly it should still accept all the > > same input formats it previously did, as well as store everything in > > the database the same way, so no data conversion should be necessary, > > but I did not test that extensively. > > How does that work? Can the DB handle 64bit large numbers? Is there a > performance hit associated to that? The existing date implementation stores it in the DB as an XSD formatted string and double precision float, so there is no particular efficacy issue one way or another. Altering it save it simply as an integer (32 or 64) would probably be a bit of a performance win, but extending the range should not negatively impact what is currently being done. Since the numeric value in the DB is a double it cannot actually store a full 64bit range, but it can safely store well past the value of a 32 bit integer (~50ish bits if I recall correctly). Depending on exactly how you are using the numeric representation vs XSD representation there could be an interesting failure mode when the numeric overflows but the XSD is still within 64 bits. (100+ million years). Strictly speaking, whether or not using a double field like that is clean is sort of moot, because anyone running a 64 bit build of php 5.2.6-snap is going to end up with strtotime returning 64 bit values, which should cause the same values as I am generating to end up in the DB, even without my code changes. If it is a problem it is going to need to be fixed regardless of whether the patch is applied. > > I now have already two patches for extending data ranges: the other one is the > Historical Date datatype by Terry A. Hurlbut. I wonder whether this extended > datatype could also use your method (AFAIK the historical dates use days > instead of seconds internally, right?). I also consider switching to a format > that separates parts of dates in different DB fields, so that SMW can > distinguish dates without exact day or day time from those dates which just > happen to be at a year's 1st of Jan 00:00 ... this would probably also solve > the 64bit issue since the components would fit into 32bit. When you say parts of dates I presume you mean month, day, year, etc. So long as the parser keeps working with the same wikitext I am not partial to what I did, I just needed something that worked, and I was more than happy to share it incase it was useful. I don't really need anything more than an extension of the current time tracking mechanism for what I am currently interested in, though the problem you mention is certainly interesting. The one issue I can forsee with the representation you are describing is that the boundaries people find interesting are different. Imagine tracking comic books. Some are published quarterly, monthly, biweekly, etc. Ideally you want some way to distinguish between "Winter," "January," and "Week 1," all of which start at the same time. I think a more general way to solve it would be make a new data type of DateRange, which was stored as two 64 bit integers. It would effectively allow you to diminish the precision of a particular date by separating the two points. Having said that, I don't need that functionality, and doing it right seems like a lot of work. Louis |