Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
Close
From: Ron Parker <ron@pa...>  20060519 05:25:03

To put it mildly, I am highly suspicious of the math behind the position filter. Does it actually work as advertised? Does anyone use it? My offthecuff answer to the latter question is "no," but I'm prepared to be informed otherwise. My slightly less offthecuff answer to the former question is also "no," and a notveryrigorous explanation of why follows. So the real question is: if someone's actually using this thing, should I put some work into making it do what it says it does? Here's why I'm suspicious: the sort routine it uses appears to not be transitive. What that means is that a<b and b<c does not necessarily imply a<c. This is what we math geeks like to call "a big nono" as it leads to a certain instability in the supposedly sorted result. What the sort routine does is compare two points along the axis on which they are closest to each other. So, let's say our three points A, B, and C are as follows: A (0,0) B (1,2) C (3,2) Now, A and B are closer along the X axis than along the Y axis. So, since 0<1, A < B. B and C are closer along the X axis as well. 1<3, so B < C. A and C are closer on the Y axis. 2<0, so C < A. What that means is that depending on which comparisons actually get made, there are three possible orderings for these three points. If these three points were the only three points in consideration, we could predict with absolute confidence what order they'd get sorted into. If we throw other random points into the mix, too, the probability of any one of those three orderings appearing in a given "sorted" collection including these three points approaches 1/3. Let's say, for the sake of argument, that the sort routine spits out BCA as the order these points should be in. Let's further suppose that we've asked the position filter to remove anything that's within 3 units (chosen because A and B are sqrt(5) units apart, which is less than 3. For completeness, A and C are sqrt(13) units apart  somewhere between 3 and 4  and B and C are sqrt(20) units apart  somewhere between 4 and 5. Some of you probably see where this is going.) The first thing the position filter will do is compare B and C. They are more than 3 units apart, so it bumps both pointers. Now it compares C and A. Also more than 3 units apart. So it bumps both pointers again. The very critical and important thing to notice here is that it never compares B and A, which are less than the specified 3 units apart. So it doesn't actually remove the "duplicate" point. On to the description of how I'd fix it. I don't expect this to make sense to anyone, but I'm putting it into the archives in case I don't get this particular potency of mushrooms again and I forget what it was I was planning to do. Throw the waypoints into a bin. While doing so, determine the bounds of the bin. If there are more than, say, 5 waypoints in the bin, split it into "halves" along the longer of the two axes. To split into half, sort the bin along that axis, find the median, and  this is the part that's different from a median cut  copy everything to the left of the cut and some of the stuff to the right into one bin, and everything to the right and some of the stuff to the left into another bin. If you had 100 points in the first bin, you might have 60 points in each of the new bins. How much is the overlap? Twice the specified distance, centered on the median line. If you split in half, recurse for each of the halves. So why the overlap? Because any point that stands a chance of being "too close" to a point that is properly in the left bin has to be within the critical distance of the line between the bins. So by overfilling both bins, we guarantee we'll get every point within the critical distance of every point in each bin. So now you have a bin with 5 or fewer waypoints in it. Compare them all to each other, mark any duplicates with the same partition number. To assign partition numbers: If neither waypoint has a partition number, increment the partition counter and mark both waypoints with it. Add them to the partition's waypoint list. If one waypoint has a partition number and the other does not, mark the unmarked waypoint with the same number. Add it to the partition's waypoint list. If both waypoints already have partition numbers and they are not equal, run down the list of all waypoints with the higher number and remark them with the lower number. When you return from the toplevel recursion, delete all but the first waypoint in each partition. For whatever definition of "first." Or, if "all" was specified, delete 'em all. Yeah, this isn't pretty, and it isn't fast. But it's better than the obvious O(n^2) algorithm. One caveat is that if you have one waypoint every 1 meter along a 100meterlong line, and your critical distance is 4 feet, you will remove all but one of those waypoints, even though each of them is only within 4 feet of at most two others. This might not be what the user has in mind. An alternate approach is to immediately mark one of each tooclose pair of waypoints for deletion, and not consider alreadymarked waypoints when searching for further pairs. This will have the effect of semirandomly "thinning the herd" so that no two waypoints are too close. This might be better. It might not. 
From: Robert Lipe <robertlipe@us...>  20060519 14:46:31

> To put it mildly, I am highly suspicious of the math behind the position > filter. Does it actually work as advertised? Does anyone use it? I've used it for some data crunching but I'll confess my standards were pretty low. It could have worked badly and still been "good enough" for me. > On to the description of how I'd fix it. I don't expect this to make > sense to anyone, but I'm putting it into the archives in case I don't > get this particular potency of mushrooms again and I forget what it was > I was planning to do. I'm so glad to have you here. This kind of thing isn't my bag. RJL 