Xournal / Support Requests / #6 Shape-recognizer Algorithm documentation

Xournal does not have actual gesture recognition, all it recognizes is geometric shapes out of the strokes drawn by the user. The algorithm is essentially the following, and only works for simple geometric shapes. It could perhaps be improved to be aware of direction and scale of things and turned into a gesture recognition system. The code is in src/xo-shapes.c . For gesture recognition, at least some time awareness should be added. (xournal can revisit its previous guesses when new strokes are added, after an arbitrarily long time, so eg you can draw a rectangle one edge at a time; presumably a gesture recognizer, if it allows combinations of strokes to be a single gesture, needs to decide when the gesture is over).

Warning: the algorithm is completely made up by me, and as you'll see it's basically applying first-year university solid mechanics (moments of inertia etc.) to the problem of straight line and circle recognition. I can't offer any assurances that it's a solid or sensible algorithm except that it works for this particular purpose. Unfortunately it will not extend to recognizing things that are not made of lines and circles.

- The stroke is given as a polygonal line (sequence of line segments)

- First we compute the mass and x and y moments (calc_inertia()): mass = total length of the polygonal curve; sx = sum of x coordinates weighted by length of individual length segments, sy = sum of y coordinates weighted by length, sxx = sum of x^2, sxy = sum of x*y, syy = sum of y^2

- This gives us the center of mass x0 = sx / mass, y0 = sy/mass of the stroke; and its normalized moment of inertia, Ixx = (sxx/mass) - x0^2, Ixy = (sxy/mass) - x0.y0, Iyy = (syy/mass) - y0^2. These quantities determine the overall center and extent of the stroke.

- The normalized determinant of the inertia matrix, which measures how round a shape is, is computed by I_det(). This is given by 4.(Ixx.Iyy - Ixy^2) / (Ixx+Iyy)^2. In terms of the moments of inertia about the principal axes I1 and I2, this would be 4.I1.I2 / (I1+I2)^2. This quantity is guaranteed to always be between 0 and 1; it is 0 if and only if the polygonal stroke perfectly lies on a straight line (without prejudging whether it zigzags back and forth on the straight line - recall we're not recognizing gestures, just geometric shapes); it is 1 if and only if the inertia matrix is the same as that of a perfect circle.

- A polygonal stroke is deemed to be a single straight line if its determinant score is less than LINE_MAX_DET (a constant set to 0.015 by trial and error).

- A polygonal shape (e.g. two lines, or a triangle or quadrilateral) is recognized by breaking the stroke into sub-strokes that pass the above test for a straight line. This is done by find_polygonal(). The general idea is to look for something that's 1/4 of the stroke (since we're looking at most for quadrilaterals) lying on a straight line (in the approximate sense provided by the above criterion). Then we grow it by considering larger sub-strokes that include it: we try to grow at either end of the sub-stroke by adding just one point, choosing whichever one will cause the determinant score to grow least, and stopping if we exceed LINE_MAX_DET. This greedy algorithm produces a first guess for an edge of the polygon (usually it takes in a bit more than an actual edge because we allow for LINE_MAX_DET tolerance). Then we look at the portions of the stroke that were not part of this edge, and recursively apply this same algorithm to them (with the number of acceptable edges to break into suitably decreased). If we manage to view the whole stroke as consisting of at most 4 edges in this way, we have a polygon.

- If a polygonal shape has been found in this way, it then gets optimized (optimize_polygonal()) by modifying the cut points between the recognized edges so as to minimize the determinant scores. (Recall the previous step tends to include more than appropriate in the first edge identified; this step "gives" those inappropriate pieces back to the other edges where they actually belong).

- The recognizer maintains a queue (recognizer_queue) of recognized line segments / polygonal edges, and can use them to combine successive strokes into a single shape. For example, you can either draw a whole rectangle in a single stroke without lifting the pen (and it gets recognized all at once), or you can draw it by successively drawing four straight-line strokes that get individually recognized, but after the last one is seen, the four lines get retroactively combined into a rectangle.

- Specifically: after recognizing a collection of polygonal edges, xournal adds them to its queue, and then matches the queue against several possibilities: first rectangles (try_rectangle(): 4 edges, making angles close to 90 degrees, and with end points close to each other), then arrows (3 edges, the last two being smaller ones of comparable sizes and at comparable angles to the large first edge), then closed triangles, then closed quadrilaterals.

- If a stroke isn't seen as matching a set of polygonal edges, and if its determinant score is high (greater than CIRCLE_MIN_DET = 0.95), then the expected radius is computed from the inertia matrix, and the stroke's distance from that expected circle is computed and scored to decide if it's close enough (score_circle()).

Hope this helps.
Denis

Shape-recognizer Algorithm documentation

Group

Searches

Help

#6 Shape-recognizer Algorithm documentation

Discussion