#255 Speed up brgetc() (with patch)

v4.1
closed-fixed
None
v4.5
5
2017-09-26
2008-11-15
No

Previously prgetc() went back to the beginning of the line just to read a
proper UTF-8 character. With this patch it goes back at most 6 bytes only,
while it is still correct. This makes editing UTF-8 buffers with long lines
much faster.

Index: b.c

--- b.c (revision 25)
+++ b.c (working copy)
@@ -888,66 +888,17 @@
int prgetc(P *p)
{
if (p->b->o.charmap->type) {
-
- if (pisbol(p))
- return prgetb(p);
- else {
- P *q = pdup(p, USTR "prgetc");
- P *r;
- p_goto_bol(q);
- r = pdup(q, USTR "prgetc");
- while (q->byte<p->byte) {
- pset(r, q);
- pgetc(q);
- }
- pset(p,r);
- prm(r);
- prm(q);
- return brch(p);
- }
-
-#if 0
- int d = 0;
- int c;
- int n = 0;
- int val = p->valcol;
- for(;;) {
- c = prgetb(p);
- if (c == NO_MORE_DATA)
- return NO_MORE_DATA;
- else if ((c&0xC0)==0x80) {
- d |= ((c&0x3F)<<n);
- n += 6;
- } else if ((c&0x80)==0x00) { /* One char */
- d = c;
- break;
- } else if ((c&0xE0)==0xC0) { /* Two chars */
- d |= ((c&0x1F)<<n);
- break;
- } else if ((c&0xF0)==0xE0) { /* Three chars */
- d |= ((c&0x0F)<<n);
- break;
- } else if ((c&0xF8)==0xF0) { /* Four chars */
- d |= ((c&0x07)<<n);
- break;
- } else if ((c&0xFC)==0xF8) { /* Five chars */
- d |= ((c&0x03)<<n);
- break;
- } else if ((c&0xFE)==0xFC) { /* Six chars */
- d |= ((c&0x01)<<n);
- break;
- } else { /* FIXME: Invalid (0xFE or 0xFF found) */
- break;
- }
- }
-
- if (val && c!='\t' && c!='\n') {
- p->valcol = 1;
- p->col -= joe_wcwidth(1,d);
- }
-
- return d;
-#endif
+ P pbak;
+ int i, left = 6;
+ /* prgetb() takes care of skipping '\r' at bol */
+ if ((i = prgetb(p)) + 0U< 0x80U || i == NO_MORE_DATA) return i;
+ /* The loop also stops on NO_MORE_DATA. Good. */
+ while (left > 0 && ((i = prgetb(p)) & 0xC0) == 0x80)
+ left--;
+ pbak = *p; /* This is a lot faster than pdup. */
+ i = pgetc(p);
+ *p = pbak;
+ return i;
}
else {
return prgetb(p);

Related

Bugs: #335

Discussion

  • Joe Allen

    Joe Allen - 2008-11-23

    You must use pdup() or you risk missing the vlock/vunlock sequence if you happen to be right on a block boundary.

    In any case, a fix like this is needed. I don't remember why I commented out my own code to do this- something must have been wrong with it.

     
  • John J. Jordan

    John J. Jordan - 2015-03-14

    Fixed, although honestly not very thoroughly tested patch.

    EDIT: Strike that, I don't like the < when & does the trick, and the original patch is off-by-one compared to the code that's currently there (extra prgetb). The patch itself doesn't help big-O if valcol is to be maintained, since we'd have to walk from bol anyway, but it's better if the caller doesn't care.

     
    Last edit: John J. Jordan 2015-03-14
  • John J. Jordan

    John J. Jordan - 2015-03-16

    Take 3...

     
    Last edit: John J. Jordan 2015-03-16
  • John J. Jordan

    John J. Jordan - 2015-08-17

    Checked into fix_types: [9abf4d] -- yet another revision on the patch :-)

     

    Related

    Commit: [9abf4d]

  • John J. Jordan

    John J. Jordan - 2017-09-26
    • status: open --> closed-fixed
    • assigned_to: John J. Jordan
    • Applies To: --> v4.5
    • Group: --> v4.1
     

Log in to post a comment.