#2 utf8_substr fails for offset or length larger than 65535


pcre only allows repetitions (e.g. {min,max}) using
numbers up to 65535. Attempts to use utf8_substr with
numbers larger than that will result in a php warning.


  • Harry Fuecks

    Harry Fuecks - 2006-09-03
    • labels: --> String Functions
    • priority: 5 --> 7
    • status: open --> open-accepted
  • Harry Fuecks

    Harry Fuecks - 2006-09-03

    Logged In: YES

    Accepted. Solution probably means reverting to a slower
    implementation when > 65535, as it currently does for
    negative lengths

  • ChrisSmith

    ChrisSmith - 2006-09-27

    Logged In: YES

    Hi Harry,

    I have resolved this and reworked the substr code for
    negative lengths, its still not quick, but its much faster
    than the preg_match_all solution. DokuWiki has the patched
    code for this function + two other required "helper"
    functions, utf8_correctIdx and utf8_byteindex.


    PS. I raised the bug here originally, but didn't realise I
    was anonymous, sorry.

  • Harry Fuecks

    Harry Fuecks - 2006-09-27
    • assigned_to: nobody --> harryf
  • Harry Fuecks

    Harry Fuecks - 2006-09-27

    Logged In: YES

    Hi Chris,

    Should have guessed (who's got documents that big ;)

    Interesting approach - may be the next best thing
    performance and memory wise.

    The fix I'm exploring and spare moements is a little
    different - likewise normalize negative start / length but
    then should work something like this;

    while start > 65535:
    throw away 65535 UTF-8 chars from start of input
    and decrement start until start < 65535

    while len > 65535:
    take 65535 substr from input, decrement len
    by 65535

    finally return any intermediate substrs + final preg_match

    Something like that anyway - not sure if it will be any faster

  • Harry Fuecks

    Harry Fuecks - 2006-10-01
    • status: open-accepted --> closed-fixed

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks