twitter-text
This code is used at Twitter to tokenize and parse text
...It synchronizes development, testing, creating issues, and pull requests for twitter-text's implementations and specification. These libraries are responsible for determining the quantity of characters in a Tweet and identifying and linking any URL, @username, #hashtag, or $cashtag. Emoji supported by twemoji always count as two characters, regardless of combining modifiers. This includes emoji which have been modified by Fitzpatrick skin tone or gender modifiers, even if they are composed of significantly more Unicode code points. Emoji weight is defined by a regular expression in twitter-text that looks for sequences of standard emoji combined with one or more Unicode Zero Width Joiners (U+200D).