language agnostic - Fully correct Unicode visual string reversal -


[inspired largely trying explain problems character encoding independent character swap, these other questions neither of contain complete answer: how reverse unicode string, how reversed string (unicode safe)]

doing visual string reversal in unicode harder looks. in storage format other utf-32 have pay attention codepoint boundaries rather going byte-by-byte. that's not enough, because of combining glyphs; spec has concept of "grapheme cluster" that's closer basic unit want reversing. that's still not enough; there sorts of special case characters, bidi overrides , final forms, have fixed up.

this pseudo-algorithm handles easy cases know about:

  1. segment string alternating list of words , word-separators (some word-separators may empty string)
  2. reverse order of list.
  3. for each string in list:
    1. segment string grapheme clusters.
    2. reverse order of grapheme clusters.
    3. check initial , final cluster in reversed sequence; base characters may need reassigned correct form (e.g. if u+05db hebrew letter kaf @ end of sequence needs become u+05da hebrew letter final kaf, , vice versa)
    4. join sequence string.
  4. recombine list of reversed words produce final reversed string.

... doesn't handle bidi overrides , i'm sure there's stuff don't know about, well. can fill in gaps?


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -