java - Way to determine if a charset is multibyte? -
is there way determine whether given charset (java.nio.charset.charset) encodes characters using multiple bytes? or, alternatively, there list somewhere of character sets do/do not use more 1 byte per character?
the reason i'm asking performance tweak: need know how long (in bytes) arbitrary string in given character set. in case of single-byte encodings, it's length of string. knowing whether or not charset single-byte save me having re-encode first.
you might think puny optimization couldn't possibly worth effort, lot of cpu cycles in application spent on sort of nonsense, , input data i've encountered far has been in 20+ different charsets.
the simplest way probably:
boolean multibyte = charset.newencoder().maxbytesperchar() > 1.0f;
note newencoder
can throw unsupportedoperationexception
though if charset
doesn't support encoding. while newdecoder
isn't documented throw that, maxcharsperbyte
isn't appropriate. use averagecharsperbyte
- if that's 1 it's pretty indication it's single-byte encoding, in theory have bytes produce multiple characters, , take multiple bytes per character, averaging @ 1...
Comments
Post a Comment