java - Way to determine if a charset is multibyte? -


is there way determine whether given charset (java.nio.charset.charset) encodes characters using multiple bytes? or, alternatively, there list somewhere of character sets do/do not use more 1 byte per character?

the reason i'm asking performance tweak: need know how long (in bytes) arbitrary string in given character set. in case of single-byte encodings, it's length of string. knowing whether or not charset single-byte save me having re-encode first.

you might think puny optimization couldn't possibly worth effort, lot of cpu cycles in application spent on sort of nonsense, , input data i've encountered far has been in 20+ different charsets.

the simplest way probably:

boolean multibyte = charset.newencoder().maxbytesperchar() > 1.0f; 

note newencoder can throw unsupportedoperationexception though if charset doesn't support encoding. while newdecoder isn't documented throw that, maxcharsperbyte isn't appropriate. use averagecharsperbyte - if that's 1 it's pretty indication it's single-byte encoding, in theory have bytes produce multiple characters, , take multiple bytes per character, averaging @ 1...


Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

javascript - jQuery .height() return 0 when visible but non-0 when hidden -