c# - ReadText from file in ANSII encoding -
i use q42.winrt library download html file cache. when use readtextasync have exception:
no mapping unicode character exists in target multi-byte code page. (exception hresult: 0x80070459)
my code simple
var parsedpage = await webdatacache.getasync(new uri(string.format("http://someurl.here"))); var parsedstream = await fileio.readtextasync(parsedpage);
i open downloaded file , there ansii encoding. think need convert utf-8 don't know how.
the problem encoding of original page not in unicode, it's windows-1251, , readtextasync function handles unicode or utf8. way around read file binary , use encoding.getencoding interpret bytes 1251 code page , produce string (which unicode).
for example,
string parsedstream; var parsedpage = await webdatacache.getasync(new uri(string.format("http://bash.im"))); var buffer = await fileio.readbufferasync(parsedpage); using (var dr = datareader.frombuffer(buffer)) { var bytes1251 = new byte[buffer.length]; dr.readbytes(bytes1251); parsedstream = encoding.getencoding("windows-1251").getstring(bytes1251, 0, bytes1251.length); }
the challenge don't know stored bytes code page is, works here may not work other sites. generally, utf-8 you'll web, not always. content-type response header of page shows code page, information isn't stored in file.
Comments
Post a Comment