Opening UTF16 URL with urllib in Python -
i'm trying use google translate api translate text in kannada (and hence encoded utf-16) english. manually entering url, after pluggin in google api key, https://www.googleapis.com/language/translate/v2?key=key#&q=ಚಿಂಚೋಳಿ&source=kn&target=en, i'm able translation want.
the problem is, however, url utf16 encoded. when try open url using urllib, error message below. advice how proceed or alternative way proceed appreciated.
edit: believe problem can solved calling urllib.parse.quote_plus(text) text utf16 text, , replacing utf16 text return value function.
traceback (most recent call last): file "<pyshell#19>", line 1, in <module> urllib.request.urlopen(url) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 156, in urlopen return opener.open(url, data, timeout) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 469, in open response = self._open(req, data) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 487, in _open '_open', req) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 447, in _call_chain result = func(*args) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 1283, in https_open context=self._context, check_hostname=self._check_hostname) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 1248, in do_open h.request(req.get_method(), req.selector, req.data, headers) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/http/client.py", line 1061, in request self._send_request(method, url, body, headers) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/http/client.py", line 1089, in _send_request self.putrequest(method, url, **skips) file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/http/client.py", line 953, in putrequest self._output(request.encode('ascii')) unicodeencodeerror: 'ascii' codec can't encode characters in position 73-79: ordinal not in range(128)
the problem is, however, url utf16 encoded
utf-16 doesn't mean think means. encoding of unicode characters bytes used internally string types of systems such win32 api. utf-16 never used on web because not ascii-compatible.
https://www.googleapis.com/language/translate/v2?key=key#&q=ಚಿಂಚೋಳಿ&source=kn&target=en
this not uri - uris may contain ascii characters. iri, can contain other unicode characters.
however urllib
not support iris. there python libraries directly support iri; alternatively can convert iri corresponding uri urllib
happy with. done encoding non-ascii characters in hostname using idna algorithm, , encoding non-ascii characters in other parts of address (including query parameters) using url-encoding on utf-8 representation of characters. gives this:
https://www.googleapis.com/language/translate/v2?key=key#&q=%e0%b2%9a%e0%b2%bf%e0%b2%82%e0%b2%9a%e0%b3%8b%e0%b2%b3%e0%b2%bf&source=kn&target=en
however, use of #
here doesn't right- that's client-side mechanism passing in data browser, won't work server requests.
usually you'd like:
baseurl= 'https://www.googleapis.com/language/translate/v2' text= u'ಚಿಂಚೋಳಿ' url= baseurl+'?'+urllib.urlencode(dict( source= 'kn', target= 'en', q= text.encode('utf-8'), key= key ))
Comments
Post a Comment