Opening UTF16 URL with urllib in Python -


i'm trying use google translate api translate text in kannada (and hence encoded utf-16) english. manually entering url, after pluggin in google api key, https://www.googleapis.com/language/translate/v2?key=key#&q=ಚಿಂಚೋಳಿ&source=kn&target=en, i'm able translation want.

the problem is, however, url utf16 encoded. when try open url using urllib, error message below. advice how proceed or alternative way proceed appreciated.

edit: believe problem can solved calling urllib.parse.quote_plus(text) text utf16 text, , replacing utf16 text return value function.

traceback (most recent call last):   file "<pyshell#19>", line 1, in <module>     urllib.request.urlopen(url)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 156, in urlopen     return opener.open(url, data, timeout)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 469, in open     response = self._open(req, data)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 487, in _open     '_open', req)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 447, in _call_chain     result = func(*args)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 1283, in https_open     context=self._context, check_hostname=self._check_hostname)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/urllib/request.py", line 1248, in do_open     h.request(req.get_method(), req.selector, req.data, headers)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/http/client.py", line 1061, in request     self._send_request(method, url, body, headers)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/http/client.py", line 1089, in _send_request     self.putrequest(method, url, **skips)   file "/library/frameworks/python.framework/versions/3.3/lib/python3.3/http/client.py", line 953, in putrequest     self._output(request.encode('ascii')) unicodeencodeerror: 'ascii' codec can't encode characters in position 73-79: ordinal not in range(128) 

the problem is, however, url utf16 encoded

utf-16 doesn't mean think means. encoding of unicode characters bytes used internally string types of systems such win32 api. utf-16 never used on web because not ascii-compatible.

https://www.googleapis.com/language/translate/v2?key=key#&q=ಚಿಂಚೋಳಿ&source=kn&target=en 

this not uri - uris may contain ascii characters. iri, can contain other unicode characters.

however urllib not support iris. there python libraries directly support iri; alternatively can convert iri corresponding uri urllib happy with. done encoding non-ascii characters in hostname using idna algorithm, , encoding non-ascii characters in other parts of address (including query parameters) using url-encoding on utf-8 representation of characters. gives this:

https://www.googleapis.com/language/translate/v2?key=key#&q=%e0%b2%9a%e0%b2%bf%e0%b2%82%e0%b2%9a%e0%b3%8b%e0%b2%b3%e0%b2%bf&source=kn&target=en 

however, use of # here doesn't right- that's client-side mechanism passing in data browser, won't work server requests.

usually you'd like:

baseurl= 'https://www.googleapis.com/language/translate/v2' text= u'ಚಿಂಚೋಳಿ' url= baseurl+'?'+urllib.urlencode(dict(     source= 'kn', target= 'en',     q= text.encode('utf-8'),     key= key )) 

Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -