Python decode french char in html email attachment -
i'm trying decode html attachment file of email take on imap server. if html file contain normal character it's working without problem, when have french é
character have this: "vous \xc3\xa9t\xc3\xa9 envoy\xc3\xa9e par"
have \n
\r
appear.
i use beautifulsoup make search on html code. use loop check mail(not present in code)
imap_server = imaplib.imap4_ssl("server",993) imap_server.login(username, password) imap_server.select("test") result, data = imap_server.uid('search', none, "unseen") latest_email_uid = data[0].split()[-1] result, data = imap_server.uid('fetch', latest_email_uid, '(rfc822)') raw_email = data[0][1] raw_email=str(raw_email, 'utf8') msg = email.message_from_string(raw_email)
i walk in mail, if find html decode base64 , send beautifulsoup. after print utf-8 conversion. if replace encode.('utf-8') latin-1 have special char.
if msg.is_multipart(): part in msg.walk(): if part.get_content_type() == 'text/html': attachment= (part.get_payload(decode=1)) soup=beautifulsoup(attachment) print (soup.prettify().encode('utf-8')) else: print ("no html")
i tried encode,decode in lot charset without having nice. have tried base64.b64decode(text).decode('utf-16')
still have same \xc3\xa9
you see special characters because encoding utf-8 or latin-1:
>>> print('\xe9') é >>> print('\xe9'.encode('utf8')) b'\xc3\xa9' >>> print('\xe9'.encode('latin1')) b'\xe9' >>> print('hello world!\n'.encode('utf8')) b'hello world!\n'
when printing bytes literal, python shows repr()
representation of value, replaces byte not represent printable ascii codepoint \x..
escape sequence; replaced shorter two-character escapes, such \r
, \n
. makes representation both re-usable python bytes literal , more logged files , terminals not set international character sets.
print()
handles encoding you. print .prettify()
output directly.
if printing unicode terminal or console not work, , instead raises unicodedecodeerror
, terminal or console not configured handle unicode text properly. consult printfail python wiki page troubleshoot.
Comments
Post a Comment