smalltalk - Squeak Monticello character-encoding -
for work project using headless squeak on (displayless, remote) linuxserver , using squeak on windows developer-machine.
code on developer machine managed using monticello. have copy mcz server using sftp unfortunately (e.g. having push-repository on server not possible security reasons). code merged eg:
mczinstaller installfilenamed: 'name-b.18.mcz'.
which works.
unfortunately our code-base contains strings contain umlauts , other non-ascii characters. during monticello-reimport of them replaced other characters , replaced nothing.
i tried e.g.
mczinstaller installstream: (filestream readonlyfilenamed: '...') binary
(note .mcz's .zip's, binary should appropriate, guess default anyway)
finding out how make monticello's transfer preserve squeak internal-encoding of non-ascii's main goal of question. changing source code use ascii-strings (at least in codebase) less desirable because manual labor involved. if interested in why not simple grep-replace in case read side note:
(side note: (a simplified/special case) codebase uses seaside's #text: method render strings contain chars have html-escaped. works fine our non-ascii's e.g. converts ä ä, if grep-replace literal ä's ä explicitly, have use #html: method instead (else double-escape), require replace other characters have html-escaped (e.g. &), again source-code contains such characters. , there other cases, #text:'s take third-party strings, may not replaced #html's...)
squeak use unicode (iso 10646) internally encoding characters in string.
might use extension cp1252 characters in range 16r80 to: 16r9f, i'm not sure anymore.
the characters codes written on stream source.st, , these codes made of single byte bytestring when characters <= 16rff. in case, file should encoded in iso-8859-l1 or cp1252.
if ever have character codes > 16rff, widestring used in squeak. once again codes written on stream source.st, time these 32 bits codes (written in big-endian order). technically, encoding utf-32be.
now mczinstaller does? uses snapshot/source.st file, , uses setconverterforcode reading file, either utf-8 or macroman... non ascii characters might changed, , worse in case of widestring re-interpreted bytestring.
mc doesn't use snapshot/source.st member in archive.
rather uses snapshot.bin (see code in mcmczreader, mcmczwriter).
binary file format governed datastream.
the snippet should use rather:
mcmczreader loadversionfile: 'yourpackage-b.18.mcz'
Comments
Post a Comment