The case of CPSIO wrong encoding
Decheng (Robbie) Fan 2011-02-27
In recent days I often find cpsio.exe and cppath.exe doesn’t work correctly with Chinese file names on Windows Vista. I’m quite sure that my system’s default locale is CP936 (Chinese Simplified (PRC)). I’ve verified this using the GetACP() and GetOEMCP() APIs. However I’m quite confused why when I use cpsio.exe to copy a string of Chinese text, I can paste it correctly if I use cpsio.exe itself, but not when I use either Word or GVim. In addtion, I found that if I use Word or GVim, the pasted text is actually a CP1252 (Western European, or latin1, or ISO-8859-1) interpretation of the CP936 encoded data: if I type “:set enc=cp1252” in gvim and then paste the text, and then switch to CP936 by typing “:set enc=cp936”, then the text is shown correctly. Apparrently it won’t work correctly if I paste it directly as CP936, because there will be a extraneous CP1252 to CP936 conversion done by Windows, and resulting in strange characters.
Yesterday I suddenly found that my administrator account can do the copying and pasting correctly using cppath.exe. Then I wondered why. I tried many different options, including the current regional format setting, location setting and “copy to system accounts” setting, also the “system default locale”. But all with no luck. Half given up, I suddenly recalled one fact: I created the user accounts through the “net user” command. But my administrator account was created in Control Panel through “User Accounts”. Then I thought may this be the reason why there is such a bug? And then I tried. Yes it worked! Finally I am happy to claim that the problem is solved — maybe there was some settings missing when I create the user account using “net user”.
I didn’t spend time to dig in to find out what registry settings are missing. Anyway this makes me aware that “net user” command has its limitation (or bug?). For the root cause, I may want to investigate later.
[…] 记得两年前,我发过一篇博文,说的是 Windows Vista 上,我用的英文版的 Windows Vista,在里面复制粘贴中文的时候出问题。复制的时候使用的剪贴板格式是 CF_TEXT,粘贴是在 Vim 里面,它用的格式是 CF_UNICODETEXT。出现的是乱码,具体表现是每个中文字符变成了两个乱码字符。可能是通过 CP936 编码(简体中文国标扩展码,又称 GBK)的字符通过 CP1252(西欧字符,又称 ISO-8859-1)解码了。当时的解决方案是使用控制面板来创建新用户,就没有这个问题,而通过 net user 命令创建的用户就有这个问题。但今天发现这个可能不是根本原因。 […]