logoalt Hacker News

userbinator10/12/20244 repliesview on HN

As someone who has worked on Windows for a long time, the title was entirely unsurprising. Widening or narrowing always uses the current codepage.

If a name contains values beyond ASCII — technically out of spec

I'm not sure what spec it's referring to, but this is normal and expected for files in non-English systems.

Such tools often incorrectly assume UTF-8, which is what motivated this article.

Those tools are likely to be from the *nix world, where UTF-8 is far more common for the multibyte encoding --- but even there, you can have different codepages; and I have worked on Linux systems using CP1252 and 932 before.


Replies

mananaysiempre10/12/2024

> I'm not sure what spec [the prohibition on non-ASCII names is] referring to, but this is normal and expected for files in non-English systems.

The description of import directories[1,2] in the PE/COFF spec explicitly (if somewhat glibly) restricts imported DLLs to being referenced using ASCII only:

> Name RVA - The address of an ASCII string that contains the name of the DLL.

[1] https://learn.microsoft.com/en-us/windows/win32/debug/pe-for... (current, unversioned)

[2] https://github.com/tpn/pdfs/blob/master/Microsoft%20Portable... §6.4.1 (version 6.0, 1999)

show 1 reply
Dwedit10/12/2024

It's the import table of an EXE/DLL where non-ascii is out of spec. Meanwhile LoadLibraryW is happy to load any filename. (But don't you dare try to call LoadLibraryW from within DLLMain, that's under loader lock)

numpad010/12/2024

Sadly, it's not unusual at all to see a Windows app crash and burn when paths contain non-ASCII characters. It's just what it is to non-English computer users.

show 1 reply
ahoka10/12/2024

I’m pretty sure there are non-ASCII characters on English keyboards too.