> but without quoting which made the command hunt for the word match ending space which was regr...

josefx • today at 10:53 AM • 2 replies • view on HN

> but without quoting which made the command hunt for the word match ending space which was regrettably, the D:\ component of the name

Except the folder name did not start with a space. In an unquoted D:\Hello World, the command would match D:\Hello, not D:\ and D:\Hello would not delete the entire drive. How does AI even handle filepaths? Does it have a way to keep track of data that doesn't match a token or is it splitting the path into tokens and throwing everything unknown away?

Replies

atq2119 • today at 2:45 PM

We're all groping around in the dark here, but something that could have happened is a tokenizer artifact.

The vocabularies I've seen tend to prefer tokens that start with a space. It feels somewhat plausible to me that an LLM sampling would "accidentally" pick the " Hello" token over the "Hello" token, leading to D:\ Hello in the command. And then that gets parsed as deleting the drive.

I've seen similar issues in GitHub Copilot where it tried to generate field accessors and ended up producing an unidiomatic "base.foo. bar" with an extra space in there.

deltoidmaximus • today at 2:16 PM

I assumed he had a folder that started with a space at the start of the name. Amusingly I just tried this and with Windows 11 explorer will just silently discard a space if you add it at the beginning of the folder name. You need to use cli mkdir " test" to actually get a space in the name.

alt Hacker News

Replies