I think using "extended grapheme clusters" (EGC) (rather than code points or bytes) is a g...

ks2048 • last Monday at 1:08 AM • 3 replies • view on HN

I think using "extended grapheme clusters" (EGC) (rather than code points or bytes) is a good idea. But, why not let you do "x[:2]" (or "x[0..<2]") for s String with the first two EGCs? (maybe better yet - make that return "String?")

Replies

happytoexplain • yesterday at 11:13 PM

That's what I meant by "must be O(1)". I.e. constant time. String's Index type puts the non-constant cost of identifying a grapheme's location into the index creation functions (`index(_:offsetBy:)`, etc). Once you have an Index, then you can use it to subscript the string in constant time.

Like I said, you can easily extend String to look up graphemes by integer index, but you should define it as a function, not a subscript, to honor the convention of using subscripts only for constant-time access.

It's also just not a normal use case. In ten years of exclusive Swift usage, I've never had to get a string's nth grapheme, except for toy problems like Advent of Code.

ezfe • last Monday at 2:07 AM

Because that implies that String is a random access collection. You cannot constant-time index into a String, so the API doesn't allow you to use array indexing.

If you know it's safe to do you can get a representation as a list of UInt8 and then index into that.

zzo38computer • last Monday at 2:12 AM

I disagree. I think it should be indexed by bytes. One reason is what the other comment explains about not being constant-time (which is a significant reason), although the other is that this restricts it to Unicode (which has its own problems) and to specific versions of Unicode, and can potentially cause problems when using a different version of Unicode. A separate library can be used to deal with code points and/or EGC if this is important for a specific application; these features should not be inherent to the string type.

➕ show 2 replies

alt Hacker News

Replies