logoalt Hacker News

tialaramextoday at 2:05 AM2 repliesview on HN

Very often if you have text, which this does, you can make huge savings by being intelligent with the text.

Rust intentionally provides the simplest possible growable string buffer String, which is literally (under the hood, you can't poke this legitimately) Vec<u8> plus the promise that this is UTF-8 text.

But you might find your needs better served by one (or several) of:

Box<str> -- you don't need capacity, so, don't store it => length == capacity

CompactString -- use the entire 24 bytes for SSO, up to 24 bytes of UTF-8 inline, obviously doesn't make sense if all or the vast majority of your strings are 25 bytes or longer

ColdString -- same idea but for 8 bytes, and also not storing capacity, this only makes sense over Box<str> if you have plenty of <= 8 byte strings


Replies

ben-schaaftoday at 2:32 AM

There's really an endless list of these optimizations. A few I've used (though not necessarily in rust):

Atoms: Each string can be referenced with a single u32 or even u16, and they're inherently deduplicated.

Bump allocator: your strings are &str, allocation is super fast with limited fragmentation.

Single pointer strings (this has a name, I can't think of it right now): you store the length inside the allocation instead of in each reference, so your strings are a single pointer.

eldenringtoday at 2:35 AM

CompactStr doesnt have any additional runtime overhead iirc right? So in theory you can drop it in everywhere even when you expect > 25 chars. Maybe an extra branch in the >25 char case?