Safari does most of this by leveraging system-level AI features, some of which are entirely local (and in turn, can be and do get used elsewhere throughout the system and native apps). This model makes a lot more sense to me than building the browser around an LLM.
Firefox uses local models for translation, summarisation and possibly other stuff. As it is not restricted on one platform, I guess that it has to use its own tools, while apple (or macos/ios focused software in general) can use system level APIs. But the logic I guess is the same.