I feel like the talk about "world models" is trying to reach at that, but cast it in diffe...

kannanvijayan • today at 3:12 PM • 2 replies • view on HN

I feel like the talk about "world models" is trying to reach at that, but cast it in different terminology. World model is just domain model, and once you're at domain model, there are multitudes of domains.

Unsupervised learning over domain rulesystems has the potential to let us define really well-defined, scoped models that behave a lot more deterministically and don't colour outside the lines, and reserve their weights for cleanly modeling the domain associations and relationships that matter.

I just asked codex the following question in the middle of my coding prompt:

  What are you thoughts on the relative strengths of ewoks vs jawans?

Answer:

  • Ewoks are stronger in direct conflict. They are organized fighters, good at
    ambushes, traps, terrain control, and coordinated attacks. On Endor, the beat
  a technologically superior force by using preparation and local knowledge.
  ....

As amusing as this may be, I really have no need or desire for my coding model to understand or be aware of ewoks and their relative strengths compared to jawans. Nor do I need it to understand the nuances of the races of middle earth. And prompt response of "I have no idea what you are talking about" to all of these would feel reassuringly scoped.

Mixture-of-Experts seems like an attempt to do this - the domain structure being extracted into specific sub-models that are presumably trained on particular domain-associated content - but it feels like this is once again the beginnings of what is possible.

Replies

blahblaher • today at 5:03 PM

I've been having similar thoughts, regarding the gigantic trillion parameter models. I'm starting to believe the future will be very specialized focused models thant can be run on modest hardware (locally) but that can scale in performance (latency, speed) in the cloud, much like any other software of today.

If you need to do programming do we really need trillions sized models? Other domains might be large or smaller, but there's no need for a model to 'know' everything and datacenter levels of hardware to run.

General chatbots might work better as larger models since you really don't know what people will also for, or alternatively we find a way to route the initial question to the appropriate model. Like MoE but without needing to load a gigantic model into memory first.

antonvs • today at 4:55 PM

> As amusing as this may be, I really have no need or desire for my coding model to understand or be aware of ewoks

You'll think otherwise the first time you're a victim of a zero-day ewok.

Seriously though, while coding models may not need to know about ewoks, their contextual knowledge of things beyond just writing code almost certainly makes them better coding models.

It could be difficult to constrain the training corpus "just right" so that you eliminate all the irrelevant subjects like ewoks but retain enough so that the model doesn't turn into an idiot savant capable of churning out correct code but incapable of understanding what you really want.

alt Hacker News

Replies