So, hand-coded weights can do it with 36 params and 311 for trained weights - did anyone try the for...

ks2048 • today at 1:30 AM • 2 replies • view on HN

So, hand-coded weights can do it with 36 params and 311 for trained weights - did anyone try the former architecture, but starting with random weights and learning?

Replies

alexlitz • today at 2:49 AM

For one the specific 36 parameter version is impossible without float64 so you might guess the corollary that it is not exactly amenable to being found by gradient descent. I think the question of how you can structure transformers and neural nets in general so that they can both very parsimoniously represent things like this and have it be amenible to learning by gradient descent.

bitwize • today at 3:12 AM

"Minsky, why did you close your eyes?"

"So that the room will be empty."

alt Hacker News

Replies