logoalt Hacker News

ssivarkyesterday at 10:37 PM0 repliesview on HN

> Also [IFF] you do not learn anything about the data while performing the binary search, no?

Yes, absolutely!

I forgot to share this general perspective above, and it's too late to edit, so I'll add it here...

Since binary search assumes only monotonicity; splitting your interval into two equal parts extracts one bit of information per step, and any other choice would extract less information on average. One bit of information per step is how you end up needing log(n) steps to find the answer.

To accelerate your search, you basically need to extract those log(n) bits as fast as you can. You can think of that as leveraging both the prior, and everything you learn along the way -- to adaptively design each step to be the optimal experiment to extract maximum amount of information. And adaptive local models of your search space (gradient / hessian / etc) allow you to extract many more bits of information from each query / experiment, provided the function you are inverting has some local structure.

PS: That is why we leverage these ideas to "search" for the optimum, among a space of solutions.