Yes, something prevents llms from being RLed to do this: You can't see through something opaque to determine whether there's something high calorie or low calorie out of sight.
The problem itself is unsolvable given the data provided.
You could conceivable make it better at making guesses, but they will inherently always be guesses that will sometimes be wildly off.
> You can't see through something opaque to determine whether there's something high calorie or low calorie out of sight
https://www-users.york.ac.uk/~ss44/joke/3.htm "There is at least one field, containing at least one sheep, of which at least one side is black."