The objective function here defines a Markov random field (MRF) with boolean random variables and certain local statistics of nearest neighbours, either uniform if the target is a white image, or varying with location to produce an image. MRFs define Gibbs probability distributions, which you can sample from (which will already produce a good image here) or perform gradient ascent on to reach a local maxima. The negative log-likelihood of the MRF distribution is equal to the loss function of the original optimisation problem, so the maximum likelihood estimate (MLE) (there will often be multiple due to symmetry) of the MRF is the optimal solution(s) to the original problem. (But in general the MLE can look completely different to a sample.)
The statistics are 9th-order (of 3x3 blocks of pixels) but of a simple form which are hardly more expressive than 2nd-order nearest neighbour statistics (in terms of the different textures that they can reproduce) which are well known. In the approximate case where you only care about the average value of each pixel I think it would collapse to 2nd-order. Texture synthesis with MRFs with local statistics is discretized (in space) Turing reaction-diffusion. I did my PhD on this topic.
Probably the most influential early paper on this kind of simple texture model, where you will see similar patterns, is:
Cross & Jain, 1983, PAMI, Markov Random Field Texture Models
Anything you’ve found to be additionally interesting or curious along this path or different but somewhat related?
Are you still working on this topic or other things now?