> I still don't know exactly what you mean
Straight forward quantization, just to one bit instead of 8 or 16 or 32. Training a one bit neural network from scratch is apparently an unsolved problem though.
> The trees that correspond to the neural networks are huge.
Yes, if the task is inherently 'fuzzy'. Many neural networks are effectively large decision trees in disguise and those are the ones which have potential with this kind of approach.
> Training a one bit neural network from scratch is apparently an unsolved problem though.
It was until recently, but there is a new method which trains them directly without any floating point math, using "Boolean variation" instead of Newton/Leibniz differentiation:
https://proceedings.neurips.cc/paper_files/paper/2024/hash/7...
>Many neural networks are effectively large decision trees in disguise and those are the ones which have potential with this kind of approach.
I don't see how that is true. Decision trees look at one parameter at a time and potentially split to multiple branches (aka more than 2 branches are possible). Single input -> discrete multi valued output.
Neural networks do the exact opposite. A neural network neuron takes multiple inputs and calculates a weighted sum, which is then fed into an activation function. That activation function produces a scalar value where low values mean inactive and high values mean active. Multiple inputs -> continuous binary output.
Quantization doesn't change anything about this. If you have a 1 bit parameter, that parameter doesn't perform any splitting, it merely decides whether a given parameter is used in the weighted sum or not. The weighted sum would still be performed with 16 bit or 8 bit activations.
I'm honestly tired of these terrible analogies that don't explain anything.
> Training a one bit neural network from scratch is apparently an unsolved problem though.
I don't think it's correct to call it unsolved. The established methods are much less efficient than those for "regular" neural nets but they do exist.
Also note that the usual approach when going binary is to make the units stochastic. https://en.wikipedia.org/wiki/Boltzmann_machine#Deep_Boltzma...