I really appreciated this piece. Thank you to OP for writing and submitting it.
The thing that piqued my interest was the side remark that the Dirac delta is a “distribution“, and that this is an unfortunate name clash with the same concept in probability (measure theory).
My training (in EE) used both Dirac delta “functions” (in signal processing) and distributions in the sense of measure theory (in estimation theory). Really two separate forks of coursework.
I had always thought that the use of delta functions in convolution integrals (signal processing) was ultimately justified by measure theory — the same machinery as I learned (with some effort) when I took measure theoretic probability.
But, as flagged by the OP, that is not the case! Mind blown.
Some of this is the result of the way these concepts are taught. There is some hand waving both in signal processing, and in estimation theory, when these difficult functions and integrals come up.
I’m not aware of signal processing courses (probably graduate level) in which convolution against delta “functions” uses the distribution concept. There are indeed words to the effect of either,
- Dirac delta is not a function, but think of it as a limit of increasingly-concentrated Gaussians;
- use of Dirac delta is ok, because we don’t need to represent it directly, only the result of an inner product against a smooth function (i.e., a convolution)
But these excuses are not rigorously justified, even at the graduate level, in my experience.
*
Separately from that, I wonder if OP has ever seen the book Radically Elementary Probability Theory, by Edward Nelson (https://web.math.princeton.edu/~nelson/books/rept.pdf). It uses nonstandard analysis to get around a lot of the (elegant) fussiness of measure theory.
The preface alone is fun to read.
While the limit of increasingly concentrated Gaussian's does result in a Dirac delta, but it is not the only way the Dirac delta comes about and is probably not the correct way to think about it in the context of signal processing.
When we are doing signal processing the Dirac delta primarily comes about as the Fourier transform of a constant function, and if you work out the math this is roughly equivalent to a sinc function where the oscillations become infinitely fast. This distinction is important because the concentrated Gaussian limit has the function going to 0 as we move away from the origin, but the sinc function never goes to 0, it just oscillates really fast. This becomes a Dirac delta because any integral of a function multiplied by this sinc function has cancelling components from the fast oscillations.
The poor behavior of this limit (primarily numerically) is the closely related to the reasons why we have things like Gibbs phenomenon.
The Dirac delta is a unitary vector when represented on a vectorial basis it's a component of.
I don't know what kind of justification you expect. There's a Dirac delta sized "hole" on linear algebra, that mathematicians need a name for. It's not like we can just leave it there, unfilled.
Thanks! And yeah I’m familiar with Nelson
> But these excuses are not rigorously justified, even at the graduate level, in my experience.
Imo, the informal use is already pretty close to the formal definition. Formally, a distribution is defined purely by its inner products against certain smooth functions (usually the ones with compact support) which is what the OP alluded to when he said:
> The formal definition of a generalized function is: an element of the continuous dual space of a space of smooth functions.
That "element of the continuous dual space" is just a function that takes in a smooth function with compact support f, and returns what we take to be the inner product of f with our generalized function.
So (again, imo) "we don’t need to represent it directly, only the result of an inner product against a smooth function" isn't that distant to the formal definition.