Differentiation turns out to be a deeper subject than most people expect even if you just stick to the ordinary real numbers rather than venturing into things like hyperreals.
I once saw in an elementary calculus book a note after the proof of a theorem about differentiation that the converse of the theorem was also true but needed more advanced techniques than were covered in the book.
I checked the advanced calculus and real analysis books I had and they didn't have the proof.
I then did some searching and found mention of a book titled "Differentiation" (or something similar) and found a site that had scans for the first chapter of that book. It proved the theorem on something like page 6 and I couldn't understand it at all. Starting from the beginning I think I got through maybe a page or two before it got to my deep with my mere bachelor's degree in mathematics level of preparation.
I kind of wish I'd bought a copy of that book. I've never since been able to find it. I've found other books with the same or similar title but they weren't it.
One minor nit: A function can be differentiable at a and discontinuous at a even with the standard definition of the derivative. A trivial example would be the function f(x) = (x²-1)/(x-1) which is undefined at x=1, but f'(1)=1 (in fact derivatives have exactly this sort of discontinuity in them which is why they’re defined via limits). In complex analysis, this sort of “hole” in the function is called a removable singularity¹ which is one of three types of singularities that show up in complex functions.
⸻
1. Yes, this is mathematically the reason why black holes are referred to as singularities.
I think you can get a generalisation of autodiff using this idea of "nonstandard real numbers": You just need a computable field with infinitesimals in it. The Levi-Civita field looks especially convenient because it's real-closed. You might be able to get an auto-limit algorithm from it by evaluating a program infinitely close to a limit. I'm not sure if there's a problem with numerical stability when something like division by infinitesimals gets done. Does this have something to do with how Mathematica and other CASes take limits of algebraic expressions?
-----
Concerning the Dirac delta example: I think this is probably a pleasant way of using a sequence of better and better approximations to the Dirac delta. Terry Tao has some nice blog posts where he shows that a lot of NSA can be translated into sequences, either in a high-powered way using ultrafilters, or in an elementary way using passage to convergent subsequences where necessary.
An interesting question is: What does distribution theory really accomplish? Why is it useful? I have an idea myself but I think it's an interesting question.
I've personally always thought of the Dirac delta function as being the limit of a Gaussian with variance approaching 0. From this perspective, the Heaviside step function is a limit of the error function. I feel the error function and logistic function approaches should be equivalent, though I haven't worked through to math to show it rigorously.
Hm. Back when I was working on game physics engines this might have been useful.
In impulse/constraint mechanics, when two objects collide, their momentum changes in zero time. An impulse is an infinite force applied over zero time with finite energy transfer. You have to integrate over that to get the new velocity. This is done as a special case. It is messy for multi-body collisions, and is hard to make work with a friction model. This is why large objects in video games bounce like small ones, changing direction in zero time.
I wonder if nonstandard analysis might help.
Wow, it never occurred to me that the step function and the dirac delta are related in this way! but now that i see it, it's obvious!
I've never learnt this level of maths formally, but it's been an interest of mine on and off. And this post explained it very well, and pretty understandably for the laymen.
> The Number of Pieces an Integral is Cut Into
> You’re probably familiar with the idea that each piece has infinitesimal width, but what about the question of ‘how MANY pieces are there?’. The answer to that is a hypernatural number. Let’s call it N again.
Is that right? I thought there was an important theorem specifying that no matter the infinitesimal width of an integral slice, the total area will be in the neighborhood of (= infinitely close to) the same real number, which is the value of the integral. That's why we don't have to specify the value of dx when integrating over dx... right?
It is an interesting piece but to claim that no heavy machinery is used is a bit disingenuous at best. You have defined some purely algebraic operation “differentiation”. This operation involves a choice of infinitesimal. Is it trivial to show that the definition is independent of infinitesimal? especially if we are deriving at a hyperreal point? I doubt it and likely you would need to do more complicated set theoretic limits rather analytic limits. How do you calculate the integral of this function? Or even define it? Or rather functions, since it’s an infinite family of logistic functions? To even properly define this space you need to go quite heavily into set theory and i doubt many would find it simpler, even than working with distributions
Related to the Hyperreal numbers mentioned in the article is the class of Surreal numbers which have many fun properties. There's a nice book describing them authored by Don Knuth.
>We’ll use the hyperreal numbers from the unsexily named field of nonstandard analysis
There it is.
I really appreciated this piece. Thank you to OP for writing and submitting it.
The thing that piqued my interest was the side remark that the Dirac delta is a “distribution“, and that this is an unfortunate name clash with the same concept in probability (measure theory).
My training (in EE) used both Dirac delta “functions” (in signal processing) and distributions in the sense of measure theory (in estimation theory). Really two separate forks of coursework.
I had always thought that the use of delta functions in convolution integrals (signal processing) was ultimately justified by measure theory — the same machinery as I learned (with some effort) when I took measure theoretic probability.
But, as flagged by the OP, that is not the case! Mind blown.
Some of this is the result of the way these concepts are taught. There is some hand waving both in signal processing, and in estimation theory, when these difficult functions and integrals come up.
I’m not aware of signal processing courses (probably graduate level) in which convolution against delta “functions” uses the distribution concept. There are indeed words to the effect of either,
- Dirac delta is not a function, but think of it as a limit of increasingly-concentrated Gaussians;
- use of Dirac delta is ok, because we don’t need to represent it directly, only the result of an inner product against a smooth function (i.e., a convolution)
But these excuses are not rigorously justified, even at the graduate level, in my experience.
*
Separately from that, I wonder if OP has ever seen the book Radically Elementary Probability Theory, by Edward Nelson (https://web.math.princeton.edu/~nelson/books/rept.pdf). It uses nonstandard analysis to get around a lot of the (elegant) fussiness of measure theory.
The preface alone is fun to read.