This is an interesting question. There has been some attempt to model the camera better than just a pinhole camera, that could work. https://dof-gaussian.github.io/ https://github.com/leoShen917/DoF-Gaussian
My take.. at a macro scale, the dof is usually so small, that it's hard to get a reliably track. So you'd need some sort of way to tell that these stacked photos belong into a series, and then you sort of are doing focus stacking :-) I do think the alignment algorithm could be improved. Maybe the approaches I linked could be used to make a much more robust focus stacking algorithm, that also corrects for 3D geometry. That would be really cool!