Firstly, sensors give you magnetism and gravity as 3D vectors. For
every picture, obtain magnetic East (in the horizontal plane) you take
the cross product of the two vectors. Then you get magnetic North by
taking the cross product of magnetic East and gravity. Then you
project the camera vector onto the horizontal plane to get the azimuth
angle. Then get elevation angle with the cosine rule.

When the user selects multiple photos aimed at the same point, you can
estimate that point using some least squares method. Preferably the
photos should have a large angular separation (as seen from the point
being observed), otherwise the results will be inaccurate (in GPS, it
is called Dilution Of Precision).

All these terms are described on Wikipedia, but they focus on linear
variables. Angles are non-linear. So either one should look at books
or papers describing theodolite computations, or just make your own
approximations and simulations. For example it may be sufficient to
assume the Earth is flat after applying the equirectangular
projection.

