Answer is yes, there are ways to generate pointcloud from multiple images. Some frequently used methods to generate 3D pointcloud from images are:
3D Reconstruction from Multiple images :
Having known camera's motion in 6-DOF space, based on changes in image intensities depth can be computed using standard stereo correspondence algorithms. But camera's motion cannot be precisely estimated using Gyro,Accelerometer and magnetometer.
You can read more about those methods here: General overview
In case 6-DOF pose in unknown, still you can extract point clouds from images using some of the methods like:
SLAM:
Uncertainty in position estimation can be solved by considering images along with motion information provided by inertial sensors. SLAM is a chicken-egg problem. To estimate depth you need precise motion information, to have motion information you require depth information. There are different versions of SLAM implemented for mobiles.
LSD-SLAM :
Large-Scale Direct Monocular SLAM is used to generate dense Depthmap from continuous video feed. This method is computationally intense. Can only be performed offline. Similar version implemented for mobiles too. You can find here
Bundle Adjustment (BA) :
Traditional Bundle Adjustments methods estimate Structure and motion of camera from multiple images using epipolar constraints and feature matching. It consumes more memory for Global Optimization. High quality 3D reconstruction of scene is possible using this method. There are multiple variants of this method available now.
You can find different approaches based on same concepts. Many of the above methods can be used to generate 3D pointcloud offline. But generating pointcloud in realtime is a big thing for mobile platforms like iPhone.
Thanks