This paper proposes a high resolution video sensor for the 3D reconstruction of architec-tural models from multiple image sequences. The hybrid system unifies triangulation methods of spatial stereo with tracking methods of temporal stereo. We describe an efficient spatial image match-ing algorithm, which is based on trinocular image rectification and semi-global optimization. The motion of the video sensor is estimated using temporal feature tracking and allows the integration of dense point clouds. First experimental results are shown for images of a real scene.