Abstract: The absolute scale estimation of monocular structure from motion (SfM) is still under-explored even though it is essential for robotic tasks or real-world interaction. Typically, the use of physical scale cues requires a calibration process while context scale cues introduce geometric assumptions. In this paper, we propose a novel method to obtain absolute scales of the scene and camera motion by combining monocular SfM and uncalibrated depth from defocus (DfD) which is free for zooming and focusing on each shot independently. Specifically, we exploit that the scene structure and field of view (FoV) of each camera estimated by SfM are tightly coupled to the focal length and focused distance of DfD, and the radius of the effective aperture of the lens constrains the absolute scale of the entire estimation. The effectiveness of the proposed method is verified by using a commercially available camera with a varifocal lens through various experiments.