We investigated the joint use of the high-resolution WorldView-2 optical satellite images and the multitemporal TerraSAR-X synthetic aperture radar (SAR) satellite images to extract building height information in high-density urban areas. The main idea of the proposed fusion approach is to take full advantage of both data sets in building height retrieval. The proposed approach includes two main stages. First, initial building height estimates are extracted from WorldView-2 stereo images and multitemporal SAR images. These initial results are then combined using a novel object-based fusion approach, in which the heights of points for the same building footprint are retrieved and integrated. Experiments on the Mong Kok area of Hong Kong showed that the proposed approach using both data sets outperforms the use of either stereo images or SAR images alone. According to the results of the proposed approach, the average absolute height retrieval error is 6.53 m, which is much lower than using stereo and SAR images (9.08 and 12.24 m, respectively). The proposed fusion approach is suitable for building height retrieval in urban areas where single satellite data have limitations.