Multimedia synchronization involves a temporal relationship between audio and visual media components. The presentation of “in-sync” data streams is essential to achieve a natural impression, as “out-of-sync” effects are often associated with user quality of experience (QoE) decrease . Recently , multi-sensory media (mulsemedia) has been demonstrated to provide a highly immersive experience for its users. Unlike traditional multimedia, mulsemedia consists of other media types (i.e., haptic, olfaction, taste, etc.) in addition to audio and visual content. Therefore, the goal of achieving high quality mulsemedia transmission is to present no or little synchronization errors between the multiple media components. In order to achieve this ideal synchronization, there is a need for comprehensive knowledge of the synchronization requirements at the user interface. This paper presents the results of a subjective study carried out to explore the temporal boundaries within which haptic and air-flow media objects can be successfully synchronized with video media. Results show that skews between sensorial media and multimedia might still give the effect that the mulsemedia sequence is “in-sync” and provide certain constraints under which synchronization errors might be tolerated. The outcomes of the paper are used to provide recommendations for mulsemedia service providers in order for their services to be associated with acceptable user experience levels, e.g. haptic media could be presented with a delay of up to 1 s behind video content, while air-flow media could be released either 5 s ahead of or 3 s behind video content.