This study examines the involvement of spatial and visual working memory (WM) in the construction of flexible spatial models derived from survey and route descriptions. Sixty young adults listened to environment descriptions, 30 from a survey perspective and the other 30 from a route perspective, while they performed spatial (spatial tapping [ST]) and visual (dynamic visual noise [DVN]) secondary tasks – believed to overload the spatial and visual working memory (WM) components, respectively – or no secondary task (control, C). Their mental representations of the environment were tested by free recall and a verification test with both route and survey statements. Results showed that, for both recall tasks, accuracy was worse in the ST than in the C or DVN conditions. In the verification test, the effect of both ST and DVN was a decreasing accuracy for sentences testing spatial relations from the opposite perspective to the one learnt than if the perspective was the same; only ST had a stronger interference effect than the C condition for sentences from the opposite perspective from the one learnt. Overall, these findings indicate that both visual and spatial WM, and especially the latter, are involved in the construction of perspective-flexible spatial models.