Attentional control of referential information is an important contributor to the structure of discourse. We investigated how attention and memory interplay during visually situated sentence production. We manipulated speakers’ attention to the agent or the patient of a described event by means of a referential or a dot visual cue. We also manipulated whether the cue was implicit or explicit by varying its duration (70 ms vs. 700 ms). Participants used passive voice more often when their attention was directed to the patient’s location, regardless of whether the cue duration. This effect was stronger when the cue was explicit rather than implicit, especially for passive-voice sentences. Analysis of sentence onset latencies showed a divergent pattern: Latencies were shorter (1) when the agent was cued, (2) when the cue was explicit, and (3) when the (explicit) cue was referential. (1) and (2) indicate facilitated sentence planning when the cue supports a canonical (active voice) sentence frame and when speakers had more time to plan their sentences, whereas (3) suggests that sentence planning was sensitive to whether the cue was informative with regard to the cued referent. We propose that differences between production likelihoods and production latencies indicate distinct contributions from attentional focus and memorial activation to sentence planning: Although the former partly predicts syntactic choice, the latter facilitates syntactic assembly (i.e., initiating overt sentence generation).