The “hub and spoke model” of semantic representation suggests that the multimodal features of objects are drawn together by an anterior temporal lobe (ATL) “hub”, while modality-specific “spokes” capture perceptual/action features. However, relatively little is known about how these components are recruited through time to support object identification. We used magnetoencephalography to measure neural oscillations within left ATL, lateral fusiform cortex (FC) and central sulcus (CS) during word-picture matching at different levels of specificity (employing superordinate vs. specific labels) for different categories (manmade vs. animal). This allowed us to determine (i) when each site was sensitive to semantic category and (ii) whether this was modulated by task demands. In ATL, there were two phases of response: from around 100 ms post-stimulus there were phasic bursts of low gamma activity resulting in reductions in oscillatory power, relative to a baseline period, that were modulated by both category and specificity; this was followed by more sustained power decreases across frequency bands from 250 ms onwards. In the spokes, initial power increases were not stronger for specific identification, while later power decreases were stronger for specific-level identification in FC for animals and in CS for manmade objects (from around 150 ms and 200 ms, respectively). These data are inconsistent with a temporal sequence in which early sensory-motor activity is followed by later retrieval in ATL. Instead, knowledge emerges from the rapid recruitment of both hub and spokes, with early specificity and category effects in the ATL hub. The balance between these components depends on semantic category and task, with visual cortex playing a greater role in the fine-grained identification of animals and motor cortex contributing to the identification of tools.