Energy forecasting plays a vital role in mitigating challenges in data rich smart grid (SG) systems involving various applications such as demand-side management, load shedding, and optimum dispatch. Managing efficient forecasting while ensuring the least possible prediction error is one of the main challenges posed in the grid today, considering the uncertainty in SG data. This paper presents a comprehensive and application-oriented review of state-of-the-art forecasting methods for SG systems along with recent developments in probabilistic deep learning (PDL). Traditional point forecasting methods including statistical, machine learning (ML), and deep learning (DL) are extensively investigated in terms of their applicability to energy forecasting. In addition, the significance of hybrid and data pre-processing techniques to support forecasting performance is also studied. A comparative case study using the Victorian electricity consumption in Australia and American electric power (AEP) datasets is conducted to analyze the performance of deterministic and probabilistic forecasting methods. The analysis demonstrates higher efficacy of DL methods with appropriate hyper-parameter tuning when sample sizes are larger and involve nonlinear patterns. Furthermore, PDL methods are found to achieve at least 60% lower prediction errors in comparison to other benchmark DL methods. However, the execution time increases significantly for PDL methods due to large sample space and a tradeoff between computational performance and forecasting accuracy needs to be maintained.