Application Research of Short-Time Fourier Transform in Music Generation Based on the Parallel WaveGan System

Jun Min, Zhiwei Gao*, Lei Wang, Aihua Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Despite the widespread use of Fourier transform (FT) networks and generative adversarial networks (GANs) in audio signal processing, their practical effectiveness in unsupervised offline systems has not yet reached a fully satisfying level. Accumulating substantial experience in recent years, this article showcases how to construct an optimized, efficient music generation system. In the proposed system, the short-time Fourier transform is employed to divide a long music signal into equally sized short melodic segments. Each short melodic segment undergoes FT, and a nonautoregressive parallel WaveGAN system is trained by jointly optimizing multiresolution spectrograms and adversarial loss functions. This approach effectively captures the time–frequency distribution of real music waveforms. In essence, the proposed music generation system is a self-feedback unsupervised model relying on specific melody and note model pruning techniques. To further refine the music evaluation mechanism, in addition to conducting data analysis on the output melodies, subjective evaluation mechanisms are also incorporated.
Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalIEEE Transactions on Industrial Informatics
Early online date15 May 2024
Publication statusE-pub ahead of print - 15 May 2024

Cite this