Application Research of Short-Time Fourier Transform in Music Generation Based on the Parallel WaveGan System

Jun Min, Zhiwei Gao*, Lei Wang, Aihua Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
4 Downloads (Pure)

Abstract

Despite the widespread use of Fourier transform (FT) networks and generative adversarial networks (GANs) in audio signal processing, their practical effectiveness in unsupervised offline systems has not yet reached a fully satisfying level. Accumulating substantial experience in recent years, this article showcases how to construct an optimized, efficient music generation system. In the proposed system, the short-time Fourier transform is employed to divide a long music signal into equally sized short melodic segments. Each short melodic segment undergoes FT, and a nonautoregressive parallel WaveGAN system is trained by jointly optimizing multiresolution spectrograms and adversarial loss functions. This approach effectively captures the time–frequency distribution of real music waveforms. In essence, the proposed music generation system is a self-feedback unsupervised model relying on specific melody and note model pruning techniques. To further refine the music evaluation mechanism, in addition to conducting data analysis on the output melodies, subjective evaluation mechanisms are also incorporated.
Original languageEnglish
Pages (from-to)10770-10778
Number of pages9
JournalIEEE Transactions on Industrial Informatics
Volume20
Issue number9
Early online date15 May 2024
DOIs
Publication statusPublished - 1 Sept 2024

Keywords

  • Acoustics
  • Fourier transforms
  • Generative adversarial networks
  • Multiple signal classification
  • Music evaluation
  • Signal resolution
  • Spectrogram
  • Time-frequency analysis
  • music generation
  • parallel WaveGAN
  • short-time Fourier transform (STFT)
  • unsupervised models

Cite this