Abstract
Despite the widespread use of Fourier transform (FT) networks and generative adversarial networks (GANs) in audio signal processing, their practical effectiveness in unsupervised offline systems has not yet reached a fully satisfying level. Accumulating substantial experience in recent years, this article showcases how to construct an optimized, efficient music generation system. In the proposed system, the short-time Fourier transform is employed to divide a long music signal into equally sized short melodic segments. Each short melodic segment undergoes FT, and a nonautoregressive parallel WaveGAN system is trained by jointly optimizing multiresolution spectrograms and adversarial loss functions. This approach effectively captures the time–frequency distribution of real music waveforms. In essence, the proposed music generation system is a self-feedback unsupervised model relying on specific melody and note model pruning techniques. To further refine the music evaluation mechanism, in addition to conducting data analysis on the output melodies, subjective evaluation mechanisms are also incorporated.
Original language | English |
---|---|
Pages (from-to) | 10770-10778 |
Number of pages | 9 |
Journal | IEEE Transactions on Industrial Informatics |
Volume | 20 |
Issue number | 9 |
Early online date | 15 May 2024 |
DOIs | |
Publication status | Published - 1 Sept 2024 |
Keywords
- Acoustics
- Fourier transforms
- Generative adversarial networks
- Multiple signal classification
- Music evaluation
- Signal resolution
- Spectrogram
- Time-frequency analysis
- music generation
- parallel WaveGAN
- short-time Fourier transform (STFT)
- unsupervised models