Abstract
In the field of music generation using algorithms, processing time-series data has consistently been a complex task. To improve music generation with long sequences, insightful-unit-conditional variational autoencoder is proposed, which can enhance unit-conditional variational autoencoders with an improved attention mechanism. This model integrates TransformerXLs recurrent mechanism and relative positional encoding with measure-level granularity. For practical applications, a scheme is addressed that uses optical flow to extract motion features from video frames, quantifying motion rate and intensity. Furthermore, a dynamic correlation method is proposed to align video motion features with musical rhythm, guiding the model to generate melodies that match the videos rhythm.
Original language | English |
---|---|
Pages (from-to) | 1409-1418 |
Number of pages | 10 |
Journal | IEEE Transactions on Industrial Informatics |
Volume | 21 |
Issue number | 2 |
Early online date | 22 Oct 2024 |
DOIs | |
Publication status | Published - 1 Feb 2025 |
Keywords
- Transformers
- Computational modeling
- Mathematical models
- Vectors
- Standards
- Encoding
- Rhythm
- Training
- Optical variables measurement
- Decoding
- Artificial intelligence (AI)
- background music
- conditional variational autoencoders (CVAE)
- music generation
- TransformerXL feature extraction