ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition - Details

Author：

Indexed by：

EI Scopus Engineering Village

Abstract：

Music　emotion　recognition　(MER),　a　sub-task　of　music　information　retrieval　(MIR),　has　developed　rapidly　in　recent　years.　However,　the　learning　of　affect-salient　features　remains　a　challenge.　In　this　paper,　we　propose　an　end-to-end　attention-based　deep　feature　fusion　(ADFF)　approach　for　MER.　Only　taking　log　Mel-spectrogram　as　input,　this　method　uses　adapted　VGGNet　as　spatial　feature　learning　module　(SFLM)　to　obtain　spatial　features　across　different　levels.　Then,　these　features　are　fed　into　squeeze-and-excitation　(SE)　attention-based　temporal　feature　learning　module　(TFLM)　to　get　multi-level　emotion-related　spatial-temporal　features　(ESTFs),　which　can　discriminate　emotions　well　in　the　final　emotion　space.　In　addition,　a　novel　data　processing　is　devised　to　cut　the　single-channel　input　into　multichannel　to　improve　calculative　efficiency　while　ensuring　the　quality　of　MER.　Experiments　show　that　our　proposed　method　achieves　10.43%　and　4.82%　relative　improvement　of　valence　and　arousal　respectively　on　the　R2　score　compared　to　the　state-of-the-art　model,　meanwhile,　performs　better　on　datasets　with　distinct　scales　and　in　multi-task　learning.　Copyright　©　2022　ISCA.

Keyword：

Data handling Emotion Recognition Learning systems Music Speech communication Speech recognition

Author Community：

[ 1 ] [Huang, Zi]School of Computer Science and Technology, Xi'an Jiaotong University, China
[ 2 ] [Ji, Shulei]School of Computer Science and Technology, Xi'an Jiaotong University, China
[ 3 ] [Hu, Zhilan]Media Technology Institute, Huawei Technologies Co., Ltd.
[ 4 ] [Cai, Chuangjian]Media Technology Institute, Huawei Technologies Co., Ltd.
[ 5 ] [Luo, Jing]School of Computer Science and Technology, Xi'an Jiaotong University, China
[ 6 ] [Yang, Xinyu]School of Computer Science and Technology, Xi'an Jiaotong University, China

Reprint Author's Address：

X. Yang;;School of Computer Science and Technology, Xi'an Jiaotong University, China;;email: yxyphd@mail.xjtu.edu.cn;;

Email：

Show more details

Related Keywords：

Big Data Analysis of Intelligent Emotion Recognition Based on ECG Signal Processing Technology
2022，2nd IEEE International Conference on Power, Electronics and Computer Applications, ICPECA 2022
Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
2022，23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Contextual Analysis of Transactional Data
2020，15th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2019, co-located with the 5th International Conference on Harmony Search, Soft Computing and Applications, ICHSA 2019
Autonomous pavement distress detection using ground penetrating radar and region-based deep learning
2020，Measurement: Journal of the International Measurement Confederation

Source ：

ISSN： 2308-457X

Year： 2022

Volume： 2022-September

Page： 4152-4156

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 5

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 9

Affiliated Colleges：

电子与信息工程学部（原电子与信息工程学院）计算机科学与技术学院（原计算机科学与技术系）

Get Fulltext

DOI Library Discovery Baidu Scholar Search

Type
Departments

All Years Choose Year From to