系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (12): 3462-3469.doi: 10.12305/j.issn.1001-506X.2021.12.06

• 电子技术 • 上一篇    下一篇

基于神经网络的声场景数据声谱图提取方法

韦娟1,*, 丁智恺1, 宁方立2   

  1. 1. 西安电子科技大学通信工程学院, 陕西 西安 710071
    2. 西北工业大学机电学院, 陕西 西安 710072
  • 收稿日期:2020-10-06 出版日期:2021-11-24 发布日期:2021-11-30
  • 通讯作者: 韦娟
  • 作者简介:韦娟(1973—), 女, 副教授, 博士, 主要研究方向为声源定位、音频识别|丁智恺(1997—), 男, 硕士研究生, 主要研究方向为声场景识别|宁方立(1974—), 男, 教授, 博士, 主要研究方向为声源定位
  • 基金资助:
    国家自然科学基金(52075441);陕西省重点研发计划项目(2018GY-181);陕西省重点研发计划项目(2020ZDLGY06-09)

Spectrogram extraction method for acoustic scene data based on neural network

Juan WEI1,*, Zhikai DING1, Fangli NING2   

  1. 1. School of Telecommunication Engineering, Xidian University, Xi'an 710071, China
    2. School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an 710072, China
  • Received:2020-10-06 Online:2021-11-24 Published:2021-11-30
  • Contact: Juan WEI

摘要:

在复杂环境声场景识别任务中, 梅尔频谱作为输入的深度卷积神经网络有良好的识别能力, 然而梅尔滤波器组依据人耳生理特征设计, 对于声场景识别并非最优滤波器组。针对此问题提出声谱图提取神经网络取代传统梅尔频谱提取过程, 通过训练该网络使声谱图自动适应声场景数据集。声谱图提取神经网络连接ResNet50作为声场景识别架构, 在DCASE2019声场景数据集上进行训练与测试, 实验结果表明该架构比传统模型有更高的识别率, 能够有效调整频率曲线、滤波器幅值以及滤波器形状。

关键词: 声场景分类, 深度卷积神经网络, 声谱图提取神经网络, 梅尔频谱

Abstract:

In complex acoustic scene classification (ASC) tasks, the deep convolution neural network with Mel spectrum as input has good recognition ability. However, the Mel filter bank is designed based on the physiological characteristics of human ears and is not the optimal filter bank for ASC. To solve this problem, spectrogram extraction neural network (SENN) is proposed to replace the traditional Mel-spectrum extraction process, and by training this model, the spectrogram is automatically adapted to the acoustic scene data set. SENN is connected to ResNet50 as the ASC architecture, and the DCASE2019 acoustic scene data set is used for training and testing. The experimental results show that this architecture has higher recognition rate than traditional models and can effectively adjust the frequency curve, amplitude of filters and filter shape.

Key words: acoustic scene classificationcan (ASC), deep convolutional neural network (DCNN), spectrogram extraction neural network (SENN), Mel-spectrum

中图分类号: