系统工程与电子技术 ›› 2023, Vol. 45 ›› Issue (7): 2203-2210.doi: 10.12305/j.issn.1001-506X.2023.07.31

• 制导、导航与控制 • 上一篇    下一篇

基于多尺度特征的像素位姿定位优化方法

董思强, 邓年茂, 刘琰   

  1. 北京控制与电子技术研究所, 北京 100038
  • 收稿日期:2022-02-18 出版日期:2023-06-30 发布日期:2023-07-11
  • 通讯作者: 董思强
  • 作者简介:董思强(1981—), 男, 高级工程师, 博士研究生, 主要研究方向为视觉导航、深度学习、制导与控制
    邓年茂(1963—), 男, 研究员, 博士, 主要研究方向为视觉导航、制导与控制、光电技术
    刘琰(1976—), 男, 研究员, 硕士, 主要研究方向为制导与控制、视觉导航、深度学习、测试技术

Optimization method of pixel pose location based on multi-scale features

Siqiang DONG, Nianmao DENG, Yan LIU   

  1. Beijing Institute of Control and Electronic Technology, Beijing 100038, China
  • Received:2022-02-18 Online:2023-06-30 Published:2023-07-11
  • Contact: Siqiang DONG

摘要:

在已知三维信息的场景中估计相机位姿, 是自主驾驶、增强现实、虚拟现实等领域的重要环节。已有方法从输入图像中直接回归相机的位姿, 或者通过回归像素的三维坐标方式计算相机位姿, 这些方法存在的问题是与训练场景耦合严重, 在新环境中缺少泛化能力。认为深度学习网络应该专注于学习鲁棒和不变的图像特征, 因此介绍了一种基于多尺度图像特征对齐的优化方法, 将图像特征相似性作为度量形式, 将相机位姿作为优化量, 通过从像素到位姿的端到端的训练, 来估计相机精确的六自由度(6 degree of freedom, 6DOF)位姿。该模型参数和场景分离, 对新场景有较强的泛化能力, 并且具有较好的定位精度。

关键词: 多尺度图像特征, 位姿优化, 特征对齐, 泛化能力

Abstract:

Estimating camera pose in a scene with known 3D information is important in several fields such as autonomous driving, augmented reality and virtual reality. There are many methods to directly return to the camera pose from the input image, and then calculate the camera pose through the 3D coordinates of the returning pixels. However, the problem of these methods is that they are seriously coupled with the training scene and lack the generalization ability in the new environment. The convolutional neural network should focus on learning the robust and invariant visual features. Therefore, a direct alignment optimal method is proposed based on multi-scale features. It takes the feature similarity as the measurement form and the camera pose as the optimization quantity, and estimates the accurate 6 degree of freedom (6DOF) pose of the camera through end-to-end training from pixel to pose. The system separates the model parameters from the scene, has strong generalization ability for new scenes, and has good pose location accuracy.

Key words: multiscale visual feature, pose optimization, feature alignment, generalization ability

中图分类号: