For my thesis, I proposed a method for learning Visual Inertial Odometry (VIO) in an end-to-end manner using an extended Kalman filter (EKF) as part of the network architecture. VIO estimates the ego-motion of a vehicle using camera images and Inertial Measurement Units (IMU). Classical VIO methods rely heavily on manually crated image processing pipelines which are prone to failures in situations with rapid motion and texture-less scenes. Deep learning methods have shown tremendous success in address some of these limitations. However, it is still desirable to incorporate classical algorithms that have shown to work well for the specific problem domain.
Our work leverages a robo-centric EKF as part of the deep network, where EKFs has been used extensively in classical VIO pipelines. The robo-centric EKF integrates IMU measurements uses a well-understood model that is grounded in physics, and it allows us to use relative pose measurements from a deep network as EKF updates. The system is fully differentiable with the EKF, and it can be trained end-to-end in a supervised fashion. We tested the method on the KITTI self-driving car dataset and EuRoC drone dataset, and demonstrated competitive performance on KITTI.