Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations

Yoo, Cheol-Hwan; Ji, Seowon; Shin, Yong-Goo; Kim, Seung-Wook; Ko, Sung-Jea

doi:10.1109/ACCESS.2020.3001637

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Fast and Accurate 3D Hand Pose Estimation via Recurrent Neural Network for Capturing Hand Articulations

Authors: Yoo, Cheol-Hwan; Ji, Seowon; Shin, Yong-Goo; Kim, Seung-Wook; Ko, Sung-Jea

Issue Date: 2020

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: Three-dimensional displays; Pose estimation; Recurrent neural networks; Feature extraction; Two dimensional displays; Logic gates; 3D hand pose estimation; recurrent neural network; hand articulations

Citation: IEEE ACCESS, v.8, pp.114010 - 114019

Indexed: SCIE
SCOPUS

Journal Title: IEEE ACCESS

Volume: 8

Start Page: 114010

End Page: 114019

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/58943

DOI: 10.1109/ACCESS.2020.3001637

ISSN: 2169-3536

Abstract: 3D hand pose estimation from a single depth image plays an important role in computer vision and human-computer interaction. Although recent hand pose estimation methods using convolution neural network (CNN) have shown notable improvements in accuracy, most of them have a limitation that they rely on a complex network structure without fully exploiting the articulated structure of the hand. A hand, which is an articulated object, is composed of six local parts: the palm and five independent fingers. Each finger consists of sequential-joints that provide constrained motion, referred to as a kinematic chain. In this paper, we propose a hierarchically-structured convolutional recurrent neural network (HCRNN) with six branches that estimate the 3D position of the palm and five fingers independently. The palm position is predicted via fully-connected layers. Each sequential-joint, i.e. finger position, is obtained using a recurrent neural network (RNN) to capture the spatial dependencies between adjacent joints. Then the output features of the palm and finger branches are concatenated to estimate the global hand position. HCRNN directly takes the depth map as an input without a time-consuming data conversion, such as 3D voxels and point clouds. Experimental results on public datasets demonstrate that the proposed HCRNN not only outperforms most 2D CNN-based methods using the depth image as their inputs but also achieves competitive results with state-of-the-art 3D CNN-based methods with a highly efficient running speed of 285 fps on a single GPU.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Electrical Engineering > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,997,638; Today View :9,591

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE