Action-Attending Graphic Neural Network

Chaolong Li*, Zhen Cui*, Wenming ZhengΤ, Chunyan Xu, Rongrong Ji, Jian Yang



Abstract: The motion analysis of human skeletons is crucial for human action recognition, which is one of the most active topics in computer vision. In this paper, we propose a fully end-to-end action-attending graphic neural network (A²GNN) for skeleton-based action recognition, in which each irregular skeleton is structured as an undirected attribute graph. To extract high-level semantic representation from skeletons, we perform the local spectral graph filtering on the constructed attribute graphs like the standard image convolution operation. Considering not all joints are informative for action analysis, we design an action-attending layer to detect those salient action units (AUs) by adaptively weighting skeletal joints. Herein the filtering responses are parameterized into a weighting function irrelevant to the order of input nodes. To further encode continuous motion variations, the deep features learnt from skeletal graphs are gathered along consecutive temporal slices and then fed into a recurrent gated network. Finally, the spectral graph filtering, action-attending and recurrent temporal encoding are integrated together to jointly train for the sake of robust action recognition as well as the intelligibility of human actions. To evaluate our A²GNN, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D dataset. The experimental results demonstrate that our network achieves the state-of-the-art performances.


•  Propose a fully end-to-end graphical neural network framework to deal with skeleton-based action recognition, where we model human skeletons as attribute graphs and then introduce spectral graph filtering to extract high-level skeletal features.
•  Design a weighting way to adaptively detect salient action units for different human actions, which not only promotes human action recognition accuracy but also favors our cognitive understanding to human actions.
•  Achieve the state-of-the-art performances on the four benchmark datasets including the two large-scale challenging datasets: LSC and NTU RGB+D.


Motion Capture Database HDM05

Florence 3D Actions Dataset

Large Scale Combined RGB-D Action Dataset

NTU RGB+D Action Recognition Dataset

Visualization of Detected Action Units

TensorFlow code used for the experiments.

To be released.


   title={Action-Attending Graphic Neural Network},
   author={Li, Chaolong and Cui, Zhen and Zheng, Wenming and Xu, Chunyan and Ji, Rongrong and Yang, Jian},
   journal={IEEE Transactions on Image Processing},