Confidence Maps

The Confidence maps for each person k and each body part j is defined by:

It is a Gaussian curve with gradual changes where

sigma

controls the spread of the peak. The predicted peak of the network is an aggregation of the individual confidence maps by a max operator.

OpenPose : Human Pose Estimation Method

OpenPose is the first real-time multi-person system to jointly detect human body, hand, facial, and foot key-points (in total 135 key-points) on single images. It was proposed by researchers at Carnegie Mellon University. They have released in the form of Python code, C++ implementation and Unity Plugin. These resources can be downloaded from OpenPose repository.

Architecture:

In first step the image is passed through baseline CNN network to extract the feature maps of the input In the paper. In this paper the authors used first 10 layers of VGG-19 network.
The feature map is then process in a multi-stage CNN pipeline to generate the Part Confidence Maps and Part Affinity Field
- Part Confidence Maps:
- Part Affinity Field
In the last step, the Confidence Maps and Part Affinity Fields that are generated above are processed by a greedy bipartite matching algorithm to obtain the poses for each person in the image.

Confidence Maps and Part Affinity Fields

Confidence Maps: A Confidence Map is a 2D representation of the belief that a particular body part can be located in any given pixel. Confidence Maps are described by following equation:

where

is the number of body parts locations.

Part Affinity Fields: Part Affinity is a set of 2D vector fields that encodes location and orientation of limbs of different people in the image. It encodes the data in the form of pairwise connections between body parts.

Multi Stage CNN:

The above multi-CNN architecture has three major steps:

The first set of stages predicted the Part Affinity Fields refines L_t from the feature maps of base network F.

The second set of stages takes use the output Part Affinity Fields from the previous layers to refine the prediction of confidence maps detection.

The final S (confidence maps) and L (Part Affinity Field) are then passed into the greedy algorithm for further process.

Loss functions:

L2-loss

function is used to calculate the loss between the predicted confidence maps and Part Affinity fields to the ground truth maps and fields.

where

is the ground truth part affinity fields,

is the ground truth part confidence map, and

is a binary mask with

(

) = 0 when the annotation is missing at the pixel p. This is to prevent the extra loss that can be generated by these mask.

The intermediate supervision at each stage is used to address the problem of vanishing gradient problem by replenishing the gradient periodically.

Tags:

#Image-Processing #Neural Network #Machine Learning #Machine Learning

Part Affinity Fields:

Confidence Maps

OpenPose : Human Pose Estimation Method

Similar Reads

Contact Us