Confidence Maps
The Confidence maps for each person k and each body part j is defined by:
It is a Gaussian curve with gradual changes where
sigma
controls the spread of the peak. The predicted peak of the network is an aggregation of the individual confidence maps by a max operator.
OpenPose : Human Pose Estimation Method
OpenPose is the first real-time multi-person system to jointly detect human body, hand, facial, and foot key-points (in total 135 key-points) on single images. It was proposed by researchers at Carnegie Mellon University. They have released in the form of Python code, C++ implementation and Unity Plugin. These resources can be downloaded from OpenPose repository.
Architecture:
- In first step the image is passed through baseline CNN network to extract the feature maps of the input In the paper. In this paper the authors used first 10 layers of VGG-19 network.
- The feature map is then process in a multi-stage CNN pipeline to generate the Part Confidence Maps and Part Affinity Field
- Part Confidence Maps:
- Part Affinity Field
- In the last step, the Confidence Maps and Part Affinity Fields that are generated above are processed by a greedy bipartite matching algorithm to obtain the poses for each person in the image.
Confidence Maps and Part Affinity Fields
- Confidence Maps: A Confidence Map is a 2D representation of the belief that a particular body part can be located in any given pixel. Confidence Maps are described by following equation:
where
J
is the number of body parts locations.
Multi Stage CNN:
The above multi-CNN architecture has three major steps:
- The first set of stages predicted the Part Affinity Fields refines Lt from the feature maps of base network F.
- The second set of stages takes use the output Part Affinity Fields from the previous layers to refine the prediction of confidence maps detection.
- The final S (confidence maps) and L (Part Affinity Field) are then passed into the greedy algorithm for further process.
Loss functions:
An
L2-loss
function is used to calculate the loss between the predicted confidence maps and Part Affinity fields to the ground truth maps and fields.
where
L
c
*
is the ground truth part affinity fields,
S
j
*
is the ground truth part confidence map, and
W
is a binary mask with
W
(
p
) = 0 when the annotation is missing at the pixel p. This is to prevent the extra loss that can be generated by these mask.
The intermediate supervision at each stage is used to address the problem of vanishing gradient problem by replenishing the gradient periodically.
Contact Us