Confidence Maps

The Confidence maps for each person k and each body part j is defined by:

It is a Gaussian curve with gradual changes where

sigma

controls the spread of the peak. The predicted peak of the network is an aggregation of the individual confidence maps by a max operator.

OpenPose : Human Pose Estimation Method

OpenPose is the first real-time multi-person system to jointly detect human body, hand, facial, and foot key-points (in total 135 key-points) on single images. It was proposed by researchers at Carnegie Mellon University. They have released in the form of Python code, C++ implementation and Unity Plugin. These resources can be downloaded from OpenPose repository.

Architecture: 

  • In first step the image is passed through baseline CNN network to extract the feature maps of the input In the paper. In this paper the authors used first 10 layers of VGG-19 network.
  • The feature map is then process in a multi-stage CNN pipeline to generate the Part Confidence Maps and Part Affinity Field
    • Part Confidence Maps:
    • Part Affinity Field
  • In the last step, the Confidence Maps and Part Affinity Fields  that are generated above are processed by a greedy bipartite matching algorithm to obtain the poses for each person in the image.

Confidence Maps and Part Affinity Fields

  • Confidence Maps: A Confidence Map is a 2D representation of the belief that a particular body part can be located in any given pixel. Confidence Maps are described by following equation:

where

J

is the number of body parts locations.

  • Part Affinity Fields: Part Affinity is a set of 2D vector fields that encodes location and orientation of limbs of different people in the image. It encodes the data in  the form of pairwise connections between body parts.
  • Multi Stage CNN:

    The above multi-CNN architecture has three major steps:

    • The first set of stages predicted the Part Affinity Fields refines L from the feature maps of base network F.
    • The second set of stages takes use the output Part Affinity Fields from the previous layers to refine the prediction of confidence maps detection.
    • The final S (confidence maps) and L (Part Affinity Field) are then passed into the greedy algorithm for further process.

    Loss functions:

    An

    L2-loss

    function is used to calculate the loss between the predicted confidence maps and Part Affinity fields to the ground truth maps and fields.

    where

    L

    c

    *

    is the ground truth part affinity fields,

    S

    j

    *

    is the ground truth part confidence map, and

    W

    is a binary mask with

    W

    (

    p

    ) = 0 when the annotation is missing at the pixel p. This is to prevent the extra loss that can be generated by these mask.


    The intermediate supervision at each stage is used to address the problem of vanishing gradient problem by replenishing the gradient periodically.

    Similar Reads

    Confidence Maps:

    The Confidence maps for each person k and each body part j is defined by:...

    Part Affinity Fields:

    The part affinity field is required especially in multi person pose detection we are required to map the correct body parts to its body. Because for multiple persons, there are multiple heads, hands, shoulders etc. Thus it becomes difficult to distinguish sometimes when they closely grouped together. PAF provides a connection between different part of the body that belongs to the same person. A stronger PAF link between body parts represents that high chances that those body parts belong to the same person....

    Contact Us