Detect body gestures to control RPG game using Mediapipe

If you have ever played Nintendo Switch games, you are likely familiar with the game Ring Fit Adventure, an exercise-based video game that encourages physical activity. Or Mario Tennis and its swing mode, which allows players to swing the Joy-Con like a tennis racket to play the game. Physical activity and fun gameplay make these games an excellent choice for anyone seeking a unique and entertaining way to stay fit.

So this post will show you how to create an app that converts your body gestures into real-time game controls. It can't yet replace all the functions of a keyboard or controller, but we'll strive to improve it in future versions.

You can view the demo video here. I am utilizing this application to play The Legend of Zelda: Breath of the Wild.

https://www.youtube.com/watch?v=nMx1VlgjfBw

This also has a function to control the steering wheel, allowing you to use it to control a racing game. You can view the demo video here.

https://www.youtube.com/watch?v=gAEEKOdsAxs

Here is the full code of this project.

https://github.com/ngviethoang/body-gesture-to-keyboard-control

This post below will explain how this project works.

Outline

First, let's break this down into smaller problems. It will be easier to comprehend and address this issue.

Detect human pose from the camera: We will apply the Mediapipe pose solution here to get the pose in real-time
Detect pre-defined body gestures: Define body gestures from pose estimation output to detect when the conditions for the gesture are met.
Trigger keyboard events corresponding to each gesture.

We will attempt to solve each problem and then incorporate them into the app.

Pose detection using Mediapipe

In this step, I will use PyQt to build the app and add mediapipe solutions to it.

First, install the packages that we need to use in this project, like PySide6, mediapipe, opencv, numpy.

pip install PySide6 opencv-python mediapipe numpy

Next, create a file to run the app.

import sys
from PySide6.QtCore import Qt, Slot
from PySide6.QtGui import QImage, QPixmap
from PySide6.QtWidgets import (
    QApplication,
    QComboBox,
    QHBoxLayout,
    QLabel,
    QMainWindow,
    QCheckBox,
    QVBoxLayout,
    QWidget,
    QFormLayout,
    QSlider,
    QPushButton,
)

class Window(QMainWindow):
    def __init__(self):
        super().__init__()
        # Title and dimensions
        self.setWindowTitle("Pose Detection")
        self.setGeometry(100, 100, 900, 650)

if __name__ == "__main__":
    app = QApplication()
    w = Window()
    w.show()
    sys.exit(app.exec())

To run the camera and extract images from it, we need to run an independent thread. This thread will read the image output from the camera. We can then use Mediapipe to process and display this image.

Create a class that inherits from PySide6's QThread. This class should be used to run the pose detection process.

class Cv2Thread(QThread):
        def __init__(self, parent=None):
        QThread.__init__(self, parent)

        def run(self):
                pass

In this run() function, we will read an image from the camera and run MediaPipe's pose estimation.

You can see the mediapipe solution for Python here.

Here is the code to run the pose estimation. It also draws the landmarks for the detected pose.

The results variable is the output of the pose detection. It contains the body's landmarks, which we can use to detect gestures.

You can read the comment for more information.

def run(self):
    self.cap = cv2.VideoCapture(0)
    with mp_pose.Pose() as pose:
        while self.cap.isOpened():
            success, image = self.cap.read()
            if not success:
                print("Ignoring empty camera frame.")
                # If loading a video, use 'break' instead of 'continue'.
                continue

            # To improve performance, optionally mark the image as not writeable to
            # pass by reference.
            # Recolor image to RGB
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image.flags.writeable = False

            # Make detection
            results = pose.process(image)

            # Recolor back to BGR
            image.flags.writeable = True
            image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

            # Draw landmark annotation on the image.
            mp_drawing.draw_landmarks(
                image,
                results.pose_landmarks,
                mp_pose.POSE_CONNECTIONS,
                landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style(),
            )

            # Reading the image in RGB to display it
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

            # Creating and scaling QImage
            h, w, ch = image.shape
            image = QImage(image.data, w, h, ch * w, QImage.Format_RGB888)
            image = image.scaled(640, 480, Qt.KeepAspectRatio)

            if cv2.waitKey(5) & 0xFF == 27:
                break

    sys.exit(-1)

The processor can now be run on the thread, but we need the app to display the processed image. In PyQt, we have the Signal to send data to the main window.

class Cv2Thread(QThread):
        update_frame = Signal(QImage)

In the run function, add the emit function.

def run(self):
    self.cap = cv2.VideoCapture(0)
    with mp_pose.Pose() as pose:
        while self.cap.isOpened():
            success, image = self.cap.read()

                        ...

            # Creating and scaling QImage
            h, w, ch = image.shape
            image = QImage(image.data, w, h, ch * w, QImage.Format_RGB888)
            image = image.scaled(640, 480, Qt.KeepAspectRatio)

                        # Emit signal
            self.update_frame.emit(image)

            if cv2.waitKey(5) & 0xFF == 27:
                break

    sys.exit(-1)

Now back to Window class, update the __init__ function to create the thread and receive the image data.

After receiving the image, we set it to the label and show it in the app.

class Window(QMainWindow):
    def __init__(self):
        super().__init__()
        # Title and dimensions
        self.setWindowTitle("Pose Detection")
        self.setGeometry(100, 100, 900, 650)

        self.cv2_thread = Cv2Thread(self)
        self.cv2_thread.finished.connect(self.close)
                # Receive data from thread
        self.cv2_thread.update_frame.connect(self.setImage)

    @Slot(QImage)
    def setImage(self, image):
        self.camera_label.setPixmap(QPixmap.fromImage(image))

Now let’s run the app and see the result. Don’t forget to activate the virtualenv if it’s used.

python window.py

Now you can see pose landmarks marked in the camera.

Detect body gestures

Using the coordinates of these landmarks, we will attempt to detect body gestures and emit events when they are identified.

The results contain 33 pose landmarks, as shown in the image below. We can use these landmarks to detect gestures.

Mediapipe pose landmarks

Another solution is to use machine learning (ML) to train the dataset for each pose. As I'm new to this field, we will skip this part for now and revisit it later to enhance our solution.

You can check another solution here: Pose Classification

In this post, I will use a simple method for gesture detection.

Each gesture has different angles between the landmarks. For example, when we’re standing, the angle between hip, knee and ankle is close to 180 degrees. Also when we curl our hands, the angle between shoulder, elbow and wrist landmarks is less than 45 degrees.

So, in the first step, we will get all landmarks that we need and calculate the angles between them for further calculation.

First, create a helper function for getting the right landmark.

def get_landmark_coordinates(landmarks, landmark):
    value = landmarks[landmark.value]
    return [
        value.x,
        value.y,
        value.z,
        value.visibility,
    ]

Then, extract the landmarks from results.

import mediapipe as mp

mp_pose = mp.solutions.pose

pose_landmarks = results.pose_landmarks.landmark

# Get coordinates
left_shoulder = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.LEFT_SHOULDER
)
right_shoulder = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.RIGHT_SHOULDER
)

left_elbow = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.LEFT_ELBOW
)
right_elbow = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.RIGHT_ELBOW
)

left_wrist = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.LEFT_WRIST
)
right_wrist = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.RIGHT_WRIST
)

left_hip = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.LEFT_HIP
)
right_hip = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.RIGHT_HIP
)

left_knee = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.LEFT_KNEE
)
right_knee = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.RIGHT_KNEE
)

left_ankle = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.LEFT_ANKLE
)
right_ankle = get_landmark_coordinates(
    pose_landmarks, mp_pose.PoseLandmark.RIGHT_ANKLE
)

...

To calculate the angle between three points in a two-dimensional coordinate system, use the following example code:

import numpy as np

# calculate angle between line ab and bc
def calculate_angle(a, b, c):
    a = np.array(a)  # First
    b = np.array(b)  # Mid
    c = np.array(c)  # End

    radians = np.arctan2(c[1] - b[1], c[0] - b[0]) - np.arctan2(a[1] - b[1], a[0] - b[0])
    angle = np.abs(radians * 180.0 / np.pi)

    if angle > 180.0:
        angle = 360 - angle

    return angle

We can now use this function to calculate the angles of the body's landmarks.

left_shoulder_angle = calculate_angle(left_elbow, left_shoulder, left_hip)
right_shoulder_angle = calculate_angle(
    right_elbow, right_shoulder, right_hip
)

left_elbow_angle = calculate_angle(left_shoulder, left_elbow, left_wrist)
right_elbow_angle = calculate_angle(
    right_shoulder, right_elbow, right_wrist
)

left_hip_angle = calculate_angle(left_shoulder, left_hip, left_knee)
right_hip_angle = calculate_angle(right_shoulder, right_hip, right_knee)

left_knee_angle = calculate_angle(left_hip, left_knee, left_ankle)
right_knee_angle = calculate_angle(right_hip, right_knee, right_ankle)

left_hip_knee_angle = calculate_angle(right_hip, left_hip, left_knee)
right_hip_knee_angle = calculate_angle(left_hip, right_hip, right_knee)

We can use these angles to monitor our body's states when we move and emit events like curling our hands or walking, if detected.

We need an object to store the body's current states, such as walking and swinging hands, so that we can predict the current gesture.

Predict body gesture

I will use the squat exercise as an example in this post.

First, create a class called LegState to manage the state of the body when we receive the angles above.

Set the property squat = False to keep track of the current state.

If the angle of both the left and right knee are smaller than a certain threshold (I set it to 155 degrees here), we change the property squat to True and emit the event. Otherwise, the property is set to False.

class LegsState:

    KNEE_UP_MAX_ANGLE = 155

    def __init__(self):
        self.squat = False

    def update(
        self,
        left_hip,
        right_hip,
        left_knee,
        right_knee,
        left_ankle,
        right_ankle,
        left_hip_angle,
        right_hip_angle,
        left_knee_angle,
        right_knee_angle,
    ):
        if (
            left_knee_angle < self.KNEE_UP_MAX_ANGLE
            and right_knee_angle < self.KNEE_UP_MAX_ANGLE
        ):
            if not self.squat:
                self.squat = True
                                print('squat')
                # emit event squat here 
        else:
            self.squat = False

Trigger keyboard events

The next step is to create a new class to receive events such as squats, walking,… and turn them into keyboard events.

First, let's create a class called Command for pressing and releasing the correct key for each event.

In the add_command function, we pass a command_key_mappings dictionary to retrieve the key associated with a command. If no key is found, the function terminates.

When a key is pressed, it must be released after a period of 100 milliseconds to 1 second or more. However, the event queue can become overloaded with numerous events in a single second, so we need a way to prevent pressing the same key that is already pressed and to release the key after a certain amount of time.

We will use a timer to separately release the currently pressed key. We also need to remember the most recent pressed key to avoid duplication.

from datetime import datetime
from pynput.keyboard import Controller
from threading import Timer

class CommandProcessor:
    def __init__(self):
        self.keyboard = Controller()
        self.pressing_key = None
        self.pressing_timer = None

    def release_previous_key(self):
        if self.pressing_key:
            previous_key = self.pressing_key["key"]
            # print(f"releasing {previous_key}")
            self.keyboard.release(previous_key)
            self.pressing_key = None

    def add_command(
        self,
        command,
        keyboard_enabled: bool,
        command_key_mappings: dict,
        pressing_timer_interval: float,
    ):
        now = datetime.now()

        if keyboard_enabled:
            if command in command_key_mappings:
                key = command_key_mappings[command]
                # get current pressing key
                previous_key = None
                if self.pressing_key:
                    previous_key = self.pressing_key["key"]

                # clear old timer
                if self.pressing_timer and self.pressing_timer.is_alive():
                    # print("cancel timer")
                    self.pressing_timer.cancel()

                # new action
                if previous_key != key:
                    self.release_previous_key()
                    if key:
                        print("pressing", key)
                        self.keyboard.press(key)

                if key:
                    # create new timer
                    self.pressing_timer = Timer(
                        pressing_timer_interval,
                        self.release_previous_key,
                    )
                    self.pressing_timer.start()

                    self.pressing_key = dict(key=key, time=now)

Now, let's create a class called Events to receive all events from the Command class above and run the add_command method.

In the Events class, we can create multiple Command properties to press keys simultaneously. We can route the commands to the appropriate processor based on the command name, as shown in the code below.

class Events:
    def __init__(
        self,
        keyboard_enabled,
        cross_cmd_enabled,
        pressing_timer_interval,
        d1_pressing_timer_interval,
        d2_pressing_timer_interval,
        command_key_mappings,
    ):
        self.keyboard_enabled = keyboard_enabled
        self.cross_cmd_enabled = cross_cmd_enabled
        self.command_key_mappings = command_key_mappings
        self.pressing_timer_interval = pressing_timer_interval
        self.d1_pressing_timer_interval = d1_pressing_timer_interval
        self.d2_pressing_timer_interval = d2_pressing_timer_interval

        self.cmd_process = CommandProcessor()

        # process cmd related to direction (left, right)
        self.d1_cmd_process = CommandProcessor()  # walk
        self.d2_cmd_process = CommandProcessor()  # tilt face

    # Add command to pipeline
    def add(self, command):
        # Split command by type name
        if "walk" in command or "d1" in command:
            self.d1_cmd_process.add_command(
                command,
                self.keyboard_enabled,
                self.command_key_mappings,
                self.d1_pressing_timer_interval,
            )
        elif "face" in command or "d2" in command:
            self.d2_cmd_process.add_command(
                command,
                self.keyboard_enabled,
                self.command_key_mappings,
                self.d2_pressing_timer_interval,
            )
        else:
            self.cmd_process.add_command(
                command,
                self.keyboard_enabled,
                self.command_key_mappings,
                self.pressing_timer_interval,
            )

The full code of my implementation can be found in the body directory.

Conclusion

This project is still simple and could be improved more. I would appreciate your opinion or any feedback you may have.

Hope you like it 👋