ESP32-S3 Edge-AI｜Human Activity Recognition Using Accelerometer Data and ESP-DL

Published in

The ESP Journal

5 min readJun 7, 2023

Edge computing is a distributed computing paradigm that brings computation and data storage closer to the device’s location. Edge Artificial Intelligence (edge-AI) is an exciting development within edge computing because it allows traditional technologies to run more efficiently, with higher performance and less power. Trained neural networks are used to make inferences on small devices. The potential applications areas of edge AI includes manufacturing, healthcare, the retail industry, surveillance, smart home, and finance banking.

Espressif System provides a framework ESP-DL that can be used to deploy your high-performance deep learning models on ESP32-S3.

In this article, you will understand how to read sensor data and using ESP-DL to deploy a deep-learning model on ESP32-S3.

The article is divided in 4-sections
1. Model Deployment
2. Model Define
3. Model Run
4. Future Opportunities

Prerequisite for using ESP-DL

Before getting a deep dive into ESP-DL, we assume that readers have knowledge about:

Building and training neural networks. (Check out the basics of deep learning)
ESP-IDF release/v4.4 environment. (For more details, follow setting-up ESP-IDF environment or tool chain for ESP-IDF)
Working knowledge of basic C and C++ language
Converting model into ESP-DL formate

Note: Please use ESP-IDF release/v4.4 on the commit “cc71308e2fdce1d6d27fc52d39943f5d0fb83f35” to reproduce the same results

1. Model Deployment

A convolution neural network is designed using accelerometer data to recognize human activity.

*In this blog we will not focus on the development and conversion of neural networks to ESP-Dl format.

1.1 ESP-IDF Project Hierarchy

The first step is to create a new project in VS Code based on ESP-IDF standards. For more details about how to create a VS Code project for ESP32 please click here or ESP-IDF getting started guide.
Place the .cpp and .hpp files generated by converting the model into ESP-DL format in your current working directory.
Add all the dependent components to the components folder of your working directory.
sdk.config files are default configuration files from ESP-WHO example. These files are also provided in linked GITHUB

The Project directory should look like this:

├── CMakeLists.txt
├── components
│   ├── bus
│   ├── mpu6050
│   └── esp-dl
├── dependencies.lock
├── main
│   ├── app_main.cpp
│   └── CMakeLists.txt
├── model
│   ├── Activity_coefficient.cpp
│   ├── Activity_coefficient.hpp
│   └── model_define.hpp
├── partitions.csv
├── sdkconfig
├── sdkconfig.defaults
├── sdkconfig.defaults.esp32
├── sdkconfig.defaults.esp32s2
└── sdkconfig.defaults.esp32s3

2. Model Define

We will define our model in the ‘model_define.hpp’ file. Following the steps below with detailed explanation. When open in Netron our model looks like Figure 1 below.

2.1 Import Libraries

Import all the relevant libraries. Please check here for all the currently supported libraries by ESP-DL.

#pragma once
#include "dl_layer_model.hpp"
#include "dl_layer_base.hpp"
#include "dl_layer_max_pool2d.hpp"
#include "dl_layer_conv2d.hpp"
#include "dl_layer_concat.hpp"
#include "Activity_coefficient.hpp"
#include "dl_layer_reshape.hpp"
#include "dl_layer_softmax.hpp"
#include <stdint.h>

using namespace dl;
using namespace layer;
using namespace Activity_coefficient;

2.2 Declare Layers

The next step is to declare each layer.

Input is not considered a layer so it is not defined here.
Except for the output layer, all the layers are declared as private layers.

class ACTIVITY : public Model<int16_t> 
{
private:
    Conv2D<int16_t> l1;
    Conv2D<int16_t> l2;
    Reshape<int16_t> l3;
    Conv2D<int16_t> l4;
    Conv2D<int16_t> l5;

public:
    Softmax<int16_t> l6;

2.3 Initialize Layers

After declaring the layers, Initialize each layer with its weight, biases, activation functions and shape.

ACTIVITY () : 
            l1(Conv2D<int16_t>(-13, get_statefulpartitionedcall_sequential_1_conv2d_2_biasadd_filter(), get_statefulpartitionedcall_sequential_1_conv2d_2_biasadd_bias(), get_statefulpartitionedcall_sequential_1_conv2d_2_biasadd_activation(), PADDING_VALID, {}, 1,1, "l1")),
            l2(Conv2D<int16_t>(-13, get_statefulpartitionedcall_sequential_1_conv2d_3_biasadd_filter(), get_statefulpartitionedcall_sequential_1_conv2d_3_biasadd_bias(), get_statefulpartitionedcall_sequential_1_conv2d_3_biasadd_activation(), PADDING_VALID, {}, 1,1, "l2")),                       
            l3(Reshape<int16_t>({1,1,2496},"l2_reshape")), 
            l4(Conv2D<int16_t>(-11, get_fused_gemm_0_filter(), get_fused_gemm_0_bias(), get_fused_gemm_0_activation(), PADDING_VALID, {}, 1, 1, "l3")),
            l5(Conv2D<int16_t>(-9, get_fused_gemm_1_filter(), get_fused_gemm_1_bias(), NULL, PADDING_VALID,{}, 1,1, "l4")),
            l6(Softmax<int16_t>(-14,"l5")){}

2.4 Building Layers

The next step is to build each layer. For more information about building layers please check the build function of each layer.

void build(Tensor<int16_t> &input)
    {
        this->l1.build(input);
        this->l2.build(this->l1.get_output());
        this->l3.build(this->l2.get_output());
        this->l4.build(this->l3.get_output());
        this->l5.build(this->l4.get_output());
        this->l6.build(this->l5.get_output());
        
    }

2.5 Call Layers

In the end, connect these layers and call them one by one by using a call function. For more information about calling layers please check the call function of each layer.

void call(Tensor<int16_t> &input)
    {
        this->l1.call(input);
        input.free_element();

        this->l2.call(this->l1.get_output());
        this->l1.get_output().free_element();

        this->l3.call(this->l2.get_output());
        this->l2.get_output().free_element();

        this->l4.call(this->l3.get_output());
        this->l3.get_output().free_element();

        this->l5.call(this->l4.get_output());
        this->l4.get_output().free_element();

        this->l6.call(this->l5.get_output());
        this->l5.get_output().free_element();

    }
};

3. Model Run

Once our model is built, declare the input to our model and run the model on ESP32-S3 in ‘app_main.cpp’.

3.1 Import libraries

#include <stdio.h>
#include <stdlib.h>
#include "esp_system.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "dl_tool.hpp"
#include "model_define.hpp"
#include "i2c_bus.h"
#include "mpu6050.h"
#include "driver/i2c.h"
#include "esp_log.h"

3.2 Declare Input

The input to our neural network is taken from the MPU6050 accelerometer sensor. To read sensor data in real time, we use the mpu6050 driver from espressif. Every 4 seconds, the data will be stored in an array and fed to the network for predictions.

int input_height = 80;
int input_width = 3;
int input_channel = 1;
int input_exponent = -13;
float acc_xyz[240] = {0};
int index_acc=0;
#define I2C_MASTER_SCL_IO 16      /*!< gpio number for I2C master clock */
#define I2C_MASTER_SDA_IO 17      /*!< gpio number for I2C master data  */
#define I2C_MASTER_NUM I2C_NUM_0  /*!< I2C port number for master dev */
#define I2C_MASTER_FREQ_HZ 400000 /*!< I2C master clock frequency */
static i2c_bus_handle_t i2c_bus = NULL;
static mpu6050_handle_t mpu6050 = NULL;

extern "C" void app_main(void)
{
    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = I2C_MASTER_SDA_IO,
        .scl_io_num = I2C_MASTER_SCL_IO,
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .clk_flags = 0,
    };
    
    conf.master.clk_speed = I2C_MASTER_FREQ_HZ;
    i2c_bus = i2c_bus_create(I2C_MASTER_NUM, &conf);
    mpu6050 = mpu6050_create(i2c_bus, MPU6050_I2C_ADDRESS);
    uint8_t mpu6050_deviceid;
    mpu6050_acce_value_t acce;
    mpu6050_get_deviceid(mpu6050, &mpu6050_deviceid);
    printf("mpu6050 device ID is: 0x%02x\n", mpu6050_deviceid);
    mpu6050_set_acce_fs(mpu6050, ACCE_FS_4G);
while(1){
for (int i=0 ;i<80; i++)
{
    mpu6050_get_acce(mpu6050, &acce);
    acc_xyz[index_acc]=acce.acce_x;
    index_acc=index_acc+1;
    acc_xyz[index_acc]=acce.acce_y;
    index_acc=index_acc+1;
    acc_xyz[index_acc]=acce.acce_z;
    index_acc=index_acc+1;
    vTaskDelay(50 / portTICK_RATE_MS);
}
index_acc=0;
int16_t *model_input = (int16_t *)dl::tool::malloc_aligned_prefer(input_height*input_width*input_channel, sizeof(int16_t *));
    for(int i=0 ;i<input_height*input_width*input_channel; i++){
        float normalized_input = acc_xyz[i] / 1.0; //normalization
        model_input[i] = (int16_t)DL_CLIP(normalized_input * (1 << -input_exponent), -32768, 32767);
    }

3.3 Set Input shape

Set the data in a tensor to feed the neural network.

Tensor<int16_t> input;

input.set_element((int16_t *) model_input).set_exponent(input_exponent).set_shape({input_height,input_width,input_channel}).set_auto_free(false);

3.4 Call model

Call the model by calling the method forward and passing input to it. Use latency to calculate the time taken by ESP32-S3 to run the neural network.

ACTIVITY model;
                dl::tool::Latency latency;
                latency.start();
                model.forward(input);
                latency.end();
                latency.print("\nActivity model", "forward");3. Future Directions

3.5 Monitor output

The output is taken out from the public layer i.e l6. The result can be printed in the terminal.

float *score = model.l6.get_output().get_element_ptr();
                float max_score = score[0];
                int max_index = 0;
                for (size_t i = 0; i < 6; i++)
                {
                    printf("%f, ", score[i]*100);
                    if (score[i] > max_score)
                    {
                        max_score = score[i];
                        max_index = i;
                    }
                }
                printf("\n");
                switch (max_index)
                {
                    case 0:
                    printf("0: Downstairs");
                    break;
                    case 1:
                    printf("1: Jogging");
                    break;
                    case 2:
                    printf("2: Sitting");
                    break;
                    case 3:
                    printf("3: Standing");
                    break;
                    case 4:
                    printf("4: Upstairs");
                    break;
                    case 5:
                    printf("5: Walking");
                    break;
                    default:
                    printf("No result");
                }
                printf("\n");
}
}

4. Conclusion

In conclusion, the current project offers exciting possibilities for various applications, including predictive maintenance in industrial settings, boxing punch recognition in sports using an accelerometer, and fall detection in healthcare. These are just a few examples that can be explored further. For those interested in accessing the source code, it is available on the GitHub repository.