ESP32: Dual core execution speedup

The objective of this post is to implement a simple power calculation algorithm on the ESP32 and test the speedup by executing it in the two cores of the microcontroller.

Introduction

The objective of this post is to implement a simple power calculation algorithm on the ESP32 and test the speedup by executing it in the two cores of the microcontroller.

As we have been covering in previous tutorials, the ESP32 has two Tensilica LX6 cores [1] which we can use to execute code. At the time of writing, the easiest way to control the execution of code in the different cores of the ESP32 is by using FreeRTOS and assign a task to each CPU, as can be seen in detail in this previous post.

Although in a generic way there are many benefits from having more than one core available to execute code, one of the most important is increasing the performance of our programs. So, although we have already seen how to execute code in the two cores of the ESP32, we haven’t yet looked at the performance increase we can get from that.

So, in this tutorial, we are going to design a simple application that is able to calculate a power of the numbers of an array. We will run it in a single core and then we will split the processing by the two cores and check if there is a gain on the execution time. Finally, just for comparison, we will also split the execution between four tasks (two assigned to each core) just to check if there is any speedup for spawning more tasks than cores.

Calculating speedup

Basically, what we want to check is how much the speed of execution of our program increases when we pass from a single core execution to a dual core execution. So, one of the easiest ways of doing that is by doing the ratio between the execution time of the program running on just one core and running on the two cores of the ESP32 [2].

We will measure the execution time for each approach using the Arduino micros function, which returns the number of microseconds since the board began running the program [3].

So, for measuring a execution block of code, we will do something similar to the indicated bellow.

start = micros();
//Run code
end = micros();
execTime = end - start;

We could have used FreeRTOS xTaskGetTickCount function for a greater precision, but the micros function will be enough for what we want to show and it is a well known Arduino function.

Global vars

We will start our code by declaring some auxiliary global variables. We are not going to need any include.

Since we are going to compare speedups in different situations, we are going to specify an array with multiple exponents to try. Also, we are going to have a variable with the size of the array, so we can iterate it. We are first going to try small values for the exponent, and then start increasing them a lot.

int n[10] = {2, 3, 4, 5, 10, 50, 100, 1000, 2000, 10000 };
int nArraySize = 10;

We are also going to have some variables to store the execution time of the code in the different use cases. We will reuse the execution start and end variables but use a different variable for each use case: one task execution, two tasks execution and four tasks execution. This way, we will store the values for latter comparison.

unsigned long start;
unsigned long end;
unsigned long execTimeOneTask, execTimeTwoTask, execTimeFourTask ;

Then, we will declare a counting semaphore in order to be able to synchronize the setup function (which is running on a FreeRTOS task) with the tasks we are going to launch. Refer to this post for a detailed explanation on how to achieve this type of synchronization.

Note that the xSemaphoreCreateCounting function receives as input the maximum count and the initial count for the semaphore. Since we will have at most 4 tasks to synchronize, its maximum value will be 4.

SemaphoreHandle_t barrierSemaphore = xSemaphoreCreateCounting( 4, 0 );

Finally, we will declare two arrays: one with the initial values (called bigArray) and other to store the results (called resultArray). We will make their sizes big.

int bigArray[10000], resultArray[10000];

The FreeRTOS task

We will define the function that will implement our power calculation algorithm, in order to be launched as a task. If you need help for the specifics on how to define a FreeRTOS task, please check this previous tutorial.

The first thing we need to consider is that our function will receive some configuration parameters. Since in some of the use cases we are going to split the execution of the algorithm by various tasks, we need to control the portion of the array that each task will cover.

In order to create only a generic task that can answer to all our use cases, we will pass the indexes of the array that the task will be responsible as a parameter. Also, we are going to pass the exponent, so we now how many multiplications we need to perform. You can check in more detail how to pass an argument to a FreeRTOS task in this previous tutorial.

So, we will first define a structure with these 3 parameters, so we can pass it to our function. You can read more about structs here. Note that this struct is declared outside any function. So, the code can be placed near the global variables declaration.

struct argsStruct {
  int arrayStart;
  int arrayEnd;
  int n;
};

Since we already have our struct declared, inside the function we will declare a variable of the type of this structure. Then, we will assign the input parameter of the function to this variable. Remember that parameters are passed to FreeRTOS functions as pointer to null (void *) and it is our responsibility to cast it back to the original type.

Note also that we pass to the function a pointer to the original variable and not the actual variable, so we need to use the pointer to access the value. Again, please consult the previous post for a detailed explanation about this procedure.

argsStruct myArgs = *((argsStruct*)parameters);

Now, we will implement the power calculation function. Note that we could have used the Arduino pow function, but I will explain latter why we didn’t. So, we will implement the function with a loop where we will multiply a value by itself n times.

We start by declaring a variable to hold the partial product. Then, we will do a for loop to iterate all the elements of the array assigned to the task. Remember that this was our initial objective. Then, for each element, we will calculate its power and in the end we will assign the value to the results array. This ensures that we will use the same data in all our tests and we do not change the original array.

For a cleaner code we could have implemented the pow algorithm in a dedicated function, but I wanted to minimize auxiliar function calls for a more compact code.

Check the code bellow. I left there commented a line of code to calculate the power using the pow function, so you can try it if you want. Note that the speedup results will be considerably different. Also, for accessing an element of a struct variable, we use the name of the variable point (“.”) the name of the element.

int product;
for (int i = myArgs.arrayStart; i < myArgs.arrayEnd; i++) {

    product = 1;
    for (int j = 0; j < myArgs.n; j++) {
      product =  product * bigArray[i];
    }
    resultArray[i]=product;
    //resultArray [i] = pow(bigArray[i], myArgs.n);
}

Check the full function code bellow. Note that the end of the code we are increasing the global counting semaphore by one unit. This is being done to ensure synchronization with the setup function, since we will count the execution time from there. Check this previous post on how to synchronize tasks with FreeRTOS semaphores. Also, in the end, we are deleting the task.

void powerTask( void * parameters ) {

  argsStruct  myArgs = *((argsStruct*)parameters);

  int product;
  for (int i = myArgs.arrayStart; i < myArgs.arrayEnd; i++) {

    product = 1;
    for (int j = 0; j < myArgs.n; j++) {
      product =  product * bigArray[i];
    }
    resultArray[i]=product;
    //resultArray [i] = pow(bigArray[i], myArgs.n);
  }

  xSemaphoreGive(barrierSemaphore);

  vTaskDelete(NULL);

}

The setup function

We are going to do all the remaining code in the setup function, so our main loop will be empty. We will start by opening a serial connection, to output the results of our tests.

Serial.begin(115200);
Serial.println();

Now, we will initialize our array with some values to apply the calculation. Note that we are not concerned about the content of the array and neither with the actual result, but rather with the execution times. So, we are going to initialize the array with random values just for the sake of showing those functions, since we are not going to print the array that will hold the results.

We are first going to call the randomSeed function, so the random values generated will differ in different executions of the program [4]. As indicated here, if the analog pin is unconnected, it will return a value corresponding to random noise, which is ideal for passing as input of the randomSeed function.

After that, we can simply call the random function, passing as arguments the minimum and maximum values that can be returned [4]. Again, we are doing this just for illustration purposes, since we are not going to print the contents of the arrays.

randomSeed(analogRead(0));
for (int i = 0; i < 10000; i++) {
  bigArray[i] = random(1, 10);
}

Now, we are going to define the variables that will be used as arguments to our tasks. Remember the previously declared structure, which will hold the indexes of the array that each task will process and the exponent.

So, the first element of the structure is the starting index, the second is the final index and the third is the exponent. We will declare structures for each use case (one task, two tasks and four tasks) as can be seen in the code bellow. Note that in the last element of the struct we are passing an element of the global array we declared with the exponents. So, as we will see in the final code, we will iterate the whole array, but for now let’s keep things simple.

argsStruct oneTask = { 0 , 1000 , n[i] };

argsStruct twoTasks1 = { 0 , 1000 / 2 , n[i] };
argsStruct twoTasks2 = { 1000 / 2 , 1000 , n[i] };

argsStruct fourTasks1 = { 0 , 1000 / 4 , n[i] };
argsStruct fourTasks2 = { 1000 / 4 , 1000 / 4 * 2, n[i]};
argsStruct fourTasks3 = { 1000 / 4 * 2, 1000 / 4 * 3, n[i]};
argsStruct fourTasks4 = { 1000 / 4 * 3 , 1000, n[i]};

Now, we are going to do the test using a single task. We start by getting the execution time with the micros function. Then, we will use the xTaskCreatePinnedToCore function to create a FreeRTOS task pinned to one of the cores. We will choose core 1.

We are going to pass as parameter the address of the oneTask struct variable, which contains the arguments needed for the function to run. Don’t forget the cast to void*.

One very important thing to take in consideration is that we haven’t yet analyzed which tasks may have been launched by the Arduino core, and that may influence the execution time. So in order to guarantee that our task will execute with greater priority, we will assign it a value of 20. Remember, higher numbers mean higher execution priority for the FreeRTOS scheduler.

After launching the task, we will ask for a unit of the semaphore, ensuring that the setup function will hold until the new task finishes execution. Finally, we will print the execution time.

    Serial.println("");
    Serial.println("------One task-------");

    start = micros();

    xTaskCreatePinnedToCore(
      powerTask,               /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                   /* Stack size in words */
      (void*)&oneTask,         /* Task input parameter */
      20,                      /* Priority of the task */
      NULL,                    /* Task handle. */
      1);                      /* Core where the task should run */

    xSemaphoreTake(barrierSemaphore, portMAX_DELAY);

    end = micros();
    execTimeOneTask = end - start;
    Serial.print("Exec time: ");
    Serial.println(execTimeOneTask);
    Serial.print("Start: ");
    Serial.println(start);
    Serial.print("end: ");
    Serial.println(end);

This is pretty much what we are going to do for the rest of the use cases. The only difference is that we are going to launch more tasks and we are going to try to get more units from the semaphore (as many as the number os tasks launched for that use case).

Check the full source code bellow, which already includes the use cases for running the code with two tasks (one per ESP32 core) and four tasks (two per ESP32 core). Also, at the end of the setup function, it includes the printing of the speedup results for each iteration.

Note that the extra parameter of the Serial.println function indicates the number of decimal places for the floating point number.

int n[10] = {2, 3, 4, 5, 10, 50, 100, 1000, 2000, 10000  };
int nArraySize = 10;

struct argsStruct {
  int arrayStart;
  int arrayEnd;
  int n;
};

unsigned long start;
unsigned long end;
unsigned long execTimeOneTask, execTimeTwoTask, execTimeFourTask ;

SemaphoreHandle_t barrierSemaphore = xSemaphoreCreateCounting( 4, 0 );

int bigArray[10000], resultArray[10000];

void setup() {

  Serial.begin(115200);
  Serial.println();

  randomSeed(analogRead(0));

  for (int i = 0; i < 10000; i++) {
    bigArray[i] = random(1, 10);
  }

  for (int i = 0; i < nArraySize; i++) {

    Serial.println("#############################");
    Serial.print("Starting test for n= ");
    Serial.println(n[i]);

    argsStruct oneTask = { 0 , 1000 , n[i] };

    argsStruct twoTasks1 = { 0 , 1000 / 2 , n[i] };
    argsStruct twoTasks2 = { 1000 / 2 , 1000 , n[i] };

    argsStruct fourTasks1 = { 0 , 1000 / 4 , n[i] };
    argsStruct fourTasks2 = { 1000 / 4 , 1000 / 4 * 2,   n[i]};
    argsStruct fourTasks3 = { 1000 / 4 * 2, 1000 / 4 * 3, n[i]};
    argsStruct fourTasks4 = { 1000 / 4 * 3 , 1000,     n[i]};

    Serial.println("");
    Serial.println("------One task-------");

    start = micros();

    xTaskCreatePinnedToCore(
      powerTask,               /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                   /* Stack size in words */
      (void*)&oneTask,         /* Task input parameter */
      20,                      /* Priority of the task */
      NULL,                    /* Task handle. */
      1);                      /* Core where the task should run */

    xSemaphoreTake(barrierSemaphore, portMAX_DELAY);

    end = micros();
    execTimeOneTask = end - start;
    Serial.print("Exec time: ");
    Serial.println(execTimeOneTask);
    Serial.print("Start: ");
    Serial.println(start);
    Serial.print("end: ");
    Serial.println(end);

    Serial.println("");
    Serial.println("------Two tasks-------");

    start = micros();

    xTaskCreatePinnedToCore(
      powerTask,                /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                    /* Stack size in words */
      (void*)&twoTasks1,        /* Task input parameter */
      20,                       /* Priority of the task */
      NULL,                     /* Task handle. */
      0);                       /* Core where the task should run */

    xTaskCreatePinnedToCore(
      powerTask,               /* Function to implement the task */
      "coreTask",              /* Name of the task */
      10000,                   /* Stack size in words */
      (void*)&twoTasks2,       /* Task input parameter */
      20,                      /* Priority of the task */
      NULL,                    /* Task handle. */
      1);                      /* Core where the task should run */

    for (int i = 0; i < 2; i++) {
      xSemaphoreTake(barrierSemaphore, portMAX_DELAY);
    }

    end = micros();
    execTimeTwoTask = end - start;
    Serial.print("Exec time: ");
    Serial.println(execTimeTwoTask);
    Serial.print("Start: ");
    Serial.println(start);
    Serial.print("end: ");
    Serial.println(end);

    Serial.println("");
    Serial.println("------Four tasks-------");

    start = micros();

    xTaskCreatePinnedToCore(
      powerTask,                /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                    /* Stack size in words */
      (void*)&fourTasks1,       /* Task input parameter */
      20,                       /* Priority of the task */
      NULL,                     /* Task handle. */
      0);                       /* Core where the task should run */

    xTaskCreatePinnedToCore(
      powerTask,                /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                    /* Stack size in words */
      (void*)&fourTasks2,       /* Task input parameter */
      20,                       /* Priority of the task */
      NULL,                     /* Task handle. */
      0);                       /* Core where the task should run */

    xTaskCreatePinnedToCore(
      powerTask,                /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                    /* Stack size in words */
      (void*)&fourTasks3,       /* Task input parameter */
      20,                       /* Priority of the task */
      NULL,                     /* Task handle. */
      1);                       /* Core where the task should run */

    xTaskCreatePinnedToCore(
      powerTask,                /* Function to implement the task */
      "powerTask",              /* Name of the task */
      10000,                    /* Stack size in words */
      (void*)&fourTasks4,       /* Task input parameter */
      20,                       /* Priority of the task */
      NULL,                     /* Task handle. */
      1);                       /* Core where the task should run */

    for (int i = 0; i < 4; i++) {
      xSemaphoreTake(barrierSemaphore, portMAX_DELAY);
    }

    end = micros();
    execTimeFourTask = end - start;
    Serial.print("Exec time: ");
    Serial.println(execTimeFourTask);
    Serial.print("Start: ");
    Serial.println(start);
    Serial.print("end: ");
    Serial.println(end);

    Serial.println();
    Serial.println("------Results-------");

    Serial.print("Speedup two tasks: ");
    Serial.println((float) execTimeOneTask / execTimeTwoTask, 4 );

    Serial.print("Speedup four tasks: ");
    Serial.println((float)execTimeOneTask / execTimeFourTask, 4 );

    Serial.print("Speedup four tasks vs two tasks: ");
    Serial.println((float)execTimeTwoTask / execTimeFourTask, 4 );

    Serial.println("#############################");
    Serial.println();
  }

}

void loop() {

}

void powerTask( void * parameters ) {

  argsStruct  myArgs = *((argsStruct*)parameters);

  int product;
  for (int i = myArgs.arrayStart; i < myArgs.arrayEnd; i++) {

    product = 1;
    for (int j = 0; j < myArgs.n; j++) {
      product =  product * bigArray[i];
    }

    resultArray[i]=product;
    //resultArray [i] = pow(bigArray[i], myArgs.n);
  }

  xSemaphoreGive(barrierSemaphore);

  vTaskDelete(NULL);

}

Testing the code

To test the code, simply upload with with the Arduino IDE and open the serial monitor. You should get a result similar to figure 1. Naturally, the execution times may vary.

Figure 1 – Output of the speedup testing program.

The results for each exponent are shown in the table bellow, at table 1.

Exponent	1 task [µs]	2 tasks [µs]	4 tasks [µs]	Speedup 1 task vs 2 tasks	Speedup 1 task vs 4 tasks	Speedup 2 tasks vs 4 tasks
2	229	183	296	1.2514	0.7736	0.6182
3	271	207	325	1.3092	0.8338	0.6369
4	312	224	340	1.3929	0.9176	0.6588
5	354	249	367	1.4217	0.9646	0.6785
10	556	347	451	1.6023	1.2328	0.7694
50	2235	1188	1305	1.8813	1.7126	0.9103
100	4331	2234	2343	1.9387	1.8485	0.9535
1000	42072	21108	21212	1.9932	1.9834	0.9951
2000	83992	42138	42190	1.9933	1.9908	0.9988
10000	419400	210283	210310	1.9945	1.9942	0.9999

Table 1 – Results of speedup for the exponents defined in the code.

Analyzing the results

To understand the results, we first need to take in consideration that one cannot fully paralelize the whole program. So, there are always going to be portions of the code that execute in a single core, such as the launching of the tasks or the synchronization mechanisms.

Although we didn’t have this use case, many parallelization algorithms also have a part where each partial result is then aggregated sequentially (for example, if we have multiple tasks calculating the max value of their portion of the array and then the parent task calculates the max value between all the partial results).

So, no matter that we do, we theoretically cannot achieve a speedup equal to the number of cores (there are some exceptions in, for example, algorithms that search for a specific value and exit when found, but let’s not complicate).

So, the more computation we execute in parallel versus the portion we execute in sequence, the more speedup we will have. And that’s precisely what we see in our results. If we start with an exponent of 2, our speedup from one task execution to two tasks is only of 1.2514.

But as we increase the exponent value, the more iterations of the inner most loop we do, and thus more computation we can perform in parallel. Thus, as we increase the exponent, we see an increase of the speedup. For example, with an exponent of 1000, our speedup is 1.9933. Which is a very high value compared to the previous.

So, this means the sequencial part gets diluted and for big exponents the parallelization compensates. Note that we are able to achieve these kind of high values because the algorithm is very simple and easily parallelized. Also, because of the way FreeRTOS handles priorities and task execution context switches, there is not much overhead after the tasks are executing. A normal computer tends to have more context switches between threads, and thus the sequencial overhead is bigger.

Note that we didn’t use the pow function to show this progression the the speedup values. The pow function uses floats, meaning its implementation is more computacional intensive. So, we would get much better speedups at lower exponents because the parallel part would be much more relevant than the sequencial one. You can comment our implemented algorithm and use the pow function to compare the results.

Also, it’s important to note that launching more tasks than the number of cores doesn’t increase the speedup. Indeed, it has the opposite effect. As we can see from the results, the speedup from executing with 4 tasks is always lower from the one by executing with 2 tasks. Indeed, for low exponents, it’s actually slower to execute the code with 4 tasks (even though they are split amongst the two cores of the ESP32) than to execute with with a task in a single core.

This is normal because each CPU can only execute a task at a given time and thus launching more tasks than CPUs means more sequencial time to deal with them and more overhead on synchronization points.

Nevertheless, note that this is not always black and white. If the tasks had some kind of yielding point where they would be stopped waiting for something, then having more tasks than cores could be beneficial in order to use the free CPU cycles. Nevertheless, in our case, the CPUs never yield and are always executing, so there is no benefit in launching more than 2 tasks, regarding speedup.

The Amdhal’s law

To complement the results, we will look at Amdahl’s law. So, we are going to apply some transformations to the speedup formula we were using. The initial formula was that the speedup is equal to the sequencial execution time (we will call it T) divided by the parallel execution time (we will call it T Parallel).

But, as we saw, there is a part of the execution time that is always sequencial and another that may be parallelized. Let’s call p the fraction of the program that we can run in parallel. Since it is a fraction, its value will be between 0 and 1.

So, we can represent the sequencial execution time T as the portion that can be executed in parallel (p*T) plus the portion that cannot (1-p)*T.

Since we can only split the parallel part between the cores of our machine, the T Parallel time is similar to the formula above, except that the parallel part appears divided by the number of cores we have available.

So, our speedup formula becomes:

Now we have the speedup written in function of T, the original execution time without enhancements. So, we can divide every term by T:

So, we now have the speedup written in function of the part that can run in parallel and the number of cores. To finish, let’s suppose that we have an infinite amount of cores available to execute (the real use case would be a very large number, but we will analyse the formula mathematically).

Since a constant divided by infinite equals to 0, we end up with:

So, although speedup increases with the number of resources available for parallelization (the number of cores, in our case), the truth is that the maximum theoretical speedup possible is limited by the non parallel part, which we can’t optimize.

This is an interesting conclusion for us to decide when going to parallelization or not. Some times, the algorithms can easily parallelized and we have a tremendous speedup, others the effort needed doesn’t justify the gain.

Also, other important thing to keep in mind is that the best sequencial algorithm is not always the best after parallelization. So, sometimes, it’s best to use an algorithm that has less performance in sequencial execution, but ends up being much better that the best sequencial one upon parallelization.

Final notes

With this tutorial, we confirm that the functions available for multi core execution are working well and we can take advantage of them to get performance benefits.

The last more theoretical part had the objective to show that parallelization is not a trivial issue, and one cannot jump right to a parallel approach without a previous analysis and expect a speedup equal to the number of cores available. This rational extends beyond the scope of the ESP32 and applies to parallel computing in general.

So, for those who are going to start with parallel computing using the ESP32 for performance gains, it’s a good start to learn some of the more theoretical concepts first.

Finally, note that the code is oriented to showing the results, and thus many optimizations could have been done, such as launching the tasks in a cycle instead of repeating the code or condensate the algorithm in a function.

Also, we were not looking to the results of the execution to confirm that the output array was the same because that was not our main focus. Naturally, going from sequencial to parallel needs a careful implementation of the code and executing a lot of tests to ensure the result is the same.

References

[1] https://espressif.com/en/products/hardware/esp32/overview

[2] http://www.dcc.fc.up.pt/~fds/aulas/PPD/1112/metrics_en.pdf

[3] https://www.arduino.cc/en/reference/micros

[4] https://www.arduino.cc/en/reference/random

Bernie Andrews (@BernieSAndrews)

May 17, 2017 at 11:22 am

Recently found this blog. Thanks for the posts! Very well done.

I just tried the above and encountered an error:

#############################
Starting test for n= 10000

——One task——-
Guru Meditation Error: Core 0 panic’ed (Interrupt wdt timeout on CPU0)
Register dump:
PC : 0x400d674c PS : 0x00060034 A0 : 0x8008494b A1 : 0x3ffc0b90
A2 : 0x3ffc1370 A3 : 0x00060021 A4 : 0x00060a23 A5 : 0x3ffdfba0
A6 : 0x0000000a A7 : 0x3ffdd460 A8 : 0x00000000 A9 : 0xb33fffff
A10 : 0x00060021 A11 : 0x00000000 A12 : 0x00060021 A13 : 0x00000000
A14 : 0xffffffff A15 : 0x3ffd72ec SAR : 0x00000016 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000

Backtrace: 0x400d674c:0x3ffc0b90 0x4008494b:0x3ffc0bb0 0x4008340e:0x3ffc0bd0 0x40085454:0x3ffc0bf0 0x40081c21:0x3ffc0c00

CPU halted.

Error appears to happen after jumping in to xTaskCreatePinnedToCore().

antepher
May 17, 2017 at 1:28 pm

Hi! Thank you very much for the feedback, I’m happy you are finding the content useful 🙂

Well that’s weird, but since the ESP32 is still in the early stages these kind of things tend to happen. I was able to run the code without any problem.

I will try to help but I would like to ask you to also post the error on the github page of the ESP32 support for the Arduino environment. They are probably more familiarized with these kind of dumps and can help us understand the problem.
https://github.com/espressif/arduino-esp32

Some questions to check if we can debug it:
– You are running the same exact code or changed something?
– For n< 10000, the code works fine, right?
– Is it deterministic or just happens sometimes?
– What is your ESP32 board? I think the ESP32 had a revision, so we may be using different versions of the chip.

I'm not paying much attention to the stack size allocated to each task. You can also try to reduce the number of words allocated on the call to the xTaskCreatePinnedToCore() and check if it helps. In principle this should not be the problem since we are deleting the previous tasks, but we can give it a try.

Also, if someone else runs into this problem and was able to understand the cause, please let us know since I have not been able to reproduce it yet.

Best regards,
Nuno Santos

Recently found this blog. Thanks for the posts! Very well done.
I just tried the above and encountered an error:
#############################
Starting test for n= 10000
——One task——-
Guru Meditation Error: Core 0 panic’ed (Interrupt wdt timeout on CPU0)
Register dump:
PC : 0x400d674c PS : 0x00060034 A0 : 0x8008494b A1 : 0x3ffc0b90
A2 : 0x3ffc1370 A3 : 0x00060021 A4 : 0x00060a23 A5 : 0x3ffdfba0
A6 : 0x0000000a A7 : 0x3ffdd460 A8 : 0x00000000 A9 : 0xb33fffff
A10 : 0x00060021 A11 : 0x00000000 A12 : 0x00060021 A13 : 0x00000000
A14 : 0xffffffff A15 : 0x3ffd72ec SAR : 0x00000016 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000
Backtrace: 0x400d674c:0x3ffc0b90 0x4008494b:0x3ffc0bb0 0x4008340e:0x3ffc0bd0 0x40085454:0x3ffc0bf0 0x40081c21:0x3ffc0c00
CPU halted.
Error appears to happen after jumping in to xTaskCreatePinnedToCore().

Pingback: ESP32 | Andreas' Blog

Pingback: ESP32 Arduino: Using the pthreads library | techtutorialsx

ESP32: Dual core execution speedup

Introduction

Calculating speedup

Global vars

The FreeRTOS task

The setup function

Testing the code

Analyzing the results

The Amdhal’s law

Final notes

Related posts

References

4 thoughts on “ESP32: Dual core execution speedup”

Leave a ReplyCancel reply

Introduction

Calculating speedup

Global vars

The FreeRTOS task

The setup function

Testing the code

Analyzing the results

The Amdhal’s law

Final notes

Related posts

References

4 thoughts on “ESP32: Dual core execution speedup”

Leave a ReplyCancel reply

Discover more from techtutorialsx