Serial communication data structuring (part 1)

In the development of embedded systems applications, transferring serial data without any high level structuring is a bad practice. Although this is a quick way of testing, it’s not good idea to just send bytes of information and expect that everything will work fine, in a final application.

A simple way of doing that is to divide the communication in well-defined messages. Each message is delimited by a start of message (SOM) and an end of message byte (EOM), as shown in figure 1.

Basic data structure used in the serial protocol proposed.
Figure 1 – Basic data structure.

Using this structure, the function that deals with the reception of data knows that it should only start building a message when it receives the SOM byte. It also knows that the message should only be interpreted after the reception of the EOM byte. Everything received outside a SOM and EOM pair is just ignored.

In order to avoid the need for implementing a timeout mechanism, the reception of a SOM should always restart the construction of a message, even if the previous one wasn’t finished. In that case, the previous message is discarded. Figure 2 shows the basic algorithm.

Diagram with the basic message building algorithm.
Figure 2 – Basic message building algorithm.

 Although it is possible to use the same value for the SOM and the EOM bytes, using different values leads to a simpler application logic and to a more explicit structure. If it’s known beforehand that the range of bytes that will compose the data don’t cover the whole byte range (0 to 255), then choosing delimiting bytes that won’t be used as data simplifies the application, as will be seen later. Two example values that tend to be used are 255 to SOM and 254 to EOM.

Since the communication channel could be affected my numerous factors (noise, unavailability of a communication entity, etc…), it’s a good idea to define the maximum number of bytes a message can have. This way, if a situation where the EOM byte is lost or some random noise is received occurs, it is possible to guarantee that a message that is being received will not grow until there is a buffer overflow.

It’s important to notice that the previous implementations doesn’t work if the data can contain bytes with the same value of SOM or EOM. In that case, we need to implement an escaping mechanism, which consists on a byte that is sent before any data byte that has the SOM, EOM or the escaping byte (EB) value.

So, when an EB is received, the next byte should be interpreted as data, independently of its value. Although sending an EB with the next byte having a value that doesn’t need escaping could indicate some kind of error, doing this validation complicates the logic and there are some other mechanisms to validate the content that make this less pertinent.

Of course that this escaping mechanism only makes sense when a message is being built. Otherwise, the EB byte should be discarded as any other byte. Figure 3 shows the whole algorithm.

Diagram with the complete message building algorithm.
Figure 3 – Complete message building algorithm.

As mentioned before, there are mechanisms that can help to ensure the integrity of the data, which corresponds to having a way to check that the content sent is the same as the received. One simple way of doing it is to include a checksum byte (CS) in the message, which has a value equal to the sum of the values of all the data bytes.  Typically, this byte is located at the end of the message, before the EOM byte, as shown in figure 4.

Data structure with checksum.
Figure 4 – Data structure with checksum.

After the reception of the EOM, the value of all the bytes of data is summed and compared with the checksum value. If they are different, it means that some error has occurred and the data received is corrupted. If they match, there is some confidence that the data is correct.

Since the EBs are part of the message, including them in the checksum is a good approach. But, doing so, if the checksum has the value of EOM, SOM or EB, using an EB before the checksum byte will necessarily change its value. So, a reasonable approach is to just add an offset to the checksum if it has one of those 3 special values (of course, this reduces the accuracy of the checksum mechanism).

With this simple mechanism, there is a probability that the sum of the values of all bytes of a corrupted message has the same value of the checksum of the correct message, thus leading to accept it when it should be discarded. So, there are more robust methods, such as hash functions, which lead to better results, but are more complex and may not be suitable to applications were the resources are very limited.

Related content

4 thoughts on “Serial communication data structuring (part 1)”

  1. Pingback: Serial communication data structuring pt 2 | techtutorialsx

  2. Pingback: Arduino Bluetooth with HC-06 | techtutorialsx

  3. Pingback: Toggling a LED with Arduino and HC-06 | techtutorialsx

  4. Pingback: ESP32 Arduino Bluetooth Classic: Controlling a relay remotely | techtutorialsx

Leave a Reply