The beginner’s guide to implementing YOLOv3 in TensorFlow 2.0 (part-2)
In part 1, we’ve discussed the YOLOv3 algorithm. Now, it’s time to dive into the technical details for the implementation of YOLOv3 in Tensorflow 2.
The code for this tutorial designed to run on Python 3.7 and TensorFlow 2.0 can be found in my Github repo.
This tutorial was inspired by Ayoosh Kathuria, from one of his great articles about the implementation of YOLOv3 in Pytorch published in paperspaces’s blog (credit link at the end of this tutorial).
YOLOv3 has 2 important files: yolov3.cfg
and yolov3.weights
. The file yolov3.cfg
contains all information related to the YOLOv3 architecture and its parameters, whereas the file yolov3.weights
contains the convolutional neural network (CNN) parameters of the YOLOv3 pre-trained weights.
Specifically, in this part, we’ll focus only on the file yolov3.cfg
, while the file yolov3.weights
will be discussed in the next part.
So, what we’re going to do now is to parse the parameters from the file yolov3.cfg
, read them all, and based on that we’ll construct the YOLOv3 network.
Take your hot drink and let’s get into it…
Preparation
Creating a Project Directory and Files
The first thing that we need to do is to create a project directory, I personally name it PROJECTS
because I have several projects under it. However, feel free to give another name as you want, but I suggest you do the same thing as I did so that you can follow this tutorial easily.
Under PROJECTS
, create a directory named YOLOv3_TF2
. This is the directory where we’ll be working.
Now, under the YOLOv3_TF2
directory, let’s create 4 subdirectories, namely: img
, cfg
, data
,and weights
.
And still under PROJECTS
, now create 5 python files, they are:
yolov3.py
,convert_weights.py
,utils.py
,image.py
, andvideo.py
.
Specifically, in this part, we’ll only work on the file yolov3.py
and leave the others all empty for the moment.
Downloading files yolov3.cfg
, yolov3.weights
, and coco.names
Here are the links to download the files yolov3.cfg
, yolov3.weights
, and coco.names
:
Save the files yolov3.cfg
, yolov3.weights
, and coco.names
to the subdirectories cfg
, weights
, and data
, respectively.
yolov3.py
Importing the necessary packages
Open the yolov3.py
and import TensorFlow and Keras Model. We also import the layers from Keras, they are Conv2D
, Input
, ZeroPadding2D
, LeakyReLU
, and UpSampling2D
. We’ll use them all when we build the YOLOv3 network.
Copy the following lines to the top of the file yolov3.py
.
#yolov3.py import tensorflow as tf from tensorflow.keras import Model from tensorflow.keras.layers import BatchNormalization, Conv2D, \ Input, ZeroPadding2D, LeakyReLU, UpSampling2D
Parsing the configuration file
The code below is a function called parse_cfg()
with a parameter named cfgfile
used to parse the YOLOv3 configuration fileyolov3.cfg
.
def parse_cfg(cfgfile): with open(cfgfile, 'r') as file: lines = [line.rstrip('\n') for line in file if line != '\n' and line[0] != '#'] holder = {} blocks = [] for line in lines: if line[0] == '[': line = 'type=' + line[1:-1].rstrip() if len(holder) != 0: blocks.append(holder) holder = {} key, value = line.split("=") holder[key.rstrip()] = value.lstrip() blocks.append(holder) return blocks
Let’s explain this code.
Lines 11-12, we open the cfgfile
and read it, then remove unnecessary characters like ‘\n’ and ‘#’.
The variable lines
in line 12 is now holding all the lines of the file yolov3.cfg
. So, we need to loop over it in order to read every single line from it.
Lines 15-23, loop over the variable lines
and read every single attribute from it and store them all in the list blocks
. This process is performed by reading the attributes block per block. The block’s attributes and their values are firstly stored as the key-value pairs in a dictionary holder
. After reading each block, all attributes are then appended to the list blocks
and the holder
is then made empty and ready to read another block. Loop until all blocks are read before returning the content of the list blocks
.
All right!..we just finished a small piece of code. The next step is to create the YOLOv3 network function. Let’s do it..
Building the YOLOv3 Network
We’re still working on the file yolov3.py
, the following is the code for the YOLOv3 network function, called the YOLOv3Net
. We pass a parameter named cfgfile
. So, Just copy and paste the following lines under the previous function parse_cfg()
.
def YOLOv3Net(cfgfile, model_size, num_classes): blocks = parse_cfg(cfgfile) outputs = {} output_filters = [] filters = [] out_pred = [] scale = 0 inputs = input_image = Input(shape=model_size) inputs = inputs / 255.0
Let’s look at it…
Line 27, we first call the function parse_cfg()
and store all the return attributes in a variable blocks
. Here, the variable blocks
contains all the attributes read from the file yolov3.cfg
.
Lines 37-38, we define the input model using Keras function and divided by 255 to normalize it to the range of 0–1.
Next…
YOLOv3 has 5 layers types in general, they are: “convolutional layer”, “upsample layer”, “route layer”, “shortcut layer”, and “yolo layer”.
The following code performs an iteration over the list blocks
. For every iteration, we check the type of the block which corresponds to the type of layer.
for i, block in enumerate(blocks[1:]):
Convolutional Layer
In YOLOv3, there are 2 convolutional layer types, i.e with and without batch normalization layer. The convolutional layer followed by a batch normalization layer uses a leaky ReLU activation layer, otherwise, it uses the linear activation. So, we must handle them for every single iteration we perform.
This is the code to perform the convolutional layer.
# If it is a convolutional layer if (block["type"] == "convolutional"): activation = block["activation"] filters = int(block["filters"]) kernel_size = int(block["size"]) strides = int(block["stride"]) if strides > 1: inputs = ZeroPadding2D(((1, 0), (1, 0)))(inputs) inputs = Conv2D(filters, kernel_size, strides=strides, padding='valid' if strides > 1 else 'same', name='conv_' + str(i), use_bias=False if ("batch_normalize" in block) else True)(inputs) if "batch_normalize" in block: inputs = BatchNormalization(name='bnorm_' + str(i))(inputs) inputs = LeakyReLU(alpha=0.1, name='leaky_' + str(i))(inputs)
Line 42, we check whether the type of the block is a convolutional block, if it is true then read the attributes associated with it, otherwise, go check for another type ( we’ll be explaining after this). In the convolutional block, you’ll find the following attributes: batch_normalize, activation, filters, pad, size, and stride. For more details, what attributes are in the convolutional blocks, you can open the file yolov3.cfg
.
Lines 49-50, verify whether the stride
is greater than 1, if it is true, then downsampling is performed, so we need to adjust the padding.
Lines 59-61, if we find batch_normalize
in a block, then add layers BatchNormalization and LeakyReLU, otherwise, do nothing.
Upsample Layer
Now, we’re going to continue if..else
case above. Here, we’re going to check for the upsample layer
. The upsample layer performs upsampling of the previous feature map by a factor of stride
. To do this, YOLOv3 uses bilinear upsampling method.
So, if we find upsample block, retrieve the stride
value and add a layer UpSampling2D
by specifying the stride value.
The following is the code for that.
elif (block["type"] == "upsample"): stride = int(block["stride"]) inputs = UpSampling2D(stride)(inputs)
Route Layer
The route block contains an attribute layers
which holds one or two values. For more details, please look at the file yolov3.cfg
and point to lines 619-634. There, you will find the following lines.
[route] layers = -4 [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [upsample] stride=2 [route] layers = -1, 61
I’ll explain a little bit about the above lines of yolov3.cfg
.
In the line 620 above, the attribute layers
holds a value of -4 which means that if we are in this route block, we need to backward 4 layers and then output the feature map from that layer. However, for the case of the route block whose attribute layers
has 2 values like in lines 633-634, layers
contains -1 and 61, we need to concatenate the feature map from a previous layer (-1) and the feature map from layer 61. So, the following is the code for the route
layer.
# If it is a route layer elif (block["type"] == "route"): block["layers"] = block["layers"].split(',') start = int(block["layers"][0]) if len(block["layers"]) > 1: end = int(block["layers"][1]) - i filters = output_filters[i + start] + output_filters[end] # Index negatif :end - index inputs = tf.concat([outputs[i + start], outputs[i + end]], axis=-1) else: filters = output_filters[i + start] inputs = outputs[i + start]
Shortcut Layer
In this layer, we perform skip connection. If we look at the file yolov3.cfg
, this block contains an attribute from
as shown below.
[shortcut] from=-3 activation=linear
What we’re going to do in this layer block is to backward 3 layers (-3) as indicated in from
value, then take the feature map from that layer, and add it with the feature map from the previous layer. Here is the code for that.
elif block["type"] == "shortcut": from_ = int(block["from"]) inputs = outputs[i - 1] + outputs[i + from_]
Yolo Layer
Here, we perform our detection and do some refining to the bounding boxes. If you have any difficulty understanding or have a problem with this part, just check out my previous post (part-1 of this tutorial).
As we did to other layers, just check whether we’re in the yolo layer.
# Yolo detection layer elif block["type"] == "yolo":
If it is true, then take all the necessary attributes associated with it. In this case, we just need mask
and anchors
attributes.
mask = block["mask"].split(",") mask = [int(x) for x in mask] anchors = block["anchors"].split(",") anchors = [int(a) for a in anchors] anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)] anchors = [anchors[i] for i in mask] n_anchors = len(anchors)
Then we need to reshape the YOLOv3 output to the form of [None
, B * grid size
* grid size
, 5 + C
]. The B is the number of anchors and C is the number of classes.
out_shape = inputs.get_shape().as_list() inputs = tf.reshape(inputs, [-1, n_anchors * out_shape[1] * out_shape[2], \ 5 + num_classes])
Then access all boxes attributes by this way:
box_centers = inputs[:, :, 0:2] box_shapes = inputs[:, :, 2:4] confidence = inputs[:, :, 4:5] classes = inputs[:, :, 5:num_classes + 5]
Refine Bounding Boxes
As I mentioned in part 1 that after the YOLOv3 network outputs the bounding boxes prediction, we need to refine them in order to the have the right positions and shapes.
Use the sigmoid function to convert box_centers
, confidence
, and classes
values into range of 0 – 1.
box_centers = tf.sigmoid(box_centers) confidence = tf.sigmoid(confidence) classes = tf.sigmoid(classes)
Then convert box_shapes
as the following:
anchors = tf.tile(anchors, [out_shape[1] * out_shape[2], 1]) box_shapes = tf.exp(box_shapes) * tf.cast(anchors, dtype=tf.float32)
Use a meshgrid
to convert the relative positions of the center boxes into the real positions.
x = tf.range(out_shape[1], dtype=tf.float32) y = tf.range(out_shape[2], dtype=tf.float32) cx, cy = tf.meshgrid(x, y) cx = tf.reshape(cx, (-1, 1)) cy = tf.reshape(cy, (-1, 1)) cxy = tf.concat([cx, cy], axis=-1) cxy = tf.tile(cxy, [1, n_anchors]) cxy = tf.reshape(cxy, [1, -1, 2]) strides = (input_image.shape[1] // out_shape[1], \ input_image.shape[2] // out_shape[2]) box_centers = (box_centers + cxy) * strides
Then, concatenate them all together.
prediction = tf.concat([box_centers, box_shapes, confidence, classes], axis=-1)
Big note:
Just to remain you that YOLOv3 does 3 predictions across the scale. We do as it is.
Take the prediction result for each scale and concatenate it with the others.
if scale: out_pred = tf.concat([out_pred, prediction], axis=1) else: out_pred = prediction scale = 1
Since the route and shortcut layers need output feature maps from previous layers, so for every iteration, we always keep the track of the feature maps and output filters.
outputs[i] = inputs output_filters.append(filters)
Finally, we can return our model.
model = Model(input_image, out_pred) model.summary() return model
The Complete Code of the yolov3.py
#yolov3.py import tensorflow as tf from tensorflow.keras import Model from tensorflow.keras.layers import BatchNormalization, Conv2D, \ Input, ZeroPadding2D, LeakyReLU, UpSampling2D def parse_cfg(cfgfile): with open(cfgfile, 'r') as file: lines = [line.rstrip('\n') for line in file if line != '\n' and line[0] != '#'] holder = {} blocks = [] for line in lines: if line[0] == '[': line = 'type=' + line[1:-1].rstrip() if len(holder) != 0: blocks.append(holder) holder = {} key, value = line.split("=") holder[key.rstrip()] = value.lstrip() blocks.append(holder) return blocks def YOLOv3Net(cfgfile, model_size, num_classes): blocks = parse_cfg(cfgfile) outputs = {} output_filters = [] filters = [] out_pred = [] scale = 0 inputs = input_image = Input(shape=model_size) inputs = inputs / 255.0 for i, block in enumerate(blocks[1:]): # If it is a convolutional layer if (block["type"] == "convolutional"): activation = block["activation"] filters = int(block["filters"]) kernel_size = int(block["size"]) strides = int(block["stride"]) if strides > 1: inputs = ZeroPadding2D(((1, 0), (1, 0)))(inputs) inputs = Conv2D(filters, kernel_size, strides=strides, padding='valid' if strides > 1 else 'same', name='conv_' + str(i), use_bias=False if ("batch_normalize" in block) else True)(inputs) if "batch_normalize" in block: inputs = BatchNormalization(name='bnorm_' + str(i))(inputs) #if activation == "leaky": inputs = LeakyReLU(alpha=0.1, name='leaky_' + str(i))(inputs) elif (block["type"] == "upsample"): stride = int(block["stride"]) inputs = UpSampling2D(stride)(inputs) # If it is a route layer elif (block["type"] == "route"): block["layers"] = block["layers"].split(',') start = int(block["layers"][0]) if len(block["layers"]) > 1: end = int(block["layers"][1]) - i filters = output_filters[i + start] + output_filters[end] # Index negatif :end - index inputs = tf.concat([outputs[i + start], outputs[i + end]], axis=-1) else: filters = output_filters[i + start] inputs = outputs[i + start] elif block["type"] == "shortcut": from_ = int(block["from"]) inputs = outputs[i - 1] + outputs[i + from_] # Yolo detection layer elif block["type"] == "yolo": mask = block["mask"].split(",") mask = [int(x) for x in mask] anchors = block["anchors"].split(",") anchors = [int(a) for a in anchors] anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)] anchors = [anchors[i] for i in mask] n_anchors = len(anchors) out_shape = inputs.get_shape().as_list() inputs = tf.reshape(inputs, [-1, n_anchors * out_shape[1] * out_shape[2], \ 5 + num_classes]) box_centers = inputs[:, :, 0:2] box_shapes = inputs[:, :, 2:4] confidence = inputs[:, :, 4:5] classes = inputs[:, :, 5:num_classes + 5] box_centers = tf.sigmoid(box_centers) confidence = tf.sigmoid(confidence) classes = tf.sigmoid(classes) anchors = tf.tile(anchors, [out_shape[1] * out_shape[2], 1]) box_shapes = tf.exp(box_shapes) * tf.cast(anchors, dtype=tf.float32) x = tf.range(out_shape[1], dtype=tf.float32) y = tf.range(out_shape[2], dtype=tf.float32) cx, cy = tf.meshgrid(x, y) cx = tf.reshape(cx, (-1, 1)) cy = tf.reshape(cy, (-1, 1)) cxy = tf.concat([cx, cy], axis=-1) cxy = tf.tile(cxy, [1, n_anchors]) cxy = tf.reshape(cxy, [1, -1, 2]) strides = (input_image.shape[1] // out_shape[1], \ input_image.shape[2] // out_shape[2]) box_centers = (box_centers + cxy) * strides prediction = tf.concat([box_centers, box_shapes, confidence, classes], axis=-1) if scale: out_pred = tf.concat([out_pred, prediction], axis=1) else: out_pred = prediction scale = 1 outputs[i] = inputs output_filters.append(filters) model = Model(input_image, out_pred) model.summary() return model
That’s it for part 2 and see you in part 3.
I have another tutorial that I highly recommend reading. It provides detailed instructions on how to load and visualize the COCO dataset using custom code.
Parts:
Part-2, Parsing the YOLOv3 configuration file and creating the YOLOv3 network.
Part-3, Converting the YOLOv3 pre-trained weights into the TensorFlow 2.0 weights format.
Part-4, Encoding bounding boxes and testing this implementation with images and videos.
Credit link:
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/
What others say
An impressive share! I have just forwarded this onto a coworker who was doing a little research on this. Jayme Feodor Camel
Howdy! I simply want to offer you a big thumbs up for the great info you have got here on this post. I will be returning to your blog for more soon. Helaina Fraser Tillio
A big thank you for your article. Really thank you! Much obliged. Lottie Tod Dianemarie
Pana la urma vad cu tristete cum se consfinteste caderea Troiei 2.0. Gui Hobard Foote
Some really wonderful blog posts on this internet site , appreciate it for contribution. Meaghan Rip Rabin
Pretty! This has been an incredibly wonderful article. Many thanks for supplying this info. Trista Rutger Pickar
This article gives clear idea for the new people of blogging, that really how to do blogging and site-building. Natalee Sayres Yance
Thanks BrianSaf, sorry for the late reply. Stay safe out there. Katee Berty Antebi
Paragraph writing is also a fun, if you be acquainted with afterward you can write if not it is difficult to write. Emma Lannie Andrien
Hi there to all, how is all, I think every one is getting more from this web site, and your views are pleasant in favor of new visitors. Jaclin Michel Haveman
Im obliged for the article post. Really looking forward to read more. Really Great. Rebbecca Humfried Ericksen
Nice respond in return of this difficulty with firm arguments and explaining all on the topic of that. Violet Gregorius Pfeifer
I really liked your blog.Really looking forward to read more. Awesome.
Hey, I was wondering how I can put this code into Arduino Nano 33 BLE and what would the path of live video be if OV7675 camera in Arduino Tiny Machine Learning kit is being used
Hello,
Thank you for this amazing tutorial, it really helps me understand the YOLOv3 architecture and how to implement it with TensorFlow 2. Would you please be able to add a MIT License to your code on GitHub? I’d like to play around with this tutorial.