Tutorial

Dog in the garden

In this tutorial is shown a step by step definition of the Aliquis Pipeline “Dog in the garden”.

A Background Subtractor algorithm is used to detect moving object in a complex scenario (like a garden) and, after some processing, recognize it (a dog, in this case) by a Convolutional Neural Network.

It’s a good starting point to get confidence with the main concepts of Aliquis!

The complete pipeline can be found here. The video and the image used in the tutorial can be downloaded here. The neural network Resnet152 is available here.

Define a pipeline and write an input stage

First of all we need to define a pipeline. In a text editor we can specify the pipeline’s name, an identifier of the pipeline itself, by writing:

name: "dog_in_garden"

We have just defined an empty pipeline named “dog_in_garden”.

This pipeline can’t do absolutely nothing, it hasn’t working stage(s). To work, an Aliquis pipeline needs at least one source stage able to load input data.

Let’s define a video source stage:

stages {
  name: "src"
  type: SOURCE_VIDEO
  source_param {
    path: "video/dog_in_garden.mp4"
  }
}

This stage, called “src”, simply load a video from a given path.

More generally, a stage definition has the following sintax:

stages {
  name: "stage_name"
  type: STAGE_TYPE
  input: "input_stage"
  stage_type_param {
    ...
  }
}

Each stage has an identifier (name), a type that defines which operation is performed and an input, an identifier of the input stage. Source stages are the only ones that don’t need an input. Finally, a :ref: list of arguments <aliquis.StageParameter> can be specified for each stage, to control it’s behavior.

Our stage needs only one parameter to simply load the video: path.

Save the pipeline with .apl extension, for example dog_in_garden.apl

Now we have a working pipeline, it loads a video from a path and outputs a stream of patches, one for each frame of the video.

We can visualize it using the host aliquispl_run. Running the following command, you can reproduce the video.

aliquispl_run -c1 dog_in_garden.apl

The -c argument enables a continuos stream of the patches, with a desired delay (milliseconds) from each other. If you don’t use it, you can scan the patches using the keyboard arrows.

Press ESC to stop the host.

Change colorspace

We want to work with gray images in the first section of this pipeline. Stage COLORSPACE allow to switch the color domain of the input patches. A gray converter stage can be written as follow:

stages {
  name: "src_gray"
  type: COLORSPACE
  input: "src"
  colorspace_param {
    color_format: GRAY
  }
}

This stage, named src_gray, take as input the output patches of the previous specified stage, src. We only need to specify in the stage parameter the desired color_format: GRAY

Now src_gray is the output stage of dog_in_garden pipeline. Running aliquispl_run as before we can visualize a gray version of the video (or, most correctly, a continuos stream of the pipeline’s output patches).

Load an image

To perform a simple background subtraction, we need a static image which represent it.

A very quick approach that seems working well whit our video, is to extract a frame from the initial sequence of the video. Actually, we haven’t a real static background, the wind moves the leaves and some shadows appears… but we fix this after in the pipeline.

Let’s load the background with another source stage:

stages {
  name: "bg"
  type: SOURCE_IMAGE
  source_param {
    path: path/to/image
    loop: true
  }
}

This stage looks very similar to the one used to load the video, and take the same parameters. In this case we need to enable the flag loop because we want to fill the pipeline continuosly with the image.

Also in this case is required a gray version of the input, in order to perform a subtraction with the source video. We can use the same stage as before, just with a different name and different input:

stages {
  name: "bg_gray"
  type: COLORSPACE
  input: "bg"
  colorspace_param {
    color_format: GRAY
  }
}

In this case we use the source image stage, bg, as input stage.

Now we have two output stages: src_gray and bg_gray. If we run aliquispl_run in the usual way, two streaming outputs are displayed.

Note that the stage definition order doesn’t matter on the operational sequence performed by the pipeline. The workflow is entirely decided by the input - name definitionThe multi-source graph-based workflow is entirely defined by input - name links. However, it is a good idea to keep a consistent layout to avoid mistakes.

Arithmetic operation: subtraction

At this point we have all the necessary to perform a subtraction between the video and the static background. We can do this using a general stage ARITHMETIC_LOGIC which perform both arithmetic and logic operations:

stages {
  name: "bg_subtract"
  type: ARITHMETIC_LOGIC
  input: "src_gray"
  input: "bg_gray"
  arithmetic_logic_param {
    op: SUBTRACT
    clipping: true
  }
}

This stage can take two inputs, if required by the operation like in our case. The operation performed is specified in the op parameter. SUBTRACT subtract the second input from the first one, pixel-wise. We also find necessary to obtain an image in a 0-255 range: clipping flag attend to this task.

Run aliquispl_run to see the resulting output patches. Now we have just one output stage!

Threshold

We need a binary image to draw the countours of the moving object and extract a bounding-box containing it. A bitmask can be obtained with a basic threshold algorithm.

In Aliquis, a threshold can be computed whith a specific stage:

stages {
  name: "bg_mask"
  input: "bg_subtract"
  type: THRESHOLD
  threshold_param {
    type: BINARY
    threshold: 30
  }
}

It takes as parameters the algorithm type and the threshold value.

Running aliquispl_run again, a bitmask of the moving object appears… threshold value of 30 working well, but a lot of noises is present due to the moving leaves.

In the next step we try to remove it with a bit of morphological operations.

Morphological operations

The first morphological operations that we can perform in order to remove noises is opening.

Aliquis comes with a generic morphological stages which includes a lot of basic operations. Let’s write opening:

stages {
    name: "opening"
    type: MORPHOLOGY
    input: "bg_mask"
    morphology_param {
        type: OPEN
        iterations_num: 1
        kernel {
            shape_str: "7x7"
            d:0   d:0   d:255 d:255 d:255 d:0   d:0
            d:0   d:255 d:255 d:255 d:255 d:255 d:0
            d:255 d:255 d:255 d:255 d:255 d:255 d:255
            d:255 d:255 d:255 d:255 d:255 d:255 d:255
            d:255 d:255 d:255 d:255 d:255 d:255 d:255
            d:0   d:255 d:255 d:255 d:255 d:255 d:0
            d:0   d:0   d:255 d:255 d:255 d:0   d:0
        }
    }
}

In MORPHOLOGY stage we need to specify a kernel and the number of iterations. To define kernel we have to specify a string with the dimension of the kernel itself and the single values of the elements componing the kernel.

The stage above use 7x7 octagonal kernel to carry out an opening, defined by the type: OPEN, with one iteration (note that type defining the operation is not the same which define the stage).

This operation has removed a lot of noise. Verify it running :codec:’aliquispl_run’ as usual.

To make sure to remove all areas in which we aren’t interesting, too big to be removed by opening, we can do an area-based check. This is a more advanced operation that allow to remove all the connected area bigger than a certain threshold. Formally, it’s not a morphological operation, but in this case is an useful expedient:

stages {
    name: "remove_areas"
    type: REMOVE_AREAS
    input: "opening"
    remove_areas_param {
        threshold_min: 8000
    }
}

This stage, taking a binary image, compute all the connected area and removes every regions that exceed a pixel area of 8000. More generally, you can also define an area range instead of a single value.

Note that using this stage alone, instead of opening, produce a similar result, in this case… but is a very computational expensive solution.

Finally, to ensure that our detected area include most of the interest object and doesn’t have disconnected region, we can perform an additional dilation operation with the same kernel and some additional iteration.

To do this we can just copy&paste the previous stage, changing only type and itearations_num in the morphology_param:

stages {
    name: "dilate"
    type: MORPHOLOGY
    input: "remove_areas"
    morphology_param {
        type: DILATE
        iterations_num: 5
        kernel {
            shape_str: "7x7"
            d:0   d:0   d:255 d:255 d:255 d:0   d:0
            d:0   d:255 d:255 d:255 d:255 d:255 d:0
            d:255 d:255 d:255 d:255 d:255 d:255 d:255
            d:255 d:255 d:255 d:255 d:255 d:255 d:255
            d:255 d:255 d:255 d:255 d:255 d:255 d:255
            d:0   d:255 d:255 d:255 d:255 d:255 d:0
            d:0   d:0   d:255 d:255 d:255 d:0   d:0
        }
    }
}

You can now visualize the obtained mask running the aliquispl_run host.

Segmentation

In order to feed a Convolutional Neural Network for the classification of our detected object, we need to extract the portion of the original image, frame by frame, described by the mask previously obtained.

The stage SEGMENTATION can help us with this task:

stages {
    name: "segmentation"
    type: SEGMENTATION
    input: "src"
    input: "dilate"
    segmentation_param {
        type: CONTOURS
        shape: POLYGON
        convex_hull: true
    }
}

First of all we can notice that we have two inputs: the original source src and the output of the last written stage, dilate.

Take a look on the parameters used for this stage. First of all, we want to draw the contours of our detected object, so we specify it with CONTOURS. Our object is a generic shaped ones, so draw POLYGON contours can be a good idea that fit well with a generic shaped mask. Enable convex_hull ensure to have contours without concavity.

Now, to visualize the result of this stage, we must run the host with an additional argument: aliquispl_run -d -c1 dog_in_garden.apl. The -d argument decorate the patches with the additional information, contours in our case.

Extract a bounding box

Convolutional Neural Networks require rectangular or squared patches in input. In particular, Residual Neural Network take as input images of shape 227x227. We need to extract this squared box from the informations collected in the segmentation stage.

First of all we can extract the bounding box using CUT_OUT stage:

stages {
    name: "cut_out"
    type: CUT_OUT
    input: "segmentation"
    cut_out_param {
        type: SQUARED
        border_policy: KEEP
        margin_x_rel: 0.1
        margin_y_rel: 0.1
    }
}

We choose to extract a SQUARED box which contains the finded contours, taking an extra margin of 10% along x and y. KEEP policy on border allow the extraction of the bounding box even if part of it fall outside the image. Choosing a DISCARD policy produce an output only when the object (extra margins include) is entirely in the scene.

The output of our pipeline has, in this case, an output of patches of different dimensions. Run simply aliquispl_run -c1. Until the dog enter on the scene we have no output! Add -d to watch the contours inside it.

At least, resize this patches at the desired dimensions:

stages {
  name: "cut_out_resized"
  type: RESIZE
  input: "cut_out"
  resize_param {
    width: 227
    height: 227
  }
}

RESIZE stage do nothing more than rescaling the input patches at the specified width and height. Relative dimensions can also be specified instead absolute ones… but is not this the case.

Now the patches can be classified by the network.

Classification

Aliquis allow to use network defined with the framework Caffe. In this case we wnt to use a robust net like ResNet152 to recognize the detected object in the previous stages.

All we need to do is to load the prototxt and the caffemodel in the NEURAL_NETWORK stage:

stages {
 name: "net"
 type: NEURAL_NETWORK
 input: "cut_out_resized"
 neural_network_param {
   deploy: "models/resnet152/deploy.prototxt"
   model: "models/resnet152/resnet.caffemodel"
   mean_file: "models/resnet152/resnet_mean.binaryproto"
   gpu: true
 }
}

In deploy parameter we specify the path of the desired network architecture written in a prototxt and in the model parameter the associated pretrained weights. Resnet152 required also a binaryproto file containing the mean file to normalize the input images. We can load it using mean_file argument.

gpu flag, if true, run the network on gpu. Cpu computation can be very slow for this kind of architectures.

The netwrk now classify the input patches and output the patche with additional classification informations: class number and value.

Let’s show this result with: aliquispl_run -d -c1 dog_in_garden.apl.

Instead of class number, we can use the classe names of the ImageNet database, loading them from a txt file which contain them in the correct order.

We only have to add this file using -n argument: aliquispl_run -d -c1 -n class_names.txt dog_in_garden.apl

Finally we want to show the entire results on the original image. Generally -s argument provide to this task, remapping the ouput of the pipeline on the source. In our case, where two source stages are present, -S stage_name show a more clean result, showing only the desired source.

aliquispl_run -S src -d -c1 -n class_names.txt dog_in_garden.apl