Cognitive Object Detection System by Fisheye Image Processing

Name: Cognitive Object Detection System by Fisheye Image Processing
Author: Chen Zhen

A wide-angle fisheye lens projects strong visual distortion on the image plane. Different from the perspective projection, the fisheye projection shows the object shape variations comparing with the human’s visual observation. It is challenging to recognize distorted objects either by the human beings or automation systems. This thesis aims to study the object classification and localization algorithm on fisheye image and build an efficient object detection architecture for large-scale fisheye images.
First of all, a synthetic fisheye image dataset is built using the equidistant projection, which reduces the time consumption of the fisheye image collection and labelling. Then a fisheye image classification model is trained on the developed synthetic fisheye image dataset. The classification model achieves the evaluation by both synthetic images and real-world images. Evaluation results prove that the trained model is available in realworld implementation. Through comparing deconvolution features between the perspective model and the fisheye model, the DCNN shows the manifest ability to learn deformed features from fisheye images. The achieved synthetic dataset is the first largescale synthetic fisheye image dataset for NN training, which is open access for research communities.
Secondly, a feature-based architecture and a knowledge-based architecture are evaluated separately on the performance of the fisheye image classification. The feature-based classification architecture combines the hand-craft feature extractor sRD-SIFT, BoVW and SVM classifier in the design. In the meanwhile, ResNet-50 represents the knowledge-based classification architecture in the test. Two classification models are trained by the developed synthetic fisheye images separately. The training speed and the classification accuracy are metrics on two model’s evaluation. Experimental results indicate that ResNet has significant advantages over SVM model for both evaluation metrics.
In the end, a rotation sensitive detector is developed for object detection in fisheye images. As the fundamental structure of the detector, a rotated bounding box is proposed to describe the boundary of an object instead of a horizontal bounding box.
RIoU is used as the matching metric between the ground truth and the prior. The overall framework is an end-to-end training architecture which inserts the distortion detection structure into the single-shot detection architecture YOLOv3. The proposed new detection architecture is evaluated on different metrics.