GSoC'22@CROSS

EfficientDet: Balancing Efficiency with accuracy

Code | Dataset | Notebook | Project | About Me

Why we chose the EfficientDet family of models and which compound coefficient ϕ you should choose

The aim of our object detector is to be as accurate as possible along with running in real-time. Traditionally, there has been an associated trade-off between either metric, either the accuracy or the speed. The EfficientDet family aims to build a scalable detection architecture with high accuracy and a reasonable computational footprint across a wide spectrum of resource constraints ranging from 3B to 300B FLOPS.

There are three major contributions of this paper:
BiFPN: A weighted bidirectional feature network for easy and fast multi-scale feature fusion. Compound scaling: A new method, which jointly scales up backbone, feature network, box/class network, and resolution
EfficientDet: A new family of detectors with significantly better accuracy and efficiency across a wide spectrum of resource constraints.
The model addresses the problem of efficient multi-scale feature fusion. Feature Pyramid Networks(FPN) are used for efficient multi-scale feature fusion. Not all features contribute equally to the output features and hence, EfficientDet detectors propose a new strategy for multi-scale feature fusion.
Additionally, model scaling is a well-known strategy to improve accuracy in object detection models by increasing the size of the backbone. Similar to compund scaling in EfficientNets EfficientDets propose a compound scaling coefficient that jointly scales up the resolution, depth, width for the backbone, feature network and box/class prediction network.
FPNs were introduced for to detect objects at multiple scales but these are computationally expensive. CNNs on the other hand, form an inherent heirarchial pyramid structure but lack the representational capacity due to low semantic features in high-resolution maps. FPNs overcome this by using a bottom-up and top-down pathway. High-level features are upsampled first and then combined with low-level features using a lateral connection. The BiFPN architecture learns weights while fusing feature maps of different scales using either unbounded fusion, softmax-based fusion and Fast-Normalized fusion. The backbone networks are ImageNet pretrained EfficientNets. The authors proposed a new compound scaling method for object detection, which uses a simple compound coefficient ϕ to jointly scale-up all dimensions of the backbone network, BiFPN network, class/box network, and resolution.

Click here to view an excellent blog with more details written, on the paper and click here to view the paper itself.

Back to Main

~ Email | CV | Github | LinkedIn ~