REACT: A Novel AI Approach that Leverages Both Edge and Cloud Resources to Improve Live Video Analytics Applications


The Internet is transitioning toward an edge-computing architecture to accommodate latency-sensitive DNN workloads in the developing Internet of Things and mobile computing application domains. Unfortunately, large, high-accuracy DNN models cannot be operated on the edge due to their lack of computing capabilities, unlike in cloud environments. Therefore, previous efforts have concentrated on shifting some computing to the cloud to circumvent this restriction. However, this results in longer delays.

A new Microsoft research proposes REACT, a unique architecture that uses the edge and the cloud in tandem to execute redundant calculations. To enhance detection quality without compromising latency, they fuse the cloud inputs received asynchronously into the stream of computation at the edge. This allows leveraging the cloud’s precision without sacrificing the edge’s low latency.

The team uses a two-pronged approach to solving the problems of poor edge computing capacity and accuracy loss due to edge models. 

  • To begin, edge object identification must be called only once every few frames because of the spatiotemporal correlation between successive video frames. Edge detection occurs every fifth frame. They use a rather lightweight operation of object tracking to bridge the gap between the two sets of frames.
  • Second, only certain frames are sent to the cloud asynchronously to boost inference precision. Depending on network latency and cloud resource availability, edge devices don’t get cloud detections for a few frames afterward. 
  • Next, the most recent, previously unreported cloud detections are combined with the present image. To “fast forward” to the current time, they use the cloud detection generated in an older frame and feed it into a second instance of the object tracker. As long as there is no drastic change in the scene, the newly identified items can be integrated into the current frame. 

The team applied this method to a dataset of dashcam videos. Their experiments used cutting-edge computer vision methods to obtain local and remote item detections. In addition, they employ the widely used statistic from the field of computer vision known as [email protected] (mean average precision at 0.5 IoU) to evaluate the quality of the object detections. They also looked at two datasets to determine how effective REACT was:

  1. As a Drone-Based Surveillance System, VisDrone
  2. The D2City system is a dashcam-based driving assistance system.

The results from their testing show that REACT can provide up to 50% better results than baseline methods. They also demonstrate that edge and cloud models can complement one another and that the proposed edge-cloud fusion approach can boost performance in general.

In addition to the light object tracking done on intermediate frames, the object detector only runs once per few frames. By duplicating detection between the edge and the cloud, developers have more leeway in choosing how often to run their applications on each platform while maintaining the same level of detection accuracy.

The researchers also highlight that having several edge devices use the same cloud-hosted model may spread the expense of using cloud resources over a larger population. In particular, the V100 GPU can support more than 60 concurrent devices simultaneously, assuming the application can endure a median latency of up to 500 ms.

While this work has mostly discussed its application to object detection, the team believes it can be applied in other situations, including human pose-estimation, instance, and semantic segmentation applications, for the “best of both worlds.”


https://healthmedicinet.com/business/react-a-novel-ai-approach-that-leverages-both-edge-and-cloud-resources-to-improve-live-video-analytics-applications/


Leave a Reply

Your email address will not be published. Required fields are marked *