Researchers discover and classify several items in the absence of photos

Physics

 Without capturing photos or requiring
extensive scene reconstruction, researchers have created a new fast method for
detecting the location, size, and category of many items. The new method may be
helpful for spotting potential roadblocks because it drastically reduces the
amount of processing power usually required for object detection.

 

Researchers at the Beijing Institute of
Technology in China were able to efficiently and robustly detect several
objects from a limited number of 2D measurements because their method relies on
a single-pixel detector, as explained by team leader Liheng Bian. According to
experts, “the issues of heavy communication load, high computing overhead,
and low perception rate of existing visual perception systems are expected to
be resolved by this type of image-free sensing technology.”


 

 Classification, single-object recognition,
and tracking are currently the only things that can be accomplished by
image-free perception methods. Researchers came up with a method called
image-free single-pixel object detection (SPOD) to do all three simultaneously.
They found that SPOD has an object detection accuracy of nearly 80%, as
reported in Optics Letters. The SPOD method is an advancement in the research
team’s prior work in the area of imaging-free sensing technology for effective
scene perception. In the past, they have used a single-pixel detector to do
image-free classification, segmentation, and character recognition.

 

“For autonomous driving, SPOD could be
used with lidar to help improve scene reconstruction speed and object detection
accuracy,” Bian explained. As the authors put it, “We believe that it
has a high enough detection rate and accuracy for autonomous driving, while
also reducing the transmission bandwidth and computing resource requirements
needed for object detection.”

 

Detection without images

 Detailed photographs of a scene are
typically required to extract the elements necessary to identify an object in
order to automate advanced visual tasks, such as navigation or tracking a
moving plane. There is a significant computational cost, a long-running time,
and a heavy data transfer burden because this demands either advanced imaging
technology or complicated reconstruction techniques. Because of this, it’s
possible that standard image-first, perception-second methods aren’t the most
effective for object detection. One way to reduce the amount of processing time
needed for object recognition is to use an image-free sensing method that
relies on single-pixel detectors. Single-pixel imaging illuminates the scene
with a sequence of structured light patterns and then records the transmitted
light intensity to collect the spatial information of objects as opposed to
using a pixelated detector like a CMOS or CCD. This data is subsequently used
for computational purposes, such as recreating the item or determining its
attributes.

 Researchers employed SPOD to rapidly scan
an entire scene and generate 2D measurements by employing a modest yet optimized
structured light pattern. The high-dimensional informative aspects of the
picture are extracted by feeding the measurements into a deep-learning model
called a transformer-based encoder. A multi-scale attention network-based
decoder receives these features and concurrently returns the class, location,
and size of all targets in the scene.

 

Lintao Peng, one of the researchers,
remarked that the compact, optimized pattern yielded higher image-free sensing
performance than the full-size pattern utilized by other single-pixel detection
approaches. Additionally, the SPOD decoder’s multi-scale attention network
helps focus on the desired part of the scene. This paves the way for improved
scene feature extraction, which in turn facilitates state-of-the-art object
detection capabilities.


 

Proof-of-concept demonstration

 The researchers constructed an experimental
proof-of-concept for SPOD. Target scenes were photographs of randomly chosen
images from the Pascal Voc 2012 exam collection. Spatial light modulation and
image-free object recognition with SPOD took an average of 0.016 seconds per
scene when a sampling rate of 5% was used.

 

Compare that to first reconstructing the
scene (which takes 0.05 seconds) and then searching for objects (which takes
0.018 seconds). The overall average detection accuracy for all object classes
in the test dataset using SPOD was 82.2%. The current object detection dataset
used to train the model only has 80 categories, hence SPOD cannot recognize
every potential object type at this time, as stated by Peng. Image-free
multi-object recognition of new target classes for applications like
pedestrian, car, or boat detection is possible by fine-tuning the pre-trained
model for a given task. Next, the team plans to develop reconstruction-free
sensing technology by applying the image-free perception method to new types of
detectors and computational acquisition systems.

 

Scroll top