in Blog Posts, C/C++, Computer Vision (机器视觉), Solution

[OpenCV]detectMultiScale

I met a problem when using the interface ‘detectMultiScale’ of OpenCV. The rectangles it gives out may not be fully inside the frame of the original image. As a result, if these rectangles are applied directly on the original image to crop out the detected objects, your programs crash.

These are the interfaces

virtual void detectMultiScale( const Mat& image,
                               CV_OUT vector& objects,
                               double scaleFactor=1.1,
                               int minNeighbors=3, int flags=0,
                               Size minSize=Size(),
                               Size maxSize=Size() );

and

virtual void detectMultiScale( const Mat& image,
                               CV_OUT vector& objects,
                               vector& rejectLevels,
                               vector& levelWeights,
                               double scaleFactor=1.1,
                               int minNeighbors=3, int flags=0,
                               Size minSize=Size(),
                               Size maxSize=Size(),
                               bool outputRejectLevels=false );

I am copying what I wrote for a pull request on Github, this ‘may-be issue’ can be fixed easily by modifying one line in the source file

modules/objdetect/src/cascadedetect.cpp

Replace this one

Size processingRectSize( scaledImageSize.width - originalWindowSize.width + 1, scaledImageSize.height - originalWindowSize.height + 1 );

with this line

Size processingRectSize( scaledImageSize.width - originalWindowSize.width , scaledImageSize.height - originalWindowSize.height);

My explanation goes here, ignore the line numbers if they look wrong to you.
“Actually, in the code, the workflow is more complicated. In the file cascadedetect.cpp

This is the line building the final detected rectangle

995 rectangles->push_back(Rect(cvRound(x*scalingFactor), cvRound(y*scalingFactor), winSize.width, winSize.height));

the winSize is assigned here

969 Size winSize(cvRound(classifier->data.origWinSize.width * scalingFactor), cvRound(classifier->data.origWinSize.height * scalingFactor));

while the maximum value of x and y can be find here, they are related to the processingRectSize.

971         int y1 = range.start * stripSize;
972         int y2 = min(range.end * stripSize, processingRectSize.height);
973         for( int y = y1; y < y2; y += yStep )
974         {
975             for( int x = 0; x < processingRectSize.width; x += yStep )

Say the original image size is O, the original window size is W, scaling factor is F. O and W is integer and F is a decimal usually larger than 1. The width and height are assumed to be the same for example.

If we calculate the right-most point of the detected rectangle, it should be:

the maximum x is: (cvRound(O/F) - W), current winSize is W*F, following the line 995 we get:

cvRound( (cvRound(O/F) - W) * F ) + W*F

This can be larger than O, say O is 600, F is 4.177250, W is 24, the number we can get above is 601.254 which is larger than 600."

Hope these help.