Home

Robert Wang

ezSIFT: an easy-to-use standalone SIFT library


Project overview

The SIFT (scale-invariant feature transform) algorithm has been considered as one of the most robust local feature detector and description. Many open-source SIFT implementation rely on some 3rd-party libraries. These dependencies make the installation, compilation and usage not easy.

The ezSIFT library provides a standalone and lightweight SIFT implementation written in C/C++. The ezSIFT is self-contained, and does not require any other libraries. So it is easy to use and modify. Besides, the implementation of the ezSIFT is straightforward and easy to read.

The implementation of a few functions in this library refers to the implementations from OpenCV and VLFeat.
OpenCV, http://opencv.org/
VLFeat, http://www.vlfeat.org/

If you use any code from the ezSIFT library in your research work, we expect you to cite this project:
Guohui Wang, ezSIFT: an easy-to-use standalone SIFT library, https://sourceforge.net/p/ezsift, 2013.

For those who are interested in GPU implementations of the SIFT algorithm on mobile devices, the following two papers can provide more details:

  1. Blaine Rister, Guohui Wang, Michael Wu and Joseph R. Cavallaro, "A Fast and Efficient SIFT Detector using the Mobile GPU", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013. (GPU implementation using OpenGL ES; performance benchmarked on Google Nexus 7, Samsung Galaxy Note II, Qualcomm Snapdragon S4 pro, NVIDIA Tegra 2.)
  2. Guohui Wang, Blaine Rister, and Joseph R. Cavallaro, "Workload Analysis and Efficient OpenCL-based Implementation of SIFT Algorithm on a Smartphone", IEEE Global Conference on Signal and Information Processing (GlobalSIP), December 2013. (OpenCL-based GPU implementation; benchmarked on Qualcomm Snapdragon MSM8064 SoC.)

Function interface
    int sift_cpu(const ImageObj<uchar> &image, list<SiftKeypoint> & kpt_list, 
                bool bExtractDescriptors);

image: input image which is defined as an object of ImageObj class. ImageObj is defined as follows:

    template <typename T> 
    class ImageObj
    {
    public:
        int w; //image width;
        int h; //image height.
        T * data; //stores the raw pixal value; sift_cpu() requires "data" to store an vector of grayscale uchar values.
    }

The ImageObj class provides basic image reading/writing functions for pgm and ppm images. If you want to work with more general image formats (such as jpeg, png and so on), you can easily find an image compression library and help you read images. If the original image is a color image, you need some preprocessing to convert the image to grayscale format.
kpt_list: a list of detected keypoints. It contains the position, scale, orientation and feature descriptor of each keypoint.
bExtractDescriptors: indicates if you want to extract the feature descriptor. If bExtractDescriptors=true, the generated kpt_list will contain feature descriptors. If bExtractDescriptors=false, no feature descriptor is generated, and the kpt_list only contains keypoint information (position, scale, orientation).


ezSIFT usage and examples

Two application examples are included in the source code package:
- feature_detection.cpp
- image_match.cpp
They show how to use the ezSIFT library. Basically, you just include ezSIFT.h, then you can call sift_cpu() function to detect and extract feature points.

Example 1: keypoint/feature detection
ImageObj<uchar> image;  
image.read_pgm("input.pgm");

bool bExtractDescriptor = true;
list<SiftKeypoint> kpt_list;
// Perform SIFT computation on CPU.
sift_cpu(image, kpt_list, bExtractDescriptor);
// Generate output image
draw_keypoints_to_ppm_file("output.ppm", image, kpt_list);
// Generate keypoints list
export_kpt_list_to_file("output.key", kpt_list, bExtractDescriptor);

Input (left), result image (right)
(The input image courtesy of Affine Covariant Features Project at Oxford Univ.)
graf1-keypoints

Example 2: Feature matching

This example is a simple demonstration of feature matching, based on brute-force matching. If you want more accurate matching results, you should consider other matching algorithms.

ImageObj<uchar> image1, image2;
image1.read_pgm("file1.pgm");
image2.read_pgm("file2.pgm");

// Detect keypoints
list<SiftKeypoint> kpt_list1, kpt_list2;
bool bExtractDescriptor = true;
sift_cpu(image1, kpt_list1, bExtractDescriptor);
sift_cpu(image2, kpt_list2, bExtractDescriptor);

// Match keypoints.
list<MatchPair> match_list;
match_keypoints(kpt_list1, kpt_list2, match_list);

// Draw result image.
draw_match_lines_to_ppm_file("output_file.ppm", image1, image2, match_list);
printf("Number of matched keypoints: %d\n", match_list.size());

Input images:
(The input images courtesy of Affine Covariant Features Project at Oxford Univ.)
graf1-keypoints
Output images:
grafs-keypoints
Feature matching results:
matching


Compare with Lowe's implementation, OpenCV, and VLFeat.

The links for the implementations under comparison:
1. Lowe's implementation: http://www.cs.ubc.ca/~lowe/keypoints/
2. OpenCV: http://opencv.org/
3. VLFeat: http://www.vlfeat.org/

Instead of using images containing real-world scenes, we generated a few images with different simeple shapes. The reason I choose these images is that they show very strong/sharp contours which are naturally good keypoints for image matching. A good SIFT detector should be able to detect most of them. By using these images, we can easily tell the performance difference among the SIFT implementations.

Comparison test 1

Input image:
img

Keypoint detection results:
img
The results from ezSIFT and Lowe's are close, and the performance is better than the other two implementations. Apparently, some key features are missing in OpenCV and VLFeat results. (red squares indicate the missing keypoints).

Rotated image:
img

img
Again, ezSIFT and Lowe's implementation generate similar keypoints, which have better performance than the other two implementations. Some keypoints are missing in OpenCV and VLFeat results (red squares in the figure). Moreover, when taking a close look, we can notice that many small scale features on the boundary of the blobs are missing in result images of the OpenCV implementation.

Feature matching results:
img
The keypoint matching algorithm is not the focus of this SIFT library. So, the brute-force matching is used due to its simplicity to demonstrate the correctness of the SIFT keypoints and descriptors. Using other matcher may generate better matching results.

img

img

img
The ezSIFT and Lowe's have found comparable number of matches, and all matches are correct. While, the OpenCV and VLFeat generate many false matches.
Please notice that feature matching accuracy depends on the settings of matching threshold, as well as the detector parameter settings. Here, for all implementations, we use the default settings coming with the softwares. If you put time to fine-tune the parameters, the results from all these implementations may improve.


Comparison test 2

Input image:
img

Keypoint detection results:
img
ezSIFT, Lowe's, and VLFeat have similar features. Specifically, the ezSIFT and VLFeat implementations generate almost the same features. While, some key features are missing in the OpenCV results. And OpenCV generates some random features.

Rotated image:
img
For this image, all four implementations generate most of the major keypoint features. Lowe's and OpenCV generate slightly more features than ezSIFT and VLFeat.

img

Feature matching results:
img
The keypoint matching algorithm is not the focus of this SIFT library. So, the brute-force matching is used due to its simplicity to demonstrate the correctness of the SIFT keypoints and descriptors. Using other matcher may generate better matching results.

img

img

img
Similar to the previous comparison test, Lowe's and the ezSIFT give better matching results. OpenCV and VLFeat have some false matches.
Again, please notice that feature matching accuracy depends on the settings of matching threshold, as well as the detector parameter settings. Here, for all implementations, we use the default settings coming with the softwares. If you put time to fine-tune the parameters, the results from all these implementations may improve.

References

  1. David Lowe, Demo Software: SIFT Keypoint Detector, http://www.cs.ubc.ca/~lowe/keypoints/.
  2. David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
  3. Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image, David G. Lowe, US Patent 6,711,293 (March 23, 2004). Provisional application filed March 8, 1999. Asignee: The University of British Columbia.
  4. VLFeat, http://www.vlfeat.org/.
  5. OpenCV, http://opencv.org/.
  6. OpenSIFT, http://robwhess.github.io/opensift/.

Patent Notice: The following patent has been issued for methods embodied in this software: "Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image," David G. Lowe, US Patent 6,711,293 (March 23, 2004). Provisional application filed March 8, 1999. Asignee: The University of British Columbia. For further details, contact David Lowe (lowe@cs.ubc.ca) or the University-Industry Liaison Office of the University of British Columbia.