计算机视觉小记

这篇笔记我打算基于 tutorial 做一点 problem-based 的总结。P.S.,Kenneth 课教的好,人长得帅,还有一件 NERV 员工服!计算机视觉又是一个相当「EVA」的学科,已经可以想象他戴着墨镜做出碇司令的经典手势了。


  This article is a self-administered course note.

  It will NOT cover any exam or assignment related content.


Digital Image Processing

Key point 1: Spatial v.s. Gray-level resolution

  • Sampling \(\to\) Spatial resolution (# pixels) \(\to\) Insufficient samples: sampling checkerboards
  • Quantization \(\to\) Gray-level resolution (# intensity values) \(\to\) Insufficient gray levels: false contouring

Key point 2: Adjacency & Distance

4-邻接,8-邻接与 m(ixed)-邻接 [不能形成三角形]。

中间 1 与右上 1 是 8-邻接的,但不是 m-邻接的

Euclidean 距离 (几何距),City-clock 距离 (\(|x_1-x_2|+|y_1-y_2|\)),Chessboard 距离 (\(\max(|x_1-x_2|,|y_1-y_2|)\)),\(D_m\) 距离 (m-path 距离)。

Key point 3: Convolution & Filters

Filtering 本质上是卷积,因此满足交换律与结合律。

Linear spatial filtering:

  • Smoothing Filters — blurring, noise reduction. 系数之和为 1,plain 区域在 smoothing 后保持原状
  • Sharpening Filters — highlight fine details. 系数之和为 0,plain 区域后在 sharpening 之后消失

Order-statistics filtering (一般来说 non-linear,因此不满足卷积的交换/结合律):

  • Median filters. Reduce salt-and-pepper noise.
  • Max filters. Reduce pepper noise (黑点为 0)
  • Min filters. Reduce salt noise (白点为 1)
  • Midpoint filters (\(\frac{1}{2}[\max+\min]\)). Reduce Gaussian noise or uniform noise.

Key point 4: Color models

RGB model,full-color image 的 pixel depth 为 24,一共有 \(2^{24}\) 种不同颜色。

YIQ model,Y 分量是辉度 (luminance) 分量,I 与 Q 是色度 (chrominance) 分量。I 与 Q 分量的系数之和均为 0,这是为了防止其捕捉到任何辉度信息 (\(R=G=B\) 时 pixel 所显示的是辉度)。


Feature Extraction

Keypoint 0:

首先注意一个 convention,图像矩阵以左上角为原点 \((0,0)\),横轴为 \(x\) 轴,纵轴为 \(y\) 轴。但 numpy 矩阵还是以传统的先行再列的形式访问,因此坐标为 \((x,y)\) 的像素在代码中应该写成 img[y][x]

Key point 1: Canny edge detection

Canny 边缘检测算法的步骤如下 (2D 边缘检测):

  • smooth the image \(I\) by convolving with a 2D Gaussian kernel \(S=G_{\sigma}*I\)
  • find the gradient of the smoothed image \(\nabla S\)
  • non-maximal suppression: only selects edgels where \(||\nabla S||\) is greater than local values of \(||\nabla S||\) in the direction of \(\pm \nabla S\) (如果没有这一步,生成的边缘将会很厚。这一步相当于把边缘细化到脊线 ridge 上)
  • threshold the edgels: only strong edgels with \(||\nabla S||\) above a certain value are retained
  • hysteresis: some weak edges are revived if they span the gaps between some strong edgels (保证边缘的连续性从而留下一些弱边缘)

重要参数 \(\sigma\) (3.feature extraction pp.19 网格图):

  • fine-scale Gaussian kernel (较小 \(\sigma\)):检测到更多,更细微的边缘,对 noise 敏感
  • large-scale Gaussian kernel (较大 \(\sigma\)):suppress finer details

Aperture problem (孔径问题):为了分析图像的 motion 特征,我们还需要 corner detection 角点检测!

edge 究竟朝哪个方向移动呢?如果可以看到 endpoint 我们就能确定了!

Key point 2: Harris corner detection

Harris 角点的定义:若局部窗口包含角点,则该窗口沿任意方向移动都有灰度跳变。(平坦区域:任意方向移动均无灰度跳变;边缘:垂直于边缘方向移动有灰度跳变)

以上定义的数学描述是图像窗口在向方向 \((u,v)\) 平移时的平方强度变化 squared change in intensity (这里的 \(w\) 一般是 2D 高斯滤波核,以分配更多注意给中心区域): \[ E(u,v)=\sum_{x,y}w(x,y)[I(x+u,y+v)-I(x,y)]^2 \] 经泰勒展开和矩阵分析将其近似为: \[ \begin{aligned} E(u,v)&\approx \begin{bmatrix}u&v\end{bmatrix}M\begin{bmatrix} u \\ v\end{bmatrix} \\ \text{where }M&=\sum_{x,y} w(x,y) \begin{bmatrix}I_x^2 & I_xI_y \\I_xI_y & I_y^2\end{bmatrix} \end{aligned} \] 根据基本特征值理论有 \(\lambda_1\leq E(u,v) \leq \lambda_2\),其中 \(\lambda_1,\lambda_2\) 分别是 \(M\) 的最小与最大特征值,表示窗口向各个方向移动获得的最小与最大灰度变化。据此有:

  • case 0: \(\lambda_1\approx \lambda_2\approx 0\): 窗口移动灰度基本不变,为 featureless region
  • case 1: \(\lambda_1\approx 0\), \(\lambda_2\) is large: 很显然这是边缘,\(\lambda_1\) 为平行边缘方向,\(\lambda_2\) 为垂直边缘方向
  • case 2: \(\lambda_1\), \(\lambda_2\) are both large and distinct: 这是角点!

注意到 \(\lambda_1\lambda_2=\det(A), \lambda_1+\lambda_2=\text{trace}(A)\),基于此定义角点响应函数 cornerness function \(R\)。Harris 推荐将 \(k\) 设为 \(0.04\sim 0.06\)\[ R=\det(M)-k\cdot \text{trace}(M)^2 \] 数学 intuition 这样就差不多了,以下是 Harris 角点检测算法的步骤:

  • convert color image to grayscale image
  • compute gradient along \(x\), \(y\) axis at each pixel \(I_x,I_y\)
  • form images of \(I_x^2,I_y^2\) and \(I_xI_y\), then smooth them with 2D Gaussian kernel
  • form image of the cornerness function \(R\)
  • locate local maxima in the image of \(R\) as corners
  • compute the coordinates of the corners up to sub-pixel accuracy by quadratic approximation (二次曲线近似使得角点的位置精确到次像素程度)
  • threshold the strong corners

代码如下。虽然背后的数学原理有点难,但实现起来并不复杂 [等 assignment 2 due 了以后再放出]:

1
# To be released: after Mar.5


Perspective Projection


Reference

  This article is a self-administered course note.

  References in the article are from corresponding course materials if not specified.

Course info: COMP3317, Taught by Prof. Kenneth K.Y. Wong

Course textbook: Computer Vision – A Modern Approach, Forsyth & Ponce

-----------------------------------そして、次の曲が始まるのです。-----------------------------------