计算机视觉小记
这篇笔记我打算基于 tutorial 做一点 problem-based 的总结。P.S.,Kenneth 课教的好,人长得帅,还有一件 NERV 员工服!计算机视觉又是一个相当「EVA」的学科,已经可以想象他戴着墨镜做出碇司令的经典手势了。
This article is a self-administered course note.
It will NOT cover any exam or assignment related content.
Digital Image Processing
Key point 1: Spatial v.s. Gray-level resolution
- Sampling \(\to\) Spatial resolution (# pixels) \(\to\) Insufficient samples: sampling checkerboards
- Quantization \(\to\) Gray-level resolution (# intensity values) \(\to\) Insufficient gray levels: false contouring
Key point 2: Adjacency & Distance
4-邻接,8-邻接与 m(ixed)-邻接 [不能形成三角形]。

Euclidean 距离 (几何距),City-clock 距离 (\(|x_1-x_2|+|y_1-y_2|\)),Chessboard 距离 (\(\max(|x_1-x_2|,|y_1-y_2|)\)),\(D_m\) 距离 (m-path 距离)。
Key point 3: Convolution & Filters
Filtering 本质上是卷积,因此满足交换律与结合律。
Linear spatial filtering:
- Smoothing Filters — blurring, noise reduction. 系数之和为 1,plain 区域在 smoothing 后保持原状
- Sharpening Filters — highlight fine details. 系数之和为 0,plain 区域后在 sharpening 之后消失
Order-statistics filtering (一般来说 non-linear,因此不满足卷积的交换/结合律):
- Median filters. Reduce salt-and-pepper noise.
- Max filters. Reduce pepper noise (黑点为 0)
- Min filters. Reduce salt noise (白点为 1)
- Midpoint filters (\(\frac{1}{2}[\max+\min]\)). Reduce Gaussian noise or uniform noise.
Key point 4: Color models
RGB model,full-color image 的 pixel depth 为 24,一共有 \(2^{24}\) 种不同颜色。
YIQ model,Y 分量是辉度 (luminance) 分量,I 与 Q 是色度 (chrominance) 分量。I 与 Q 分量的系数之和均为 0,这是为了防止其捕捉到任何辉度信息 (\(R=G=B\) 时 pixel 所显示的是辉度)。
Feature Extraction
Keypoint 0:
首先注意一个 convention,图像矩阵以左上角为原点
\((0,0)\),横轴为 \(x\) 轴,纵轴为 \(y\) 轴。但 numpy
矩阵还是以传统的先行再列的形式访问,因此坐标为 \((x,y)\) 的像素在代码中应该写成
img[y][x]
。
Key point 1: Canny edge detection
Canny 边缘检测算法的步骤如下 (2D 边缘检测):
- smooth the image \(I\) by convolving with a 2D Gaussian kernel \(S=G_{\sigma}*I\)
- find the gradient of the smoothed image \(\nabla S\)
- non-maximal suppression: only selects edgels where \(||\nabla S||\) is greater than local values of \(||\nabla S||\) in the direction of \(\pm \nabla S\) (如果没有这一步,生成的边缘将会很厚。这一步相当于把边缘细化到脊线 ridge 上)
- threshold the edgels: only strong edgels with \(||\nabla S||\) above a certain value are retained
- hysteresis: some weak edges are revived if they span the gaps between some strong edgels (保证边缘的连续性从而留下一些弱边缘)
重要参数 \(\sigma\) (3.feature extraction pp.19 网格图):
- fine-scale Gaussian kernel (较小 \(\sigma\)):检测到更多,更细微的边缘,对 noise 敏感
- large-scale Gaussian kernel (较大 \(\sigma\)):suppress finer details
Aperture problem (孔径问题):为了分析图像的 motion 特征,我们还需要 corner detection 角点检测!

Key point 2: Harris corner detection
Harris 角点的定义:若局部窗口包含角点,则该窗口沿任意方向移动都有灰度跳变。(平坦区域:任意方向移动均无灰度跳变;边缘:垂直于边缘方向移动有灰度跳变)
以上定义的数学描述是图像窗口在向方向 \((u,v)\) 平移时的平方强度变化 squared change in intensity (这里的 \(w\) 一般是 2D 高斯滤波核,以分配更多注意给中心区域): \[ E(u,v)=\sum_{x,y}w(x,y)[I(x+u,y+v)-I(x,y)]^2 \] 经泰勒展开和矩阵分析将其近似为: \[ \begin{aligned} E(u,v)&\approx \begin{bmatrix}u&v\end{bmatrix}M\begin{bmatrix} u \\ v\end{bmatrix} \\ \text{where }M&=\sum_{x,y} w(x,y) \begin{bmatrix}I_x^2 & I_xI_y \\I_xI_y & I_y^2\end{bmatrix} \end{aligned} \] 根据基本特征值理论有 \(\lambda_1\leq E(u,v) \leq \lambda_2\),其中 \(\lambda_1,\lambda_2\) 分别是 \(M\) 的最小与最大特征值,表示窗口向各个方向移动获得的最小与最大灰度变化。据此有:
- case 0: \(\lambda_1\approx \lambda_2\approx 0\): 窗口移动灰度基本不变,为 featureless region
- case 1: \(\lambda_1\approx 0\), \(\lambda_2\) is large: 很显然这是边缘,\(\lambda_1\) 为平行边缘方向,\(\lambda_2\) 为垂直边缘方向
- case 2: \(\lambda_1\), \(\lambda_2\) are both large and distinct: 这是角点!
注意到 \(\lambda_1\lambda_2=\det(A), \lambda_1+\lambda_2=\text{trace}(A)\),基于此定义角点响应函数 cornerness function \(R\)。Harris 推荐将 \(k\) 设为 \(0.04\sim 0.06\)。 \[ R=\det(M)-k\cdot \text{trace}(M)^2 \] 数学 intuition 这样就差不多了,以下是 Harris 角点检测算法的步骤:
- convert color image to grayscale image
- compute gradient along \(x\), \(y\) axis at each pixel \(I_x,I_y\)
- form images of \(I_x^2,I_y^2\) and \(I_xI_y\), then smooth them with 2D Gaussian kernel
- form image of the cornerness function \(R\)
- locate local maxima in the image of \(R\) as corners
- compute the coordinates of the corners up to sub-pixel accuracy by quadratic approximation (二次曲线近似使得角点的位置精确到次像素程度)
- threshold the strong corners
代码如下。虽然背后的数学原理有点难,但实现起来并不复杂 [等 assignment 2 due 了以后再放出]:
1 | # To be released: after Mar.5 |
Perspective Projection
Reference
This article is a self-administered course note.
References in the article are from corresponding course materials if not specified.
Course info: COMP3317, Taught by Prof. Kenneth K.Y. Wong
Course textbook: Computer Vision – A Modern Approach, Forsyth & Ponce