加密学之叁 - Hard Problems

COMP3357 digest 的 P3 部分,主要介绍密码学中的几个困难问题 (hard problems) 与一个重要的 cryptographic primitive: 单向函数 (one-way function).

这一 Part 涉及到的困难问题 (Factoring problem, RSA problem, Discrete Logarithm problem, Diffie-Hellman problem) 的构建与 P2 中的数论,群论知识密切相关,注意参考。

同样,与 CnBlogs 中的笔记不同,这里不会详细的记载每个相关的知识点;只会尝试简略地总结一些能让未来的自己也会感到有趣,巧妙的内容 (Eureka!)。


  This article is a self-administered course note.

  It will NOT cover any exam or assignment related content.


Hardness of Mathematical Problems

When discussing hardness of mathematical problems with input some integer \(N\), we study asymptotics in terms of the input length.

Integer \(N\) will be assumed to be represented in binary. The running time of an algorithm taking as input an integer \(N\) is measured in terms of \(||N||\), the length of the binary representation of \(N\).

Note that \(||N||=\lfloor \log N \rfloor+1\).

A problem is considered to be easy if the running time of its algorithm is polynomial in \(||N||\).


Eureka moment!

这里对数学问题困难性 (hardness) 的定义是建立在算法数论 (algorithmic number theory) 的视角上的。

实际上,这个概念更贴近计算机领域而不是纯数学领域。如果从算法时间复杂度的角度来定义,就好理解多了:一个时间复杂度为 \(O(\mathtt{poly}(n))\) 的算法是高效 (efficient) 的,对应的问题是简单 (easy) 的。而时间复杂度为 \(O(\mathtt{exp}(n))\) 的算法是低效 (inefficient) 的,对应的问题是困难 (hard) 的。


Factoring Problem

Factoring could be viewed as the inverse operation of multiplication. Multiplying two primes \(p\) and \(q\) as \(N=p\times q\) is easy, while factorizing \(N\) to derive \(p\) and \(q\) is hard.

Factoring the product of two primes: Given \(N=pq\), where \(p,q\) are primes, find \(p\) and \(q\).

Factoring Experiment

Cryptographically, the hardness of the factoring problem is indicated by the factoring experiment \(\mathtt{Factor}_{\mathscr{A}, Genmodulus}(n)\), where GenModulus is an efficient (polynomial-time) algorithm that, on input \(1^n\), outputs \((N,p,q)\) where \(N=pq\), and \(p,q\) are \(n\)-bit primes except with negligible probability.

Adversary \(\mathscr{A}\) with \(N\) needs to efficiently compute the corresponding \(p,q\).

Definition. Factoring is hard relative to GenModulus if for all probabilistic polynomial-time algorithm \(\mathscr{A}\), there exists a negligible \(\varepsilon(n)\) that \(\Pr[\mathtt{Factor}_{\mathscr{A}, GenModulus}(n)=1]\leq \varepsilon(n)\).

The Factoring Assumption is the assumption that there exists a GenModulus relative to which factoring is hard.


Eureka moment!

这里对数学问题困难性的定义又与上面的算法数论 (algorithmic number theory) 或计算机科学领域的定义不同,是以实验来描述的。是不是很有加密学的特色?

这里还要注意一个概念:密码学假设 (assumption),它将某个数学问题的困难性归约到实验中的一个高效 (efficient) 算法的存在性上。对于大数质因数分解问题 (Factoring problem),其困难性被某个高效的 GenModulus 算法的存在保证。


RSA Problem

Given a modulus \(N\) and integer \(e>2\), such that \(\gcd(e, \varphi(N))=1\).

By the Corollary 2 of the Fermat's Little Theorem (see it in P2), we know that exponentiation to the \(e\)-th power mod \(N\) is a permutation on \((\mathbb{Z}/N)^{*}\);

For any \(y\in (\mathbb{Z}/N)^{*}\), let \([y^{1/e} \mod{N}]\) be the unique element which yields \(y\) when raised to the \(e\)-th power mod \(N\), i.e., the \(e\)-th root of \(y\) mod \(N\).

If \(e\cdot d\equiv 1\pmod{\varphi(N)}\), then raising to the \(d\)-th power is the inverse of raising to the \(e\)-th power, that is, for any \(g\in (\mathbb{Z}/N)^{*}\), \((g^e)^d \equiv g\pmod{N}\). Therefore, if \(y=g^e\), \(y^{1/e}=g=y^d\pmod{N}\).

RSA problem is essentially finding \(e\)-th root of \(y\) mod \(N\): Given \(N,e,y\), compute \([y^{1/e}\mod{N}]\). Note that \(e>2\) and \(\gcd(e, \varphi(N))=1\), \(\gcd(y, N)=1\).

RSA Experiment

Similarly, we have the RSA experiment \(\mathtt{RSA}\)-\(\mathtt{inv}_{\mathscr{A}, GenRSA}(n)\), where GenRSA is an efficient algorithm that, on input \(1^n\), outputs a modulus \(N\) that is the product of two \(n\)-bit primes \(p,q\), as well as integers \(e,d>0\) with \(\gcd(e, \varphi(N))=1\) and \(ed\equiv 1\pmod{\varphi(N)}\).

Adversary \(\mathscr{A}\) with \(N,e\) and a uniform \(y\in (\mathbb{Z}/N)^{*}\) needs to efficiently compute the corresponding \(x\).

Definition. RSA problem is hard relative to GenRSA if for all probabilistic polynomial time algorithm \(\mathscr{A}\), there exists a negligible \(\varepsilon(n)\) that \(\Pr[\mathtt{RSAinv}_{\mathscr{A}, GenRSA}(n)=1]\leq \varepsilon(n)\).

The RSA Assumption is the assumption that there exists a GenRSA algorithm relative to which the RSA problem is hard. Note that GenRSA can be constructed from GenModulus.

RSA and Factoring

The hardness of RSA problem could be reduced to the hardness of factoring problem, which is to say, one could efficiently solve RSA problem if he could efficiently solve factoring problem.

Assume there exists an efficient algorithm \(\mathscr{A}\) that could factorize \(N\) into two distinct primes \(p,q\), then we could construct an efficient \(\mathscr{A}'\) that could solve RSA problem.

  • \(\mathscr{A}'\) runs \(\mathscr{A}(N)\) and obtain two distinct odd primes \(p,q\) that \(N=pq\).
  • \(\mathscr{A}'\) simply computes \(\varphi(N)=(p-1)(q-1)\).
  • \(\mathscr{A}'\) computes \(d\equiv e^{-1}\pmod{\varphi(N)}\) with extended Euclidean algorithm.
  • \(\mathscr{A}'\) could compute \(y^{1/e}\) mod \(N\) efficiently by computing \(y^d\) mod \(N\).


Eureka moment!

RSA 问题本质上是求模意义下的 \(e\) 次方根。注意 RSA 问题中的模数 \(N\) 是两个不同的大奇质数 \(p,q\) 之积。

RSA 问题的巧妙之处在于,利用群的 closure 性质与费马小定理可得,模意义下的 \(e\) 次方根与模意义下的 \(d\) 次方幂是等价的;其中 \(\gcd(e, \varphi(N))=1\), \(d\equiv e^{-1} \pmod{\varphi(N)}\). 也就是说,如果敌手得知 \(\varphi(N)\),那么他就能通过计算得出 \(d\),从而轻松解决 RSA 问题。

然而得知 \(\varphi(N)\) 也是困难的;这由 Factoring problem 的困难性保证。对于只有两个质因数 \(p,q\) 的模数 \(N\), \(\varphi(N)=\varphi(pq)=\varphi(p)\varphi(q)=(p-1)(q-1)\)。所以,得到 \(\varphi(N)\) 的前提是成功将 \(N\) 分解为 \(p,q\)

著名的 RSA 公钥加密就应用了这一性质:在该加密中,公钥 \(pk=e\), 私钥 \(sk=d\)。加密 \(Enc_{pk}(m)=m^e\),解密 \(Dec_{sk}(c)=c^d\) (均在 mod \(N\) 意义下)。RSA 问题的困难性保证了任意不持有私钥的敌手无法由密文 \(c=m^e\) 破解消息 \(m\) (即,得到 \(c\) 在模 \(N\) 意义下的 \(e\) 次方根)。


Discrete Logarithm Problem

Fix a cyclic group \(\mathbb{G}\) of order \(q\), and generator \(g\in\mathbb{G}\). We know that \(\{g^0,g^1,...,g^{q-1}\}=\mathbb{G}\).

Equilently, for every \(h\in\mathbb{G}\), there is a unique \(x\in \mathbb{Z}_q\), such that \(g^x=h\). We define \(\log_g{h}\) to be this \(x\). This is the discrete logarithm of \(h\) with respect to \(g\) in the group \(\mathbb{G}\).

Discrete Logarithm problem in \(\mathbb{G}\): Given \(g\in \mathbb{G}\) and uniform \(h\in \mathbb{G}\), compute \(\log_g{h}\).

Discrete-Logarithm Experiment

Similarly, we have the discrete-logarithm experiment \(\mathtt{DLog}_{\mathscr{A}, \mathscr{G}} (n)\) where \(\mathscr{G}\) is an efficient group-generation algorithm that, on input \(1^n\), outputs a cyclic group \(\mathbb{G}\), its order \(q\), and a generator \(g\).

Adversary \(\mathscr{A}\) with \(\mathbb{G},q,g\) and a uniform \(h\in\mathbb{G}\) needs to efficiently compute correponding \(x\).

Definition. RSA problem is hard relative to \(\mathscr{G}\) if for all probabilistic polynomial time algorithm \(\mathscr{A}\), there exists a negligible \(\varepsilon(n)\) that \(\Pr[\mathtt{DLog}_{\mathscr{A}, \mathscr{G}}(n)=1]\leq \varepsilon(n)\).

The Discrete Logarithm Assumption is the assumption that there exists a group-generation algorithm \(\mathscr{G}\) relative to which the Discrete Logarithm problem is hard.


Eureka moment!

离散对数问题本质上是模意义下的对数运算。另外还需要注意一点,无论是 RSA 实验还是离散对数实验,对敌手的挑战 (challenge) 都是均匀随机 (uniform) 的。

其实,RSA 问题 (模意义下的开次方运算) 与离散问题 (模意义下的对数运算) 均可以视为幂运算 \(h=g^x\)逆运算:RSA 问题给出指数 \(x\),要求由 \(h\) 计算底数 \(g=h^{1/x}\);而离散对数问题给出底数 \(g\),要求由 \(h\) 计算指数 \(x=\log_g h\)


Diffie-Hellman Problem

Fix cyclic group \(\mathbb{G}\) of order \(q\), and generator \(g\in\mathbb{G}\).

Given elements \(h_1,h_2\in \mathbb{G}\), define Diiffie-Hellman function \(DH_g(h_1,h_2)=g^{\log_g h_1\cdot\log_g h_2}\). That is, if \(h_1=g^{x_1}\), \(h_2=g^{x_2}\), then \(DH_g(h_1,h_2)=g^{x_1\cdot x_2}\).

Computational Diffie-Hellman (CDH) in \(\mathbb{G}\): Given \(g\in \mathbb{G}\), and uniform \(h_1,h_2\in\mathbb{G}\), compute \(DH_g(h_1,h_2)\).

Decisional Diffie-Hellman (DDH) in \(\mathbb{G}\): Given \(g\in\mathbb{G}\), and uniform \(h_1,h_2\in\mathbb{G}\), distinguish \(DH_g(h_1,h_2)\) from a uniform group element \(h'\in \mathbb{G}\).

CDH and DDH experiments could be constructed similarly so they are not covered here.

It is easy to see that if one could solve Discrete Logarithm problem, he could also solve CDH problem; and if one could solve CDH problem, he could also solve DDH problem.

So DLog \(\to\) CDH \(\to\) DDH, the assumption is getting stronger and stronger, which is to say, if one could NOT solve DDH problem, it would be impossible for him to solve CDH or DLog problem.


Eureka moment!

Diffie-Hellman 问题的困难性是由离散对数问题的困难性保证的。至于为什么要定义这样一个函数,是由于 Diffie-Hellman 问题定义了两个可以暴露的量 \(h_1,h_2\)。当 Alice 与 Bob 之间需要进行多次信息传输时,利用 Diffie-Hellman 问题的困难性可以很好的隐藏秘密信息。

以著名的 Diffie-Hellman Key-exchange Protocol (密钥交换协议) 为例:Alice 选择某个 \(x_1\),并向 Bob 发送 \(h_1=g^{x_1}\);而 Bob 选择某个 \(x_2\),并向 Alice 发送 \(h_2=g^{x_2}\)。这样,Bob 与 Alice 双方都可以计算出密钥 \(k=g^{x_1x_2}=h_1^{x_2}=h_2^{x_1}\),密钥交换成功。而 Diffie-Hellman 问题的困难性保证窃听到 transcript \(h_1,h_2\) 的敌手无法据此求出 \(k=DH_g(h_1,h_2)\).


One-Way Function

One-Way Functions are a critical cryptographic primitive, that is both necessary and sufficient for private-key encryption and MACs. Informally speaking, a function \(f\) is one-way if it is easy to compute but hard to invert.

We formally define "hard to invert" with the inverting experiment \(\mathtt{Invert}_{\mathscr{A}, f}(n)\).

The Inverting Experiment

  • Challenger \(\mathscr{C}\) choose uniform \(x\in\{0,1\}^n\), and compute \(y:=f(x)\).
  • Adversary \(\mathscr{A}\) is given \(1^n\) and \(y\) as inputs. \(\mathscr{A}\) outputs \(x'\).
  • \(\mathscr{A}\) succeeds and the experiment evaluates to \(1\) if and only if \(f(x')=y\).

Note that the preimage \(x'\) does NOT neccessarily have to equal \(x\), as long as \(f(x')=y\).

Definition. A function \(f:\{0,1\}^{*}\to \{0,1\}^{*}\) is one-way if the following two conditions hold:

  • (Easy to compute): There is a polynomial-time algorithm that on input \(x\) outputs \(f(x)\).
  • (Hard to invert): For all PPT algorithm \(\mathscr{A}\) there is a negligible function \(\varepsilon(n)\) such that: \(\Pr[\mathtt{Invert}_{\mathscr{A}. f}(n)=1]\leq \varepsilon(n)\).

One-Way Functions Based on Assumptions

We could construct one-way functions based on the above four assumptions (factoring assumption, RSA assumption, discrete logarithm assumption, Diffie-Hellman assumption).

We illustrate this by using factoring assumption to construct a one-way function.

Given \(1^n\) and GenModulus, define the (deterministic) function \(f_{Gen}\) as follows:

  • Input: A random tage: string \(x\) of length \(n\)
  • Output: Integer \(N\).
  • Compute \((N,p,q):=GenModulus(1^n;x)\) and return \(N\).

Theorem. If the Factoring problem is hard relative to GenModulus, then \(f_{Gen}\) is one-way.

Hard-Core Predicates

For a one-way function \(f\), it does not exist any PPT adversary that could efficiently invert it. However, the partial information of the preimage may be easily revealed.

Consider the function \(f:\{0,1\}^{2n}\to \{0,1\}^{2n}\) and one-way function \(g:\{0,1\}^n\to \{0,1\}^n\). We could define \(f(x,y)=g(x)||y\). Since \(g\) is one-way, it is trivial that \(f\) is also one-way. However, it is easy to know the last \(n\) bits of the preimage are exactly the last \(n\) bits of \(f(x,y)\).

Though \(f\) is one-way, half of the information about the preimage is revealed.

We define hard-core predicate \(hc\) of a one-way function as the bit that could NEVER be revealed. Informally speaking, \(hc(x)\) is a bit that is efficiently computable given \(x\), but hard to compute given \(f(x)\).

Definition. A function \(hc:\{0,1\}^{*}\to\{0,1\}\) is a hard-core predicate of a function \(f\) if:

  • hc can be computed in polynomial time.
  • For every probabilistic polynomial-time algorithm \(\mathscr{A}\) there is a negligible function \(\varepsilon(n)\) such that \(\Pr_{x\leftarrow \{0,1\}^n}[\mathscr{A}(1^n,f(x))=hc(x)]\leq 1/2+\varepsilon(n)\).

A hard-core predicate for the RSA problem is the least significant bit. That is to say, compute \(lsb(x)\) given \(y=[x^e\mod{N}]\) is hard.


Eureka moment!

做了这么长的铺垫,终于来到了单向函数 (one-way function) 的介绍。我觉得单向函数是密码学中最重要的概念之一,它的定义体现了密码学的核心思想—— Easy to compute, hard to invert.

首先来谈谈它的存在性。之前介绍的三个数学难题 (hard problems) 都很符合单向函数的定义;Factoring problem,质数相乘容易,质因数分解难;RSA problem,模意义下幂运算简单,开方运算难;Discrete Logarithm problem,模意义下幂运算简单,对数运算难;因此,由这些难题的猜想 (assumptions) 构造单向函数再自然不过。

最后谈谈它的意义。密码学中,加密方案 (encryption),消息验证码 (MAC),数字签名 (digital signature) 等加密学原语 (cryptographic primitives) 本质上是一个单向函数构建的过程。更具体一点说,是一个 单向陷门函数 (trapdoor one-way function) 的构建。所谓陷门 (trapdoor) 是指,存在某个 \(z\) 使得计算原像 \(x=f^{-1}(y)\) 简单,则该 \(z\) 称为函数的陷门。

一个单向陷门函数有以下性质:(1) 单向性 (2) 存在陷门。这是否和某些原语的构造十分相似?以公钥加密 (或是 CPA 安全的私钥加密) 为例,加密消息 \(m\) 得到密文 \(c\) 很容易:\(Enc_{pk}(m)\to c\) (公钥 \(pk\) 公开);而由密文 \(c\) 解密得到原文 \(m\) 却很困难。此为单向性 (one-way)。而倘若知道私钥 \(sk\),由密文 \(c\) 解密得到原文 \(m\) 的过程就非常简单:\(Dec_{sk}(c)\to m\)。因此,私钥 \(sk\) 是加密函数 \(Enc_{pk}\)陷门 (trapdoor)


Reference

  This article is a self-administered course note.

  References in the article are from corresponding course materials if not specified.

Course info:

Code: COMP3357, Lecturer: Dr. Ravi. Ramanathan, TA: Zhang Chengru

Course textbook:

Introduction to Modern Cryptography (2nd edition), Katz, J. & Lindell, Y., (2015), CRC Press, view

-----------------------------------そして、次の曲が始まるのです。-----------------------------------