10. Regular Expression Matching
问题
问题是说,我们有一个东西正则表达式(不知道什么是正则表达式不重要),他有一个pattern,可以匹配字符串。匹配的规则就是
- 小写字母代表它本身
- . 代表任意字符
- * 代表他前面的字符可以匹配0次或者1次或者多次
举例来说就是我们有一个pattern a*
, 我们可以匹配a
, aa
,aaaa
,或者是空字符串。
给出一个字符串s,和一个pattern p,判断s是否可以匹配p。
解析
我会尽力还原每一步是如何想到的,将他们联系起来。
对于aaa是否匹配a*匹配问题,我们的思路一般是(可能我们有时候都不会注意到这个路径):
-
首先看s中的第一个a是否匹配p中的第一个,不匹配则停止,匹配则继续看下一个字符;
-
s的第二个字符a是否匹配p的第二个字符,对于p的第二个字符出现了*,我们需要回看p的第一个字符发现是a,那么基于前面的结果(s中的第一个a匹配p中的第一个a),而且当前第二步中,p中*的前一个字符是a,s的当前字符是a,也可以匹配;所以继续匹配
-
s中的第三个a同第二部的匹配思路
-
最终结束匹配。
不知看到这里给予前面的结果一步步构建最终结果,是否可以联想到动态规划的最优子结构。如果可以想到,我们就按照动态规划的思路去构建代码了:
-
首先是看状态定义,根据结果长度为len(s)的字符是否匹配长度为len(p),我们定义
dp[m][n]
。 -
然后转移公式:
- 这里一开始我直接就按照最难的包含.和*的开始推导。根据题解的思路提示,可以先去思考没有星号的情况,就是只包含小写字母和点号。这个时候的话其实匹配的规则就很明朗,就是对应的字符匹配上,或者对应位置是一个点,那么也算匹配,就继续往下匹配即可,反之则结束。
- 然后继续考虑加上星号的情况:
- 如果遇到星号的话,存在几种情况:
- s中的字符和星号之前的字符不相等
- s中的字符和星号之前的字符相等
- 星号之前的字符是.(也可以视为与当前s中的字符和星号之前的点相等)
- 对于以上几种情况
- 第一种,相当于当前的某个字符串c*匹配不上s中当前字符,此时我们
dp[i][j]
的结果需要基于dp[i][j-2]
的结果(此处可以理解为j中这个字符以及星号没用了,匹配了0个字符) - 第二种以及第三种,可能匹配0次,或者一次,也可能匹配多次。匹配0次的话
dp[i][j] = dp[i - 1][j - 2]
, 匹配一次的话dp[i][j] = dp[i - 1][j - 2]
,相当于说p中当前的字符匹配上了s中的*和他前一个字符(总共两个字符),所以是j - 2同第一种。匹配多次的话,我们可以考虑当前的结果是基于上一次的结果(aaa匹配a*基于aa匹配a*的基础),dp[i][j] = dp[i - 1][j]
,
- 第一种,相当于当前的某个字符串c*匹配不上s中当前字符,此时我们
- 如果遇到星号的话,存在几种情况:
-
然后初始化。
-
j和i都为0的时候,两个空字符串一定匹配,所以
dp[0][0] = true;
; -
当p的长度为0时候肯定除了上述的情况(空字符串),其他的都匹配不上,为默认值false;
-
当s的长度为0的时候,能够匹配的pattern应该长得像
a*a*b*c*...
,所以当s长度为0,当前i位置, i - 1是星号,i的结果取决于i - 2的结果:for (int i = 2; i < n + 1; i++) { if (p.charAt(j - 1) == '*' && dp[0][j - 2]) { dp[0][i] = true; } }
可以确定一些偶数位。
-
代码
class Solution {
public boolean isMatch(String s, String p) {
if (s == null || p == null) return false;
int m = s.length();
int n = p.length();
boolean[][] dp = new boolean[m + 1][n + 1];
dp[0][0] = true;
// initialization
for (int i = 2; i < n + 1; i += 2) {
if (p.charAt(i - 1) == '*' && dp[0][i - 2]) {
dp[0][i] = true;
}
}
for (int i = 1; i < m + 1; i++) {
for (int j = 1; j < n + 1; j++) {
if (s.charAt(i - 1) == p.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1];
} else if (p.charAt(j - 1) == '.') {
dp[i][j] = dp[i - 1][j - 1];
} else if (p.charAt(j - 1) == '*') {
if (p.charAt(j - 2) != '.' && s.charAt(i - 1) != p.charAt(j - 2)) {
dp[i][j] = dp[i][j - 2];
} else {
dp[i][j] = dp[i][j - 2] || dp[i - 1][j] || dp[i - 1][j - 2];
}
}
}
}
return dp[m][n];
}
}
- Regular Expression Matching
Problem
The problem is that we have a thing regular expression (it doesn't matter if you don't know what a regular expression is), and he has a pattern that matches strings. The rules for matching are
- lowercase letters for itself
- . represents any character
- * means that the character in front of him can match 0 times or 1 time or more
For example, if we have a pattern a*
, we can match a
, aa
, aaaa
, or the empty string.
Given a string s, and a pattern p, determine if s can match p.
Parsing
I'll try to restore how each step came to mind and connect them.
For the question of whether aaa matches a * match, the general idea is (and we probably don't even notice the path sometimes).
-
First see if the first a in s matches the first in p. Stop if it doesn't match, and move on to the next character;
-
whether the second character a of s matches the second character of p. For the second character of p appears *, we need to look back at the first character of p found to be a, then ** based on the previous result ** (the first a in s matches the first a in p), and the current second step, the previous character of * in p is a, the current character of s is a, can also match; so continue to match
-
the third a in s is the same as the second part of the matching idea
-
finally end matching.
I don't know if it is possible to associate the optimal sub-structure of dynamic programming by seeing the step-by-step construction of the final result given here to the previous result. If it can be thought of, we follow the idea of dynamic programming to build the code: *
-
First is to see state definition, according to the result length of len(s) whether the character matches the length of len(p), we define
dp[m][n]
. -
Then the transfer formula.
- Here at the beginning I will directly follow the most difficult ones containing . and *'s to start the derivation. According to the idea of the problem solution hint, you can ** first go to think about the case without asterisks **, that is, containing only lowercase letters and dotted signs. This time, the rules of matching are very clear, that is, the corresponding characters match, or the corresponding position is a dot, then it is also considered a match, and continue to match down, and vice versa is the end.
- Then consider the case of asterisks.
- If an asterisk is encountered, there are several cases.
- The character in s and the character before the asterisk are not equal
- The character in s and the character before the asterisk are equal
- The character before the asterisk is . (can also be considered equal to the character in the current s and the dot before the asterisk)
- For the above cases
- The first one, which is equivalent to some current string c * does not match the current character in s. At this time, our
dp[i][j]
result needs to be based on the result ofdp[i][j-2]
(here it can be understood that this character in j and the asterisk are useless and match 0 characters) - The second and the third one may match 0 times, or once, or many times. If we match 0 times,
dp[i][j] = dp[i - 1][j - 2]
, if we match once,dp[i][j] = dp[i - 1][j - 2]
, which means that the current character in p matches the * in s and his previous character (two characters in total), so it is j - 2 with the first one. Matching multiple times, we can consider that the current result is based on the previous result (aaa match a* based on aa match a*),dp[i][j] = dp[i - 1][j]
,
- The first one, which is equivalent to some current string c * does not match the current character in s. At this time, our
- If an asterisk is encountered, there are several cases.
-
then initialize.
-
When both j and i are 0, the two empty strings must match, so
dp[0][0] = true;
; -
when the length of p is 0 must not match except in the above case (empty string), for the default value of false.
-
When the length of s is 0, the pattern that can be matched should look like
a*a*b*c*...
, so when the length of s is 0, the current position i, i - 1 is an asterisk and the result of i depends on the result of i - 2.for (int i = 2; i < n + 1; i++) { if (p.charAt(j - 1) == '*' && dp[0][j - 2]) { dp[0][i] = true; } }
Some even digits can be determined.
-
Code
class Solution {
public boolean isMatch(String s, String p) {
if (s == null || p == null) return false;
int m = s.length();
int n = p.length();
boolean[][] dp = new boolean[m + 1][n + 1];
dp[0][0] = true;
// initialization
for (int i = 2; i < n + 1; i += 2) {
if (p.charAt(i - 1) == '*' && dp[0][i - 2]) {
dp[0][i] = true;
}
}
for (int i = 1; i < m + 1; i++) {
for (int j = 1; j < n + 1; j++) {
if (s.charAt(i - 1) == p.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1];
} else if (p.charAt(j - 1) == '.') {
dp[i][j] = dp[i - 1][j - 1];
} else if (p.charAt(j - 1) == '*') {
if (p.charAt(j - 2) ! = '.' && s.charAt(i - 1) ! = p.charAt(j - 2)) {
dp[i][j] = dp[i][j - 2];
} else {
dp[i][j] = dp[i][j - 2] || dp[i - 1][j] || dp[i - 1][j - 2];
}
}
}
}
return dp[m][n];
}
}