10. Regular Expression Matching

问题

问题是说，我们有一个东西正则表达式（不知道什么是正则表达式不重要），他有一个pattern，可以匹配字符串。匹配的规则就是

小写字母代表它本身
. 代表任意字符
* 代表他前面的字符可以匹配0次或者1次或者多次

举例来说就是我们有一个pattern a*, 我们可以匹配a, aa,aaaa,或者是空字符串。

给出一个字符串s，和一个pattern p，判断s是否可以匹配p。

解析

我会尽力还原每一步是如何想到的，将他们联系起来。

对于aaa是否匹配a*匹配问题，我们的思路一般是（可能我们有时候都不会注意到这个路径）：

首先看s中的第一个a是否匹配p中的第一个，不匹配则停止，匹配则继续看下一个字符;
s的第二个字符a是否匹配p的第二个字符，对于p的第二个字符出现了*，我们需要回看p的第一个字符发现是a，那么基于前面的结果（s中的第一个a匹配p中的第一个a），而且当前第二步中，p中*的前一个字符是a，s的当前字符是a，也可以匹配；所以继续匹配
s中的第三个a同第二部的匹配思路
最终结束匹配。

不知看到这里给予前面的结果一步步构建最终结果，是否可以联想到动态规划的最优子结构。如果可以想到，我们就按照动态规划的思路去构建代码了：

首先是看状态定义，根据结果长度为len(s)的字符是否匹配长度为len(p)，我们定义dp[m][n]。
然后转移公式：
- 这里一开始我直接就按照最难的包含.和*的开始推导。根据题解的思路提示，可以先去思考没有星号的情况，就是只包含小写字母和点号。这个时候的话其实匹配的规则就很明朗，就是对应的字符匹配上，或者对应位置是一个点，那么也算匹配，就继续往下匹配即可，反之则结束。
- 然后继续考虑加上星号的情况：
  - 如果遇到星号的话，存在几种情况：
    - s中的字符和星号之前的字符不相等
    - s中的字符和星号之前的字符相等
    - 星号之前的字符是.(也可以视为与当前s中的字符和星号之前的点相等)
  - 对于以上几种情况
    - 第一种，相当于当前的某个字符串c*匹配不上s中当前字符，此时我们dp[i][j] 的结果需要基于dp[i][j-2]的结果（此处可以理解为j中这个字符以及星号没用了，匹配了0个字符）
    - 第二种以及第三种，可能匹配0次，或者一次，也可能匹配多次。匹配0次的话dp[i][j] = dp[i - 1][j - 2], 匹配一次的话dp[i][j] = dp[i - 1][j - 2],相当于说p中当前的字符匹配上了s中的*和他前一个字符（总共两个字符），所以是j - 2同第一种。匹配多次的话，我们可以考虑当前的结果是基于上一次的结果（aaa匹配a*基于aa匹配a*的基础），dp[i][j] = dp[i - 1][j],
然后初始化。
- j和i都为0的时候，两个空字符串一定匹配，所以dp[0][0] = true;;
- 当p的长度为0时候肯定除了上述的情况（空字符串），其他的都匹配不上，为默认值false；
- 当s的长度为0的时候，能够匹配的pattern应该长得像a*a*b*c*...，所以当s长度为0，当前i位置, i - 1是星号，i的结果取决于i - 2的结果：
```
for (int i = 2; i < n + 1; i++) {
    if (p.charAt(j - 1) == '*' && dp[0][j - 2]) {
        dp[0][i] = true;
    }
}
```
  可以确定一些偶数位。

代码

class Solution {
    public boolean isMatch(String s, String p) {
        if (s == null || p == null) return false;
        int m = s.length();
        int n = p.length();
        
        boolean[][] dp = new boolean[m + 1][n + 1];
        
        dp[0][0] = true;
        
        // initialization
        for (int i = 2; i < n + 1; i += 2) {
            if (p.charAt(i - 1) == '*' && dp[0][i - 2]) {
                dp[0][i] = true;
            }
        }
        
        for (int i = 1; i < m + 1; i++) {
            for (int j = 1; j < n + 1; j++) {
                if (s.charAt(i - 1) == p.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else if (p.charAt(j - 1) == '.') {
                    dp[i][j] = dp[i - 1][j - 1];
                } else if (p.charAt(j - 1) == '*') {
                    if (p.charAt(j - 2) != '.' && s.charAt(i - 1) != p.charAt(j - 2)) {
                        dp[i][j] = dp[i][j - 2];
                    } else {
                        dp[i][j] = dp[i][j - 2] || dp[i - 1][j] || dp[i - 1][j - 2];
                    }
                }
            }
        }
        return dp[m][n];
    }
}

Regular Expression Matching

Problem

The problem is that we have a thing regular expression (it doesn't matter if you don't know what a regular expression is), and he has a pattern that matches strings. The rules for matching are

lowercase letters for itself
. represents any character
* means that the character in front of him can match 0 times or 1 time or more

For example, if we have a pattern a*, we can match a, aa, aaaa, or the empty string.

Given a string s, and a pattern p, determine if s can match p.

Parsing

I'll try to restore how each step came to mind and connect them.

For the question of whether aaa matches a * match, the general idea is (and we probably don't even notice the path sometimes).

First see if the first a in s matches the first in p. Stop if it doesn't match, and move on to the next character;
whether the second character a of s matches the second character of p. For the second character of p appears *, we need to look back at the first character of p found to be a, then ** based on the previous result ** (the first a in s matches the first a in p), and the current second step, the previous character of * in p is a, the current character of s is a, can also match; so continue to match
the third a in s is the same as the second part of the matching idea
finally end matching.

I don't know if it is possible to associate the optimal sub-structure of dynamic programming by seeing the step-by-step construction of the final result given here to the previous result. If it can be thought of, we follow the idea of dynamic programming to build the code: *

First is to see state definition, according to the result length of len(s) whether the character matches the length of len(p), we define dp[m][n].
Then the transfer formula.
- Here at the beginning I will directly follow the most difficult ones containing . and *'s to start the derivation. According to the idea of the problem solution hint, you can ** first go to think about the case without asterisks **, that is, containing only lowercase letters and dotted signs. This time, the rules of matching are very clear, that is, the corresponding characters match, or the corresponding position is a dot, then it is also considered a match, and continue to match down, and vice versa is the end.
- Then consider the case of asterisks.
  - If an asterisk is encountered, there are several cases.
    - The character in s and the character before the asterisk are not equal
    - The character in s and the character before the asterisk are equal
    - The character before the asterisk is . (can also be considered equal to the character in the current s and the dot before the asterisk)
  - For the above cases
    - The first one, which is equivalent to some current string c * does not match the current character in s. At this time, our dp[i][j] result needs to be based on the result of dp[i][j-2] (here it can be understood that this character in j and the asterisk are useless and match 0 characters)
    - The second and the third one may match 0 times, or once, or many times. If we match 0 times, dp[i][j] = dp[i - 1][j - 2], if we match once, dp[i][j] = dp[i - 1][j - 2], which means that the current character in p matches the * in s and his previous character (two characters in total), so it is j - 2 with the first one. Matching multiple times, we can consider that the current result is based on the previous result (aaa match a* based on aa match a*), dp[i][j] = dp[i - 1][j],
then initialize.
- When both j and i are 0, the two empty strings must match, so dp[0][0] = true;;
- when the length of p is 0 must not match except in the above case (empty string), for the default value of false.
- When the length of s is 0, the pattern that can be matched should look like a*a*b*c*... , so when the length of s is 0, the current position i, i - 1 is an asterisk and the result of i depends on the result of i - 2.
```
for (int i = 2; i < n + 1; i++) {
    if (p.charAt(j - 1) == '*' && dp[0][j - 2]) {
        dp[0][i] = true;
    }
}
```
  Some even digits can be determined.

Code

class Solution {
    public boolean isMatch(String s, String p) {
        if (s == null || p == null) return false;
        int m = s.length();
        int n = p.length();
        
        boolean[][] dp = new boolean[m + 1][n + 1];
        
        dp[0][0] = true;
        
        // initialization
        for (int i = 2; i < n + 1; i += 2) {
            if (p.charAt(i - 1) == '*' && dp[0][i - 2]) {
                dp[0][i] = true;
            }
        }
        
        for (int i = 1; i < m + 1; i++) {
            for (int j = 1; j < n + 1; j++) {
                if (s.charAt(i - 1) == p.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else if (p.charAt(j - 1) == '.') {
                    dp[i][j] = dp[i - 1][j - 1];
                } else if (p.charAt(j - 1) == '*') {
                    if (p.charAt(j - 2) ! = '.' && s.charAt(i - 1) ! = p.charAt(j - 2)) {
                        dp[i][j] = dp[i][j - 2];
                    } else {
                        dp[i][j] = dp[i][j - 2] || dp[i - 1][j] || dp[i - 1][j - 2];
                    }
                }
            }
        }
        return dp[m][n];
    }
}

10K's