Implementing Regular Expression Matching
Given two strings S and P where S consists of only lowercase English alphabets while P consists of lowercase English alphabets as well as special characters ‘.’ and ‘*’, the task is to implement a function to test regular expression such that:
'.'
Matches any single character.'*'
Matches zero or more of the preceding element.
Note: For each appearance of the character ‘
*'
, there will be a previous valid character to match.
Examples:
Input:
s = "aaa", p = "a"
Output:
false
Explanation:
"a" does not match the entire string "aaa".
Input:
s = "abb", p = "a.*"
Output:
true
Explanation:
replace
.
with b then p becomes ab* now replace * with one preceeding character hence p becomes abb.
Approach: To solve the problem follow the below observations:
- Suppose: s = “aa” and p = “aa”, since all the characters in both s and p are the same, it’s a match.
- Suppose: s = “aabb” and p = “aab*”, We know that substrings bb and b* match because * can be replaced by one b. Since we already know that the remaining substrings “aa” and “aa” are matched, the whole strings are also a match.
What can we infer from this? Right, if we have a solution of part of a problem, we can use that partial result and can go forward. Also, we can use the already calculated result without calculating it again.
Does this ring a bell ????? Yes, this problem satisfies the following two properties –
- Optimal Substructure — Any problem has optimal substructure property if its overall optimal solution can be constructed from the optimal solutions of its subproblems.
- Overlapping Subproblems — Any problem has overlapping sub-problems if finding its solution involves solving the same subproblem multiple times.
It is now evident that we can use Dynamic Programming to solve this problem. Below are the steps —
- Create a boolean 2D
dp
array with sizes asboolean[][] dp = new boolean[s.length() + 1][p.length() + 1]
. We are adding extra 1 to incorporate the case in case either or both of the strings are empty. - If both strings are empty, then it’s a match, thus,
dp[0][0] = true
. - Let’s take an example
s = "aab"
andp = "c*a*b"
and create a DP table.
c |
* |
a |
* |
b |
|||
---|---|---|---|---|---|---|---|
0 |
1 |
2 |
3 |
4 |
5 |
||
0 |
TRUE |
FALSE |
TRUE |
FALSE |
TRUE |
FALSE |
|
a |
1 |
FALSE |
FALSE |
FALSE |
TRUE |
TRUE |
FALSE |
a |
2 |
FALSE |
FALSE |
FALSE |
FALSE |
TRUE |
FALSE |
b |
3 |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
TRUE |
- First column — it means
p
is empty and it will match tos
only ifs
is also empty which we have stored indp[0][0]
. Thus, remaining values of dp[0][i] will befalse
. - First row — this is not so easy. It means which
p
matches emptys
. The answer is either an empty pattern or a pattern that represents an empty string such as"a*"
,"x*y*"
,"l*m*n*"
and so on. In the above example, ifs = ""
andp = "c*"
, then due to*
,c
can be replaced by 0c
s which gives us an empty string. Hence,dp[0][2] = true
. - For non-empty strings, let’s say that
s[i - 1] == p[j - 1]
this means the (i – 1)th and (j – 1)th characters are same. This means, we have to check if the remaining strings are a match or not. If they are a match, then the current substrings will be a match, otherwise they won’t be a match i.e.,dp[i][j] = dp[i - 1][j - 1]
. We’re taking (i – 1)th and (j – 1)th characters to offset empty strings as we’re assuming our strings start from index 1. - If
p[j - 1] == "."
, then it means any single character can be matched. Therefore, here also, we will have to check if the remaining string is a match or not. Thus,dp[i][j] = dp[i - 1][j - 1]
. - If
p[j - 1] == "*"
, then it means either it’s represents an empty string (0 characters), thusdp[i][j] = dp[i][j - 2]
ors[i - 1] == p[j - 2] || p[j - 2] == "."
, then current character of string equals the char preceding ‘*'
in pattern so the result isdp[i-1][j]
.
Below is the implementation of above approach:
C++
#include <iostream> #include <vector> using namespace std; bool isMatch(string s, string p) { int rows = s.length(); int columns = p.length(); // Base conditions if (rows == 0 && columns == 0) { return true ; } if (columns == 0) { return false ; } // DP array vector<vector< bool >> dp(rows + 1, vector< bool >(columns + 1, false )); // Empty string and empty pattern are a match dp[0][0] = true ; // Deals with patterns with * for ( int i = 2; i <= columns; i++) { if (p[i - 1] == '*' ) { dp[0][i] = dp[0][i - 2]; } } // For remaining characters for ( int i = 1; i <= rows; i++) { for ( int j = 1; j <= columns; j++) { if (s[i - 1] == p[j - 1] || p[j - 1] == '.' ) { dp[i][j] = dp[i - 1][j - 1]; } else if (j > 1 && p[j - 1] == '*' ) { dp[i][j] = dp[i][j - 2]; if (p[j - 2] == '.' || p[j - 2] == s[i - 1]) { dp[i][j] = dp[i][j] || dp[i - 1][j]; } } } } return dp[rows][columns]; } int main() { cout << boolalpha << isMatch( "aab" , "a.*" ) << endl; return 0; } |
Java
// Java Code for above approach import java.io.*; import java.util.Scanner; public class RegularExpressionMatching { public static boolean isMatch(String s, String p) { int rows = s.length(); int columns = p.length(); /// Base conditions if (rows == 0 && columns == 0 ) { return true ; } if (columns == 0 ) { return false ; } // DP array boolean [][] dp = new boolean [rows + 1 ][columns + 1 ]; // Empty string and empty pattern // are a match dp[ 0 ][ 0 ] = true ; // Deals with patterns with * for ( int i = 2 ; i < columns + 1 ; i++) { if (p.charAt(i - 1 ) == '*' ) { dp[ 0 ][i] = dp[ 0 ][i - 2 ]; } } // For remaining characters for ( int i = 1 ; i < rows + 1 ; i++) { for ( int j = 1 ; j < columns + 1 ; j++) { if (s.charAt(i - 1 ) == p.charAt(j - 1 ) || p.charAt(j - 1 ) == '.' ) { dp[i][j] = dp[i - 1 ][j - 1 ]; } else if (j > 1 && p.charAt(j - 1 ) == '*' ) { dp[i][j] = dp[i][j - 2 ]; if (p.charAt(j - 2 ) == '.' || p.charAt(j - 2 ) == s.charAt(i - 1 )) { dp[i][j] = dp[i][j] | dp[i - 1 ][j]; } } } } return dp[rows][columns]; } public static void main(String[] args) { System.out.println(isMatch("aab", "a.*")); } } |
Python3
def isMatch(s: str , p: str ) - > bool : rows, columns = ( len (s), len (p)) # Base conditions if rows = = 0 and columns = = 0 : return True if columns = = 0 : return False # DP array dp = [[ False for j in range (columns + 1 )] for i in range (rows + 1 )] # Since empty string and empty pattern are a match dp[ 0 ][ 0 ] = True # Deals with patterns containing * for i in range ( 2 , columns + 1 ): if p[i - 1 ] = = '*' : dp[ 0 ][i] = dp[ 0 ][i - 2 ] # For remaining characters for i in range ( 1 , rows + 1 ): for j in range ( 1 , columns + 1 ): if s[i - 1 ] = = p[j - 1 ] or p[j - 1 ] = = '.' : dp[i][j] = dp[i - 1 ][j - 1 ] elif j > 1 and p[j - 1 ] = = '*' : dp[i][j] = dp[i][j - 2 ] if p[j - 2 ] = = '.' or p[j - 2 ] = = s[i - 1 ]: dp[i][j] = dp[i][j] or dp[i - 1 ][j] return dp[rows][columns] if __name__ = = '__main__' : print (isMatch("aa", "a")) print (isMatch("aa", "a * ")) print (isMatch("ab", ".")) print (isMatch("aab", "c * a * b")) print (isMatch("mississippi", "mis * is * p * .")) print (isMatch("", ". * ")) |
C#
using System; public class RegularExpressionMatching { public static bool IsMatch( string s, string p) { int rows = s.Length; int columns = p.Length; // Base conditions if (rows == 0 && columns == 0) { return true ; } if (columns == 0) { return false ; } // DP array bool [][] dp = new bool [rows + 1][]; for ( int i = 0; i <= rows; i++) { dp[i] = new bool [columns + 1]; } // Empty string and empty pattern are a match dp[0][0] = true ; // Deals with patterns with * for ( int i = 2; i <= columns; i++) { if (p[i - 1] == '*' ) { dp[0][i] = dp[0][i - 2]; } } // For remaining characters for ( int i = 1; i <= rows; i++) { for ( int j = 1; j <= columns; j++) { if (s[i - 1] == p[j - 1] || p[j - 1] == '.' ) { dp[i][j] = dp[i - 1][j - 1]; } else if (j > 1 && p[j - 1] == '*' ) { dp[i][j] = dp[i][j - 2]; if (p[j - 2] == '.' || p[j - 2] == s[i - 1]) { dp[i][j] = dp[i][j] || dp[i - 1][j]; } } } } return dp[rows][columns]; } public static void Main( string [] args) { Console.WriteLine(IsMatch( "aab" , "a.*" )); } } |
Javascript
var isMatch = function (s, p) { const rows = s.length; const columns = p.length; /// Base conditions if (rows == 0 && columns == 0) { return true ; } if (columns == 0) { return false ; } // DP array const dp = Array.from({ length: s.length + 1 }, () => [ false ]); // Empty string and empty pattern are a match dp[0][0] = true ; // Deals with patterns with * for (let i = 1; i < columns + 1; i++) { if (p[i - 1] === '*' ) { dp[0][i] = dp[0][i - 2]; } else { dp[0][i] = false ; }; } // For remaining characters for (let i = 1; i < rows + 1; i++) { for (let j = 1; j < columns + 1; j++) { if (p[j - 1] === '*' ) { if (p[j - 2] === s[i - 1] || p[j - 2] === '.' ) { dp[i][j] = dp[i][j - 2] || dp[i - 1][j]; } else { dp[i][j] = dp[i][j - 2]; } } else if (p[j - 1] === s[i - 1] || p[j - 1] === '.' ) { dp[i][j] = dp[i - 1][j - 1]; } else { dp[i][j] = false ; } } } return dp[rows][columns]; }; console.log(isMatch("aa", "a")); console.log(isMatch("aa", "a*")); console.log(isMatch("an", ".")); console.log(isMatch("aab", "c*a*b")); console.log(isMatch("mississippi", "mis*is*p*.")); console.log(isMatch("", ".*")); console.log(isMatch("ab", ".*c")); |
true
Time Complexity: O(m×n)
Auxiliary Space: O(m×n)
Contact Us