139. Word Break

yPhantom 2019年10月17日 37次浏览

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Note:

  • The same word in the dictionary may be reused multiple times in the segmentation.
  • You may assume the dictionary does not contain duplicate words.

Example 1:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
             Note that you are allowed to reuse a dictionary word.

Example 3:

Input: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
Output: false

Solution

work break重复刷了不知道多少遍了emoji people:persevere

DFS

题意让判断字符串s能否被一个词典list给划分。

一开始想的是用DFS,套上之前的套路定义一个dfs函数,然后入参包含start和end。但是一直写不出来...

后参考别的解答写了个最基本的dfs:

class Solution {
    public boolean wordBreak(String s, List<String> wordDict) {
        // 在每轮递归中都判断当前字符串s是否已经在dict中了
        if (wordDict.contains(s)) {
            return true;
        }
        for(int i = 1; i < s.length(); i++) {
            // 判断s[0:i]和s[i:]是不是可以被wordDict划分
            if (wordDict.contains(s.substring(0, i)) && wordBreak(s.substring(i), wordDict)) {
                return true;
            }
        }
        return false;
                
    }
}

结果在如下测试用例的时候直接超时,估计是栈太深了。

"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
["a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"]

因此在讨论有剪枝的解决方案,例如使用一个HashSet来保存:

class Solution {
    private Set<String> visited = new HashSet<>();

    public boolean wordBreak(String s, List<String> wordDict) {
        if (visited.contains(s)) {
            return true;
        }
        if (wordDict.contains(s)) {
            return true;
        }
        for(int i = 1; i < s.length(); i++) {
            if (wordDict.contains(s.substring(0, i)) && wordBreak(s.substring(i), wordDict)) {
                visited.add(s);
                return true;
            }
        }
        return false;
                
    }
}

这个还是TLE超时了,但是使用HashMap却Accepted了

class Solution {
    private Map<String, Boolean> visited = new HashMap<>();

    public boolean wordBreak(String s, List<String> wordDict) {
        if (visited.containsKey(s)) {
            return visited.get(s);
        }
        if (wordDict.contains(s)) {
            return true;
        }
        for(int i = 1; i < s.length(); i++) {
            if (wordDict.contains(s.substring(0, i)) && wordBreak(s.substring(i), wordDict)) {
                visited.put(s, true);
                return true;
            }
        }
        visited.put(s, false);
        return false;
                
    }
}

两个contains方法都是利用hash值进行比较的,主要的时间区别应该是在最后的add/put false的地方。

首先,对于一开始的判断是否访问过相同字符串主要是因为hashmap和hashset的contains方法比wordDict这个list的contains方法要快很多,其次最后面

visited.put(s, false);

我测试了不加这句话,直接就超时了,那么这句话主要是针对什么场景呢?在上面那个超时场景中,实际效果就是

visited.put("b", false);

起到的剪枝效果硬是从TLE到faster than 78.73%....Leetcode的运行时间真迷

DP

看到字符串匹配就想到dp。

假设dp[i]表示s[0:i)可以进行分词,那么dp[i] = dp[j] && (s[j, i) in dict ). 0<= j <i

class Solution {

    public boolean wordBreak(String s, List<String> wordDict) {
        boolean[] dp = new boolean[s.length() + 1];
        dp[0] = true;
        
        for (int i = 1; i <= s.length(); i++) {
            for (int j = 0; j < i; j++) {
                if (dp[j] && wordDict.contains(s.substring(j, i))) {
                    dp[i] = true;
                }
            }
        }
        return dp[s.length()];
    }
}

Python3版DP

class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        dp = [0 for i in range(len(s) + 1)]
        dp[0] = True
        
        for i in range(1, len(dp)):
            for j in range(0, i):
                if dp[j] and (s[j:i] in wordDict):
                    dp[i] = True
        return dp[len(s)]

DIsscuss里面有个简化版本:

def wordBreak(self, s, words):
    ok = [True]
    for i in range(1, len(s)+1):
        ok += any(ok[j] and s[j:i] in words for j in range(i)),
    return ok[-1]

简化了dp数组的生成,简化了if语法,简化了for j为生成器,最后简化了取dp的最后一个元素......

相关题目

Word Break II