This was an ungraded practice problem for our CS6515 class. There are existing solutions out there, but I found them a bit unintuitive so this is my attempt at explaining the problem a little differently.

## Problem Summary

You are given a string *s* containing *n* characters *s*[1…*n*]. The string is corrupted such that all punctuation has been removed. For example, the string “you are a bold one.” would appear as “youareaboldone”. You are also given a dictionary method, *dict(w)*, that is able to determine whether or not a substring *w* of *s* is a valid word. In other words, *dict(“bold”)* would return *true*, while *dict(“aboldo”)* would return *false*. You are tasked with doing the following:

- Give a dynamic programming algorithm that can tell whether or not a string
*s*contains a sentence of valid words. This algorithm should take at most*O(n*time.^{2}) - If the string is valid, output the sequence of words.

*Refer to page 178 of the textbook for the full, formalized problem description.*

## Solution

I’m going to go about solving this problem similarly to my knapsack problem explanation and do the following:

- State the subproblems
- Define the recurrence
- Describe the memoization table structure
- Whip up some pseudocode
- Perform some quick Big O analysis of the algorithm

### Subproblem

Dynamic programming is all about identifying smaller subproblems, solving them, and using them to build up to the final solution. For this problem we can build off of **smaller prefixes** of *s* that are of length *i* where **0 ≤ i ≤ n**.

Let’s define our subproblem as:

**V( i) = whether or not (true/false) the string s contains only valid words that exist in the dictionary in the string s_{1}…s_{i}**

### Recurrence

The substring of length 0 (empty string) is considered a valid word so our base case starts with the sentence being valid.

**Base Case:****V(0) = true**

Since we need to check the status of substrings within ** s_{1}…s_{i}** in our dictionary we will need to introduce another variable

*j*such that

**0 ≤**. Our recurrence will look as follows:

*i*≤*i***Recurrence:****V(***i*) =*any*( for(*j*in*1*…*i*) {*dict(s[j…i])*AND V(*j*- 1) } )

So what is this saying? Basically for all characters in *s* from 1 to *i* we will loop over the possible substrings using *j*. If the substring of *s* from *j* to *i* is a valid word in the dictionary **and** we recorded the previous word was valid by checking the ending of the previous substring via V(*j* - 1).

### Memoization Table Structure

Given that our subproblem definition takes just a single parameter, *i*, we should be able to get away with a 1-dimensional memoization table of length to contain partial values for strings of length 0 to *n*.

### Pseudocode Implementation

I decided to implement this one in Python instead of just pseudocode. You should be able to run the following on Repl.It using their Python 3 interpreter.

**Part 1 - Is the Sentence Valid?**

```
# Base case
t[0] = True
# Recurrence
for i in range(1, n):
t[i] = False
for j in range(1, i+1):
# i and j correspond to our memoization array t
# which contains an extra element to represent the
# empty substring. We must adjust them to work with
# the s character array.
si = i - 1
sj = j - 1
substr = "".join(s[sj:si+1])
if d[substr] and t[j-1]:
t[i] = True
next_word_idx[j-1] = i
print("The string s contains a valid sentence:")
print(t[n-1])
```

**Part 2 - Output the Words**

```
# Extract the words back out
prev = 0
words = []
for i in next_word_idx:
if i != 0:
word = s[prev:i]
words.append("".join(word))
prev = i
print("")
print("The words in the sentence are:")
print(words)
```

### Runnable Code

You can see the code in action in the following Python repl:

### Big O Analysis

As you can see in the Python pseudocode above, most of the work happens in Part 1. Here we have two nested loops, the outer for *i* from 1 to *n* and the inner for *j* from 1 to *i*. Since *i* is at most *n*, this part ends up being *O(n ^{2})*. In Part 2 when we extract the words back out, this is a simple loop from 0 to

*n*which ends up being

*O(n)*. Part 1 dominates and the

**overall complexity is**.

*O(n*^{2})