Flipping Out
This problem gives us a series of coinflips. The series is generated by several fixed generators which are also serieses of coin flips. When generating the series, if the number of total matches of each generator string with the previously generated output is even, the next output is tails, else heads. The problem supposes there is one generator series missing, and asks how many possibilities are there for the missing generator given the other generators and the intended output series.
We can find what the missing sequences would be by simulating the generation of output string from the given generators. When we reach an output that comes out incorrectly with respect to the known generators, we know that the missing generator must terminate at the character we're on. That makes every string that terminates at that character a candidate. We know, however, that none of those candidates should have appeared EARLIER in the output string, since that would have caused some earlier output to be different, and we know it's not.
Next time we come across an incorrect character, we can check against all our candidate sequences for any that match at that index, eliminating any that don't. If we have no candidate that matches all points which are incorrect in the output string, there is no solution (or if the first character is an H...). If we have a situation where there are NO incorrect characters in the output, there are infinite solutions (any generator that doesn't match any substring of the output will do...of which there are infinite choices).
So now the only question is how to do this efficiently.
The naive method walks the input string and at each step compares with all previous strings. At each step in the output string, we have to compare with all of our generator strings to see if there are any new matches. Given an output string of length n, and the fact that the TOTAL number of characters is 10^6, this bounds the total length of the generator strings to 10^6 - n. Further, the time to do the comparison is bounded by the total length of the generator strings. This makes the total time for all comparisons n * (10^6 - n). This is maximized when n = 10^6/2, which is still far too large.
Instead of going character by character, can we instead pre-compute the location of all string matches? Given the same constraints above, this will still be too slow.
What repeat work are we doing?
consider two generator strings
- TTT
- TTTH
In all our proposed solutions, we would consider them independent. We can see intuitively, however, that trying to match TTT does most of the work for trying to match TTTH. We just have to figure out a way to encode the generator strings to allow us to take advantage of that entropy. The natural solution is to build a prefix tree.
- build a prefix tree of the generator strings
- iterate through the output strings, creating a new pointer into the tree for each new character
- at each character, walk all the current pointers into the tree, traversing them to the appropriate child node for the current character in the output string
- each time we read a new character, update all the pointers
It turns out this is too slow, since we can have n pointers, and n characters.
There is repeat work at play here, as well, since finding
- HTTT
- TTT
Will overlap, but the prefix tree doesn't help there.
fortunately, [1]