I've tried to take a step back and look at this from a more general view.
Here are a my two cents:
=== First, the issue of storing the un-segmented content:
There seem to be two models:
A) storing the segmented content as a copy of the original source.
B) marking up the source with specific elements for segmentation.
The solution A requires to duplicate the source, which means a danger of having possible discrepancies between the original and the segmented source (what do we do in those cases?). It seems also a waste of space: no matter what the tool used large files always end up being a problem at some point. We should try to avoid making it worst.
Note that having a separate original-source would be basically the reverse of