why is the html output of a paragraph in word broken into lots of little spans in html ?

Jan 25, 2009 at 10:20 PM
Edited Jan 25, 2009 at 10:22 PM
the paragraph has exactly the same style so i dont see why so many spans are needed in the html ?

This is how word stories it because it needs to manage the data in little chunks to enable tracking and translation ?
Mar 3, 2009 at 2:57 AM

 You are correct in assuming that Word stores data in chunks. These chunks corresponds to the user session during which it was entered into the document.

The .docx file is a zip package. So just rename the document to .zip. Uncompress the files to the desired folder. You will find a collection of xml files. The xml files contain all the information to recreate the word document.

For more details on the file format, you can look at 


Best regards