Using for SharePoint document conversion

Jul 23, 2009 at 6:49 PM

Great utility! I've been trying the console application and it works really well. I've been considering using this utility as an alternative to SharePoint's document conversion service since it doesn't support embedded images (reason: it doesn't know where it should put them). I'd like to use OpenXMLViewer.exe to convert uploaded .docx files, but I have two problems:

1) There are no physical paths for the uploaded documents. I can get the web URI or the actual binary contents of a file, but not the physical machine location.

2) The html file is written to the destination folder and embedded images get outputted to a subfolder (word\media), but I need to get the file paths and names of the html file and its associated images so that I can upload them to a document library.

I was looking at the code and it isn't immediately obvious how I can do this. Hoping you can point me in the right direction to enhance the console app to do this.


Jul 23, 2009 at 11:37 PM

After looking at the latest source code, I noticed that the C# file program.cs file does perform a conversion, but when I compiled and ran it, it doesn't extract embedded images. Is this intentional? Maybe I'm missing something...

Jul 24, 2009 at 4:19 AM


Good questions there.

1. I am afraid the application needs a fully qualified path to work with. We will look at this in the next release. For now, there can be 2 approaches possible.

        a. Wrap the OpenXMLViewer.exe with another application layer that will be able to determine the actual physical path and call the exe

        b. The C++ source code is available in the source branch. The code can be modified, built etc to get the relative path.

2. The file generated is <document name>.html / <document name>.xhtml. This can be used to get the respective file name. However for the images, a change in code may be needed. The current application uses the file names as stored in the docx. Hence it is possible that the image file may be overwritten in case multiple documents use the same file name and the files are extracted to the same output folder.

The C# code branch is not supported and is legacy code. Hence it may not function as expected.

Best regards,