Visit Linkwerk.com

How to use Microsoft Clipart files on Linux/Unix

Users of Microsoft Office have access to the Microsoft ClipArt Library. The downloaded files can easily be used with the associated Clip Organizer, which runs on Windows, of course. But if you want to switch to Linux, there seems to be no software to handle the clip art files with the .mpf extension. Since the .mpf files are plain XML with the graphics embedded as Base64 strings, it’s easy to extract the files.

How to extract Base64 from XML

The binary data in the .mpf files is wrapped in elements of type C:resource, where C is the namespace prefix for urn:schemas-microsoft-com:office:clipgallery. A C:resource has child elements named C:filepath and C:contents. C:filepath contains a file name of the clip art. The following XSLT extracts the Base64 encoded files from a .mpf file.

<?xml version="1.0" ?>
<xslt:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
  xmlns:D="DAV"
  xmlns:C="urn:schemas-microsoft-com:office:clipgallery"
  xmlns:dt="urn:schemas-microsoft-com:datatypes"
  xmlns:exsl="http://exslt.org/common"
  extension-element-prefixes="exsl"
>
  <xslt:template match="/">
    <xslt:apply-templates select="//C:resource"/>
  </xslt:template>
  <xslt:template match="C:resource">
    <exsl:document href="/tmp/{./C:filepath}.b64" method="text">
      <xslt:value-of
       select="translate(normalize-space(./C:contents),' ','')"/>
    </exsl:document>
  </xslt:template>
</xslt:stylesheet>

It uses an EXSLT extension to generate more than one output file with one XSLT run. I chose this instead of the standardized XSLT2 instruction, because the EXSLT extension is supported by libxslt/xsltproc, which is available on most Linux systems. If you want to switch to Saxon, just exchange the exsl:document with the xsl:result-document element.

Running the above XSLT with xsltproc on your downloaded .mpf results in one or more .b64 files in /tmp directory. The final step is the decoding of the Base64 data.

cd /tmp
ls *.b64 | while read FILE
do
  base64 -d $FILE >  `basename $FILE .b64`
done

Depending on standard GNU command line tools base64 and basename, these few lines (which work fine in bash and probably some other shells) extract the desired binary files to the /tmp directory.

Note: The Microsoft Clip Art library is copyright protected. Please read the legal information before downloading any clip art. The above source code is published under a CC licence.

Comments are closed.