The Mysterious World of Metadata – Article

[Written in January 2005]
The Mysterious World of Metadata
A. Introduction
Recent stories about lawyers releasing documents containing embarrassing hidden data have highlighted the dangers of “metadata,” especially in documents created with Microsoft Office programs. Unfortunately, other lawyers who do not learn how to deal with metadata will suffer the same public humiliation. Metadata may not be the most important issue in electronic discovery, but it is one issue that lawyers must be familiar with because there will be negative consequences if they don’t address the well-publicized issues.
A. What is “Metadata” and Why We Should Care About It
The hidden data we call metadata is another example of a helpful feature that has some unfortunate negative consequences. The term is occasionally used in a limited or otherwise imprecise way, so let me give you my definition.
1. Defining the Term
“Meta” is the Greek word for “about.” Metadata refers to certain data that are associated with a document, but are not generally visible in the ordinary display or printing of the document. Common examples include comments, markup and revisions, author, owner and other information, and even records of versions. Although metadata is often discussed in connection with Microsoft Office documents, it can be created by many software programs.
2. Why Metadata Exists
Metadata is not inherently bad. It depends on the context we find it and who is viewing or using it. For many purposes, especially for collaborating on documents, this information is helpful and valuable. The “Track Changes” features, versioning, document and author information and other metadata can be very useful when several people work on a document. Once the document moves out of “friendly hands,” however, it can cause some damage if it is revealed, ranging from embarrassment to devastation of your case. Imagine the consequences if a document included a different settlement figure or candid comments about the strength or weakness of certain points.
3. Good Metadata and Bad Metadata
While it is tempting to think in terms of “good” metadata and “bad “metadata, it is more useful to think in terms of the amount and types of information that a particular piece of metadata carries. Some metadata is all but innocuous – file name, file type, creation date and the like. However, in certain cases, this information can turn out to be key evidence in a case. Other metadata is rich in information content – comments and revisions, for example – and you would generally not want this information to fall into someone else’s hands. The context is what is important. A document might have more than one hundred metadata items associated with it. Unless you know what metadata exists, you cannot make good decisions about it.
It’s also worth noting that some metadata may be altered or incorrect. For example, in the document properties, fields, such as author, may be edited and the “statistics” information for some Word documents bears no relation to reality.
B. Metadata You Might Find – Microsoft Word Example
Microsoft Word metadata gets the bulk of the attention these days, so let’s take a closer look at it. Do you know how to check for metadata in Word documents? Microsoft’s website is a useful resource for information about this hidden data.
1. Document Properties
Even if they are aware of metadata being created and associated with a document, many people do not realize how simple it is to view the metadata in documents. We will not go into much detail here, but spending 5 to 10 minutes under the Help menu in Word or on Google will open up new worlds for you.
For a quick example, simply open a Microsoft Word document and click on “Properties” under the “File” menu. You’ll find a screen that will allow you to see the wide range of metadata that is and can be associated with a Word document. People have been embarrassed by nearly all of these items, from revealing that someone outside the firm was the original author of an agreement to showing only a few minutes of actual editing time on a document for which many hours of time was charged for preparation. Again, it’s not so much the information itself – it’s the context that matters.
2. Track Changes and Comments
Everyone’s favorite forms of metadata are “Track Changes” and comments. An opposing party or even a judge can turn the “Track Changes” back on in a document after you thought you turned them off. There are lots of embarrassing and costly examples I’m sure that you have heard about. The sensitivity of this information is obvious.
You simply must learn how these features work and what precautions to take. Note that Office 2003 has built in some warnings and settings to help you out. Note too that you can set up Word to reveal hidden information in documents, which helps you see what is in your documents and, of course, will let you see what might be in documents that are sent to you.
3. Earlier Edits and Versions
If you are not careful about default settings, you may find other surprises. Earlier versions might be included as part of the final document you send, even if you use Adobe Acrobat to create a PDF file as a way to remove metadata. In certain situations, a Word document might contain information to allow someone else to use the “undo” feature to reveal changes and revisions.
D. Playing Offense and Defense with Metadata
Obviously, you want to be careful on this issue. It should be equally apparent that metadata can be a two-way street and that there are offensive and defensive uses of metadata.
1. Protecting Your Documents
Job one, of course, is to protect your own documents. You also want to understand what metadata is associated with your clients’ documents and the implications of that metadata.
A commonly-advised approach is to strip the metadata from the documents. There are several inexpensive software tools that will remove the metadata from or “scrub” Microsoft Office documents. Remember that Excel and PowerPoint files also contain metadata and spreadsheet files might have very damaging revisions or evidence of prior calculations. Microsoft also has a free “Remove Hidden Data” tool, but it only works with the newest versions of Office and you will need to study the published list of known issues.
Other common solutions are to save Word files as PDF files, use WordPad, a stripped-down word processor in Windows, or save the file in the RTF format. Note that Adobe Acrobat can now introduce its own metadata. Scrubbing and other techniques will work, but they may not get everything and it is important to follow developments in this area. There is currently an ongoing discussion about whether Word metadata can in fact carry through to a PDF document.
2. Showing Metadata in Other Documents
Playing defense on metadata is hard work. Playing offense is much more fun. Not to give away secrets, but a number of excellent lawyers have been aware of metadata and how to read it for years. They have used metadata as one more weapon in their arsenals. As we have suggested, it takes only a few setting changes in Word, Excel or PowerPoint to reveal, on a routine basis, the metadata associated with documents you receive. Perhaps the memo you had hoped would be the “smoking gun,” but was not, actually has the smoking gun hidden in it. At this point, it is hard to argue against treating the checking of metadata as a standard practice. However, it is worth noting that some commentators have opined that this practice is just plain wrong.
3. Difficult Ethical and Other Considerations
Metadata raises its own set of difficult ethical and other issues. Consider this question: what happens when I realize that I have produced or am compelled to produce documents that have damaging metadata in them? Am I compelled to affirmatively reveal it?
Given the lack of awareness of many lawyers, simply turning off the “track changes” on Word documents, which does not remove the metadata, does in fact make it invisible to unsophisticated readers. How would a court treat that approach? Is it possible to educate a judge about metadata and obtain a protective order that effectively permits the scrubbing of metadata? Should discovery requests routinely refer to production of documents in a format where metadata has not been scrubbed or altered?
I have little doubt that we will soon see court decisions on some of these questions. This area is one where you will want to track developments carefully. One good approach is to think of metadata in the same light as handwritten comments on paper documents. What would you do with the paper? Let those principles guide you in handling metadata.
E. Conclusions, Tips and Action Steps
The good news in the world of metadata is that, in many cases, you can address the primary issues relatively easily and inexpensively. The bad news is that there are a lot of metadata issues to worry about.
Let’s end with three action steps for you to take in the next few days.
First, an easy one. Open up a Word document, check the properties and see what you find.
Second, write down on a piece of paper the software tool that your firm uses to scrub metadata from documents and locate and read your policy for when and how to use it. If you can’t do either, find out why.
Third, take a few documents created outside your firm and try to turn on the “Track Changes” or show hidden data features. Think about what you find and decide whether you have the nerve to check your own documents.
As always, it’s best to be embarrassed in private than in public. If you don’t get metadata, metadata will get you.
Note: This article is one of a series of my previously-published articles that I’m making available for free on my website and incorporating into my blog. Other of my articles may be found in the Articles category archive on my blog.
[Originally posted on DennisKennedy.Blog (http://www.denniskennedy.com/blog/)]
This post brought to you by Dennis Kennedy’s half-day electronic discovery seminar – “Preparing for the New World of Electronic Discovery: Easing Your Transition from Paper to Electronic Discovery.” Contact Dennis today for more information and to schedule a seminar for your firm or legal department.

Comments

  1. nobody says

    The problem with the Remove Hidden Data plugin is that it merely creates an uneditable copy for read-only distribution. It seems the only way to remove metadata from the original without losing style and properties information is to save it as RTF, open the RTF, then save it again as a DOC file. This is the worst kind of security problem, because anyone who creates a new document from a preexisting one risks exposing information without any kind of warning. At least with markup languages, you can read the source.