*** From the Archives ***

This article is from May 11, 2004, and is no longer current.

XML Can Go to H***: One Designer's Experience with the "Future of Publishing"

“As for the future of publishing, XML is really important. I can’t tell you how or why, but I know it’s the future.” — My Editor Pam Pfiffner
If ever there were two philosophically incompatible genres, surely programming and graphic design would outrank organized religion. Asking a graphic designer to program is generally about as rewarding as asking a programmer to lay out a four-color magazine with Adobe InDesign. In either case, the results would be approximately the same and I won’t dwell on them here. Given the diametrically opposite approaches programmers and designers have to creating output, it boggles the mind that anyone bothered to invent a publishing solution that plunges both right- and left-brained people into absolute chaos. I am referring, of course, to XML, short for eXtensible Markup Language.
XML, DTD, and Other Acronyms
What is XML, exactly? It depends who you ask.
Programmer definition of XML: eXtensible Mark-up Language, a specification developed by the W3C (World Wide Web Consortium), is a subset of Standard Generalized Mark-Up Language (SGML), designed especially for Web documents. It allows designers to create customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. XML contains both data and metadata. It uses DTD and schema to describe the data.
Readable definition of XML: If you are familiar with HTML, you know all about tagging text with certain styles, for example, Body, Head, and List. When an HTML file is “read” by a Web browser, the information is interpreted and presented to you with nicely formatted headings, text, and tables. For example, the following line of HTML code tags “How to Braid Water” as a title element:
<title>How to Braid Water</title>
XML is a superset of HTML that allows programmers to define custom styles and document structure. XML-aware programs can automatically extract data from an XML document, using an associated Document Type Definition (DTD) as a guide. This file basically defines the elements and data structure contained in an XML document. For example, a simple DTD might contain the description of four elements: Chapter, Heading1, Heading2, and Para. The DTD would further specify that a Chapter can contain Heading1’s, Heading2’s, and Para’s; Heading1’s can contain Heading2’s and Para’s; Heading 2’s can contain only Para’s. In addition to structural flows, the DTD specifies style tags, fonts, spacing, and a zillion other attributes that an element can contain.
Now, reread that last few sentences again and think to yourself: Geez, all this info for just four elements! Then, just imagine what fun it would be to handle a DTD containing 200 elements and you will get some inkling of the complexities we are talking about.
Conceived as a method for producing consistent output, XML has been adopted by Forward Thinkers in the publishing industry. Unfortunately, Forward Thinkers often don’t actually do much besides think forwardly, leaving the details to underlings. Being an underling myself, I gotta tell you it’s really true that the devil is in those details. And I will further clarify this statement by describing what happened when XML met Adobe FrameMaker.
The Goddess Meets the Forward Thinkers
My day job is a compositor and desktop publishing troubleshooter with a delightful publishing company in California (my official title is Production Goddess). About a year ago my company hired a couple of XML evangelists who set about converting our entire publishing operation to what they enthusiastically described as “push-button publishing.” According to these two guys, we could reduce the entire complicated editorial process to a couple of copy editors and a laptop.
I was called into a meeting where these two Forward Thinkers described publishing utopia: The authors would cheerfully submit manuscript in XML format and our copy editors would somehow do their correcting stuff to smooth out the prose. And then, by magic, we would transfer the files to the laptop, push a button, and Shazaam! — something would be created that we could send to the printer. At the end of the presentation, I meekly asked a couple of questions along the lines of could they please define Shazaam! and what exactly would we be sending to the printer? As with most Forward Thinkers confronted with reality, I got a good wave-off and was sort of told to sit down, shut up, and come to terms with the end of Publishing As We Know It. I did persist a little, asking who was going to write all this magical software and was assured that they would be happy to oblige.
Before they could get started on their forwardly thinking quest, however, my boss dropped a 500-page XML anvil of a manuscript on my foot. How hard could this be, I figured. After all, I read the press releases and had faith, as only a true acolyte can, that Adobe had seamlessly implemented XML into FrameMaker 7.0, which would then be able to spit out the PDF pages we required. I figured it would just be a matter of sucking the chapters into FrameMaker, slapping some tags onto a couple of recalcitrant paragraphs, and Shazaam! — perfect PDF output. Of course, this was before I started reading the 588-page XML manual that was written by a coven of surly Klingons.
An Excursion into Esoterica
The first problem I ran into was spaces in style names. To make life easy for editors, authors, and compositors, I had designed our styles with English names. So instead of inscrutable names like BTF and CL, I spelled everything out — BTF became Body Text First and so on. Well, XML can’t handle spaces in style names, so I had to rename 86 styles to things like BodyTextFirst and CodeLast. OK, that only took a couple of hours. Now we could get on with customizing the DTD so the output would be in our standard format.
Adobe supplies several standard structured applications for creating XML documents, and xDocBook, a standard application for writing and formatting books, seemed like a good place to start. Of course, starting off with xDocBook’s DTD consisting of 200-plus elements is probably not a good idea for beginners. Do people really need elements called Alt, Initializer, and MsgAud? I deleted the ones that were redundant, irrelevant, and inexplicable. Then I carefully added in formatting information and saved the result as an EDD, FrameMaker’s version of a DTD.
After a week or so of experimentation, which included excursions into esoteric files named structapps.fm, default.rw, xdocbook.css, and so on, I managed to produce FrameMaker files that looked kinda like our standard format. However, nobody at the office could directly copyedit FrameMaker files — I had to export the text as Word files so corrections could be made. At which point, of course, all XML-ness disappeared and the entire exercise devolved into a pointless and intricate exercise in file conversion.
The Art of the Alt-Tab
I should add here that at one point seven separate programs were open just to process these stupid files: Acrobat for reading the XML manual, Notepad for reading and writing the read/write rules, Internet Explorer for examining the raw XML code, blah, blah, blah. Plus, there were all sorts of files lurking around FrameMaker’s desktop: my EDD, the template, the book author’s DTD, a billion log files, the chapter file — on and on ad nauseum. I got the definite sinking feeling that “push-button publishing” really meant a lot of Alt-Tabbing to switch among all these apps.
Somewhere around the middle of the book, it was discovered that FrameMaker was gleefully stripping out all elements named programco. Unfortunately, the author used a lot of programco’s, and boy, was he was livid when he got the page proofs. After consultation with Adobe, I discovered the problem in a file called “rules,” buried four folders down, which instructed FrameMaker to drop certain elements. (Let’s open yet another instance of Notepad, shall we?) Exactly why the programmers at Adobe felt that programco was offensive remains an enigma. After all, what’s one more element in the wad supplied with xDocBook? Unfortunately, by the time I discovered the fix, the copy editors, proofreaders, project managers, and assistant head person were also livid. I suspect the head person was also livid, but at least he never fired off panicky e-mails to me. Everyone finally calmed down, I fixed the Word files, and we shipped the book more-or-less on time.
The Orc Comes to Rest
After this disastrous intro to XML, things were kinda quiet in the home office for a while… until a few weeks ago, when the XML orc starting flapping around again. Apparently, a couple of authors were adamantly insisting on submitting manuscripts in XML format. Swell, I thought. Let the Forward Thinkers handle it. By this time, I figured they had finished writing the push-button software and I was off the hook. To my utter horror, I discovered that their enthusiasm must have pushed somebody’s button, because they were suddenly no longer with the company. The XML orc slowly circled the remaining personnel and came to rest inside my office.
The sample chapter I was sent conformed to no standard XML format. The authors, I was cheerily informed, didn’t like xDocBook, and they wrote their own implementation. Wait a minute. The whole point of XML is that it’s supposed to be standardized, right? How could someone write a non-standard standard? More to the point, why would anyone write a non-standard standard?
I spent a few days stirring the sample about, squirting it in and out of FrameMaker, and fiddling with the EDD. And then my boss told me not to worry. Before they slid out the door, the Forward Thinkers had found this Swell Consultant who had taken the project in hand and could magically process the chapters without any layout program at all. I shooed the orc out of the office, cleaned up the mess of feathers, and returned to the comfy world of Wintel.
A week later, I got the dreaded e-mail. Apparently, the consultant “had other time commitments” and couldn’t actually do the work. But, good news! She got a good start on a custom DTD using our standard styles. The XML orc slunk back into my office and started scratching in the corner.
I opened the DTD. There were nine elements in there, none of which were relevant to our standard style guide. AuthorSurname? RevisionNumber? Geez. I hope they didn’t pay big bucks for this stuff. The XML orc perched onto the top left corner of my laptop and started teetering back and forth. An inky feather, portent of future events, drifted down onto the keyboard.
Back to the Beginning
OK, now I get to write a DTD more or less from scratch. Of course I cheated by starting with xDocBook and renaming the elements to conform to the authors’ idea of how a DTD should have been written. This was so much fun that I began wishing for a hard disk crash. Let’s see. SimpFleagleOrg. This might correspond to exactly what style here? Heading1? UnnumberedList? Footnote? JumpAndKillProgrammer? Could the authors supply a Rosetta Stone, I pleaded? Apparently they were willing to write a translator program if I could supply them with a DTD. I shot the xDocBook DTD off to them yesterday and I just can’t wait to see what comes back. (What finally came back was a series of long e-mails explaining, in obfuscating program-speak, how the authors wanted the keystroke information and code lines formatted. Sixty-seven hours later, I finally produced perfect XML output and saved the FrameMaker EDD to seven different backup locations.)
Now, in the middle of all this frantic activity, Adobe released an upgrade to FrameMaker 7.0 — FrameMaker 7.1. This version promised a seamless save to XML, something that the previous version didn’t do. In fact, the XML that FrameMaker 7.0 produced was firmly rejected by any other program with which I tried to open the files (OK, I didn’t expect Unreal to open the document, but I was getting desperate). Interestingly, FrameMaker 7.1 could save out XML files. But of course, it occasionally refused to open them again, producing pages and pages of error codes. In some cases, it happily opened the file, but truncated it where it found an orc feather or something.
At this point, in utter desperation, I ditched the XML altogether. To do this, I saved the file in FrameMaker format, set Unstructured as the preference, restarted FrameMaker, waded through the dialog boxes warning me about opening structured docs in unstructured FrameMaker, and then tagged the miserable document by hand. And I would have to repeat this exercise with each and every chapter, mind you. I am, by the way, not going to discuss the bizarre clutter of WhiteSpace tags, which had to be deleted one by one for the structured format to actually work. No, I couldn’t globally delete them, because doing so removed the spaces in the Heading2 elements. You knew that, right?
Push-Button What Exactly?
By now, I assume that you have a pretty decent picture of “push-button” publishing. If not, let me review the steps you will need to take to accomplish this goal:

  1. Convince, coerce, or demand that authors abandon Word and OpenOffice as writing platforms and use programs whose documentation include phrases like “Validates XML documents against DTD/XML Schema”!; “Performs XSL transformation”!; and “Conforms to the latest W3C specs”! in the software overview. The word “schema” is very important. If it isn’t included in the press release, the software is for wimps.
  2. Convince, coerce, or demand that the copy editors smooth out text snuggled between a Byzantine arrangement of angle brackets. Their choice of editing software is, of course, personal, but I suggest Edlin. You can actually still find this antediluvian chestnut on the Win XP CD. XML in a DOS box — Wow!
  3. Find a sober compositor with years of layout and programming experience and a big-rig computer setup. Make sure the compositor knows about the Alt-Tab key sequence. Convince, coerce, or demand that the compositor memorize the XML manual, write the appropriate translation files, and then turn out pages of impeccably formatted material in a goofy timeframe. Extra credit for doing this in a DOS box.

In sum, if you are in charge of organizing the technical documentation for a nuclear submarine, XML has much to recommend it. Theoretically, you could design an XML template, hand it out to the hundreds of specialized tech writers, and get back a series of conformal documents that should be easy to assemble. Then you can actually push a button (at least you can in FrameMaker) and produce human-readable PDF files. Naturally, you’ll have to settle for a dowdy format. Let’s face it: If you entertain the concept that XML can adroitly handle text runarounds, four-color separations, and twiddly kerning, you need to review the concept of circular firing squads.
If, on the other hand, you are the head of production at a book company with the usual contingent of happy Quark-heads, forget it. Don’t go there. If someone in your organization starts blabbering about XML being the “future of publishing,” grab the orc Mace and lock your office door.
Read more by Susan Glinert.

  • anonymous says:

    At last someone speaks up about the XML con!

  • anonymous says:

    i read the article with a mixture of amusement and horror… this is not just a story about xml, but about every other ‘standard’ that exists, from postscript to html, all of them are a nightmare, and nobody seems to know what is going on… i could go on, but i would just be repeating myself… thank you for this article… i only wish i believed it would change things!

  • anonymous says:

    I’ve always thought XML was conceived by the same people who snap wide awake in the middle of the night sensing that something is askew in their underwear drawer.

    XML has always been about quantifying creativity. And you see the results. Add to this the fact that we creatives can be easily intimidated by these “forward thinkers.”

    I try to remember this, the same thing my father always told me about snakes: “They’re more afraid of you than you are of them.”

  • anonymous says:

    XML is a joke created by those that do not know the problems and solutions of publishing in the world of print. Why do these programmer types keep on thinking up such stupid solutions. KISS, Keep It Simple Stupid.

  • anonymous says:

    Let designers do what they do best and programmers do what they do best. Why must a designer be expected to be a programmer?
    Javascript, Actionscript, HTML, XML; once your time is consummed with these codes what time do you have for design?

  • anonymous says:

    I don’t agree with the article because Susan is blaming XML when she should be blaming her employer.

    XML has a time and place and needs to be set up properly to work. Trying to figure it out from scratch, with no previous experience on live, deadline driven data is just freakin nuts.

    I think the moral of the story is knowing when to say no. Yes, unfortunately Susan got mixed up in something not of her making but should have had the sense to say “I don’t know how to do that and don’t have time to learn”.

  • anonymous says:

    The author knows little about using XML, yet starts a major project with XML. The project is snakebit, and it is XML’s fault, not the author’s. A little knowledge is a dangerous thing; no knowledge is disasterous.

  • anonymous says:

    This is almost exactly the same experience I had with XML at a publishing company. The only difference was I was on a Mac and fortunately it was only a test. I laughed out loud reading about the author having to use seven apps to make things work. This rang so true to me. I thought I was crazy, but it’s good to see I’m not alone. The concept of “push-button publishing” is a nice dream, but the reality is much different. There is far too much individual custimization in day-to-day publishing for there to be a catch-all solution. Enforcing a standard company-wide is it’s own nightmare. Who’s responsible when the tags are input incorrectly? Someone still has to go through the data and check everything. Maybe XML is the future, but today I’ll stick with formatting by hand. Some of the auto-formatting plug-ins for InDesign/Quark (like InData) are nice but still require a lot of work upfront and a lot of double-checking. It’s a good first step for moving in the XML direction though, especially for catalog-type work.

  • anonymous says:

    Other comments mentioned that before: blaming XML is the worst that the author could have done after that experience. What really went wrong – like it happens very often – the process wasn’t prepared properly and the tools that have been used are not good enough for the task.

    And what’s missing completely is a commercial view on things. How can the author come to the conclusion that XML isn’t ready for publishing when you can find thousands of case studies where XML based publishing created huge return on investment. Well, that’s the point, many people are prepared for investments (time and/or money) and those won’t ever use XML or other new technology successfully.

    An objective view on XML should conclude that XML is not (yet) good enough for all sort of publishing. But there are fantastic opportunities in that technology!

  • anonymous says:

    OK, I won’t blame XML for all the world’s troubles…. I’m sure it has a place… somewhere… a very dark, orc infested place if Susan is right.

    I would like to know some of the ‘thousands of companies’ that have made a successful workflow based on it. I’m wondering how many are printers.

    I like the idea of XML. But for now, I’ll stick to designing and let the right brain types hash it out for me.

  • anonymous says:

    I just wanted to give you props on your fun article. Maybe your company should consider you for copywriting before programming.

    And I loved the response below about something “askew in their underwear drawer.” Laughed out loud at that one.

  • anonymous says:

    Still very amusing on the second read.

    The complexity of XML seems to be masking its usefulness. I agree that in principle it sounds a great idea, in practice, it seems to be a new gadget for its own sake. Obviously it works for some – and I’d be interested in who it HAS worked for and in what publishing tasks!

  • anonymous says:

    Still very amusing on the second read.

    The complexity of XML seems to be masking its usefulness. I agree that in principle it sounds a great idea, in practice, it seems to be a new gadget for its own sake. Obviously it works for some – and I’d be interested in who it HAS worked for and in what publishing tasks!

  • David Glover says:

    It seems as though much of this XML effort is trying to replicate what we’ve been able to do with styles in MS Word (and other word processors) and Quark or InDesign.

    For at least a decade I’ve prepared copy in Word using styles which is automatically mapped through to Quark or InDesign styles. I can give them any name I want. I can assign them (in Word) to the function keys so I can quickly assign them. Importantly, when others are preparing copy, I can just give them a printout of the keys assigned to subheads, pull quotes, captions etc and they just hit the button to format correctly.

    This is the standard o

    And why, oh why, have we allowed “standards” to be promulgated that don’t allow spaces in names? Spaces were cleverly invented to make things easier to read and were widely adopted over the next few centuries.

    Programmers ditched spaces when memory cost several dollars per byte. This cost efficiency became irrelevant over 20 years ago (and the authors of the Mac OS duly took note).

    Somehow, since the web came along, we’ve been thrown back into the computing dark ages with all these disallowed characters. If I want to name a style with spaces (or traditional typographic marks like the paragraph or section mark!) the system should allow it.

    Until then, I’ll leave XML as something that computer programs generate.

  • Anonymous says:

    This I especially loved. I wonder, if after even 5 ears if XML is any more useful now than then?

    Convince, coerce, or demand that the copy editors smooth out text snuggled between a Byzantine arrangement of angle brackets. Their choice of editing software is, of course, personal, but I suggest Edlin. You can actually still find this antediluvian chestnut on the Win XP CD. XML in a DOS box — Wow!

  • >