XML Can Go to H***: One Designer's Experience with the "Future of Publishing"
"As for the future of publishing, XML is really important. I can't tell you how or why, but I know it's the future." -- My Editor Pam Pfiffner
If ever there were two philosophically incompatible genres, surely programming and graphic design would outrank organized religion. Asking a graphic designer to program is generally about as rewarding as asking a programmer to lay out a four-color magazine with Adobe InDesign. In either case, the results would be approximately the same and I won't dwell on them here. Given the diametrically opposite approaches programmers and designers have to creating output, it boggles the mind that anyone bothered to invent a publishing solution that plunges both right- and left-brained people into absolute chaos. I am referring, of course, to XML, short for eXtensible Markup Language.
XML, DTD, and Other Acronyms
What is XML, exactly? It depends who you ask.
Programmer definition of XML: eXtensible Mark-up Language, a specification developed by the W3C (World Wide Web Consortium), is a subset of Standard Generalized Mark-Up Language (SGML), designed especially for Web documents. It allows designers to create customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. XML contains both data and metadata. It uses DTD and schema to describe the data.
Readable definition of XML: If you are familiar with HTML, you know all about tagging text with certain styles, for example, Body, Head, and List. When an HTML file is "read" by a Web browser, the information is interpreted and presented to you with nicely formatted headings, text, and tables. For example, the following line of HTML code tags "How to Braid Water" as a title element:
<title>How to Braid Water</title>
XML is a superset of HTML that allows programmers to define custom styles and document structure. XML-aware programs can automatically extract data from an XML document, using an associated Document Type Definition (DTD) as a guide. This file basically defines the elements and data structure contained in an XML document. For example, a simple DTD might contain the description of four elements: Chapter, Heading1, Heading2, and Para. The DTD would further specify that a Chapter can contain Heading1's, Heading2's, and Para's; Heading1's can contain Heading2's and Para's; Heading 2's can contain only Para's. In addition to structural flows, the DTD specifies style tags, fonts, spacing, and a zillion other attributes that an element can contain.
Now, reread that last few sentences again and think to yourself: Geez, all this info for just four elements! Then, just imagine what fun it would be to handle a DTD containing 200 elements and you will get some inkling of the complexities we are talking about.
Conceived as a method for producing consistent output, XML has been adopted by Forward Thinkers in the publishing industry. Unfortunately, Forward Thinkers often don't actually do much besides think forwardly, leaving the details to underlings. Being an underling myself, I gotta tell you it's really true that the devil is in those details. And I will further clarify this statement by describing what happened when XML met Adobe FrameMaker.
The Goddess Meets the Forward Thinkers
My day job is a compositor and desktop publishing troubleshooter with a delightful publishing company in California (my official title is Production Goddess). About a year ago my company hired a couple of XML evangelists who set about converting our entire publishing operation to what they enthusiastically described as "push-button publishing." According to these two guys, we could reduce the entire complicated editorial process to a couple of copy editors and a laptop.
I was called into a meeting where these two Forward Thinkers described publishing utopia: The authors would cheerfully submit manuscript in XML format and our copy editors would somehow do their correcting stuff to smooth out the prose. And then, by magic, we would transfer the files to the laptop, push a button, and Shazaam! -- something would be created that we could send to the printer. At the end of the presentation, I meekly asked a couple of questions along the lines of could they please define Shazaam! and what exactly would we be sending to the printer? As with most Forward Thinkers confronted with reality, I got a good wave-off and was sort of told to sit down, shut up, and come to terms with the end of Publishing As We Know It. I did persist a little, asking who was going to write all this magical software and was assured that they would be happy to oblige.
Before they could get started on their forwardly thinking quest, however, my boss dropped a 500-page XML anvil of a manuscript on my foot. How hard could this be, I figured. After all, I read the press releases and had faith, as only a true acolyte can, that Adobe had seamlessly implemented XML into FrameMaker 7.0, which would then be able to spit out the PDF pages we required. I figured it would just be a matter of sucking the chapters into FrameMaker, slapping some tags onto a couple of recalcitrant paragraphs, and Shazaam! -- perfect PDF output. Of course, this was before I started reading the 588-page XML manual that was written by a coven of surly Klingons.
An Excursion into Esoterica
The first problem I ran into was spaces in style names. To make life easy for editors, authors, and compositors, I had designed our styles with English names. So instead of inscrutable names like BTF and CL, I spelled everything out -- BTF became Body Text First and so on. Well, XML can't handle spaces in style names, so I had to rename 86 styles to things like BodyTextFirst and CodeLast. OK, that only took a couple of hours. Now we could get on with customizing the DTD so the output would be in our standard format.
Adobe supplies several standard structured applications for creating XML documents, and xDocBook, a standard application for writing and formatting books, seemed like a good place to start. Of course, starting off with xDocBook's DTD consisting of 200-plus elements is probably not a good idea for beginners. Do people really need elements called Alt, Initializer, and MsgAud? I deleted the ones that were redundant, irrelevant, and inexplicable. Then I carefully added in formatting information and saved the result as an EDD, FrameMaker's version of a DTD.
After a week or so of experimentation, which included excursions into esoteric files named structapps.fm, default.rw, xdocbook.css, and so on, I managed to produce FrameMaker files that looked kinda like our standard format. However, nobody at the office could directly copyedit FrameMaker files -- I had to export the text as Word files so corrections could be made. At which point, of course, all XML-ness disappeared and the entire exercise devolved into a pointless and intricate exercise in file conversion.
The Art of the Alt-Tab
I should add here that at one point seven separate programs were open just to process these stupid files: Acrobat for reading the XML manual, Notepad for reading and writing the read/write rules, Internet Explorer for examining the raw XML code, blah, blah, blah. Plus, there were all sorts of files lurking around FrameMaker's desktop: my EDD, the template, the book author's DTD, a billion log files, the chapter file -- on and on ad nauseum. I got the definite sinking feeling that "push-button publishing" really meant a lot of Alt-Tabbing to switch among all these apps.
Somewhere around the middle of the book, it was discovered that FrameMaker was gleefully stripping out all elements named programco. Unfortunately, the author used a lot of programco's, and boy, was he was livid when he got the page proofs. After consultation with Adobe, I discovered the problem in a file called "rules," buried four folders down, which instructed FrameMaker to drop certain elements. (Let's open yet another instance of Notepad, shall we?) Exactly why the programmers at Adobe felt that programco was offensive remains an enigma. After all, what's one more element in the wad supplied with xDocBook? Unfortunately, by the time I discovered the fix, the copy editors, proofreaders, project managers, and assistant head person were also livid. I suspect the head person was also livid, but at least he never fired off panicky e-mails to me. Everyone finally calmed down, I fixed the Word files, and we shipped the book more-or-less on time.
The Orc Comes to Rest
After this disastrous intro to XML, things were kinda quiet in the home office for a while... until a few weeks ago, when the XML orc starting flapping around again. Apparently, a couple of authors were adamantly insisting on submitting manuscripts in XML format. Swell, I thought. Let the Forward Thinkers handle it. By this time, I figured they had finished writing the push-button software and I was off the hook. To my utter horror, I discovered that their enthusiasm must have pushed somebody's button, because they were suddenly no longer with the company. The XML orc slowly circled the remaining personnel and came to rest inside my office.
The sample chapter I was sent conformed to no standard XML format. The authors, I was cheerily informed, didn't like xDocBook, and they wrote their own implementation. Wait a minute. The whole point of XML is that it's supposed to be standardized, right? How could someone write a non-standard standard? More to the point, why would anyone write a non-standard standard?
I spent a few days stirring the sample about, squirting it in and out of FrameMaker, and fiddling with the EDD. And then my boss told me not to worry. Before they slid out the door, the Forward Thinkers had found this Swell Consultant who had taken the project in hand and could magically process the chapters without any layout program at all. I shooed the orc out of the office, cleaned up the mess of feathers, and returned to the comfy world of Wintel.
A week later, I got the dreaded e-mail. Apparently, the consultant "had other time commitments" and couldn't actually do the work. But, good news! She got a good start on a custom DTD using our standard styles. The XML orc slunk back into my office and started scratching in the corner.
I opened the DTD. There were nine elements in there, none of which were relevant to our standard style guide. AuthorSurname? RevisionNumber? Geez. I hope they didn't pay big bucks for this stuff. The XML orc perched onto the top left corner of my laptop and started teetering back and forth. An inky feather, portent of future events, drifted down onto the keyboard.
Back to the Beginning
OK, now I get to write a DTD more or less from scratch. Of course I cheated by starting with xDocBook and renaming the elements to conform to the authors' idea of how a DTD should have been written. This was so much fun that I began wishing for a hard disk crash. Let's see. SimpFleagleOrg. This might correspond to exactly what style here? Heading1? UnnumberedList? Footnote? JumpAndKillProgrammer? Could the authors supply a Rosetta Stone, I pleaded? Apparently they were willing to write a translator program if I could supply them with a DTD. I shot the xDocBook DTD off to them yesterday and I just can't wait to see what comes back. (What finally came back was a series of long e-mails explaining, in obfuscating program-speak, how the authors wanted the keystroke information and code lines formatted. Sixty-seven hours later, I finally produced perfect XML output and saved the FrameMaker EDD to seven different backup locations.)
Now, in the middle of all this frantic activity, Adobe released an upgrade to FrameMaker 7.0 -- FrameMaker 7.1. This version promised a seamless save to XML, something that the previous version didn't do. In fact, the XML that FrameMaker 7.0 produced was firmly rejected by any other program with which I tried to open the files (OK, I didn't expect Unreal to open the document, but I was getting desperate). Interestingly, FrameMaker 7.1 could save out XML files. But of course, it occasionally refused to open them again, producing pages and pages of error codes. In some cases, it happily opened the file, but truncated it where it found an orc feather or something.
At this point, in utter desperation, I ditched the XML altogether. To do this, I saved the file in FrameMaker format, set Unstructured as the preference, restarted FrameMaker, waded through the dialog boxes warning me about opening structured docs in unstructured FrameMaker, and then tagged the miserable document by hand. And I would have to repeat this exercise with each and every chapter, mind you. I am, by the way, not going to discuss the bizarre clutter of WhiteSpace tags, which had to be deleted one by one for the structured format to actually work. No, I couldn't globally delete them, because doing so removed the spaces in the Heading2 elements. You knew that, right?
Push-Button What Exactly?
By now, I assume that you have a pretty decent picture of "push-button" publishing. If not, let me review the steps you will need to take to accomplish this goal:
- Convince, coerce, or demand that authors abandon Word and OpenOffice as writing platforms and use programs whose documentation include phrases like "Validates XML documents against DTD/XML Schema"!; "Performs XSL transformation"!; and "Conforms to the latest W3C specs"! in the software overview. The word "schema" is very important. If it isn't included in the press release, the software is for wimps.
- Convince, coerce, or demand that the copy editors smooth out text snuggled between a Byzantine arrangement of angle brackets. Their choice of editing software is, of course, personal, but I suggest Edlin. You can actually still find this antediluvian chestnut on the Win XP CD. XML in a DOS box -- Wow!
- Find a sober compositor with years of layout and programming experience and a big-rig computer setup. Make sure the compositor knows about the Alt-Tab key sequence. Convince, coerce, or demand that the compositor memorize the XML manual, write the appropriate translation files, and then turn out pages of impeccably formatted material in a goofy timeframe. Extra credit for doing this in a DOS box.
In sum, if you are in charge of organizing the technical documentation for a nuclear submarine, XML has much to recommend it. Theoretically, you could design an XML template, hand it out to the hundreds of specialized tech writers, and get back a series of conformal documents that should be easy to assemble. Then you can actually push a button (at least you can in FrameMaker) and produce human-readable PDF files. Naturally, you'll have to settle for a dowdy format. Let's face it: If you entertain the concept that XML can adroitly handle text runarounds, four-color separations, and twiddly kerning, you need to review the concept of circular firing squads.
If, on the other hand, you are the head of production at a book company with the usual contingent of happy Quark-heads, forget it. Don't go there. If someone in your organization starts blabbering about XML being the "future of publishing," grab the orc Mace and lock your office door.
Read more by Susan Glinert.
Liked This? Read These!
FrameMaker seems to be Adobe's best-kept secret. Read More
Adobe FrameMaker is incomparable for producing long, technical documentation. Easier to use than other desktop publishing applications, FrameMaker contains almost all the bells and whistles required... Read More
Adobe Systems Incorporated (Nasdaq:ADBE) today announced Adobe® FrameMaker® 7.2, the latest version of its enterprise-class authoring and publishing software. Adobe FrameMaker, a tool for creating... Read More
Adobe Systems Incorporated (Nasdaq:ADBE) today announced Adobe® FrameMaker® 8 software, a complete authoring and publishing tool that combines the simplicity of word processing with the power of XML. Read More