Hot Stuff

Weekly Contest
myPANTONE for iPhone and iPod Touch & PANTONE COLOR BRIDGE coated
CreativePro.com Podcast
Don't miss it! Updated every Monday.
FREE Mags for Creative Pros!
Creativity, Website Magazine, and more!
HerGeekness Says: Convert Any File Part III
You've now converted that funky file to a PDF. It's time to tackle the final step: extracting assets from that PDF.
Written by Anne-Marie "HerGeekness" Concepcion on September 7, 2008
Related Articles
Related Reading
If you're new to this series, you may want to start from the beginning:
HerGeekness Says: Convert a File, Any File
HerGeekness Says: Convert Any File Part II
I Have a PDF. Now What?
Okay, you went through whatever rigmarole you had to go through to squeeze a PDF out of the client’s file. That may be all that’s necessary -- if you don’t need to edit anything in the client’s file, and the colors are fine, simply import the PDF into your project and place it as artwork. You might need to rasterize the PDF in Photoshop or export it to EPS format from Acrobat Professional if your layout program can’t import PDF files.
More often, though, you want only some of the text and graphics, and you need them in editable format. There are myriad ways to approach the challenge. The ones I describe below work in both Acrobat 8 or 9 (probably in earlier versions, too, but I don’t have any installed to verify that).
First, to extract all the text in a PDF to an external file, suitable for placing into another document and formatting there, open the PDF in Acrobat Pro and go to the File > Export submenu (Figure 1). One or more of the options there should get you on your way.
Figure 1. A quick way to get all the formatted text out of a PDF and into a single file is to open it in Acrobat and let its Export commands do their job.
To extract all the raster images in a PDF as individual JPEGs, PNGs, or TIFFs (at their existing resolution), don’t use the File > Export > Image command, which makes an image file out of each entire page in the PDF. Instead, choose Advanced > Document Processing > Export All Images, which results in a folder full of all the photos in the document, suitable for editing or re-using elsewhere (Figure 2).
Figure 2. You can extract every raster image in a PDF into separate image files with the Export All Images command, and then manipulate them and place them in any other program file.
The best thing about this method is that it maintains image resolution. If a screenshot in the PDF was 72 ppi, it’ll be 72 ppi when the extracted image is opened in Photoshop; if another image on the same page was 687 ppi (a result of scaling the image in the authoring program), it’ll be 687 ppi in Photoshop. Almost as good as a Collect for Output!
However, this method doesn't work for vector images, which Acrobat ignores. To extract vector artwork that you can edit and save in a drawing program, you can try brute force; that is, open the PDF itself from Adobe Illustrator’s File > Open command, effectively “converting” it to an editable Illustrator file. It can only open and convert one page of a PDF at a time, but that might be all you need. Use the Selection tool to select the elements you want to isolate, then copy and paste them into separate Illustrator files. Close the PDF without saving changes (Illustrator is not suited for editing PDFs, and it can really mess them up) or work on a copy of the PDF if you don’t trust yourself to stay away from Command/Ctrl-S.
Most often, I need to grab only one element in a PDF -- a logo, perhaps, or a raster image. In that case, I use a less drastic approach with my best friend, Acrobat Pro’s TouchUp Object Tool (Figure 3).
Figure 3. The Advanced Editing Toolbar (top) in Acrobat is home to the powerful TouchUp Object tool. To use it, select the tool and then click on the object in the PDF you want to work with. A selection rectangle should appear. Right-click on the selection and choose Edit Object or Edit Image from the contextual menu (middle). Raster images open in Photoshop by default (bottom); vector images and blocks of text open in Illustrator. You can change those defaults in Acrobat’s preferences.
When the object opens in the editing program, you’ll see that it appears in a temporary file. Instead of editing the artwork while it’s open in the temp file and then saving back to the PDF, choose Save As to save it as an independent, editable file in your program’s native file format.
On the Horizon: Round-tripping PDFs
One problem with all of these “extract from PDF” approaches is that the PDF’s page geometry -- the layout -- can’t be converted along with the text and images. What if a client creates a company newsletter in a program you’ve never heard of? Yes, they can supply it as a PDF, but how is that going to help you convert the sixteen pages of articles, sidebars, rules, and caption treatments to an editable InDesign or QuarkXPress file -- a reasonable running start in taking over production?
That’s when you should start looking for PDF conversion utilities that go the other way, from PDF to [your program here].
Recosoft, for example, sells products that convert PDFs to editable, true-to-the-layout-geometry Microsoft Office application formats or to InDesign’s INDD format (Figure 4).
Figure 4. Pages ’08 to InDesign! I exported one of Pages ’08 newsletter template files (top) to PDF using its own Export to PDF command. Then in a copy of InDesign CS3 with the PDF2ID plug-in installed, I opened the PDF, which converted to an editable InDesign layout file in the process (middle). It’s a little clunky, but I can fix the minor problems (such as the multiple text frames per column) in far less time than recreating it from scratch. The PDF2ID plug-in even took care of extracting all the graphics and linking them to the layout (bottom).
Or perhaps you specialize in database publishing, and the client says, “Here’s the Web page containing our database reports. Just copy and paste the HTML table.” Yikes! You need the data as Excel spreadsheets to sort and filter before exporting to the tab- or comma-delimited files that your database publishing program understands.
If you install the (Windows-only) Acrobat plug-in LDGetter from PDFToAll, you could create a PDF from the Web pages (in Acrobat Pro, choose File > Create PDF > From Web Page), then use the plug-in to convert the PDF to an Excel spreadsheet.
Recosoft and PDFToAll are the only two companies I know of that are tackling the “convert from PDF” challenge, but I’m hoping we’ll see more as the market demands it.
You've Vanquished Weirdo Files
With the help of this three-part series, you should be able to handle almost any unusual file format that comes your way.










