[OSM-talk] SVG to PDF

David Earl david at frankieandshadow.com
Tue Dec 19 22:52:38 GMT 2006


PDF files are read from the *end*. At the end of a PDF file is a number
which is an offset within the file where the reader can find a table (a
"dictionary" in PDF parlance) which (possibly indirectly) identifies all the
elements (pages, graphics, ...) in the file versus the offset within the
file where these appear. (In this way, you can append a new chunk onto the
end of the file which replaces the table, thereby modifying the content
selectively without rewriting the whole file - a bit like multi-session
CDs).

So if you delete anything from the file, all the offsets will be wrong.

The purpose of the little bit of binary (which isn't part of the PDF data -
in the sense that none of the index tables reference it) is normally to make
downloaders and other file transfer agents think the file is binary, so that
they don't convert CR or LF into CRLF, thereby disrupting any other binary
data in the file, and the offsets, of course. (Some applications put custom
data in this area of the file knowing a reader won't look at it).

However, Acrobat (specifically) is quite good at detecting misaligned
offsets and repairing them.

So maybe a file got treated as if it were text at some point, but e.g. line
endings got changed, so the offsets are broken.

Binary PDFs certainly *aren't* just compressed PDFs. You will find a mixture
of binary and text data in most PDF files. If there's an image in the file
it will almost certainly be binary (though it doesn't have to be), and most
PDF applications represent their graphic data in binary form in the file.
But even when the file doesn't contain any non-printable ascii characters,
the line endings are still 'binary'.

David

> -----Original Message-----
> From: talk-bounces at openstreetmap.org
> [mailto:talk-bounces at openstreetmap.org]On Behalf Of Andy Armstrong
> Sent: 19 December 2006 22:16
> To: Jochen Topf
> Cc: talk at openstreetmap.org
> Subject: Re: [OSM-talk] SVG to PDF
>
>
> On 19 Dec 2006, at 22:09, Jochen Topf wrote:
> > I am not sure what you mean with second line. PDF files are binary.
> > When
> > I try deleting what looks like the second line in a PDF it is broken
> > afterwards.
>
> PDF files don't have to be binary. When I downloaded it from the
> server I got an ASCII PDF - which isn't unusual (binary PDFs are just
> compressed ASCII PDFs). I'm not sure if it'll make it through the
> mailing list but here's what I see:
>
> %PDF-1.4
> %Âéî¡
> 3 0 obj
> <<
>    /Type /Catalog
>    /Pages 2 0 R
>  >>
> endobj
> 4 0 obj
> <<
>
> That funky looking second line was the one I deleted - it all works
> fine after that.
>
> --
> Andy Armstrong, hexten.net
>
>
> _______________________________________________
> talk mailing list
> talk at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk





More information about the talk mailing list