[OSM-dev] Problem retrieving wiki pages - comparison
Andreas Kalsch
andreaskalsch at gmx.de
Fri Aug 28 19:06:05 BST 2009
A comparison between de.wikipedia.org and wiki.openstreetmap.org - the
option --save-headers prepends the answer's headers to "file":
1) wget -O file --save-headers http://de.wikipedia.org/wiki/Test && more
file
HTTP/1.0 200 OK
Date: Fri, 28 Aug 2009 12:30:51 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: de
Vary: Accept-Encoding,Cookie
Last-Modified: Sat, 22 Aug 2009 06:27:38 GMT
Content-Length: 25984
Content-Type: text/html; charset=utf-8
Age: 19534
Connection: keep-alive
<!DOCTYPE html ...
2) wget -O file --save-headers
http://wiki.openstreetmap.org/wiki/Map_Features && more file
HTTP/1.0 200 OK
Date: Fri, 28 Aug 2009 17:45:43 GMT
Server: Apache
X-Powered-By: PHP/5.2.4-2ubuntu5.6
Content-Language: en
ETag: W/"wiki:pcache:idhash:1156-0!1!0!!en!2!edit=0--20090828060905"
Vary: Accept-Encoding,Cookie
X-Vary-Options:
Accept-Encoding;list-contains=gzip,Cookie;string-contains=wikiTo
ken;string-contains=wikiLoggedOut;string-contains=wiki_session
Cache-Control: s-maxage=18000, must-revalidate, max-age=0
Last-Modified: Fri, 28 Aug 2009 10:49:00 GMT
Content-Encoding: gzip
Content-Length: 74290
Content-Type: text/html; charset=UTF-8
Age: 875
X-Cache: HIT from ross.wwood.co.uk
X-Cache-Lookup: HIT from ross.wwood.co.uk:3128
Via: 1.0 ross.wwood.co.uk:3128 (squid/2.6.STABLE18)
Connection: keep-alive
^_<8B>^H^@^@^@^@^@^@^C<EC><FD>[<8F>$?&<86>=<CF><FE>^U~r<AB><F7><AE>?<91>^^y<BF>t
e<8C><B2><AA><AB><BA>jwUw< ....
wiki.openstreetmap.org sends gzipped content anyway - and wget does not
care about the response header "Content-Encoding: gzip"
3) Even further, my wget does not care about gzip at all:
wget -O file --header="Accept-Encoding: gzip"
http://de.wikipedia.org/wiki/Test && more file
^_<8B>^H^@^@^@^@^@^@^C<C5>][o#Gv~<B6>~E^M7<B3>^Z9"<9B><92>Fs<91>D.4<92>?<F5><C8>
<D6>Z^ZO<E2>^YCh<B2><8B>d^O<9B><DD>?<88><92>^L^C^N<90>?^P<E4>%
^V<B9>a^P^Dy<C8>
<E3>^L^DX<FD>^A<EF><93><DF><F7><97> <E4>;<A7><AA><AB><AB>4D<CD>:<D9>]
@<D3>?:Uu
<EA><DC><CF><F1>?<FD><CF><F7>N<FF><F2><F8>@^L<D2>Q
<8E>_={y<B8>'ju<C7>y<BD><B1>
<E7>8<FB><A7><FB><E2>/^<9C>^^<BD>^Tk<8D><A6>8<8D><DD>0<F1>S?
<DD><C0>q^N><AB><89><DA>
M<C7>[<8E>3<99>L^Z<93><8D>F^T<F7><9D><D3>/<9C>^K^Zk<8D>
><U+058F><F5><D4><FA><B2><E1><A5> ....
Anyone who can solve this puzzle, so that we can download with simple
commands ?-)
Andi
Marc Schütz schrieb:
> -------- Original-Nachricht --------
>
>> Datum: Fri, 28 Aug 2009 11:59:15 +0200
>> Von: Roland Olbricht <roland.olbricht at gmx.de>
>> An: dev at openstreetmap.org
>> Betreff: Re: [OSM-dev] Problem retrieving wiki pages
>>
>
>
>>> Of course, but the server (or more likely the proxy) is still
>>>
>> mis-behaving:
>>
>>> if the client does not send an 'Accept-Encoding' header, the server must
>>> return plain text, not gzipped or deflated text.
>>>
>> No. Have a look at the HTTP 1.1 specification
>> http://tools.ietf.org/html/rfc2616#section-14.3
>>
>> "If no Accept-Encoding field is present in a request, the server MAY
>> assume that the client will accept any content coding."
>>
>>
>
> If you read on, it also gives a recommendation for this case:
> "If no Accept-Encoding field is present in a request, the server MAY
> assume that the client will accept any content coding. In this case,
> if "identity" is one of the available content-codings, then the
> server SHOULD use the "identity" content-coding, unless it has
> additional information that a different content-coding is meaningful
> to the client."
>
> So you are right; strictly speaking, the behaviour of the proxy is not in violation of the RFC. However, it is recommended to use the identity encoding whenever possible.
>
>
>> If you think of a highly frequented server like the wiki, it's a good
>> decision
>> to compress the data whenever possible.
>>
>
> I agree, but in this case the client gave no indication that it actually understands compressed replies.
>
More information about the dev
mailing list