[OSM-dev] Problem retrieving wiki pages - comparison

Andreas Kalsch andreaskalsch at gmx.de
Fri Aug 28 19:06:05 BST 2009


A comparison between de.wikipedia.org and wiki.openstreetmap.org - the 
option --save-headers prepends the answer's headers to "file":


1) wget -O file --save-headers http://de.wikipedia.org/wiki/Test && more 
file

HTTP/1.0 200 OK
Date: Fri, 28 Aug 2009 12:30:51 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: de
Vary: Accept-Encoding,Cookie
Last-Modified: Sat, 22 Aug 2009 06:27:38 GMT
Content-Length: 25984
Content-Type: text/html; charset=utf-8
Age: 19534
Connection: keep-alive

<!DOCTYPE html ...


2) wget -O file --save-headers 
http://wiki.openstreetmap.org/wiki/Map_Features && more file

HTTP/1.0 200 OK
Date: Fri, 28 Aug 2009 17:45:43 GMT
Server: Apache
X-Powered-By: PHP/5.2.4-2ubuntu5.6
Content-Language: en
ETag: W/"wiki:pcache:idhash:1156-0!1!0!!en!2!edit=0--20090828060905"
Vary: Accept-Encoding,Cookie
X-Vary-Options: 
Accept-Encoding;list-contains=gzip,Cookie;string-contains=wikiTo
ken;string-contains=wikiLoggedOut;string-contains=wiki_session
Cache-Control: s-maxage=18000, must-revalidate, max-age=0
Last-Modified: Fri, 28 Aug 2009 10:49:00 GMT
Content-Encoding: gzip
Content-Length: 74290
Content-Type: text/html; charset=UTF-8
Age: 875
X-Cache: HIT from ross.wwood.co.uk
X-Cache-Lookup: HIT from ross.wwood.co.uk:3128
Via: 1.0 ross.wwood.co.uk:3128 (squid/2.6.STABLE18)
Connection: keep-alive

^_<8B>^H^@^@^@^@^@^@^C<EC><FD>[<8F>$?&<86>=<CF><FE>^U~r<AB><F7><AE>?<91>^^y<BF>t
e<8C><B2><AA><AB><BA>jwUw< ....


wiki.openstreetmap.org sends gzipped content anyway - and wget does not 
care about the response header "Content-Encoding: gzip"



3) Even further, my wget does not care about gzip at all:

wget -O file --header="Accept-Encoding: gzip" 
http://de.wikipedia.org/wiki/Test && more file

^_<8B>^H^@^@^@^@^@^@^C<C5>][o#Gv~<B6>~E^M7<B3>^Z9"<9B><92>Fs<91>D.4<92>?<F5><C8>
<D6>Z^ZO<E2>^YCh<B2><8B>d^O<9B><DD>?<88><92>^L^C^N<90>?^P<E4>% 
^V<B9>a^P^Dy<C8>
<E3>^L^DX<FD>^A<EF><93><DF><F7><97> <E4>;<A7><AA><AB><AB>4D<CD>:<D9>] 
@<D3>?:Uu
<EA><DC><CF><F1>?<FD><CF><F7>N<FF><F2><F8>@^L<D2>Q 
<8E>_={y<B8>'ju<C7>y<BD><B1>
<E7>8<FB><A7><FB><E2>/^<9C>^^<BD>^Tk<8D><A6>8<8D><DD>0<F1>S?
<DD><C0>q^N><AB><89><DA> 
M<C7>[<8E>3<99>L^Z<93><8D>F^T<F7><9D><D3>/<9C>^K^Zk<8D>
 ><U+058F><F5><D4><FA><B2><E1><A5> ....


Anyone who can solve this puzzle, so that we can download with simple 
commands ?-)

Andi


Marc Schütz schrieb:
> -------- Original-Nachricht --------
>   
>> Datum: Fri, 28 Aug 2009 11:59:15 +0200
>> Von: Roland Olbricht <roland.olbricht at gmx.de>
>> An: dev at openstreetmap.org
>> Betreff: Re: [OSM-dev] Problem retrieving wiki pages
>>     
>
>   
>>> Of course, but the server (or more likely the proxy) is still
>>>       
>> mis-behaving:
>>     
>>> if the client does not send an 'Accept-Encoding' header, the server must
>>> return plain text, not gzipped or deflated text.
>>>       
>> No. Have a look at the HTTP 1.1 specification
>> http://tools.ietf.org/html/rfc2616#section-14.3
>>
>> "If no Accept-Encoding field is present in a request, the server MAY
>>    assume that the client will accept any content coding."
>>
>>     
>
> If you read on, it also gives a recommendation for this case:
> "If no Accept-Encoding field is present in a request, the server MAY
>    assume that the client will accept any content coding. In this case,
>    if "identity" is one of the available content-codings, then the
>    server SHOULD use the "identity" content-coding, unless it has
>    additional information that a different content-coding is meaningful
>    to the client."
>
> So you are right; strictly speaking, the behaviour of the proxy is not in violation of the RFC. However, it is recommended to use the identity encoding whenever possible.
>
>   
>> If you think of a highly frequented server like the wiki, it's a good
>> decision 
>> to compress the data whenever possible.
>>     
>
> I agree, but in this case the client gave no indication that it actually understands compressed replies.
>   




More information about the dev mailing list