http-protocol-faq

FAQs

Here are list of faqs that are asked for web beginner, like what Vary is and difference between Content-Encoding and Transfer-Encoding, what conditional request is etc.

How response Vary header is used

In some case, server may return different responses for the same uri based on some info from request headers, the based on headers are put in Vary header, one typical example is web server returns different web pages for mobile and desktop, but the uri is same, that means response depends on User-Agent request header, so put User-Agent in Vary response header, when proxy(nginx always cache response) receives the response, it (nginx) stores the response with key (uri, User-Agent), next time when a new request comes in with same uri but different user-agent, proxy should request a fresh one from origin server, not use cached data, as the User-Agent(secondary key when searching cache) is different.

response depends on User-Agent and Cookie

  • Vary: User-Agent, Cookie

In one word, it tells downstream proxies how to match future request headers to decide whether the cached response can be used rather than requesting a fresh one from the origin server.

what q=0.6 mean in header value

As for some headers, like Accept-Language, Accept it could contain multiple values, in that case which value should be used by server, q(relative quality factor) is used for this, it indicates preference of each value, server should use highest one(q value is larger), when sends response to client, here is an example:

Accept-Language: en-us,en;q=0.5

en-us uses default q=1 which is higher than en, so when server sends response to client, it should use en-us Language if it supports.

Rules

  • without q provided use default q=1
  • high value means much prefer
  • value range [0, 1]
  • format value;q=0.5

how Etag is generated

Etag(response header) is an identifier for a specific version of a resource(mostly for static resource), often a hash value of a resource, it’s generated by web server based on attributes(size, inode, modified-time etc) of the resource.

ETag: "737060cd8c284d8af7ad3082f209582d"

In order to support this, client can request partial range of content, while server must support partial request, there are three headers, two response headers Accept-Ranges, Content-Range, one request header Range.

  • Accept-Ranges indicates if server supports partial request
  • Range indicates which range client wants to get
  • Content-Range indicates which range that server sends it to client.

server declares it supports

Accept-Ranges: bytes

client request a range

Range: bytes=500-999

server send the conent of that range

Content-Range : bytes 500-999/1234

Expires vs Cache-Control(max-age value) header

Cache-Control was introduced in HTTP/1.1 and offers more options than Expires. They can be used to accomplish the same thing.

The data value for Expires is an HTTP date whereas Cache-Control max-age lets you specify a relative amount of time so you could specify ‘X hours after the page was requested’.

If a response includes a Cache-Control field with the max-age directive, a recipient MUST ignore the Expires field.

Cache-Control: max-age=3600

Expires: Tue, 18 Jul 2017 16:07:23 GMT

Always use Cache-Control as it offers more options

Ways to do conditional request

Conditional request means client provides condition to server, server checks the condition if matched, sends the resource, otherwise, only sends header with special status code.

Old way(http1.0)

If-Unmodified-Since and If-Modified-Since, where the client sends a timestamp of the resource.

http 1.1

If-Modified and If-None-Modified, where the client sends an ETag representation of the resource

Difference:

Dates can be ordered, ETags can not.

This means that if some resource was modified a year ago, but never since, and we know it. Then we can correctly answer an If-Unmodified-Since request for arbitrary dates the last year and agree that sure… it has been unmodified since that date.

An Etag is only comparable for identity. Either it is the same or it is not. If you have the same resource as above, and during the year the docroot has been moved to a new disk and filesystem, giving all files new inodes but preserving modification dates. And someone had based the ETags on file’s inode number. Then we can’t say that the old ETag is still okay, without having a log of past-still-okay-ETags.

So I don’t see them as one obsoleting the other. They are for different situations. Either you can easily get a Last-Modified date of all the data in the page you’re about to serve, or you can easily get an ETag for what you will serve.

If you have a dynamic webpage with data from lots of db lookups it might be difficult to tell what the Last-Modified date is without making your database contain lots of modification dates. But you can always make an md5 checksum of the result rendered page.

When supporting these cache protocols I definitely go for only one of them, never both.

TE and Transfer-Encoding header

The TE request header specifies the transfer encodings the user agent is willing to accept. (you could informally call it Accept-Transfer-Encoding, which would be more intuitive).

TE: chunked

The Transfer-Encoding response header specifies the form of encoding used to safely transfer the payload body to the user

1
2
3
4
5
#chunked, only for Http1.1
Transfer-Encoding: chunked

#no encoding at transfor level
Transfer-Encoding: identity

In which case chunked is used

Regards to chunked encoding, there is one important response header Trailer, it allows the sender to include additional fields(header) at the end of chunked messages in order to supply metadata that might be dynamically generated while the message body is sent, such as a message integrity check, digital signature, or post-processing status.

Note: The TE request header needs to be set to "trailers" to allow trailer fields.

Chunked encoding is useful when larger amounts of data are sent to the client and the total size of the response may not be known until the request has been fully processed. For example, when generating a large HTML table resulting from a database query or when transmitting large images.

Data is sent in a series of chunks. The Content-Length header is omitted in this case and at the beginning of each chunk you need to add the length of the current chunk in hexadecimal format, followed by ‘\r\n’ and then the chunk itself, followed by another ‘\r\n’. The terminating chunk is a regular chunk, with the exception that its length is zero. It is followed by the trailer, which consists of a (possibly empty) sequence of entity header fields.

A chunked response looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
Trailer: Expires

7\r\n
Mozilla\r\n
9\r\n
Developer\r\n
7\r\n
Network\r\n
0\r\n
Expires: Wed, 21 Oct 2015 07:28:00 GMT\r\n
\r\n

Accept-Encoding and Content-Encoding

The Content-Encoding entity header is used to compress the media-type. When present, its value indicates which encodings were applied to the entity-body. It lets the client know how to decode in order to obtain the media-type referenced by the Content-Type header.

The recommendation is to compress data as much as possible and therefore to use this field, but some types of resources, such as jpeg images, are already compressed

The Accept-Encoding request HTTP header advertises which content encoding, usually a compression algorithm, the client is able to understand.

1
2
3
4
5
6
Accept-Encoding: gzip
Accept-Encoding: compress
Accept-Encoding: deflate

# As long as the identity value, meaning no encoding in some case like for image format.
Accept-Encoding: identity

Note: browser will decompress payload and show uncompressed web page to user

Content-Encoding vs Transfer-Encoding

Content-Encoding is how content is encoding, like if the web page is 100k, it’s better to encode it with gzip to reduce the payload, when server gets the encode data(or server encodes it by itself), the server may decide to transfer the gzip data(Content-Encoding: gzip) with chunked format(Transfer-Encoding: chunked), that’s what they are, they apply at different levels, Transfer-Encoding is hop by hop, it may change during transferring proxy, while Content-Encoding is end-to-end proxy never touch the payload!

Without Content-Encoding, assume it's uncompressed, without Transfer-Encoding, assume it's not chunked, but must has Content-length if has body

method PUT(whole update) vs POST(new) vs PATCH(part update)

The POST method is used to submit an entity to the specified resource, often causing a change in state or side effects on the server. plan to create new, if you run many times with same uri, many new objects may be created with same value

POST /questions

The PUT method replaces all current representations of the target resource with the request payload, or create new one if not found, plan to replace, if you run many times with same uri, there is only one objects created, as PUT must provide a identity

PUT /questions/{question-id}

The PATCH method is used to apply partial modifications to a resource.

1
2
3
4
5
6
7
8
9
10
method                          PATCH     POST         PUT
Request has body Yes Yes Yes
Successful response has body Yes Yes NO
Safe No NO NO
Idempotent No NO Yes
Cacheable No NO NO
Allowed in HTML forms No Yes NO

Safe: no side effect
Idempotent: same result if ran many times

A POST request is typically sent via an HTML form and results in a change on the server, in this case, it only supports three content types

  • application/x-www-form-urlencoded: the keys and values are encoded in key-value tuples separated by '&', with a '=' between the key and the value. Non-alphanumeric characters in both keys and values are percent encoded: this is the reason why this type is not suitable to use with binary data (use multipart/form-data instead)
1
2
3
4
5
6
POST /test HTTP/1.1
Host: foo.example
Content-Type: application/x-www-form-urlencoded
Content-Length: 27

field1=value1&field2=value2
  • multipart/form-data: each value is sent as a block of data (“body part”), with a user agent-defined delimiter (“boundary”) separating each part. The keys are given in the Content-Disposition header of each part.

  • text/plain

When the POST request is sent via a method other than an HTML form — like via an XMLHttpRequest(like in script) — the body can take any type

GET VS HEAD

The GET method requests a representation of the specified resource. Requests using GET should only retrieve data.

The HEAD method asks for a response identical to that of a GET request, but without the response body.

HEAD same as GET but without body returned

OPTIONS method

The OPTIONS method is used to describe the communication options for the target resource.
The client can specify a URL for the OPTIONS method, or an asterisk (*) to refer to the entire server.

Identifying allowed request methods

1
2
3
4
5
6
7
8
9
$ curl -X OPTIONS http://example.org -i

HTTP/1.1 200 OK
Date: Mon, 16 Dec 2019 02:57:20 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Server: Apache/2.4.7 (Ubuntu)
Allow: GET,HEAD,POST,OPTIONS

how to create a forever http connection

To open a connection that never dies until close it by explicitly

  • Websocket
  • send http request with transfer-encoding: Chunked, but never set terminating chunk 0\r\n

In the above two ways, after get respone from sever, server will not close it, the close happends only when client closes it or client sends \0\r\n to server