Unprocessable Entity
Being overly specific can lead to confusion. How SoundCloud’s API baffled me with a peculiar HTTP status code.
CACHING IS HARD, especially if you don’t have control over the data sources. Manger, a Node.js program I wrote to cache RSS feeds, encounters all kinds of things in the wild—RSS over HTTP is a zoo—but recently it failed in a particular, mildly amusing, way.
While running an integration test, a client requested Lena Dunham’s Women Of The Hour podcast and got nothing back, so it failed.
Hitting its cached URL manually, I got a 301:
curl -v http://feeds.soundcloud.com/users/soundcloud:users:180603351/sounds.rss
* Trying 93.184.220.127...
* Connected to feeds.soundcloud.com (93.184.220.127) port 80 (#0)
> GET /users/soundcloud:users:180603351/sounds.rss HTTP/1.1
> Host: feeds.soundcloud.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Mon, 31 Oct 2016 19:57:51 GMT
< Location: https://rss.art19.com/women-of-the-hour
< Server: am/2
< Content-Length: 0
<
* Connection #0 to host feeds.soundcloud.com left intact
Why would my program trip over a plain redirect? My first suspicion was the protocol change, HTTP redirecting to HTTPS. Clearly, that shouldn’t be a problem either, but it was the only somewhat interesting bit about this.
Debugging my program, I saw:
MANGER 23881: { headers: { 'accept-encoding': 'gzip' },
hostname: 'feeds.soundcloud.com',
method: 'GET',
path: '/users/soundcloud:users:180603351/sounds.rss',
port: 80,
protocol: 'http:' }
MANGER 23881: { Error: quaint HTTP status: 422 from feeds.soundcloud.com
at ClientRequest.onResponse (/Users/michael/workspace/manger/index.js:223:16)
at ClientRequest.g (events.js:291:16)
at emitOne (events.js:96:13)
at ClientRequest.emit (events.js:188:7)
at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:472:21)
at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)
at Socket.socketOnData (_http_client.js:361:20)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:176:18)
url: 'http://feeds.soundcloud.com/users/soundcloud:users:180603351/sounds.rss' }
422, eh? The 4xx class encompasses client errors, but I hadn’t encountered 422 yet. Apparently, it is neither part of RFC 7231, nor listed under Status code there. Instead, 422 is part of RFC 4918, a WebDAV HTTP extension.
The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.
Someone at SoundCloud has a soft spot for obscure tech. WebDAV, what’s that again? Web Distributed Authoring and Versioning. Ah, O.K. …
But what is the SoundCloud API trying to tell me? I found these HTTP status codes, including 422 Unprocessable Entity:
The request was valid, but one or more of the parameters looks a little screwy. It’s possible that you sent data in the wrong format. One example would be providing an array when we expected a string.
A little screwy? Come on! What’s screwy about this request?
{ headers: { 'accept-encoding': 'gzip' },
hostname: 'feeds.soundcloud.com',
method: 'GET',
path: '/users/soundcloud:users:180603351/sounds.rss',
port: 80,
protocol: 'http:' }
O.K., let’s compare it with curl’s request:
> GET /users/soundcloud:users:180603351/sounds.rss HTTP/1.1
> Host: feeds.soundcloud.com
> User-Agent: curl/7.43.0
> Accept: */*
Comparing the two, I spotted the obvious difference in the headers. My program didn’t sent User-Agent or Accept headers. My intuition led me to try the Acccept header first, because I thought that might be a thing.
{ headers: { 'accept-encoding': 'gzip', accept: '*/*' }
422, still. So, headers aren’t the issue. I had just updated to Node v6.7.0, there might be, although highly unlikely, something off with its HTTP module—hopefully not. I was about to go deep with Wireshark, when I decided to try the User-Agent header as well.
MANGER 23881: { headers:
{ 'accept-encoding': 'gzip',
accept: '*/*',
'user-agent': 'screwy' },
hostname: 'feeds.soundcloud.com',
method: 'GET',
path: '/users/soundcloud:users:180603351/sounds.rss',
port: 80,
protocol: 'http:' }
MANGER 23881: redirecting to https://rss.art19.com/women-of-the-hour
MANGER 23881: { headers:
{ 'accept-encoding': 'gzip',
accept: '*/*',
'user-agent': 'screwy' },
hostname: 'rss.art19.com',
method: 'GET',
path: '/women-of-the-hour',
port: 443,
protocol: 'https:' }
🎉 TA-DAH! In fact, the SoundCloud API demands the HTTP user-agent header being set—set to anything, really. I understand what they’re doing here, assumingly meeting a requirement of their logging, but passing 422 to a plain GET is, for want of a better word, a little screwy.