The Troubled Programmer

Being overly specific can lead to confusion. How SoundCloud’s API baffled me with a peculiar HTTP status code.

CACHING IS HARD, especially if you don’t have control over the data sources. Manger, a Node.js program I wrote to cache RSS feeds, encounters all kinds of things in the wild—RSS over HTTP is a zoo—but recently it failed in a particular, mildly amusing, way.

While running an integration test, a client requested Lena Dunham’s Women Of The Hour podcast and got nothing back, so it failed.

Hitting its cached URL manually, I got a 301:

curl -v http://feeds.soundcloud.com/users/soundcloud:users:180603351/sounds.rss

*   Trying 93.184.220.127...
* Connected to feeds.soundcloud.com (93.184.220.127) port 80 (#0)
> GET /users/soundcloud:users:180603351/sounds.rss HTTP/1.1
> Host: feeds.soundcloud.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Mon, 31 Oct 2016 19:57:51 GMT
< Location: https://rss.art19.com/women-of-the-hour
< Server: am/2
< Content-Length: 0
<
* Connection #0 to host feeds.soundcloud.com left intact

Why would my program trip over a plain redirect? My first suspicion was the protocol change, HTTP redirecting to HTTPS. Clearly, that shouldn’t be a problem either, but it was the only somewhat interesting bit about this.

Debugging my program, I saw:

MANGER 23881: { headers: { 'accept-encoding': 'gzip' },
  hostname: 'feeds.soundcloud.com',
  method: 'GET',
  path: '/users/soundcloud:users:180603351/sounds.rss',
  port: 80,
  protocol: 'http:' }
MANGER 23881: { Error: quaint HTTP status: 422 from feeds.soundcloud.com
    at ClientRequest.onResponse (/Users/michael/workspace/manger/index.js:223:16)
    at ClientRequest.g (events.js:291:16)
    at emitOne (events.js:96:13)
    at ClientRequest.emit (events.js:188:7)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:472:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)
    at Socket.socketOnData (_http_client.js:361:20)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at readableAddChunk (_stream_readable.js:176:18)
  url: 'http://feeds.soundcloud.com/users/soundcloud:users:180603351/sounds.rss' }

422, eh? The 4xx class encompasses client errors, but I hadn’t encountered 422 yet. Apparently, it is neither part of RFC 7231, nor listed under Status code there. Instead, 422 is part of RFC 4918, a WebDAV HTTP extension.

The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions.

Someone at SoundCloud has a soft spot for obscure tech. WebDAV, what’s that again? Web Distributed Authoring and Versioning. Ah, O.K. …

But what is the SoundCloud API trying to tell me? I found these HTTP status codes, including 422 Unprocessable Entity:

The request was valid, but one or more of the parameters looks a little screwy. It’s possible that you sent data in the wrong format. One example would be providing an array when we expected a string.

A little screwy? Come on! What’s screwy about this request?

{ headers: { 'accept-encoding': 'gzip' },
  hostname: 'feeds.soundcloud.com',
  method: 'GET',
  path: '/users/soundcloud:users:180603351/sounds.rss',
  port: 80,
  protocol: 'http:' }

O.K., let’s compare it with curl’s request:

> GET /users/soundcloud:users:180603351/sounds.rss HTTP/1.1
> Host: feeds.soundcloud.com
> User-Agent: curl/7.43.0
> Accept: */*

Comparing the two, I spotted the obvious difference in the headers. My program didn’t sent User-Agent or Accept headers. My intuition led me to try the Acccept header first, because I thought that might be a thing.

{ headers: { 'accept-encoding': 'gzip', accept: '*/*' }

422, still. So, headers aren’t the issue. I had just updated to Node v6.7.0, there might be, although highly unlikely, something off with its HTTP module—hopefully not. I was about to go deep with Wireshark, when I decided to try the User-Agent header as well.

MANGER 23881: { headers:
   { 'accept-encoding': 'gzip',
     accept: '*/*',
     'user-agent': 'screwy' },
  hostname: 'feeds.soundcloud.com',
  method: 'GET',
  path: '/users/soundcloud:users:180603351/sounds.rss',
  port: 80,
  protocol: 'http:' }
MANGER 23881: redirecting to https://rss.art19.com/women-of-the-hour
MANGER 23881: { headers:
   { 'accept-encoding': 'gzip',
     accept: '*/*',
     'user-agent': 'screwy' },
  hostname: 'rss.art19.com',
  method: 'GET',
  path: '/women-of-the-hour',
  port: 443,
  protocol: 'https:' }

🎉 TA-DAH! In fact, the SoundCloud API demands the HTTP user-agent header being set—set to anything, really. I understand what they’re doing here, assumingly meeting a requirement of their logging, but passing 422 to a plain GET is, for want of a better word, a little screwy.