...soul, as in software?

Rantings about life.


Thu, 31 Dec 2009

How to keep a copy of what you watch online.

Adobe Flash (sic) is so pervasive these days, specially for online video distribution.

Sometimes (e.g., TED.com) the site kindly provides you with a link to download the video stream, but more often than not, they don't (e.g., YouTube).

There are some tools around which will download the video feed for you, but only work for certain well-known sites. I'm thinking of course of clive and youtube-dl. These tools work by reverse-engineering the protocol the Flash client use to talk to the "mothership". Sometimes that protocol changes, without notice, and that leaves you out in the cold 'till the tool author cracks it again, and updates the tool in question accordingly.

So, in case you don't have a working tool to download some nice video you just watched, here is a simple procedure to find out the download URL that the Flash client uses internally. Install some logging proxy, such as tinyproxy, and configure your browser to use it. Load the page, watch a bit of the video, and then check the logs. The URL will be there. Feed it to wget, and you are all set!

Note that some sites will put the download URL literally in the web page, as a parameter to the Flash player object, so looking for the string '.flv' on the sources will work as well. RTVE is one of such sites.

This procedure worked for me nicely, and I guess that some variation of it is what the authors of clive and youtube-dl use to crack the protocols. Then I run into citywire, a british financial news site. Their Flash client uses https plus some kind of certificate to connect to their video repository, so the proxy technique won't work. This is because the proxy only gets to see the encrypted connection, so you cannot extract the URL from it. And the certificate precludes the use of faking https proxies like WebScarab Mmmm. This is gonna be tough.

Then I realised (while sleeping, actually) that the design of the client would require securing temporary disk storage to ensure the video reproduction, as the video streams could grow very large, and keeping all that stuff in RAM would be excessive. Where? Well, in /tmp, of course! Yeah, you will find that most Flash video player implementations work that way, securing a file named something like /tmp/Flash7oo3ar, where they download the video stream, and keep it there until you kill the player. Yay!

If you are in Windoze, these clients use exclusive file access to that temporary file, so you cannot copy it. And since the client removes it when it dies (when you close the web page on your browser), not much of a loophole there. But in UNIX® it's a different story. A simple cp or ln to that file will allow you to keep it after the client dies. And what about Mac OS X? There you have per-user /tmp's, but once you locate that directory you will find the aforementioned file there. So we are all good! From now on, if you just watched an online video you want to keep, go to /tmp and it will be sitting there for you to back it up.

I really look forward to the next generation of web browsers providing unified video reproduction capabilities, and hence rendering all this crappy Flash stuff obsolete. That would be the day.

posted at: 11:22 | path: | permanent link to this entry

Valid XHTML 1.1 Valid CSS! powered by blosxom Debian GNU/Linux VPS hosting by RimuHosting