Using Hadoop for Video Streaming

 

Hadoop

 

Internet Memory supplies a service to browse archived Web pages, including multimedia content. We use Hadoop, HDFS and HBase for storing and indexing our data, and associates this storage with a Web server that lets users navigate through the archive and retrieve documents. In the present post, we focus on videos and detail the solution adopted to serve true streaming from HDFS storage.

Basics

Many video formats are found on the Web, including Windows Media (.wmv), RealMedia (.rm), Quicktime (.mov), MPEG, Adobe Flash (.flv), etc. In order to display a video, we need a player, which can be incorporated in the Web browser. The player depends on the specific video format, but most browsers are able to detect the format and choose the appropriate player. Firefox for instance comes with a lot of plugins, which can be quickly integrated in the presence of a specific video to display it content.

There are basically two ways to play a video. The simplest one is a two-steps process: first the whole file is downloaded from the Web server to the user’s computer, and then displayed by the player running the local copy. It has the disadvantage that the download step may take a while is the file is big (hundreds of megabytes are not uncommon). The second one uses (true) streaming: the video file is split into fragments which are sent from the Web server to the player, giving the illusion of a continuous stream. From the user point of view, it looks as if a window is swept over the video content, saving the need of a full initial download of the whole file.

Obviously, streaming is a more involved method because it requires a strong coordination between the components involved in the process, namely the player, the Web server, and the file system from which the video is retrieved. We examine this technical issue in the context of a Hadoop system where files are stored in HDFS, a file system dedicated to large distributed storage.

 

File seeking with HDFS

At explained above, streaming requires a strong coordination between the Web server and the file system. The former produces requests to access chunks of the video file (think to what happens when the user suddenly requires a move to a specific part of the video), whereas the later must be able to seek in the file to position the cursor at a specific location. When using HDFS, enabling such a close cooperation turns out to be a problem because HDFS can in principle only be accessed through a Hadoop client, which the standard Apache server is not. We investigated two possible solutions: Hoop, the Hadoop web server, and Apache/FUSE.

Hoop (see http:///cloudera.github.com/hoop/) is an HTTP-HDFS-Connector. It allows the HDFS file system to be accessed via HTTP. A working local prototype has been developed using JW Player and a large video file. Streaming works, but seeking in an unbuffered part results in the playback stopping. It seems that the Hoop API does not support seeking in a file, so we had to give up this approach.

The second solution is based on HDFS/FUSE. FUSE (File System in User Space) is an API that captures the file system operations and allows to implement them with ad-hoc functions running in the the user’s processus space (thereby saving the need to change the operating system kernel, a tricky and dangerous option). FUSE is provided in Hadoop as a component named “Mountable HDFS” (see http://wiki.apache.org/hadoop/MountableHDFS). It lets the standard file system user or program see the HDFS name space as a locally mounted directory. All file system operations, including directory browsing, file opening and content access, are enabled over HDFS content through the FUSE interface.

Apache server configuration

It remained to configure Apache to access the mounted FUSE system and load content from video files. How this is done depends on the video format. At the moment, we tested and validated .mp4 files and Flash video files. For the first format we use H264 Streaming Module (see http://h264.code-shop.com/trac), an Apache plugin, which enables adaptive streaming. For FLV we used pseudo-stream module for Apache named “mod_flv”. Both behave nicely and go along with the mountable HDFS without problem.

Conclusion

The solution based on Apache + Mountable HDFS (FUSE) turned out to be both reliable, functionally adequate (seeking is well supported) and efficient. The architecture is simple and easy to set up, and allows to combine the benefits of HDFS for very large repositories and standard Web server streaming solutions. Although we chose to adopt Apache plugins in our current service, nothing keeps you from using a more powerful streaming server since the FUSE approach (virtually) moves all the HDFS content in the standard file system scope.

Hoop remains a potential option for the future, but it appeared not mature enough when we tested it, at least for the complex operations (seeking at a specific offset in a file) required by video streaming.

by: Philippe Rigaux, Fri 06 Apr 2012

via Internet Memory Foundation : Synapse : Using Hadoop for Video Streaming.

Add Row Numbers to MySQL Results

Sometimes when querying MySQL I like to return a new column containing an ordered numerical sequence for each row in the result set. Here’s an explanation of how to do it.

Screen Shot 2014-06-13 at 08.56.52

I suppose this isn’t really needed in many, if not most, queries, but sometimes it’s just what I want. As an example, I have a users table that contains three columns: id, first_name and last name. The following query will retrieve a result set from this table where the first_name column is equal to John.

SELECT uuid
FROM users
WHERE email LIKE '%j%';

We can easily modify this query to insert an additional column to the front of the result set containing a sequential number for each record. Try this:

SELECT @ROW := @ROW + 1 AS ROW, uuid
FROM users, (SELECT @ROW := 0) r
WHERE email LIKE '%j%';

Voila! And now, how does it work? Let’s start with the FROM clause. We are specifying two sources of data. The first is the users table itself and the second is a subquery that sets a session variable called @ROW and initializes its value to zero. Being a derived table, we must specify an alias, so I used r.

In the SELECT clause, we are incrementing the value of the variable @ROW by one and selecting its value followed by the first_name column from our users table. Since @ROW started life with a value of zero, the first row of the returned result set will be one followed by two, three etc. If we wanted to start at a different value we could’ve set that value in the FROM clause.

And finally, the WHERE clause constrains our results to those whose first name is John. That was pretty easy and quite useful I think.

via Add Row Numbers to MySQL Results | DigitalWindFire.

Webstorm / PHPStorm get basic Ruby syntax highlighting

It is possible to get the basic syntax highlighting for Ruby files in PhpStorm using the TextMate bundles support plug-in. It’s already included with Webstorm and you don’t need to install it, just make sure it’s enabled in Settings | Plugins.

  1. Git clone Ruby.tmbundle into some directory.
  2. Add this directory in Settings | TextMate Bundles:

Ruby bundle

For older versions of Webstorm TextMate Bundles support will not recognize *.rb files as supported by this bundle. To fix this problem open the file ‘Ruby.tmbundle\Syntaxes\Ruby.plist’ in some text editor, find ‘<key>fileTypes</key>’ section, add ‘<string>rb</string>’ under ‘<array>’

Restart Webstorm, verify that *.rb is now associated correctly:

association

Now you get Ruby syntax highlighting in Webstorm:

Ruby syntax

via php – Is it possible to get Ruby syntax highlighting in PHPStorm? – Stack Overflow.