Archive for the ‘ Tools ’ Category

Using Hadoop for Video Streaming

 

Hadoop

 

Internet Memory supplies a service to browse archived Web pages, including multimedia content. We use Hadoop, HDFS and HBase for storing and indexing our data, and associates this storage with a Web server that lets users navigate through the archive and retrieve documents. In the present post, we focus on videos and detail the solution adopted to serve true streaming from HDFS storage.

Basics

Many video formats are found on the Web, including Windows Media (.wmv), RealMedia (.rm), Quicktime (.mov), MPEG, Adobe Flash (.flv), etc. In order to display a video, we need a player, which can be incorporated in the Web browser. The player depends on the specific video format, but most browsers are able to detect the format and choose the appropriate player. Firefox for instance comes with a lot of plugins, which can be quickly integrated in the presence of a specific video to display it content.

There are basically two ways to play a video. The simplest one is a two-steps process: first the whole file is downloaded from the Web server to the user’s computer, and then displayed by the player running the local copy. It has the disadvantage that the download step may take a while is the file is big (hundreds of megabytes are not uncommon). The second one uses (true) streaming: the video file is split into fragments which are sent from the Web server to the player, giving the illusion of a continuous stream. From the user point of view, it looks as if a window is swept over the video content, saving the need of a full initial download of the whole file.

Obviously, streaming is a more involved method because it requires a strong coordination between the components involved in the process, namely the player, the Web server, and the file system from which the video is retrieved. We examine this technical issue in the context of a Hadoop system where files are stored in HDFS, a file system dedicated to large distributed storage.

 

File seeking with HDFS

At explained above, streaming requires a strong coordination between the Web server and the file system. The former produces requests to access chunks of the video file (think to what happens when the user suddenly requires a move to a specific part of the video), whereas the later must be able to seek in the file to position the cursor at a specific location. When using HDFS, enabling such a close cooperation turns out to be a problem because HDFS can in principle only be accessed through a Hadoop client, which the standard Apache server is not. We investigated two possible solutions: Hoop, the Hadoop web server, and Apache/FUSE.

Hoop (see http:///cloudera.github.com/hoop/) is an HTTP-HDFS-Connector. It allows the HDFS file system to be accessed via HTTP. A working local prototype has been developed using JW Player and a large video file. Streaming works, but seeking in an unbuffered part results in the playback stopping. It seems that the Hoop API does not support seeking in a file, so we had to give up this approach.

The second solution is based on HDFS/FUSE. FUSE (File System in User Space) is an API that captures the file system operations and allows to implement them with ad-hoc functions running in the the user’s processus space (thereby saving the need to change the operating system kernel, a tricky and dangerous option). FUSE is provided in Hadoop as a component named “Mountable HDFS” (see http://wiki.apache.org/hadoop/MountableHDFS). It lets the standard file system user or program see the HDFS name space as a locally mounted directory. All file system operations, including directory browsing, file opening and content access, are enabled over HDFS content through the FUSE interface.

Apache server configuration

It remained to configure Apache to access the mounted FUSE system and load content from video files. How this is done depends on the video format. At the moment, we tested and validated .mp4 files and Flash video files. For the first format we use H264 Streaming Module (see http://h264.code-shop.com/trac), an Apache plugin, which enables adaptive streaming. For FLV we used pseudo-stream module for Apache named “mod_flv”. Both behave nicely and go along with the mountable HDFS without problem.

Conclusion

The solution based on Apache + Mountable HDFS (FUSE) turned out to be both reliable, functionally adequate (seeking is well supported) and efficient. The architecture is simple and easy to set up, and allows to combine the benefits of HDFS for very large repositories and standard Web server streaming solutions. Although we chose to adopt Apache plugins in our current service, nothing keeps you from using a more powerful streaming server since the FUSE approach (virtually) moves all the HDFS content in the standard file system scope.

Hoop remains a potential option for the future, but it appeared not mature enough when we tested it, at least for the complex operations (seeking at a specific offset in a file) required by video streaming.

by: Philippe Rigaux, Fri 06 Apr 2012

via Internet Memory Foundation : Synapse : Using Hadoop for Video Streaming.

How To Setup VNC For Ubuntu 12

via How To Setup VNC For Ubuntu 12 | DigitalOcean.

Ubuntu XFCE Vnc

Background

VNC stands for Virtual Network Computing, which allows you to connect to your server remotely, and be able to use your keyboard, mouse, and monitor to interface with that server.

Step 1 – Install VNC server and XFCE 4 desktop.

To get started, we will install a VNC server on Ubuntu 12.10 x64 Server droplet. Login as root and install packages:

apt-get -y install ubuntu-desktop tightvncserver xfce4 xfce4-goodies

 

Step 2 – Add a VNC user and set its password.

adduser vnc
passwd vnc

If you would like to get root as user vnc you would have to add it to sudoers file. Make sure you are logged in as root:

echo "vnc ALL=(ALL) ALL" >> /etc/sudoers

Set user vnc’s VNC Server password:

su - vnc
vncpasswd
exit

This step sets the VNC password for user ‘vnc’. It will be used later when you connect to your VNC server with a VNC client:

Now you can login as user ‘vnc’ and obtain root by running ‘sudo su -‘ and entering your password:

Step 3 – Install VNC As A Service

Login as root and edit /etc/init.d/vncserver and add the following lines:

#!/bin/bash
PATH="$PATH:/usr/bin/"
export USER="vnc"
DISPLAY="1"
DEPTH="16"
GEOMETRY="1024x768"
OPTIONS="-depth ${DEPTH} -geometry ${GEOMETRY} :${DISPLAY}"
. /lib/lsb/init-functions
 
case "$1" in
start)
log_action_begin_msg "Starting vncserver for user '${USER}' on localhost:${DISPLAY}"
su ${USER} -c "/usr/bin/vncserver ${OPTIONS}"
;;
 
stop)
log_action_begin_msg "Stoping vncserver for user '${USER}' on localhost:${DISPLAY}"
su ${USER} -c "/usr/bin/vncserver -kill :${DISPLAY}"
;;
 
restart)
$0 stop
$0 start
;;
esac
exit 0

Edit /home/vnc/.vnc/xstartup and replace with:

#!/bin/sh
xrdb $HOME/.Xresources
xsetroot -solid grey
startxfce4 &

Update file permissions and allow any user to start X Server:

chown -R vnc. /home/vnc/.vnc && chmod +x /home/vnc/.vnc/xstartup
sed -i 's/allowed_users.*/allowed_users=anybody/g' /etc/X11/Xwrapper.config

Make /etc/init.d/vncserver executable and start VNC server:

chmod +x /etc/init.d/vncserver && service vncserver start

Add your VNC server to automatically start on reboot:

update-rc.d vncserver defaults

 

Step 4 – Connect to your VNC server with TightVNC / Chicken of the VNC

yourserver:5901

s3ql – a full-featured file system for online data storage

I am really impressed by my first look at s3ql. A really complete, just working and well documented tool to mount aws s3 and other cloud storage solution on a dedicated server:

S3QL is a file system that stores all its data online using storage services like Google Storage, Amazon S3 or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X.

S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.

S3QL is designed to favor simplicity and elegance over performance and feature-creep. Care has been taken to make the source code as readable and serviceable as possible. Solid error detection and error handling have been included from the very first line, and S3QL comes with extensive automated test cases for all its components.

Features

  • Transparency. Conceptually, S3QL is indistinguishable from a local file system. For example, it supports hardlinks, symlinks, ACLs and standard unix permissions, extended attributes and file sizes up to 2 TB.
  • Dynamic Size. The size of an S3QL file system grows and shrinks dynamically as required.
  • Compression. Before storage, all data may compressed with the LZMA, bzip2 or deflate (gzip) algorithm.
  • Encryption. After compression (but before upload), all data can AES encrypted with a 256 bit key. An additional SHA256 HMAC checksum is used to protect the data against manipulation.
  • Data De-duplication. If several files have identical contents, the redundant data will be stored only once. This works across all files stored in the file system, and also if only some parts of the files are identical while other parts differ.
  • Immutable Trees. Directory trees can be made immutable, so that their contents can no longer be changed in any way whatsoever. This can be used to ensure that backups can not be modified after they have been made.
  • Copy-on-Write/Snapshotting. S3QL can replicate entire directory trees without using any additional storage space. Only if one of the copies is modified, the part of the data that has been modified will take up additional storage space. This can be used to create intelligent snapshots that preserve the state of a directory at different points in time using a minimum amount of space.
  • High Performance independent of network latency. All operations that do not write or read file contents (like creating directories or moving, renaming, and changing permissions of files and directories) are very fast because they are carried out without any network transactions. S3QL achieves this by saving the entire file and directory structure in a database. This database is locally cached and the remote copy updated asynchronously.
  • Support for low bandwidth connections. S3QL splits file contents into smaller blocks and caches blocks locally. This minimizes both the number of network transactions required for reading and writing data, and the amount of data that has to be transferred when only parts of a file are read or written.

 

http://code.google.com/p/s3ql/