Audio and Apache HTTPD
ApacheCon 2001
Santa Clara, US

April 6th, 2001

Sander van Zoest <sander@vanZoest.com>
Covalent Technologies, Inc. 
<http://www.covalent.net/>

Latest version can be found at:
 <http://www.vanZoest.com/sander/apachecon/2001/>

Introduction:

About this paper:

Contents:

 1. Why serve Audio on the Net?

 This is almost like asking, why are you reading this? it might be
 because of the excitement caused by the new media that has recently
 crazed upon the internet. People are looking to bring their lifes onto
 the net, one of the things that brings that closer to a reality is the
 ability to hear live broadcasts of the worlds news, favorite sport; 
 hear music and to teleconference with others. Sometimes it is simply
 to enhance the mood to a web site or to provide audio feedback of
 actions performed by the visitor of the web site.

 2. What makes delivering audio so different?

 The biggest reason to what makes audio different then traditional
 web media such as graphics, text and HTML is the fact that timing
 is very important. This caused by the significant increase in size
 of the media and the different quality levels that exist.

 There really are two kinds of goals behind audio streams.
 In one case there is a need for immediate response the moment
 playback is requested and this can sacrifice quality. While
 in the other case quality and a non-interrupted stream are much
 more important.

 This sort of timing is not really required of any other media,
 with the exception of video. In the case of HTML and images the
 files sizes are usually a lot smaller which causes the objects
 to load much quicker and usually are not very useful without
 having the entire file. In audio the middle of a stream can have
 useful information and still set a particular mood.
 
 3. Different ways of delivery Audio on the Net.
 Embedding audio in your Web Page

 This used to be a lot more common in the past. Just like embedding
 an image in a web page, it is possible to add a sound clip or score
 to the web page.

 The linked in audio files are usually short and of low quality to 
 avoid a long delay for downloading the rest of the web page and the
 audio format needs to be supported by the browser natively or with
 a browser plug-in to avoid annoying the visitor.

 This can be accomplished using the HTML 4.0 [HTML4] object element which
 works similar to how to specify an applet with the object element.
 In the past this could also be accomplished using the embed and bgsound
 browser specific additions to HTML.

 example:
   <object type="audio/x-midi" data="../media/sound.mid" width="200" height="26">
     <param name="src" value="../media/sound.mid">
     <param name="autostart" value="true">
     <param name="controls" value="ControlPanel">
   </object> 

 Each param element is specific to each browser. Please check with each
 browser for specific information in regards to what param elements are
 available.

 In this method of delivering audio the audio file is served up via the
 web server. When using an Apache HTTPD server make sure that the appropriate
 mime type is configured for the audio file and that the audio file is
 named and referenced by the appropriate extension.

 Although the current HTML 4.01 [HTML4] says to use the object element
 many browsers out on the market today still look for the embed element.
 Below find a little snipbit that will work work in many browsers.

  <object type="audio/x-midi" data="../media/sound.mid" width="200" height="26">
    <param name="src" value="../media/sound.mid">
    <param name="autostart" value="true">
    <param name="controls" value="ControlPanel">

    <embed type="audio/x-midi" src="../media/sound.mid"
     width="200" height="26" autoplay="true" controls="ControlPanel">
    <noembed>Your browser does not support embedded WAV files.</noembed>
  </object> 

 With the increasing installation base of the Flash browser plug-in by
 Macromedia most developers that are looking to provide this kind of
 functionality to a web page are creating flash elements that have their
 own way of adding audio that is discussed in Flash specific documents.

 Downloading via HTTP

 Using this method the visitor to the website will have to download the
 entire audio file and save it to the hard drive before it can be
 listened to. (1) This is very popular with people that want to listen 
 to high quality streams of audio and have a below ISDN connection to
 the internet. In some cases where the demand for a stream is high or
 the internet is congested downloading the content even for high bandwidth
 users can be affective and useful.

 One of the advantages of downloading audio to the local computer hard
 drive is that it can be played back (once downloaded) any time as long
 as the audio file is accessable from the computer.

 There are a lot of sites on the internet that provide this functionality
 for music and other audio files. It is also one of the easiest ways to
 delivery high quality audio to visitors.
 
 (1) Microsoft Windows Media Player in conjunction with the Microsoft
     Internet Explorer Browser will automaticly start playing the 
     audio stream after a sufficient amount of the file has been 
     downloaded. This can be accomplished because of the tight 
     integration of the Browser and Media Player. With most audio players
     you can listen to a file being downloaded, but you will have to
     envoke the action manually.
 
 . On-Demand streaming via HTTP

 The real difference between downloading and on-demand streaming is
 that in on-demand streaming the audio starts playing before the entire
 audio file has been downloaded. This is accomplished by a hand of off
 the browser to the audio player via an intermediate file format that
 has been configured by the browser to be handled by the audio player.

 Look in a further section entitled "Linking to Audio via Apache HTTPD"
 below for more information about the different intermediate file formats.

 This type of streaming is very popular among the open source crowd and
 is the most widely implemented using the MP3 file format. Apache,
 Shoutcast [SHOUTCAST] and Icecast [ICECAST] are the most common 
 software components used to provide on-demand streaming via HTTP. Both
 Icecast and Shoutcast are not fully HTTP compliant, but Icecast is 
 becoming closer. For more information about the Shoutcast and Icecast 
 differences see the section below.

 Sites like Live365.com and MP3.com are huge sites that rely on this
 method of delivery of audio.

 . On-Demand Streaming via RTSP/RTP

 RTSP/RTP is a new set of streaming protocols that is getting more 
 backing and becoming more popular by the second. The specification
 was developed by the Internet Engineering Task Force Working Groups
 AVT [IETFAVT] and MMUSIC [IETFMMUSIC]. RTP the Realtime Transfer
 Protocol has been around longer then RTSP and originally came out
 of the work towards a better teleconferencing, mbone, type system.
 RTSP is the Real-Time Streaming Protocol that is used as a control
 protocol and acts similarily to HTTP except that it maintains state
 and is bi-directional.

 Currently the latest Real Networks Streaming Servers support RTSP
 and RTP and Real Networks own proprietary transfer protocol RDT.
 Apple's Darwin Streaming server is also RTSP/RTP compliant.

 The RTSP/RTP protocol suite is very powerful and flexable in regards
 to your streaming needs. It has the ability to suport "server-push"
 style stream redirects and has the ability to throttle streams to
 ensure the stream can sustain the limited bandwidth over the network.

  For On-Demand streams the RTP protocol would usually stream over
 TCP and have a second TCP connection open for RTSP. Because of the
 rich features provided by the protocol suite, it is not very well
 suited to allow people to download the stream and therefore the
 download via HTTP method might still be preferred by some.

 . Live Broadcast Streaming via RTSP/RTP

 In the case of a live broadcast streaming RTSP/RTP shines. RTP allowing
 for UDP datagrams to be transmitted to clients allows for fast immediate
 delivery of content with the sacrifice of reliability. The RTP stream
 can be send over IP Multicast to minimize bandwidth on the network.

 Many Content Delivery Networks (CDNs) are starting to provide support for
 RTSP/RTP proxies that should provide a better quality streaming environment
 on the internet. 

 Much work is also being done in the RTP space to provide transfers over 
 telecommunication networks such as cellular phones. Although not directly
 related, per se, it does provide a positive feeling knowing that all the
 audio related transfer groups seem to be working towards a common standard
 such as RTP.

 . On-Demand or Live Broadcast streaming via MMS.

 This is the Microsoft Windows Media Technologies Streaming protocol. It
 is only supported by Microsoft Windows Media Player and currently only
 works on Microsoft Windows.

 5. Configuring Mime Types

 One of the most hardest things in serving audio has been the wide variety
 of audio codecs and mime types available. The battle of mime types on the
 audio player side of things isn't over, but it seems to be a little more
 controlled.

 On the server side of things provide the appropriate mime type for the 
 particular audio streams and/or files that are being served to the audio 
 players. Although some clients and operating systems handle files fully 
 based on the file extension. The mime type [RFC2045] is more specific 
 and more defined.

 The registered mime types are maintained by IANA [IANA]. On their site
 they have a list of all the registered mime types and their name space.

 If you are planning on using a mime type that isn't registered by IANA
 then signal this in the name space by adding a "x-" before the subtype. 
 Because this was not done very often in the audio space, there was a 
 lot of confusion to what the real mime type should be.

 For example the MPEG 1.0 Layer 3 Audio (MP3) [ORAMP3BOOK] mime type 
 was not specified for the longest time. Because of this the mime type 
 was audio/x-mpeg. Although none of the audio players understood 
 audio/x-mpeg, but understood audio/mpeg it was not a technically 
 correct mime type. Later audio players recognized this and started 
 using the audio/x-mpeg mime type. Which in the end caused a lot 
 of hassles with clients needing to be configured differently depending
 on the website and client that was used. Last november we thanked
 Martin Nilsson of the ID3 tagging project for registering audo/mpeg
 with IANA. [RFC3003]

 Correct configuration of Mime Types is very important. Apache HTTPD
 ships with a fairly up to date copy of the mime.types file, so most
 of the default ones (including audio/mpeg) are there.

 But in case you run into some that are not defined use the mod_mime 
 directives such as AddType to fix this.

 Examples:
	AddType audio/x-mpegurl .m3u
        AddType audio/x-scpls   .pls
        AddType application/x-ogg .ogg


 6. Common Audio File Formats

 There are many audio formats and metadata formats that exist. Many of
 them do not have registered mime types and are hardly documented.
 This section is an attempt at providing the most accurate mime type
 information for each format with a rough description of what the files
 are used for.

 . Real Audio
 
   Real Networks Proprietary audio format and meta formats. This is one
   of the more common streaming audio formats today. It comes in several
   sub flavors such as Real 5.0, Real G2 and Real 8.0 etc. The file size
   varies depending on the bitrates and what combination of bitrates are
   contained within the single file.
   The following mime types are used
      audio/x-pn-realaudio .ra, .ram, .rm
      audio/x-pn-realaudio-plugin .rpm
      application/x-pn-realmedia

 . MPEG 1.0 Layer 3 Audio (MP3)
   
   This is currently one of the most popular downloaded audio formats
   that was originally developed by the Motion Pictures Experts Group
   and has patents by the Fraunhofer IIS Institute and Thompson 
   Multimedia. [ORAMP3BOOK] The file is a lossy compression that at
   a bitrate of 128kbps reduces the file size to roughly a MB/minute.
   The mime type is audio/mpeg with the extension of .mp3 [RFC3003]

 . Windows Media Audio
   
   Originally known as MS Audio was developed by Microsoft as the MP3
   killer. Still relatively a new format but heavily marketed by
   Microsoft and becoming more popular by the minute. It is a successor
   to the Microsoft Audio Streaming Format (ASF).

 . WAV

   Windows Audio Format is a pretty semi-complicated encapsulating 
   format that in the most common case is PCM with a WAV header up front.
   It has the mime type audio/x-wav with the extension .wav.

 . Vorbis

   Ogg Vorbis [VORBIS] is still a relatively new format brought to 
   life by CD Paranoia author Christopher Montgomery; known to the 
   world as Monty. It is an open source audio format free of patents 
   and gotchas. It is a codec/file format that is roughly as good as
   the MP3 format, if not much better. The mime type for Ogg Vorbis is
   application/x-ogg with the extension of .ogg.

 . MIDI

   The MIDI standard and file format [MIDISPEC] have been used by 
   Musicians for a long time. It is a great format to add music to
   a website without the long download times and needing special players
   or plug-ins. The mime type is audio/x-midi and the extension is .mid

 . Shockwave Flash (ADPCM/MP3) [FLASH4AUDIO]

   Macromedia Flash [FLASH4AUDIO] uses its own internal audio format
   that is often used on Flash websites. It is based on Adaptive 
   Differential Pulse Code Modulation (ADPCM) and the MP3 file format.
   Because it is usually used from within Flash it usually isn't served
   up seperatedly but it's extension is .swf

 There are many many many more audio codecs and file formats that exist.
 I have listed a few that won't be discussed but should be kept in mind.
 Formats such as PCM/Raw Audio (audio/basic), MOD, MIDI (audio/x-midi),
 QDesign (used by Quicktime), Beatnik, Sun's AU, Apple/SGI's AIFF, AAC
 by the MPEG Group, Liquid Audio and AT&T's a2b (AAC derivatives),
 Dolby AC-3, Yamaha's TwinVQ (originally by Nippon Telephone and Telegraph)
 and MPEG-4 audio.

 7. Linking to Audio via Apache HTTPD

 There are many different ways to link to audio from the Apache HTTPD
 web server. It seems as if every codec has their own metafile format.
 The metafile format is provided to allow the browser to hand off the
 job of requesting the audio file to the audio player, because it is
 more familiar with the file format and how to handle streaming or how
 to actually connect to the audio server then the web browser is.

 This section will discuss the more common methods to provide streaming
 links to provide that gateway from the web to the audio world.

 Probably the one that is the most recognized file is the RAM file.

 . RAM

 Real Audio Metafile. It is a pretty straight forward way that Real
 Networks allowed their Real Player to take more control over their
 proprietary audio streams. The file format is simply a URL on each
 line that will be streamed in order by the client. The mime type
 is the same as other RealAudio files audio/x-pn-realaudio where
 the pn stands for Progressive Networks the old name of the company.

 . M3U
 
 This next one is the MPEG Layer 3 URL Metafile that has been around
 for a very long time as a playlist format for MP3 players. It supported
 URLs pretty early on by some players and got the mime type 
 audio/x-mpegurl and is now used by Icecast and many destination sites
 such as MP3.com. The format is exactly the same as that of the RAM
 file, just a list of urls that are separated by line feeds.

 . PLS
 
 This is the playlist files used by Nullsoft's Winamp MP3 Player. Later
 on it got more widely used by Nullsoft's Shoutcast and has the mime
 type of audio/x-scpls with the extension .pls. Before shoutcast the
 mimetype was simply audio/x-pls. As you can see in the example below
 it looks very much like a standard windows INI file format.

 Example:
	[playlist]
	numberofentries=2
	File1=<uri>
	Title1=<title>
	Length1=<length or -1>
	File2=<uri>
	Title2=<title>
	Length2=<length or -1>

 . SDP 
 
 This is the Session Description Protocol [RFC2327] which is heavily
 used within RTSP and is a standard way of describing how to subscribe
 to a particular RTP stream. The mime type is application/sdp with the
 extension .sdp . 

 Sometimes you might see RTSL (Real-Time Streaming Language) floating 
 around. This was an old Real Networks format that has been succeeded 
 by SDP. It's mimetype was application/x-rtsl with the extension of .rtsl
 
 . ASX
 
 Is a Windows Media Metafile format [MSASX] that is based on early XML
 standards. It can be found with many extensions such as .wvx, .wax 
 and .asx. I am not aware of a mime type for this format.
 
 . SMIL

 Is the Synchronized Multimedia Integration Language [SMIL20] that
 is now a W3C Recommendation [W3SYMM]. It was originally developed
 by Real Networks to provide an HTML-like language to their Real Player
 that was more focused on multimedia. The mime type is application/smil
 with the extensions of either .smil or .smi

 . MHEG

 Is a hypertext language developed by the ISO group. [MHEG1] [MHEG5] 
 and [MHEG5COR]. It has been adopted by the Digital Audio Visual
 Council [DAVIC]. It is more used for teleconferencing, broadcasting
 and television, but close enough related that it receives a mention
 here. The mime type is application/x-mheg with the extension of
 .mheg

 8. Configuring Apache HTTPD specificly to serve large Audio Files

 Some of the most common things that you will need to adjust to be
 able to serve many large audio files via the Apache HTTPD Server.
 Because of the difference in size between HTML files and Audio files,
 the MaxClients will need to be adjusted appropriatedly depending on
 the amount of time listeners end up tieing up a process. If you are
 serving high quality MP3 files at 128kbps for example you should
 expect more then 5 minute download times for most people.

 This will significantly impact your webserver since this means that
 that process is occupied for the entire time. Because of this you
 will also want to in crease the TimeOut Directive to a higher 
 number. This is to ensure that connections do not get disconnected
 half way through a transfer and having that person hit "reload"
 and connect again.

 Because of the amount of time the downloads tie up the processes
 of the server, the smallest footprint of the server in memory would
 be recommended because that would mean you could run more processes
 on the machine.

 After that normal performance tweaks such as max file descriptor
 changes and longer tcp listen queues apply.

 9. Icecast/Shoutcast Protocol.

 Both protocols are very tightly based on HTTP/1.0. The main difference
 is a group of new headers such as the icy headers by Shoutcast and the
 new x-audiocast headers provided by Icecast.

 A typical shoutcast request from the client.

 GET / HTTP/1.0

 ICY 200 OK
 icy-notice1:<BR>This stream requires <a href="http://www.winamp.com/">
             Winamp</a><BR>
 icy-notice2:SHOUTcast Distributed Network Audio Server/posix v1.0b<BR>
 icy-name: Great Songs
 icy-genre: Jazz
 icy-url: http://shout.serv.dom/
 icy-pub: 1
 icy-br: 24
  
 <data><songtitle><data>

 The icy headers display the song title and other formation including if
 this stream is public and what the bitrate is.

 A typical icecast request from the client.

 GET / HTTP/1.0
 Host: icecast.serv.dom
 x-audiocast-udpport: 6000
 Icy-MetaData: 0
 Accept: */*

 HTTP/1.0 200 OK
 Server: Icecast/VERSION
 Content-Type: audio/mpeg
 x-audiocast-name: Great Songs
 x-audiocast-genre: Jazz
 x-audiocast-url: http://icecast.serv.dom/
 x-audiocast-streamid:
 x-audiocast-public: 0
 x-audiocast-bitrate: 24
 x-audiocast-description: served by Icecast 

 <data>

 NOTE: I am mixing the headers of the controlling client with those form
       a listening client. This might be better explained at a latter
       date.
 
 The CPAN Perl Package Apache::MP3 by Lincoln Stein implements a little of
 each which works because MP3 players tend to support both.

 One of the big differences in implementations between the listening clients 
 is that Icecast uses an out of band UDP channel to update metadata
 while the Shoutcast server gets it meta data from the client embedded within
 the MP3 stream. The general meta data for the stream is set up via the
 icy and x-audiocast HTTP headers.

 Although the MP3 standard documents were written for interrupted communication
 it is not very specific on that. So although it doesn't state that there is
 anything wrong with embedding garbage between MPEG frames the players that
 do not understand it might make a noisy bleep and chirps because of it.

References and Further Reading:

[DAVIC]
       Digital Audio Visual Council 
       <http://www.davic.org/>

[FLASH4AUDIO]
       L. J. Lotus, "Flash 4: Audio Options", ZD, Inc. 2000.
       <http://www.zdnet.com/devhead/stories/articles/0,4413,2580376,00.html>

[HTML4]
       D. Ragget, A. Le Hors, I. Jacobs, "HTML 4.01 Specification", W3C
       Recommendation, December, 1999.
       <http://www.w3.org/TR/html401/>

[IANA]
        Internet Assigned Numbers Authority.
        <http:/www.iana.org/>

[ICECAST]
        Icecast Open Source Streaming Audio System.
        <http://www.icecast.org/>

[IETFAVT]
        Audio/Video Transport WG, Internet Engineering Task Force.
        <http://www.ietf.org/html.charters/avt-charter.html>

[IETFMMUSIC]
       Multiparty Multimedia Session Control WG, Internet Engineering Task
       Force. <http://www.ietf.org/html.charters/mmusic-charter.html>

[IETFSIP]
       Session Initiation Protocol WG, Internet Engineering Task Force.
       <http://www.ietf.org/html.charters/sip-charter.html>

[IPMULTICAST]
       Transmit information to a group of recipients via a single transmission
       by the source, in contrast to unicast.
       IP Multicast Initiative
       <http://www.ipmulticast.com/>

[MIDISPEC]
       The International MIDI Association,"MIDI File Format Spec 1.1",
       <http://www.vanZoest.com/sander/apachecon/2001/midispec.html>

[MHEG1]
       ISO/IEC, "Information Technology - Coding of Multimedia and Hypermedia
       Information - Part 1: MHEG Object Representation, Base Notation (ASN.1)"; 
       Draft International Standard ISO 13522-1;1997;
       <http://www.ansi.org/>
       <http://www.iso.ch/cate/d22153.html>

[MHEG5]
       ISO/IEC, "Information Technology - Coding of Multimedia and Hypermedia
       Information  - Part 5: Support for Base-Level Interactive Applications";
       Draft International Standard ISO 13522-5:1997;
       <http://www.ansi.org/>
       <http://www.iso.ch/cate/d26876.html>

[MHEG5COR]
       Information Technology - Coding of Multimedia and Hypermedia Information
       - Part 5: Support for base-level interactive applications -
       - Technical Corrigendum 1; ISO/IEC 13552-5:1997/Cor.1:1999(E)
       <http://www.ansi.org/>
       <http://www.iso.ch/cate/d31582.html>

[MSASX]
        Microsoft Corp. "All About Windows Media Metafiles". October 2000.
        <http://msdn.microsoft.com/workshop/imedia/windowsmedia/
         crcontent/asx.asp>

[ORAMP3]
	S. Hacker; MP3: The Definitive Guide; O'Reilly and Associates, Inc.
        March, 2000.
        <http://www.oreilly.com/catalog/mp3/>
[RFC2045]
        N. Freed and N. Borenstein, "Multipurpose Internet Mail 
        Extensions (MIME) Part One: Format of Internet Message Bodies",
        RFC 2045, November 1996. <http://www.ietf.org/rfc/2045.txt>

[RFC2327]
	M. Handley and V. Jacobson, "SDP: Session Description Protocol",
        RFC 2327, April 1998. <http://www.ietf.org/rfc/rfc2327.txt>

[RFC3003] 
        M. Nilsson, "The audio/mpeg Media Type", RFC 3003, November 2000.
        <http://www.ietf.org/rfc/rfc3003.txt>

[SHOUTCAST]
	Nullsoft Shoutcast MP3 Streaming Technology.
        <http://www.shoutcast.com/>

[SMIL20]
        L. Rutledge, J. van Ossenbruggen, L. Hardman, D. Bulterman,
        "Anticipating SMIL 2.0: The Developing Cooperative Infrastructure 
        for Multimedia on the Web"; 8th International WWW Conference, 
        Proc. May, 1999.
        <http://www8.org/w8-papers/3c-hypermedia-video/anticipating/
         anticipating.html>

[W39CIR]  
        V. Krishnan and S. G. Chang, "Customized Internet Radio"; 9th
        International WWW Conference Proc. May 2000.
        <http://www9.org/w9cdrom/353/353.html>

[VORBIS]
	Ogg Vorbis - Open Source Audio Codec
        <http://www.xiph.org/ogg/vorbis/>

[W3SYMM] 
        W3C Synchronized Multimedia Activity (SYMM Working Group);
        <http://www.w3.org/AudioVideo/>