After my failure in Experiment Zero I did not touch this idea for nearly four years. Then one day I mentioned it to another friend who immediately identified our original issue and suggested using Raw file formats for the experiment, such as BMP instead of JPG and WAV instead of MP3.

Why Raw Files?

Some file formats are referred to as "raw" because there is no compression of the data in the file. Such files usually contain a file header followed by a stream of bytes which represent discreet units of data. The important concept to understand for these experiments here is that manipulating these discreet units of data still results in a valid file of the same file type, because each data unit has no bearing on other units. In other words, you cannot "confuse" a decompression algorithm and feed it invalid values if each unit may contain all possible values.

For example: A raw pixmap file of 8x8 pixels in which each pixel may be either off (black) or on (white) could be in a raw format where each bit of data (0 or 1) is a pixel, with "0" meaning off/black and "1" meaning on/white. Since each byte in a computer is eight bits, this file could be only 8 bytes (64 bits) in size. Editing any bit in this file changes the value of a pixel either on/off, but the resulting edited file is still valid because each bit is valid whether 0 or 1.

BMP As Audio

I began Experiment One by taking a photograph of my friends Jon and Chris in my living room, which my camera saved as a JPG. This image had a heavy blue tint due to the lighting in the room at the time: original

This was saved as a BMP using GIMP: original.bmp. I used this format for two reasons:

  1. It is a raw image format in which each unit of data is one pixel.
  2. It is ubiquitous with broad application supprt.

Audacity makes the import process relatively easy. By selecting menu option File - Import - Raw Audio it is possible to open listerally any file on disk as an audio stream using user-defined file details. See Steps for details. I began with these values:

We listened to the photograph together and noticed that it has a one second repetition dueoto the sample rate being equal to the image width.

Audio Effects On An Image

I performed a series of different operations in Audacity with varying results:

Audio effects appliedResult PreviewRaw Files
Original unbent imagePNG BMP
One Second EchoPNG BMP
ReverbPNG BMP
EQ Preset
"AM Radio"
PNG BMP
EQ Preset
"Bass Cut"
PNG BMP
Leveler
(Default settings)
PNG BMP
Change Pitch
(G# to A, or something like that)
PNG BMP
Compressor (More)PNG BMP
Compressor (Less) *PNG BMP
Invert **PNG BMP
Amplification -0.01dBPNG BMP
Amplification -0.1dBPNG BMP
Amplification -1dBPNG BMP

* = Notice the difference between the two applications of the Compressor effect; The first used default (or previous values from some other audio project I worked on years ago), while the second has reduced values (less Attack, less Decay, etc). In the second example using less compression, much of the data is untouched indicating that these parts of the image already met Audacity's criteria for "compressed audio" according to the chosen settings.

** = One of the most interesting yet obvious audio effects applied was Invert. When I viewed the results it looked to me like a colour negative, so I loaded the same original BMP in GIMP, applied its Color - Invert tool, then compared the two output data streams and they are IDENTICAL.

My personal favourites include Amplification and Leveler.

What Happened (To The Colours)?

Notice that in most of the modified images, the colour map or pallette of the image seems heavily modified / destroyed, with little preservation of the original colours. In fact many of the modified images look as though they have been reduced to just a Red and Green component only.

When the JPG file was saved as a BMP in 24bpp True Colour encoding, each pixel is written to disk as a set of three bytes: one for the pixel's Red component, one for Green and the third for Blue (NOTE: May be RGB or may be BGR due to Big Endian encoding of BMP file values). When the file is imported to Audacity as Signed 24-bit PCM the three values per pixel are read together as a single 24-bit number representing one audio sample.

When no modification is performed and the file is Exported as WAV or "Raw", the contents remain unmodified. Similarly, in the Invert operation each bit is inverted, so the colours are perfectly inverted.

However, as soon as Audacity performs modification of samples, especially using surrounding values such as with Echo and Reverb, each data unit (which is one audio sample or 3-colour pixel) is modified with no regard to the discreet R,G,B units. Thus what used to represent separate colour channels have been treated together as one number, and when separated back into colour channels as a BMP, the single 24-bit values from Audacity bear little resemblance to their original 3-byte image representation.

8bpp And Unsigned 8-Bit PCM

In order to simplify my own understanding of the processes we are undertaking in this project, I decided to sacrifice some image quality / audio fidelity and reduce my working files to 8bpp (256-colour) BMP image and Import these into Audacity as Unsigned 8-bit PCM data. This should help clarify what we are doing here:

Suddenly the results were quite different, and we can begin to really "see" what the audio effects are doing to the image data!

Contrast for yourself the differences between Wahwah effect applied to a 24bpp colour image and 8bpp grayscale image. Notice that the mangling of colour values makes the colour image difficult to make out, while the grayscale image really shows the Wahwah.

Proceed to Experiment Two (Image Effects on Audio).