Databending :: Data Format Considerations

Data Format Considerations

Data come in many formats, but one important consideration in the following Experiments is the distinction between a "compressed" data format and a "raw" data format.

In a raw format data file each block of data is discreet, such as a pixel of an image or an audio "sample" in an audio file. Modifying these blocks in-place (without changing the overall file size) will result in a modified but structurally valid file. This is to say that the file will likely still represent a valid file in the original format even though the internal data have been modified.

In a compressed data file an algorithmic manipulation has been performed in order to reduce the size of the file while preseving a substantial portion of the original data. Some of the original raw data will be lost, but we choose compression algorithms which are apprpriate for our use-case. For example, encoding a studio recording as a very low bitrate MP3 may sound unpleasant, but the same low bitrate may be appropriate for a political speech where audio fidelity is less important. We would not use a JPG image compression algorithm for a song or speech ..... Or would we?

Here are some examples of Raw versus Compressed file types for certain applications:

Application	Raw	Compressed
Image / Picture	BMP, XBM, XPM	JPG, GIF, PNG
Audio	WAV, CDDA	MP3, OGG, AAC
Text	TXT, HTML	RTF, ODF, DOC

Raw files are sometimes referred to as "Lossless" meaning that there is no decrease in quality or fidelity because no data compression has been performed. Conversely, compressed files are sometimes referred to as "Lossy" since the intention is to reduce overall file size at the expense of a decrease in quality.