maurograziani.org
Music Art Technology & other stories

banner

Posted on 2015 by MG

The Ghost in the mp3

Ryan Maguire, a Ph.D. student in Composition and Computer Technologies at the Center for Computer Music at the University of Virginia, has done a thorough analysis of what the MP3 compression algorithm eliminates. This is all part of a project called The Ghost In The Mp3, whose goal is, in reality, to extract compositional material from what can be defined as MP3 waste (the linked page is the main page, but see here for a detailed discussion and examples).

We've already discussed the effects of MP3 compression here, noting that, at compression levels greater than 192 kbps (i.e., 128 kbps and below), the loss of high frequencies is noticeable even in rock songs, i.e., not particularly refined ones. On the MP3 compression algorithm, see MP3 Compression.

Maguire's analysis, however, is more in-depth than mine and highlights losses that could be significant across the entire bandwidth. His analysis is conceptually simple. In practice, he compared the spectrograms of a song before and after compression, working, obviously, not on the spectrogram images, but on the numerical data obtained from the FFT analyses used to create the images.

Here, for example, are three spectrograms from Suzanne Vega's song "Tom's Diner," which is for solo voice and is often used as a test of compression algorithms. The first two are before and after 128 kbps compression and show no visible differences. The third is the differential spectrogram obtained by comparing the binary data and shows that some differences exist (click images to enlarge).

TomsDiner_v1 TomsDiner_v1_128jcMP3 TomsDiner_v1_128Ghost

At first glance, this result doesn't impress me: I've already shown in several posts that there's a noticeable difference at 128 kbps (see "You might also be interested in" at the end of the post), and MP3 is lossy compression, so something must be missing.

What emerges from this comparison, however, is that the loss isn't limited to the high frequencies, but extends across the entire frequency range. In fact, it's most noticeable in the mid-low range and is quite noticeable in some places. Now we need to understand what's actually being removed, that is, how significant those blobs seen in the third image are.

Below, you can listen to three audio examples from SoundCloud: original, compressed, and differential. Turning up the volume a bit, you'll notice that the differential clearly includes some of the vocals. Considering that the bit rate is 128, it's not a discovery, but it's an interesting result because it's the product of a precise numerical comparison, not a "rough" one.

Now the discussion can be framed in many different ways.

From a philosophical point of view, so to speak, it's clear that any reproduction should be prohibited and that music should only exist live. Considering that the frequencies present in instrumental spectra go well beyond 20,000 Hertz (see the post "There is life beyond 20,000 Hertz!") and that some argue that, even if we don't hear them, these components have some effect on us (which, IMHO, remains to be proven), music recorded with current standards and reproduced with current systems is very different from its live performance.

Starting, however, from more, let's say, utilitarian positions, the question is how much value the reduction in file size that MP3 ensures has compared to what is lost, and here the evaluation depends greatly on each of us's listening habits. Personally, I keep the music I care about least in MP3 at 320kbps, so with limited compression, and the music I care about most in FLAC (lossless compression), but I also have a nice sound system and generally don't listen with headphones or similar.

However, although I hardly ever buy CDs anymore, only from online stores, I haven't bought music sold to me in MP3 format for a while now. As a customer, I always demand an uncompressed or lossless recording.


Back