The id3 tags are stored in the ID3 metadata which is usually in front of the mp3 frames (containing the audio), but the mp3 standard allows them also to "follow the mp3 frames".
To download the minimum number of bytes you need to:
- download the first 10 bytes of the mp3, extract the ID3v2 header and compute the size of the id3v2 header
- to retrieve the full id3v2 tags download
size
bytes of the mp3
- use a python library to extract the ID3 tags
Here's a script (python 2 or 3) which extracts album art with a minimal amount of download size:
try:
import urllib2 as request # python 2
except ImportError:
from urllib import request # python 3
from functools import reduce
import sys
from io import BytesIO
from mutagen.mp3 import MP3
url = sys.argv[1]
def get_n_bytes(url, size):
req = request.Request(url)
req.headers['Range'] = 'bytes=%s-%s' % (0, size-1)
response = request.urlopen(req)
return response.read()
data = get_n_bytes(url, 10)
if data[0:3] != 'ID3':
raise Exception('ID3 not in front of mp3 file')
size_encoded = bytearray(data[-4:])
size = reduce(lambda a,b: a*128+b, size_encoded, 0)
header = BytesIO()
# mutagen needs one full frame in order to function. Add max frame size
data = get_n_bytes(url, size+2881)
header.write(data)
header.seek(0)
f = MP3(header)
if f.tags and 'APIC:' in f.tags.keys():
artwork = f.tags['APIC:'].data
with open('image.jpg', 'wb') as img:
img.write(artwork)
A few remarks:
- it checks that the ID3 is in front of the file and that it's ID3v2
- the size of the id3 tags is stored in byte 6 to 9, as documented on id3.org
- unfortunately mutagen needs one full mp3 audio frame to parse the id3 tags. You therefore need to also download one mp3 frame (which is at max 2881 bytes long according to this comment)
- instead of blindly assuming that the album art is jpg you should check for the image format first as id3 allows many different image types
- tested with about 10 random mp3s from the internet, e.g. this one :
python url.py http://www.fuelfriendsblog.com/listenup/01%20America.mp3