Tuesday, March 23, 2010

Simplify

You know that nice feeling you get when you clean up a room, or your whole home?  It just looks so... tidy.  More spacious.  More comfortable.  You can find everything again.  You can relax and enjoy yourself.  You lose that nagging feeling that told you to clean it up in the first place.

Personally, I've had a nagging voice telling me to clean up my MP3 collection for, oh, maybe about six years now.  Like most people, I've gradually accumulated these files over a long period of time.  They were ripped on multiple computers, using different software, different bitrates, different storage.  In the early days, my rippers didn't attach ID3 tags; this wasn't a problem while I navigated music by folders, but was a huge pain once I started using iTunes.  Even the naming convention was messed up.  For years, I stubbornly insisted on using my preferred format of "Artist Name-- Album Name-- Song Name.mp3".  Eventually, I gave in and switched to the predominant format of "Artist Name - Song Name.mp3".  But, once I started ripping things with iTunes, I had to accept its default format of "Track# Song Name.mp3", with the artist and album given in the folder hierarchy.

It was a mess.  Over the years I developed a couple of Perl scripts to help manage some basic things (standardize capitalization, remove parentheses, etc.), but the best I could do was apply a bit of furniture polish when the floor was littered in old newspapers.

This Saturday, I finally cleaned it all up.  It took a grand total of about 2 hours, of which about 15 minutes was spent in front of the keyboard.  15 gigabytes of music are finally in a format I can live with.  Isn't technology grand?

I was lucky enough to get a head start thanks to my corporate overlords.  My company is owned by Gracenote, and they've developed a cool tool in-house called the Gracenote Media Manager.  It uses their MusicID technology to look up a song based on the music.  What's cool about this tool is, you can set it to scan all the music in a particular directory and have it fill out all the ID3 tags with what it finds.

In my case, most of my music is kept on a media PC that I keep attached to my home entertainment system - this is the same box that I use for DVR and video watching.  It runs Ubuntu, but has a Samba share drive that points to the folder with [almost] all my music.  The exception: discs that I have ripped within the past year or two (there aren't very many - I've grown depressingly less adventurous in my musical exploration lately) are on my main PC's hard drive, organized into iTunes folders; of course, iTunes purchases are here as well (as are some Amazon freebies and Bleep buys).

Now, all the modern stuff already was tagged properly, so I ignored that to start.  I ran GMM on the media box's drive.  This took a while, but for the most part it was really smooth.  It keeps several queues that allow you to track what it's doing: some songs that it has already identified, others that still need to be checked, some that need resolution (multiple possible matches), and a rejects bin for things that it didn't recognize.  The last two items for me were, fortunately, rather small.  I'm a wee bit obsessed about R.E.M. and Radiohead, and they couldn't identify some of the more obscure tracks.  (My favorite example: an R.E.M. fan club Christmas L.P.  I had played this on a borrowed record player, done line in to a Creative Audigy audio card, did a raw capture of the WAVE, then transcoded to MP3.  That was one of my finest moments.  Apparently, Gracenote doesn't have that album.  Yes!)

After it was done, I went through the most time-consuming part of manually identifying the tracks that they couldn't detect.  There were... maybe 80 or so, altogether?  In most cases they try to make guesses based on the words in the file name if all else fails, so you can scroll through a list of, say, 20 or so possible matches, and manually choose the one you want or reject the suggestions.  One regret - I really wish they had a manual option for cases where you wanted to override it.  I was shocked at some of the stuff in here, things that I hadn't listened to in over a decade... old RPG game music that I had downloaded from fan sites on the Internet; weird live covers of unpopular sings; one or two pieces of filk.  Honestly, I ended up deleting half of what they couldn't detect.  (Second feature wish: a "Locate This File" button on the resolver screen.)

Once that whole process was done, I saved out all the results.  Now, I still had messed-up filenames and folders, but at least I had the right ID3 tags.

For my next step, I modified iTunes configuration.  I turned on the option to have it organize my music, and also selected the option to copy imported music into the music folder.  Then, I pointed it at my Samba share and hit "Import Folder".  I had both machines plugged in to Ethernet at the time, fortunately.  It didn't take terribly long for everything to get copied across the wire.

I next verified that iTunes had imported it correctly.  I spot-checked a few random songs, and found that everything looked good: nothing was missing, all the file names had been corrected, and they were placed into their proper artist and album folders, based on the ID3 tags from GMM.  I then deleted the 15 gigs of old, badly organized data on the media box.

Finally, I switched the media folder within iTunes so that, instead of pointing at a location on my local drive, it pointed at the Samba share.  This copied all the data back across the wire, keeping its proper internal structure.  So far, so good.  In retrospect, I might have been able to do this all in one step instead of copying forward and back, but this did give me a nice way of consolidating the media files that had previously been stored on two separate computers.

Now, almost everything is perfect.  I only have a few very minor remaining niggles.

First of all, somewhere along the way a few of my albums got duplicated, or in some cases, triplicated: each song in the album will have 2 or 3 copies in their folder (with a "1" or a "2" tacked on to the filename).  I strongly suspect that this is because, in the old days, I used to manually write M3U files to manage my playlists; I think that when iTunes grinds over these, it imports the files separately.  It's a little annoying that iTunes doesn't detect and automatically remove duplicates; I can see why, say, you might want to have a song that appears in both an artist album and on a soundtrack, but not why you'd ever have multiple copies of a song within a given album.  Anyways, this is a very minor annoyance (it only happened to a few albums), and easily corrected (I just delete the superfluous albums when I see them).

A second thing is that I would like to separate my storage.  iTunes now keeps a "media folder" that holds everything iTunes: podcasts, music, movies, television, etc.  In my case, I want to keep music on my media box, but my podcasts and everything else on my PC.  It isn't a huge problem to go over the network, but it's really dumb to do this for podcasts, since they're downloaded onto my PC and copied from there to my iPhone.  To be fair, though, I have dedicated 0 seconds to figuring out how to resolve this.  There may be a clean way to break this out within iTunes, or maybe I can set up a slick nested network share or something for the podcasts folder.

Anyways, that's that!  The whole process ended up being WAY faster and easier than I had expected, and I only regret that I didn't do it a year ago.  Finally I can access all my music, and do it just as easily whether it's within iTunes or through my stereo speakers.  Hooray for good tooling, standard technologies, and virtual filesystems!

No comments:

Post a Comment