Third-Party Metadata Woes

This week, we continued to investigate metadata issues and have put together additional information that may help to pinpoint the cause of garbled metadata being displayed on the site and in-app. For more details, read on.

In order to make sense of our findings, we’ll need to explain a bit about how the current metadata display system works.

At the core of displaying music metadata such as title, artist, album, etc., we use an internal system we developed simply called “Musicbox.” Musicbox allows us to validate and ingest music into a library with metadata from files stored in a database. This metadata is later used by the station through a third-party developed radio automation software called Azuracast. Azuracast has an API through which metadata from song files appears and can be validated against in order to identify what song is currently playing, From this, the system can match up a song ID from the database and display it via our own API which displays on our website, web app, and directly through the mp3 stream’s metadata for display in third-party music apps.

Azuracast uses a few different types of IDs to help us identify songs, and we use one of them as a way to quickly identify a song. We talked about this in more detail during last week’s post, but since then, we’ve identified the likely reason for why an ID from Azuracast is not recognized: a character encoding mismatch. Sometimes, garbled text will appear in the API from which a hash is generated, and based on findings from this past week, it seems likely that it’s due to an issue with translating characters from one encoding type to another. This is also the likely reason why, as we mentioned last week, some songs appear to be missing from Azuracast’s own list of songs; there are two different hashes describing the same song between Azuracast’s API and database.

Since the API’s hash, title, and artist are unreliable, Azuracast provides a secondary system called custom fields which allows us to define our own fields and push them through the API. This would be a great workaround, but while the API can display the existence of custom fields and we can confirm their values through Azuracast’s internal database, the API never populates these fields.

With all that said, we’ve uncovered two bugs in Azuracast’s system: one related to character encoding, and one related to the custom fields portion of the Now Playing API. This is a little unusual for us, but our next steps will likely involve working to resolve these issues in Azuracast either by filing a couple bug reports or by digging into the project itself. There’re always the options of moving away from Azuracast or creating our own in-house system to replace it, but both come with their own sets of challenges. For now, we’ll see what we can do to fix this issue not just for our station, but other stations which use this software as well.

That’s all for this week! We’ve got our next Live show coming up this weekend, so make sure to hit the “interested” button for the event in Gensokyo Radio’s Discord server, and thanks for listening!

[Knowledge #186]