logoalt Hacker News

peatmosstoday at 1:20 PM9 repliesview on HN

I recently bought a tablet for sheet music, mostly to replace a stack of jazz "Real Books" at jam sessions. And the phone camera scans I made are okay, but fixed in size and have a lot of artifacts. And it would be great to transpose on the fly for e.g. Bb or Eb instruments, but being a scan this is obviously not possible.

I got digging into the state of optical music recognition and came away concluding that music is basically a greenfield for AI wherever you look. Optical music recognition is pretty terrible. AI understanding of music theory is terrible (actually looking at music that is; LLMs do okay at text descriptions of theory concepts where you can imagine some online texts making it in).

I think the issue is that we still don't have great digital formats that encode the dots on paper that musicians read. Music notation is pretty rich. Midi doesn't capture all of what's needed for symbolic understanding, because it was mostly made for capturing aspects relevant for playback or performance. MusicXML seems to be the closest for a digital format that encodes the information a musician would want, but there aren't great corpora of training data that would connect a MusicXML representation to sheet music images or to audio. I think that's because MusicXML falls short of encoding enough information to engrave music. Tools like MuseScore need to track a bunch of layout information that isn't encodable in MusicXML. Lilypond format is less verbose that MusicXML and contains a bit more information that is useful to the score creators, but most people don't create sheet music in lilypond. (As an aside, Lilypond bums me out with the state of jazz fonts. I hate looking at "legit" scores in jazz context)

I realize this is mildly off topic, but every time I see people making incremental gains on OCR, which to my mind is pretty good, I am reminded of how abysmal OMR is.


Replies

kwon-youngtoday at 3:03 PM

So, the format for musicologist and researcher in music is the MEI format: https://music-encoding.org/ for which the reference engraver is verovio: https://www.verovio.org/index.xhtml Note that verovio is able to engrave in svg format while keeping a maximum of information from the original mei score, meaning that you can extract enough metadata to create an actual detection dataset for a deep learning model. This is my horrible hacked up script that will create a coco dataset from a set of scores engraved with verovio: https://github.com/kwon-young/music/blob/main/svg2pl.py I have published a synthetic music score dataset from this: https://www.kaggle.com/datasets/kwonyoungchoi/trompa-coco/da... I anyone wants to try and fit a detector on top is welcome :)

To understand why OMR is so neglected is because most people widely underestimate the difficulty of the task. It has a specific blend of the most extreme shapes combined with an extremely complicated graphical grammar...

show 1 reply
indiv0today at 2:30 PM

> music is basically a greenfield for AI wherever you look

AIN'T THAT THE TRUTH.

My girlfriend is studying musicology and she has some physical disabilities that make it difficult for her to write things down sometimes. So I try to help her by writing some AI-powered TTS/OCR/etc. apps here and there. It becomes painfully obvious that music was never considered an important part of any AI training dataset, anywhere.

These days, I'm pleasantly surprised by how well Opus 4.8 understands/explains music theory (as you said). But ask him to transcribe/OCR/OMR some sheet music and he'll confidently give you the MusicXML/Lilypond equivalent of "2 + 2 = horse".

I really hope this ignored area will be swept up with the rest of the rising AI wave, but it's still criminally undervalued.

show 1 reply
elasticdogtoday at 3:48 PM

For just chord analysis, there's "Harte notation", which is meant to be unambiguous representation of the notes (https://ismir2005.ismir.net/proceedings/1080.pdf). That obviously doesn't get you all of the additional information necessary for engraving and full representation of the music, but there are research datasets available using it like https://github.com/smashub/choco. I've also used the https://github.com/MarkGotham/When-in-Rome dataset for some analysis work, but again that's not 100% what you're looking for.

You might like the "iReal Pro" app for the replacement and transposition of jazz standards on your tablet. It's pretty great for that use case versus camera scans.

singpolyma3today at 1:28 PM

What about sheet music typesetting formats like https://abcnotation.com/ ?

show 2 replies
genxytoday at 2:21 PM

Create a benchmark for this problem that researchers can easily run and the problem will solve itself.

WhitneyLandtoday at 1:30 PM

“there aren't great corpora of training data that would connect a MusicXML representation to sheet music images or to audio”

It may not be necessary…a lot of the training pairs/data for this could probably be procedurally created via code.

Would be pretty fun to work on and see it come to life.

show 1 reply
mcbetztoday at 1:32 PM

I observe that music OCR space and the only really good solution so far is soundslice. You scan and review some edge cases and get really good results. Paid service by a small company, very worthy to be supported!

show 1 reply
ramses0today at 3:21 PM

So I made a comment a while back about lilypond: https://news.ycombinator.com/item?id=46148831

A salient extract:

...but why is it so complicated? A novice interpretation of "music" is "a bunch of notes!" ... my amateur interpretation of "music" is "layers of notes".

You can either spam 100 notes in a row, or you effectively end up with:

    melody   = [ a, b, [c+d], e, ... ]
    bassline = [ b, _, b,     _, ... ]
    music = melody + bassline
    score = [
       "a bunch of helper text",
       + melody,
       + bassline,
       + page_size, etc...
    ]
...so Lilypond basically made "Tex4Music", and the format serves a few dual purposes...[snip]
aidenn0today at 3:05 PM

As someone who has never looked at a jazz score, can you share an example of how jazz sheet music would benefit from different fonts?

show 1 reply