MIDS Capstone Project Spring 2024

Singable

Our mission is to empower creators to enhance their content with unique lyrics and music, regardless of their ability.

Signable is an application to effortlessly generate catchy, signable lyrtics from melodies.  Leveraging cutting-edge AI technology, Singable transforms melodies into memorable lyrics with ease.  Our user-friendly interface enables even non-musicians to craft personalized lyrics tailored to their melodies, complemented by a short description, chosen genre, and desired topic. Our mission is to empower content creators of all backrounds to encrich their content with distinctive, harmonious lyrics and music, fostering creativity without limitations.

The Problem

According to Goldman Sachs, the rapidly growing content creator economy is now a global industry valued at $250 billion. Creators are the engine of this economy with over 50 million part and full time creators participating in the industry globally. 

Being a successful creator demands not only providing unique content but also generating consistent creative output to keep audiences engaged.

Music creation is a core use case for creators. And while some of these creators have musical talents to aid this endeavor, many do not. Our goal is to enable creators to generate music for their content, whether they are musically inclined or not, to help them produce a continuous stream of content that keeps their audience engagement high.

With our product, we aim to alleviate one of the most difficult components in songwriting: creating comprehensible, rhythmically-sound lyrics using melodic inputs. This differentiates this lyric generation task from poetry generation, for example, and increases the complexity, requiring a large amount of computational power. This also seems to be the largest gap in the current market, as many solutions create melodies, but not many generate lyrics that are truly singable.

Impact and market opportunity

The market volume for AI music solutions overall is expected to be 10X by 2028.  To understand the impact of this significant growth, GEMA/SECEM estimates that AI generated music will reach 28% of 2022 global music copyright revenue in only a few years.  

Our Approach

Generating music using AI is a hot topic, and one that still hasn't been properly solved due to the complexities of lyric/melody alignment. Additionally, it's difficult to ensure that user's inputs are represented as custom lyrics. Our solution handles these issues by giving users the ability to describe their desired song using a unique set of features, including title, genre, topic, and description.

We then transform these inputs into a lyrical plan using a fine-tuned BART model, which outputs keywords for each line of the song. This approach enables a more cohesive lyrical plan while still connecting user inputs. Finally, we use these keywords alongside syllable stress and alignment encodings to generate our final output, displaying in the form of sheet music and synthesized audio. To accomplish this, we had to come up with approaches for the following technical challenges:

Processing Melodies: Balancing Noise & Signal

MIDI Files are packed with information, denoting lyric timings and note onsets, durations, pitches, and stresses for up to 17 instruments. Additionally, there isn’t a standard vocal instrument for MIDI files, which is necessary to enable a model to learn the cross-correlations between lyrics and notes. To account for this, and to simplify our training data, we developed a novel method to isolate the vocal instrument by evaluating each instrument’s note onsets for alignment with lyric timings.

Connecting User Inputs: Enabling Bespoke Lyrics

To ensure that user’s inputs are reflected in the generated lyrics, we first had to extract features from our dataset that would enable dynamic inputs. Some features like genre were readily available in other datasets, while others like summary and topic required additional modeling to incorporate. These features are then fed into the Lyric-to-Plan model, which enables us to shrink the input window for our final Lyric Generator while maintaining coherence. 

Aligning Lyrics to Melodies: Enabling Singable Lyrics

We experimented with a few approaches to improve the rhythmic alignment of our generated lyrics. We took two approaches: the first anchoring on the previously-mentioned vocal melody timings, while the second anchored on syllabic timing encodings and rhythm-constrained beam search using syllable stresses. After receiving qualitative feedback, we determined that this second method was far more effective.

Acknowledgements

Our team is grateful to our Capstone Professors Uri Schonfeld and Zona Kostic who supported us with invaluable feedback and the right nudges at the right points of our project.  We also took advantage of office hours and are grateful for the TAs who made themselves available for questions.  

Finally we would also like to thank our classmates who provided feedback and encouragement as we tackled some of the more challenging parts of this effort.

 

singable_logo_white_bg.png
 AI and Music, A report commissioned by GEMA and SACEM
AI and Music, A report commissioned by GEMA and SACEM

Video

Singable Demo

Singable Demo

If you require video captions for accessibility and this video does not have captions, click here to request video captioning.

Last updated:

April 18, 2024