🎧

Programming for Music Production: Building AI-Powered Audio Tools

Balancing professionally mastered music with untreated recordings of humans speaking into a phone is easy. Easy, that is, unless you want to understand what’s being said. I learned this lesson the hard way.

As an ambitious beginner to programming for music production, I bagged a project to balance audio for a startup providing live gym sessions. Despite being a hodgepodge of TypeScript, C++, ffmpeg, VSTs, and AWS CDK code, my solution was effective & I got paid.

Today I write this post to save you that painful introduction and outline the intersection of programming, AI, and music production. I hope to accurately describe the landscape of tools you have at your disposal today as a creative to make the audio tools of tomorrow with AI.

Evolution of Music Production Tools
Brief history of digital audio workstations
Current limitations in traditional music production software
Potential of AI to Revolutionize Audio Tools
Fundamental Programming Concepts for Audio Processing
Overview of key audio programming languages
Python
C++
SuperCollider
Honorable Mentions
Basic Audio Processing Algorithms
Fast Fourier Transform (FFT)
Filtering
Delay
Translating concepts to music production
Introduction to AI in Audio Processing
Machine learning models relevant to audio
Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (RNNs)
GANs
How AI Can Enhance Audio Processing
Examples of AI in commercial audio tools
Advanced AI Applications in Music Production
Automated mixing and mastering
AI-assisted composition and arrangement
Intelligent sample selection and sound design
Ethical Considerations and Future Outlook
Impacts of AI on the music industry and creativity
Concerns about AI replacing human creativity
My vision for the future of AI in music production
Conclusion

Evolution of Music Production Tools

Before we start shipping commits and dropping beats, let’s quickly survey the history of music production technology.

Brief history of digital audio workstations

Prior to digital audio workstations (DAWs), music production was an entirely analog art form. Here’s how things evolved from there:

Timeline depicting evolution of digital music technology

Late 1970s: First digital recording systems emerge
1980s: MIDI standard invented. Software sequencers appear
1990s: Early DAWs appear like Pro Tools, Cubase and Logic
2000s: Explosion of software-based instruments and effects. User interfaces and workflows improve. Laptops can now run DAWs
2010s: Cloud collaboration, mobile DAWs, low-latency performance
2020s: Enhanced VR, ambisonic audio, web-based DAWs, and integration of AI and machine learning in the studio

Although digital audio technology revolutionized the field with faster workflows, cheaper storage, easy collaboration and totally new creative possibilities for artists, it still suffers from limitations.

Current limitations in traditional music production software

At a high level, here are some relevant limitations of current DAWs:

Time-consuming manual tasks (drum programming, audio editing)
Complex interfaces pose steep learning curve for new users
Lack of advanced expression controls in standard MIDI

These limitations keep many legacy artists hooked on expensive analog gear and gatekeep new artists from recording their own music.

Potential of AI to Revolutionize Audio Tools

While the popular narrative surrounding Artificial Intelligence tends to focus on unrealistic sci-fi doom scenarios and preemptive “they took our jobs” style grousing, there is a very real potential for AI tools to impact musicians positively.

Imagine an intelligent assistant who gets you, your music, and knows exactly how to translate your concepts into bangers in the studio. Imagine never having to edit drums again. Imagine a little guy who sits on your shoulder and tells you what song will bring the girl in the red dress to the dance floor tonight.

How do we get there? Follow me, let’s keep going.

Fundamental Programming Concepts for Audio Processing

Let’s discuss a few concepts you’ll need to know to program audio.

Overview of key audio programming languages

During the tool selection phase of any project, two considerations reign supreme:

Ecosystem refers to the quality and availability of example code, libraries, documentation, tools and guides using the chosen tool for your specific use case.
Familiarity refers to the skills and experiences embedded in you and your team. You can often get away with using a strictly inferior tool yet achieve superior outcomes if you know how to use it well.

With these considerations in mind, let’s find the best tools for the job.

Python

Python excels in audio work due to its rich ecosystem of audio and AI libraries, coupled with extensive documentation and community support. Its gentle learning curve and readability make it accessible to many developers, facilitating rapid development and team collaboration. For virtually any AI use case, Python will be the standard choice.

C++

C++ is prized in audio programming for its performance and low-level control, critical for real-time processing. It boasts industry-standard audio libraries and is widely used for developing plugins and audio engines, with a large pool of experienced developers and resources available. For virtually any audio manipulation use case, C++ is standard.

SuperCollider

SuperCollider offers a specialized ecosystem for audio synthesis and algorithmic composition, combining a powerful audio server with a flexible programming language. While less mainstream, it's highly valued in computer music circles for experimental sound design and live coding performances.

Honorable Mentions

I will certainly catch hell online for this, but all the same: You should consider TypeScript. TensorFlow has published JavaScript bindings with types that can be used to train neural networks for deep learning, the technology powering many recent AI innovations.

With frameworks like Howler, Tone, and even Dolby products now shipping in JS, it’s easier than ever to ship cross-platform AI tools for music. If your use case can be handled in TypeScript, there are huge distribution benefits to doing so.

A musician preparing to lay hands on keyboard as he considers The Algorithm

Now, in order to program with any of these tools, we’re going to need a few algorithms.

Basic Audio Processing Algorithms

We won’t spend too long on theory or computer science, but there are a few algorithms you MUST know in order to work effectively in this field.

Fast Fourier Transform (FFT)

FFT transforms a time-domain signal into its frequency-domain representation, revealing the signal's frequency content. Here’s how we do it:

Divide the input signal into segments
Apply window function to each segment
Pad each segment with zeros if necessary
Compute the discrete Fourier transform
Output the frequency domain representation

Filtering

Filtering selectively attenuates or amplifies specific frequency components of a signal to make it quieter or louder in volume. Here’s how:

Design the filter (determine coefficients)
For each input sample: a. Multiply the sample by the filter's feed-forward coefficients b. Add the result to previous output samples multiplied by feedback coefficients
Output the filtered sample

Delay

A delay effect creates a time-shifted copy of the input signal, producing echoes or used as a building block for more complex effects. Here’s how:

Create a buffer to store past samples
For each input sample: a. Read the delayed sample from the buffer b. Write the current sample to the buffer c. Output the delayed sample
Update the buffer read/write positions

That’s all you really need to know. But how do we translate these concepts to our productions?

Translating concepts to music production

If you understand FFT, Filtering, and Delay, then you know the basics of how almost all audio effects plugins are built. That’s because they form the fundamental building blocks of digital signal processing in its essence.

By creatively combining these elements - often with additional modulation and non-linear processing - audio engineers can create complex effects. For instance, a phaser uses all-pass filters and modulation, while a reverb typically employs a network of delays and filters to simulate room reflections.

Understanding this fundamental relationship allows audio programmers to efficiently design and implement a wide array of effects, paving the way for more advanced AI-driven audio processing tools.

An artistic depiction of an effect created with Filters, Delays, and Fast Fourier Transforms.

Introduction to AI in Audio Processing

Now that we’ve laid a strong foundation, let’s learn how to incorporate AI skillfully.

Machine learning models relevant to audio

Let’s cover the three deep learning architectures most relevant to audio.

Convolutional Neural Networks (CNNs)

Specialized for processing grid-like data (e.g., images, spectrograms)
Excel at feature extraction and pattern recognition
Use convolution operations to capture local patterns

Often used in audio for tasks like genre classification or instrument recognition.

Recurrent Neural Networks (RNNs)

Designed for sequential data processing
Can handle variable-length input/output
Maintain internal state to capture temporal dependencies

Useful in audio for tasks like music generation or speech recognition.

An artistic depiction of a Recurrent Neural Netwrk

GANs

Consist of two competing networks: generator and discriminator
Generator creates synthetic data, discriminator evaluates authenticity
Learn to produce highly realistic synthetic data

In audio, used for tasks like voice conversion or music style transfer.

How AI Can Enhance Audio Processing

Here are just a few ways AI can uniquely enhance audio processing workflows:

Personalization. AI can learn user preferences and adapt programming to your individual tastes, specific genres, or audience.
Restoration. AI can separate and clean up audio sources more effectively than traditional tools, improving restoration of old recordings and cleanup of noise and defects.
Stems. AI can split any track into stems quickly, useful in the studio and on stage.
Creative augmentation. AI can create new sounds, melodies, or arrangements, providing tools for musicians and producers to expand their repertoire easily.

Examples of AI in commercial audio tools

Most importantly, you should know AI is used heavily in commercial tools already. Here are just a few, and this is only barely scratching the surface.

Examples of AI-infused commercial audio tools used in the industry today.

Advanced AI Applications in Music Production

Now that we’ve covered the basics, let’s talk about some more advanced applications in the field.

Artistic depiction of a musical composition and arrangement.

Automated mixing and mastering

I’ve been using tools to automate portions of my post-production process including mixing and mastering since 2018. Early tools were heavily manual, inaccurate, and low-quality. By now, I’d say LANDR is better than most mastering engineers, and iZotope’s Ozone and Neutron are nearly the best channel processing tools for mixing and mastering on the market. Next generations of these tools will be even more powerful.

AI-assisted composition and arrangement

Composition and arrangement follow a more precise mathematical algorithm than most aspects of songwriting and music production. Ironically, the complexity involved here means musicians often navigate these duties intuitively or based off vibes. This approach has worked for me especially when my advanced math and music theory were rather weak.

AI tools that can actually grok the advanced mathematical formulae behind what works will support musicians in making more informed choices during the composition and arrangement phases, which will lead to more masterful material and more confident risk-taking by artists.

Intelligent sample selection and sound design

These days, most musicians don’t need to spend that much time designing their own sounds or searching for samples. This is thanks to the provenance of digital services like Splice, where a virtually unlimited number of human crafted and curated samples are available for download at a modest monthly fee.

Modern sample ecosystem dynamics primarily present an issue for the most discerning and visionary creators, who may spend weeks looking for or creating a sound that meets their exact specifications. With prompt-driven open source AI tools like Stable Audio Open, any sound you can describe accurately with text is yours to download. Truly the only limit is your imagination.

Ethical Considerations and Future Outlook

Although I doubt AI will eliminate more creative jobs than it creates, there are ethical concerns and considerations we must take into account as technologists and creators when using AI.

Impacts of AI on the music industry and creativity

Extrapolating from the trends we observed at the top of the article:

Manual, time-consuming work will be reduced or eliminated from audio workflows
More creatives and artists will now have the tools and resources to make art
Art and music can now be personalized to an all-new degree

But all innovation is a trade-off. What of the costs? What of the role of human creativity?

Concerns about AI replacing human creativity

Countless times in human history, there has been public outcry against new technology on the grounds that it will replace humans and invalidate their labour. And yet, not once has this come to fruition. Let’s consider just a few examples of this mass hysteria:

Mechanized looms will eliminate textile workers
Automated Teller Machines will replace bank tellers
Assembly line automation will eliminate manufacturing jobs.
Personal Computers will make many office jobs obsolete.

Hopefully you agree that textile workers, bank tellers, office and manufacturing jobs are not only still around and doing fine, but each profession experienced explosive growth over the long term as a result of continuous innovation. The work has changed, but the jobs remain.

Humans and machines collaborating peacefully in a textile factory

Furthermore, what is creativity? Where does it come from? What does it do? Machine learning models are trained to recognize and learn patterns from data, and generative models use these learned patterns to create new data in the same pattern. They fundamentally can’t replace human creativity. We will always need humans to innovate, disrupt, and break the mould.

If your role as a musician or artist is limited to replicating known patterns over and over again, I’ve got bad news for you. AI won’t automate your job or replace you, because your job is automated now, and you were replaced long ago. The first duty of an artist is to be original.

My vision for the future of AI in music production

I dream of a future where humans and machines collaborate together in peace and harmony. I see a future where our capabilities are extended, not replaced, by machines. Most importantly, I envision bad ass art made by human and machine collaborators that neither would have a hope of accomplishing by themselves.

Conclusion

Today we covered the history of digital music production, techniques and tools for applying artificial intelligence, and walked through a few sample applications. We covered a little computer science, a little math, and a lot of digital signal processing tools that will help you later.

More than anything, I want you to take what you’ve just learned and apply it in your own art, music, and software. There are unlimited possibilities with digital sound, and the only limitation is your imagination. Take your newfound wealth of knowledge and use it to produce abundance.

And please, tell your friends. Share your knowledge and experiences. AI doesn’t need to be scary, it doesn’t need to replace humans or human creativity, and if we play our cards right, it could be the greatest boon to music and musicians that this world has ever seen.

Human and AI musicians collaborate together in harmony.