Balancing professionally mastered music with untreated recordings of humans speaking into a phone is easy. Easy, that is, unless you want to understand what’s being said. I learned this lesson the hard way.
As an ambitious beginner to programming for music production, I bagged a project to balance audio for a startup providing live gym sessions. Despite being a hodgepodge of TypeScript, C++, ffmpeg, VSTs, and AWS CDK code, my solution was effective & I got paid.
Today I write this post to save you that painful introduction and outline the intersection of programming, AI, and music production. I hope to accurately describe the landscape of tools you have at your disposal today as a creative to make the audio tools of tomorrow with AI.
- Evolution of Music Production Tools
- Brief history of digital audio workstations
- Current limitations in traditional music production software
- Potential of AI to Revolutionize Audio Tools
- Fundamental Programming Concepts for Audio Processing
- Overview of key audio programming languages
- Python
- C++
- SuperCollider
- Honorable Mentions
- Basic Audio Processing Algorithms
- Fast Fourier Transform (FFT)
- Filtering
- Delay
- Translating concepts to music production
- Introduction to AI in Audio Processing
- Machine learning models relevant to audio
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- GANs
- How AI Can Enhance Audio Processing
- Examples of AI in commercial audio tools
- Advanced AI Applications in Music Production
- Automated mixing and mastering
- AI-assisted composition and arrangement
- Intelligent sample selection and sound design
- Ethical Considerations and Future Outlook
- Impacts of AI on the music industry and creativity
- Concerns about AI replacing human creativity
- My vision for the future of AI in music production
- Conclusion
Evolution of Music Production Tools
Before we start shipping commits and dropping beats, let’s quickly survey the history of music production technology.
Brief history of digital audio workstations
Prior to digital audio workstations (DAWs), music production was an entirely analog art form. Here’s how things evolved from there:
- Late 1970s: First digital recording systems emerge
- 1980s: MIDI standard invented. Software sequencers appear
- 1990s: Early DAWs appear like Pro Tools, Cubase and Logic
- 2000s: Explosion of software-based instruments and effects. User interfaces and workflows improve. Laptops can now run DAWs
- 2010s: Cloud collaboration, mobile DAWs, low-latency performance
- 2020s: Enhanced VR, ambisonic audio, web-based DAWs, and integration of AI and machine learning in the studio
Although digital audio technology revolutionized the field with faster workflows, cheaper storage, easy collaboration and totally new creative possibilities for artists, it still suffers from limitations.
Current limitations in traditional music production software
At a high level, here are some relevant limitations of current DAWs:
- Time-consuming manual tasks (drum programming, audio editing)
- Complex interfaces pose steep learning curve for new users
- Lack of advanced expression controls in standard MIDI
These limitations keep many legacy artists hooked on expensive analog gear and gatekeep new artists from recording their own music.
Potential of AI to Revolutionize Audio Tools
While the popular narrative surrounding Artificial Intelligence tends to focus on unrealistic sci-fi doom scenarios and preemptive “they took our jobs” style grousing, there is a very real potential for AI tools to impact musicians positively.
Imagine an intelligent assistant who gets you, your music, and knows exactly how to translate your concepts into bangers in the studio. Imagine never having to edit drums again. Imagine a little guy who sits on your shoulder and tells you what song will bring the girl in the red dress to the dance floor tonight.
How do we get there? Follow me, let’s keep going.
Fundamental Programming Concepts for Audio Processing
Let’s discuss a few concepts you’ll need to know to program audio.
Overview of key audio programming languages
During the tool selection phase of any project, two considerations reign supreme:
- Ecosystem refers to the quality and availability of example code, libraries, documentation, tools and guides using the chosen tool for your specific use case.
- Familiarity refers to the skills and experiences embedded in you and your team. You can often get away with using a strictly inferior tool yet achieve superior outcomes if you know how to use it well.
With these considerations in mind, let’s find the best tools for the job.
Python
Python excels in audio work due to its rich ecosystem of audio and AI libraries, coupled with extensive documentation and community support. Its gentle learning curve and readability make it accessible to many developers, facilitating rapid development and team collaboration. For virtually any AI use case, Python will be the standard choice.
C++
C++ is prized in audio programming for its performance and low-level control, critical for real-time processing. It boasts industry-standard audio libraries and is widely used for developing plugins and audio engines, with a large pool of experienced developers and resources available. For virtually any audio manipulation use case, C++ is standard.
SuperCollider
SuperCollider offers a specialized ecosystem for audio synthesis and algorithmic composition, combining a powerful audio server with a flexible programming language. While less mainstream, it's highly valued in computer music circles for experimental sound design and live coding performances.
Honorable Mentions
I will certainly catch hell online for this, but all the same: You should consider TypeScript. TensorFlow has published JavaScript bindings with types that can be used to train neural networks for deep learning, the technology powering many recent AI innovations.
With frameworks like Howler, Tone, and even Dolby products now shipping in JS, it’s easier than ever to ship cross-platform AI tools for music. If your use case can be handled in TypeScript, there are huge distribution benefits to doing so.
Now, in order to program with any of these tools, we’re going to need a few algorithms.
Basic Audio Processing Algorithms
We won’t spend too long on theory or computer science, but there are a few algorithms you MUST know in order to work effectively in this field.
Fast Fourier Transform (FFT)
FFT transforms a time-domain signal into its frequency-domain representation, revealing the signal's frequency content. Here’s how we do it:
- Divide the input signal into segments
- Apply window function to each segment
- Pad each segment with zeros if necessary
- Compute the discrete Fourier transform
- Output the frequency domain representation
Filtering
Filtering selectively attenuates or amplifies specific frequency components of a signal to make it quieter or louder in volume. Here’s how:
- Design the filter (determine coefficients)
- For each input sample: a. Multiply the sample by the filter's feed-forward coefficients b. Add the result to previous output samples multiplied by feedback coefficients
- Output the filtered sample
Delay
A delay effect creates a time-shifted copy of the input signal, producing echoes or used as a building block for more complex effects. Here’s how:
- Create a buffer to store past samples
- For each input sample: a. Read the delayed sample from the buffer b. Write the current sample to the buffer c. Output the delayed sample
- Update the buffer read/write positions
That’s all you really need to know. But how do we translate these concepts to our productions?
Translating concepts to music production
If you understand FFT, Filtering, and Delay, then you know the basics of how almost all audio effects plugins are built. That’s because they form the fundamental building blocks of digital signal processing in its essence.
By creatively combining these elements - often with additional modulation and non-linear processing - audio engineers can create complex effects. For instance, a phaser uses all-pass filters and modulation, while a reverb typically employs a network of delays and filters to simulate room reflections.
Understanding this fundamental relationship allows audio programmers to efficiently design and implement a wide array of effects, paving the way for more advanced AI-driven audio processing tools.
Introduction to AI in Audio Processing
Now that we’ve laid a strong foundation, let’s learn how to incorporate AI skillfully.
Machine learning models relevant to audio
Let’s cover the three deep learning architectures most relevant to audio.
Convolutional Neural Networks (CNNs)
- Specialized for processing grid-like data (e.g., images, spectrograms)
- Excel at feature extraction and pattern recognition
- Use convolution operations to capture local patterns
Often used in audio for tasks like genre classification or instrument recognition.
Recurrent Neural Networks (RNNs)
- Designed for sequential data processing
- Can handle variable-length input/output
- Maintain internal state to capture temporal dependencies
Useful in audio for tasks like music generation or speech recognition.
GANs
- Consist of two competing networks: generator and discriminator
- Generator creates synthetic data, discriminator evaluates authenticity
- Learn to produce highly realistic synthetic data
In audio, used for tasks like voice conversion or music style transfer.
How AI Can Enhance Audio Processing
Here are just a few ways AI can uniquely enhance audio processing workflows:
- Personalization. AI can learn user preferences and adapt programming to your individual tastes, specific genres, or audience.
- Restoration. AI can separate and clean up audio sources more effectively than traditional tools, improving restoration of old recordings and cleanup of noise and defects.
- Stems. AI can split any track into stems quickly, useful in the studio and on stage.
- Creative augmentation. AI can create new sounds, melodies, or arrangements, providing tools for musicians and producers to expand their repertoire easily.
Examples of AI in commercial audio tools
Most importantly, you should know AI is used heavily in commercial tools already. Here are just a few, and this is only barely scratching the surface.
Advanced AI Applications in Music Production
Now that we’ve covered the basics, let’s talk about some more advanced applications in the field.
Automated mixing and mastering
I’ve been using tools to automate portions of my post-production process including mixing and mastering since 2018. Early tools were heavily manual, inaccurate, and low-quality. By now, I’d say LANDR is better than most mastering engineers, and iZotope’s Ozone and Neutron are nearly the best channel processing tools for mixing and mastering on the market. Next generations of these tools will be even more powerful.
AI-assisted composition and arrangement
Composition and arrangement follow a more precise mathematical algorithm than most aspects of songwriting and music production. Ironically, the complexity involved here means musicians often navigate these duties intuitively or based off vibes. This approach has worked for me especially when my advanced math and music theory were rather weak.
AI tools that can actually grok the advanced mathematical formulae behind what works will support musicians in making more informed choices during the composition and arrangement phases, which will lead to more masterful material and more confident risk-taking by artists.
Intelligent sample selection and sound design
These days, most musicians don’t need to spend that much time designing their own sounds or searching for samples. This is thanks to the provenance of digital services like Splice, where a virtually unlimited number of human crafted and curated samples are available for download at a modest monthly fee.
Modern sample ecosystem dynamics primarily present an issue for the most discerning and visionary creators, who may spend weeks looking for or creating a sound that meets their exact specifications. With prompt-driven open source AI tools like Stable Audio Open, any sound you can describe accurately with text is yours to download. Truly the only limit is your imagination.
Ethical Considerations and Future Outlook
Although I doubt AI will eliminate more creative jobs than it creates, there are ethical concerns and considerations we must take into account as technologists and creators when using AI.
Impacts of AI on the music industry and creativity
Extrapolating from the trends we observed at the top of the article:
- Manual, time-consuming work will be reduced or eliminated from audio workflows
- More creatives and artists will now have the tools and resources to make art
- Art and music can now be personalized to an all-new degree
But all innovation is a trade-off. What of the costs? What of the role of human creativity?
Concerns about AI replacing human creativity
Countless times in human history, there has been public outcry against new technology on the grounds that it will replace humans and invalidate their labour. And yet, not once has this come to fruition. Let’s consider just a few examples of this mass hysteria:
- Mechanized looms will eliminate textile workers
- Automated Teller Machines will replace bank tellers
- Assembly line automation will eliminate manufacturing jobs.
- Personal Computers will make many office jobs obsolete.
Hopefully you agree that textile workers, bank tellers, office and manufacturing jobs are not only still around and doing fine, but each profession experienced explosive growth over the long term as a result of continuous innovation. The work has changed, but the jobs remain.
Furthermore, what is creativity? Where does it come from? What does it do? Machine learning models are trained to recognize and learn patterns from data, and generative models use these learned patterns to create new data in the same pattern. They fundamentally can’t replace human creativity. We will always need humans to innovate, disrupt, and break the mould.
If your role as a musician or artist is limited to replicating known patterns over and over again, I’ve got bad news for you. AI won’t automate your job or replace you, because your job is automated now, and you were replaced long ago. The first duty of an artist is to be original.
My vision for the future of AI in music production
I dream of a future where humans and machines collaborate together in peace and harmony. I see a future where our capabilities are extended, not replaced, by machines. Most importantly, I envision bad ass art made by human and machine collaborators that neither would have a hope of accomplishing by themselves.
Conclusion
Today we covered the history of digital music production, techniques and tools for applying artificial intelligence, and walked through a few sample applications. We covered a little computer science, a little math, and a lot of digital signal processing tools that will help you later.
More than anything, I want you to take what you’ve just learned and apply it in your own art, music, and software. There are unlimited possibilities with digital sound, and the only limitation is your imagination. Take your newfound wealth of knowledge and use it to produce abundance.
And please, tell your friends. Share your knowledge and experiences. AI doesn’t need to be scary, it doesn’t need to replace humans or human creativity, and if we play our cards right, it could be the greatest boon to music and musicians that this world has ever seen.