ASU Learning Sparks
How Generative AI Impacts Creative Industries
Can computers and robots be creative? Generative AI says 'yes'. In a similar way to you or I writing a poem or painting a canvas, Creative AI is starting to revolutionize creative industries by identifying and learning patterns in images, text and audio to generate all sorts of creative outputs. Combining human creativity and creative generative AI means a future full of possibility as well as uncertainty.
Artificial Intelligence and Machine Learning is a wide-branch of computer science focused on programming machines to perform tasks associated with human intelligence – and one of the main ways we define human intelligence is through creativity.
Creativity is often linked with the ability to be expressive artistically, but it can also be defined more broadly - the idea that we as humans can conceive of new ideas and build things that improve our world.
Creative humans have the ability to recombine existing ideas because they have practiced this way of thinking and are working with a lot of discipline-specific knowledge - for example, a composer who has studied music extensively. No one writes music without first learning to play other existing music.
From this perspective, computational AI systems can perform creativity quite well.
Although the idea of AI has been around for decades, it’s enjoying a period of renewed enthusiasm, due to recent innovations in the ability to collect, store, and analyze mass amounts of data - in the form of images, speech, purchasing habits, social media activity and more. Machine learning finds patterns by analyzing this data and allows AI systems to accomplish specific amazing things, like driving a car, predicting cancer from radiology scans, transcribing or even translating speech in real time.
Sophisticated, robust algorithms learn these patterns and can also generate new combinations of these patterns. It feels like magic, but it’s not - it’s code. It’s neural networks inside of neural networks - hence the term “deep” learning.
You may have recently heard the term deep fake - a term combining “deep learning” and “fake media”. This refers to videos created by AI that superimpose the face and voice of one person onto another. This makes it seem like the person is saying or doing things they’re actually not. Because this can be disconcerting, this technique has gotten a lot of attention in the news so you may have even seen one - like “Tom Cruise” on tiktok stumbling around, or “Simon Cowell” singing an audition on America’s Got Talent.
AI can also create still images from text prompts that are just recently incredibly impressive, even photo-realistic. Midjourney is a publicly accessible example you can try yourself and Google has shared fascinating results from their unreleased model, Imagen. These text-to-image creation systems work by finding patterns in large, labeled datasets of image and text. And then by generating new pixels, comparing them to patterns in the dataset, and modifying them to get closer and closer to the image patterns in the dataset.
A similar process can be done with audio. My first encounter with AI-generated audio was in GarageBand in the mid 2010s - they released an AI drummer that uses rhythms and structure from your other tracks to create a drumming pattern that goes with you, like a good real human drummer would do. Now there are some even more impressive tools: AIVA lets you choose parameters of the kind of song you want – like the instrumentation, style, and length – and generates a track for you. OpenAI’s Jukebox does something similar, but can even generate lyrics.
Some of this might have you feeling worried. What does this mean for composers, artists, actors in the future? What about the artists whose work is used in the training data sets - should they receive royalties? Will deep fakes help proliferate the spread of misinformation and thus degrade trust in the news media? And will training sets built from real-world data cause the exacerbation of systemic biases?
Short answer is that yes, it’s reasonable to be worried. With any new innovation there are always unanswered questions, and there are no easy answers. It will take a lot of effort and long discussions between technologists, policy makers, and citizens to settle on the best practices and ethics for how we adopt this technology into society.