Ruminations on AI, and Dr. Fei-Fei Li’s intimate and brilliant memoir chronicling the story behind the emergence of Generative AI.

A few months ago, during one of our daily conversations, my mom asked me out of the blue: “What is AL?”. It took me a few seconds to grasp her question. I quickly realized that she was referring to AI. I responded: “Amma, AI or Artificial intelligence is something like human intelligence present in machines.” I was not proud of this answer. It was neither technically accurate nor meaningful, but it satisfied my mom. She continued: “Oh, there is a lot of talk about AI in the news, how people are using it to cheat and do a lot of negative things.”. I said, “Hmm, that’s true, Amma, the value of AI isn’t properly understood,” and changed the topic. After the call, I realized how pervasive the term AI has become in the last couple of years. My mother, who was born before the term AI was coined, was now talking about it. It wouldn’t have occurred to her to ask me this question in all these years because the subject of AI would have never come to her attention in the way it is now. Something has shifted in the general awareness of AI.

The term “Artificial Intelligence” was coined for a proposal by John McCarthy (Dartmouth College), Marvin Minsky (Harvard University), Nathaniel Rochester (IBM), and Claude Shannon (Bell Telephone Laboratories) to conduct what, in hindsight, would be looked upon as audacious claim even for this elite group of academics. All that they proposed was to set up a “2-month, 10-man study of artificial intelligence” to work out the details of AI. The workshop, which took place a year later in Dartmouth between July and August 1956, is generally considered the official birthdate of the new field. It is not clear if the use of the term artificial intelligence was a deliberate choice or just a causal title for a study that the group thought they could wrap up in a couple of months. Would a different name have made a difference to how this study came to be perceived in the world outside? Say, “Alternate intelligence” or “computational intelligence”? We will never know. But the name stayed, and since that time, any talk of artificial intelligence has always caused uneasiness whenever it is brought into public discourse. However, the genesis of whether machines can give rise to intelligence goes back even further than the Dartmouth conference. In 1945, Vannever Bush raised the question. In 1950, Alan Turing, the great mathematician, after a decade of decoding German cipher codes during the Second World War, turned his attention to studying whether machines can think. In his brilliant and provocative paper” Can a Machine Think?” he reasoned that the act of thinking is a mechanical ( computational ) process, and the hypothesis of a subjective self is not necessary for thinking. Rene Descartes’s dictum “I think, therefore I am” wasn’t, after all, true.

The Turing test, as he described it in the paper, posits that as long as the responses generated by a machine can convince a human interlocutor ( a third party) that they are human-like, the conditions of thinking are met. In other words, if I cannot distinguish a response generated by a machine from that of a human, then I cannot reasonably claim that thinking is anything more than mechanical computation. We recognize the effects of thinking through outward articulation. Therefore, if a machine’s behavior can imitate humans, we have to consider it intelligent. Turing’s genius was in using this straightforward premise, an imitation game that is commonly played among children, to rattle centuries of accepted wisdom. Even today, the Turing test is the measure of AI. Chatbots driven by generative AI are eerily close to human speech and thinking. Turing’s prediction, or some may call it prophecy, is more prescient than ever before. The last sentence in Turing’s essay ends with, “We can only see a short distance ahead, but we can see plenty that needs to be done.” Turing’s essay threw the challenge open, and a generation of brilliant minds took it up. The explosion in AI that we see all around today is not a technological tsunami, as many seem to consider. The better analogy is that of a volcanic mountain, with gradual building up of heat and momentum heat for years. For decades now, there has been steady progress in how we think about AI, and in the last decade or so, we have hit the right note. To push our analogy further, the AI mountain is currently in the midst of intense volcanic activity. And like all active volcanos, the noise and heat around AI will soon subside and settle down to a more meaningful evaluation of how we shape the future.

From that beautiful fall day in November 2022, when OpenAI released ChatGPT, an AI fever has gripped the world. As one tragic pandemic was ending, another began in a different context. This AI fever, unlike the coronavirus, isn’t going to be fought or subdued or can be made to go away. It is here to stay. Within days of ChatGPT becoming public, YouTube proliferated with videos; new media was full of references to generative AI and transformers, and reputed magazines had full-length articles that spoke of a tipping point in the use of artificial intelligence and heralding a new age. In an unprecedented wave of adoption, this innocuous chatbot that responded to human queries in flawless natural language had become both a mystery tool that was at once exciting and foreboding. ChatGPT seemed to exude intelligence. Unlike Google searches, which mechanically spewed out web links for us to click on, ChatGPT generated responses in clear language. “It” understood our questions, and It responded intelligently. Something seismic had happened in the world of technology. For decades, AI research has stayed within computer labs: studied, debated, and researched by groups of highly talented scientists. Their findings have fueled self-driving cars, robotics, personal recommendations on websites, and many other innovations – all of which, no doubt, are the fruits of AI. Still, with the emergence of Generative AI in the public domain, the matter has become more existential. With GenAI, AI as a field of study has emerged out of its chrysalis, ready to fundamentally disrupt the world. How did this happen? What caused this explosion of breakthroughs in AI? Between the years 2006 and 2020, a confluence of factors expedited the progress of AI. Some of the contributions to this acceleration were serendipitous, others fortuitous, but the net result was that a lot of different strands of study and knowledge came together. The right people found the right connections at the right time. This is the secret to any breakthrough.

In 2012, two young, brilliant research scientists, Ilya Suyskever and Alex Krizhevsky, decided to use a deep convolutional neural network to classify the 1.3 million high-resolution images in the ImageNet training set – the most extensive compendium of images at that time, painstakingly put together over the years by Dr. Feh Feh Li, a cognitive scientist at Stanford, and her graduate student team. The irony here is that CNN wasn’t new at all. Back in the late 1980s, Dr. Geoffey Hinton, considered the father of modern AI, had developed this model and tried hard to convince the academic community that neural networks are the way to go if they were to make inroads into the field of AI. There were many takers of this idea, and more importantly, Dr. Hinton couldn’t convincingly demonstrate that CNNs could work because there weren’t datasets large enough to train the models on. The concept of big data was a couple of decades into the future, and the computational capacity required to train these neural algorithms wouldn’t be available until Amazon invented Cloud services around 2006. The internet itself would only become ubiquitous in the late nineties. So, Dr. Hinton, though fiercely committed to his vision, had to allow his ideas to go into hibernation. There was nothing much he could do. Other inventions, discoveries, and innovations had to come into being before his ideas could fructify. Such impasses were not new to AI. There have been many breaks in its progress, which scientists euphemistically called the “AI winters,” and the period between 1990 and 2010 was one such time. The future of AI was put on hold.

In the first decade of the twenty-first century, three essential elements fell into place: the availability of computational resources, lots of data because of and from the internet, and the growing realization that if AI needed to break the current impasse, machines should be made to learn from data and taught. The model was the human way of learning, primarily through visual apparatus – the human eyes. How does the human brain learn to recognize things when all that it receives through the eyes are just light signals that don’t mean anything? What is that mysterious process that happens from the time a ray of light hits the eyes to processing it internally to produce that vivid and intense imagery we effortlessly expeience? Cognitive science, the study of human brain perception, has mapped out what happens in the brain. At each stage of processing, the brain aggregates the light signals into higher levels of abstraction. The shapes, sizes, and colors triggered by the signals are resolved into categories until the process folds up into a visual field we recognize and experience. The training of visual cognition begins from our early days as children. At each moment, we are taught either explicitly or by inference to pigeonhole our visual sensations into categories or classes. In either case, our visual repertoire is constantly enriched, not through one-time learning but by continuous training. We learn to see and recognize better as we continuously ingest and classify information. By the time we become adults, our brains are intricately wired to give us rich internal visual lives that we share with others in terms of categories we collectively recognize, and also, miraculously, we are able to conjure up a unique inner life. We will never know if machines with AI can ever have an inner subjective life as we do, but if machines have to pass the Turing test – they are expected to respond/behave as humans do. They don’t need an inner life.

A few days ago, I finished reading Dr. Fei-Fei Li’s beautifully written memoir “The Worlds I See – curiosity, exploration, and discovery at the dawn of AI.” This is a book for anyone interested in knowing how the field of AI took off in the last fifteen years, where it is today, and how it is shaping itself for the future. Dr Li’s book is part personal memoir and, in part, the story of AI itself. Born in China, Fei-Fei immigrated to the US when she was fifteen; her life journey is emblematic of what is possible in America. She was educated at a public school in New Jersey, lived in a single-bedroom apartment with her parents, ran a dry-cleaning store with her mom, struggled with English for most of her school days, found her mentor and friend in her Maths teacher, Mr. Robert Sabella, who became her second family, won a full scholarship to Princeton, used the opportunity to get into a field that was close to her heart – engineering and cognitive science – and made a move to Caltech and Stanford for her doctorate studies in search of the right idea to work on, and then finally in 2006, channeled her energy into a project to classify images from the internet into a comprehensive dataset for use in machine learning – an unglamorous work, much criticized. An extraordinary academic and intellectual journey in some of the best academic institutions in the world. And the flair with which Fei-Fei writes about it makes it even more exhilarating.

Fe-Fei understood the value of data when no one else did. She had studied neural networks and realized that for any model to learn along the lines of human cognition, it needed data. That is precisely what Dr. Fei and her team set out to provide. When the team revealed the first version of ImageNet in 2009, the dataset had 15 million images organized under 22,000 distinct categories selected from over a billion images, annotated by 48,000 contributors from over 165 countries. Fei-Fei used Amazon’s crowd-sourcing platform, Amazon Mechanical Turk, to distribute this work. Her vision for ImageNet was to create nothing short of an ontology of the world for machine learning, especially for computer vision – which was Fei-Fei’s area of expertise. For computer vision to recognize an image of someone “rowing,” it has to bring together its recognition and understanding of a boat, water, oars, and a person and then rightly classify the whole image as that of a person rowing. Each piece of this visual landscape, though effortlessly processed by the human eye and the brain, is several magnitudes worth of computational work for machines given the right data set. The machine can, by trial and error ( called weights in Data Science), narrow down the images to what they are. The key principle in creating ImageNet was to have the largest dataset of images available in the field and, second, to cover as many categories of images as possible. Anything that is not named is not recognized. This is true for the human eye as it is for computer vision, at least at the basic level of comprehension. Fei-Fei writes eloquently in the book about how she and her team arrived at identifying categories. It was her interdisciplinary approach to understanding the problem that triggered the insights. Psychology, cognition, linguistics, management skills, common sense, and, above all, passion and a sense that the team was doing something meaningful to further the frontiers of human understanding played a role in the creation of ImageNet.

Dr. Li instituted a competition called ImageNet in 2010 to rope in developers who can use this dataset to demonstrate the efficiency of their algorithms to process images and classify them. Ilya and Alex from Dr. Hinton’s team participated in the second season of the competition, which was announced in 2012. Dr. Hinton’s team, especially Ilya, believed that there was ever a chance that neural networks, which had been out of favor for nearly two decades, would shine when they could be trained on such vast quantities of data as ImaeNet possessed. It was a courageous insight from the young researcher. He was proved right. The quality of images in AlexNet ( the name of their model) was significantly better in image recognition than any other model till that time. Convoluted Neural networks learned better and performed better when they were exposed to enormous quantities of curated data. Dr Fei-Fei’s intuition and passionate belief that the availability of large amounts of data would spur the advancements in computer vision also stood vindicated in the process.

Without Dr Li’s commitment, hard work, and belief in assembling ImageNet, Neural networks may not have made a comeback in 2012. This is not to say that it never would have. It was a matter of time before someone connected large quantities of data with neural networks. Immense credit goes to Dr. LI for seeing the value proposition in making that connection happen and devoting her time and energy to an unglamorous task. She realized, through her grounding in cognitive science, that machines need big data and large amounts of computing resources to learn by themselves. Both of these ingredients were available by 2007. For neural networks to perform well, it had to be deeply layered, just like the human brain. The deeper the neuronal layering, the more layers data can pass through, filtering out unnecessary noise and, therefore, the more accurate the classification. Again, there is an optimal and delicate balance between the number of layers the network can contain and the response time, which is equally important. These are considerations that will continue to be refined, but in 2012, with the coming together of data, neural networks, powerful processors, and computational resources, the foundations were laid. The era of deep learning had begun – which, more than anything, has been instrumental in propelling all the significant advancements in AI since then. After seventy years since the Dartmouth conference and after many stumbles and periods of desperation, AI now had a clear direction. There is no doubt that neural networks and big data hold the key to the future of this field. ChatGPT is based on a very sophisticated form of Neural networks called the transformers. More on this in another essay.

I strongly recommend Dr Fei-Fei Li’s memoir to anyone interested in understanding the state of AI today and how it came to be. Dr Li’s memoir has done what many technical books fail to do, which is to thoughtfully and stylishly unravel a complex subject to an educated audience without dumbing it down. She juxtaposes her own life, the struggles, the losses, the highs, and the lows with her professional goal, her Northstar – as she refers to it at least a dozen times in her book – of finding a way to solve the problem of computer vision. AI, to Dr. LI, is not a threat to humanity. She advocates “human-centered AI” and not “human-centric AI.” Calling AI human-centric indicates that advancements in AI have to be contained, harnessed, and put on a leash so that they don’t spiral out of human control. However, being human-centered sounds more benign. It indicates that AI should walk hand in hand, serving pressing human needs. There is no conflict unless we want to posit one deliberately. After I read her memoir, I watched several of Dr. Li’s conversations and lectures on YouTube. I realized that she is as good a speaker as she is a writer. She is clear in her vision and not afraid to speak her convictions. She comes across as humble and never fails, either in the book or in her talks, to acknowledge her roots as an immigrant to America and how the country was kind to her in ways she cannot express. At heart, Dr.Li is a teacher, and she shines in the company of students.

For a girl who set foot in New Jersey when she was fifteen and couldn’t speak a word of English, with nothing more than an inquisitive mind and her parent’s dream, Fei-Fei’s journey and success is a testament to what is possible in the great educational institutions of America. She still firmly believes that academia should drive breakthroughs and innovations, as it has always done in the past. Her two-year stint at Google, after taking a sabbatical from Stanford, convinced her even more than ever that when the role of academics is taken over by Tech companies, an uneasy relationship develops between the quality of research conducted in the labs and the ethicality of its use. When she returned to Stanford, she formed the Stanford Institute of Human-Centered Artificial Intelligence, an institute dedicated to humane research on the use of AI in fields that matter. Throughout the book, Fei-Fei constantly emphasizes the role of serendipity and the generosity of people around her and underplays her role and achievements. The result is a remarkable memoir that is educative and emotionally satisfying in equal measure.

Please read this book, especially if AI touches your work. It will help put things in perspective.

2 comments

Mridul Lanjekar says:

April 29, 2024 at 12:37 pm

In all of Dr. Fei-Fei Li’s views and findings, the bit that stood out was – “advancements in AI have to be contained, harnessed, and put on a leash so that they don’t spiral out of human control.” Reading this I remembered Otto Octavius…created for the betterment of humanity and ended up becoming a monster let loose!

1. bala.sunderosho says:
  
  May 10, 2024 at 1:06 pm
  
  Yes Mridul. one has to careful.

Leave a Reply Cancel reply