Looking at recent advancements in the field of machine learning, it would be hard to ignore neural networks. For the past two decades, neural networks have dominated the field so greatly that individual neural network architectures (patterns of neuron structures) have gone in and out of vogue in that time. Across almost all sub-fields of machine learning, neural networks are now integrated into the core of the endeavor. This phenomenon is especially interesting from a historical perspective, as it wasn’t always the case.
For the majority of machine learning’s history, neural networks were only one of many frequently used techniques, and not always even a top contender. Other data-oriented shallow learning models existed, such as Support Vector Machines and Scale-Invariant Feature Transform, and are still commonly used today, though less so than before the proliferation of neural networks1. Even less popular now are rule-based methods, which, for a period of time, contended for the future of artificial intelligence. By interrogating how neural networks overtook these competitors, we can gain insight into the conditions necessary for new AI paradigms to arise.
Two factors which ushered in the neural network takeover were (1) a shift in computing power and architecture in the 1990s and 2000s, and (2) an unfilled niche in computer programming that neural networks filled. This should be obvious in retrospect: for any technology to be widely adopted, the technological capacity for its creation (opportunity for development) must exist, and it must perform some task as well or better than alternatives (motive for adoption). A feature of neural networks’ history is that both the opportunity and motive for neural networks cut across a wide band of programming sectors. Because advancements in computing power were not new technologies but rather decreases in existing technologies’ prices2, they were available to hobbyists and low-budget developers not long after they became available to large businesses, governments, and research institutions. Their niche, as well, was one general to disparate sub-fields of computing, which made neural networks useful to implement for vastly different endeavors.
The first half of neural networks’ opportunity for adoption dawned in the 1990s, as the continual advancement of computing power reached a point where training a multilayer network was feasible. As years went on, it took less and less time to train iterative learning models on a traditional CPU, allowing for an unprecedented rapid development of those machine learning techniques. In the mid-2000s, graphics processing units (GPUs) became inexpensive as supply rose to meet the demand from computer graphics (primarily 3D modeling and video games). Programmable GPUs are physically structured differently from traditional CPUs, and allow for incredibly efficient parallel processing, so their proliferation caused a discrete leap in the efficiency of all parallelizable programs, neural networks included. This combination of factors sets the stage for neural networks’ consideration as a viable method of everyday machine learning.
Of course, a technology’s computational viability couldn’t be the sole determiner of its popularity. There’s a historic paradox in computer programming relevant to the niche neural networks fill: tasks which humans need to be taught are generally easy to encode into computers, while tasks which come innately to humans are historically difficult to encode. Humans naturally acquire sight, but it took until the 21st century for computer programs to to parse complex images at a usable level, whereas humans need to be taught arithmetic over many years, a skill which was one of the earliest encoded into computers. Somehow the things we find easiest to do are the hardest to convey to computers. This paradox is the consequence of the fact that humans are not consciously aware of the steps involved in executing the tasks which our brains execute frequently and efficiently. We don’t intuitively know how our eyes convert photons into visual objects. We don’t intuitively know how we process language. We only have awareness of these processes inputs and outputs, not their internal workings. Attempts to encode these tasks traditionally evade simple eloquent solutions3. Even through intense prolonged introspection, we haven’t been able to fully decode these black box mechanisms in our own minds. One explanation for this is that these tasks are not computed in our brains using rigorous algorithms, but rather messy associative structures of neurons 4, 5. This “messy structures hypothesis” is supported by the recent solutions to these seemingly unencodable problems. Deep neural networks, in training broad structures of neurons on large input datasets, have revolutionized fields that involve mimicking unconscious thought, like Computer Vision, Natural Language Processing, and Data Classification. In mimicking unconscious thought’s characteristics, Neural Networks are better at mimicking their function than any prior technology. With the availability of neural networks programmers could finally automate tasks that once required a human’s unconscious to complete.
With the understanding of how neural networks came to dominate machine learning, we can also understand why their competitors fell to the wayside. Shallow learning models are still used to this day, as they have the benefit of generally being faster to implement and more reliable than machine learning methods with less training time. No, as stated above, other than the rise in Neural Networks, the most striking difference in the Artificial Intelligence field between the 1990s and now is the fall of rule-based methods. Rule-based methods learn by constructing patterns of rules formatted from atoms. They were a popular area of research in the late 20th century, and encompassed Expert Systems, which were pre-programmed rule-sets used to encode and distribute automated expertise. Rule-based methods displayed promise, but didn’t seize the zeitgeist when computing power developed. While neural networks filled a desperately lacking niche in the field, rule-based methods automated logical analysis and knowledge encoding, two tasks that programmers are trained to perform efficiently. While the technological opportunity for their advancement was increasing, the motive decreased in relation to its alternatives. In this feedback loop, rule-based methods were put on the field’s back-burner, to the point that they are unknown now to many programmers.
This analysis of opportunity and motive not only explains the past, but can predict the future of Artificial Intelligence. We already see a niche forming in the gaps left unfulfilled by neural networks. As the intractable problems of transparency and modularity become more apparent, we will likely see a machine learning paradigm which is more adept at those tasks increase in funding and interest.
-
https://towardsdatascience.com/why-deep-learning-is-needed-over-traditional-machine-learning-1b6a99177063↩
-
https://www.techspot.com/article/2008-gpu-efficiency-historical-analysis/ , https://www.hamiltonproject.org/charts/one_dollars_worth_of_computer_power_1980_2010↩
-
See the fields of computer vision and natural language processing for easy examples of how hard these tasks are to encode by hand.↩
-
While conscious thought is also constructed of messy structures of neurons, it is clear that conscious thought can be organized to follow rigorous algorithms, and thus can be imitated by conventional encoding.↩
-
This is evidenced by the way that these innate traits can be hacked by altering their training data. They are not beautifully pure algorithms handed down from evolution, but rather an innate ability to acquire these skills.↩