Why is AI on a Fast Track Now?

AI has gone through ups and downs in its history. In the past few years it has gained renewed momentum due to the following developments.

Computing Power

Ever since IT took off, computing speed and capacity have been increasing and the cost of computing has been decreasing exponentially. At the core of computing power is the CPU (Central Processing Unit), which is responsible for executing various instructions in our application programs. Its capacity is determined by the number of transistors on a single integrated circuit. This count has been following Moore’s Law, which states that the number of transistors on an IC doubles every two years.

To further improve application performance, number of processors (CPUs) are placed on a single IC as in dual core, quad core etc. For even larger workloads, multiple computers connected in a distributed environment. More computers means more computing power and better performance.

Another type of processor, called GPU (Graphics Processing Unit) has been developed for faster processing of graphics, specifically in gaming applications. GPUs typically give 10x performance improvement over CPU equivalents, and they are complimentary to CPUs, as they can’t handle all types of operations. These GPUs are being repurposed for machine learning workloads. But they are only well suited for parallel processing and not all instructions in business applications support parallel processing.

Since GPUs were originally designed for gaming applications, repurposing them for ML applications is sub-optimal. Google has come up with another processor called TPU (Tensor Processing Unit) which is specifically designed for machine learning applications. This is expected to further improve the performance of these applications.

Multi-core CPUs, repurposed GPUs and the arrival of TPUs provided massive computing power, which in turn enabled large Deep Learning applications like ImageNet contest, Speech Recognition, Machine Translation, NLP etc. Many of these algorithms existed earlier too, but due to lack of sufficient computing power, these applications were not possible in the past.

Even now, capsule networks, GANs, or memory networks cannot give desired performance with currently available computing capacity. This limitation hopefully will be overcome in due course of time.

Big Data

With increased use of the Internet, cloud computing and social media, the data in enterprises has grown exponentially. Historically only structured data was used in enterprises, but of late varieties of data such as image, video, audio, text is making their way into various business applications. More and more real time applications are increasing the velocity of data movement within enterprises. Enterprises and consumers have started appreciating the value of data more than ever.

All the characteristics of Big Data can be summarized in five Vs – Volume, Variety, Velocity, Variability and Value. Variability refers to the quality of the data.

More often lack of data or poor-quality data prohibits data analysis and decision making in the organizations. The arrival of Big Data and associated infrastructure around the Hadoop eco-system has enabled a much faster adaption of AI applications.


Though many of core machine/Deep Learning algorithms have been around for some time, many incremental improvements made large-scale applications viable in the recent past.

Neural network parameters need to be initialized to start the training, and these initial values have a huge influence in finding optimal values that minimize the error between prediction and true value. If the initial values are not proper, then it may take a long time for the training to converge, or it may end up in some local minima that may not be the optimal solution. In the recent past many different initialization methods have been formulated, including unsupervised representation learning to improve accuracy and reduce the training time.

Neural networks tend to overfit to training data, more often than traditional machine learning algorithms. There have been few weight decay regularization methods to reduce this impact in the past, but the recent dropout technique has been found much more effective in reducing this problem.

For activation function, neural networks historically used sigmoid or tanh functions. In the recent past research has shown that Relu (Rectified Linear Unit) improves the performance of many applications. However, in recurrent neural networks tanh is still preferred activation function.

Historically there have been many optimization algorithms that minimize the error between prediction and actual value. RMSProp and Adam algorithms that have been published in the recent past have shown much faster convergence as well as better accuracy.

Batch normalization is another recent technique that helped improve the performance and accuracy in the recent past. This is similar to normalizing input training data, where each layer’s activations are normalized, and normalization parameters are also learned during the training process.

Power of compounding Internet users

Internet penetration is growing at astonishing speed. As of 31st Dec 2017, 4.2 billion people are connected to the Internet, which is about 54.4% of world’s population. As more and more people get connected, it drives more collaboration and faster innovation of products and services. This is also driving the rise of AI applications. The number of papers being submitted in various journals on AI applications has been growing exponentially.

Democratization of AI

More and more open data is being made available by Google, Facebook, Amazon, and leading academic institutions like Stanford, University of California, MIT are supporting the increased interest in AI applications.

There is a good ecosystem of open source tools and technologies available for developing AI applications. Amazon, Facebook, and Microsoft are also offering most of their machine learning tools and technologies as open source.

AI legends like Geoff Hinton, Andrew NG are offering online courses on ML and Deep Learning at a nominal cost. Google, Microsoft are also offering such ML/DL courses at no cost. Platforms like Kaggle that conducts competitions on complex ML/DL problems are helping many interested parties to participate from across the globe.

Almost all the leading educational institutes across the globe offer both online and on-premise paid courses on ML and DL. All these initiatives are enabling many people to get trained on these algorithms and tools, which in turn is driving faster adoption of AI across the industry as well as consumer segments.