AI-ML in the Cloud : Part 1- A Comprehensive Introduction

Jean Latiere
Sep 28, 2023
11 min read

Updated: Oct 22, 2023

This blog post is the first of a series on AI-ML in the cloud, starting with a comprehensive introduction, before diving more into details and the specifics of Generative AI in the cloud in a 2nd blog post, and concluding with a post about how to optimize AI-ML in the Cloud.

In today's digital age, the convergence of Artificial Intelligence (AI), Machine Learning (ML), and cloud computing is revolutionizing industries and reshaping the way businesses operate. This article aims to provide an in-depth understanding of these technologies, their interplay, and the immense potential they offer when combined.

1. The definitions

What is Artificial Intelligence (AI) ? AI is not just about robots. It's a broad field of study that encompasses various sub-domains. From natural language processing to robotics, AI's primary goal is to create systems that can perform tasks that would typically require human intelligence. These tasks include problem-solving, understanding languages, and recognizing patterns.

What is Machine Learning (ML) ? ML is often considered a subset of AI but is crucial to its success. Instead of being explicitly programmed to perform a task, ML systems use algorithms and statistical models to analyze and draw inferences from data. Over time, these systems "learn" and improve their tasks without being explicitly programmed for every eventuality.

What is Cloud Computing? Beyond just storing photos or documents, cloud computing is about providing computing services over the internet. From databases, and servers, to networking, everything can be delivered over the cloud. The primary benefits include cost savings, global scale, and performance.

What is generative AI? Generative AI is a type of artificial intelligence that can create new content and ideas, including conversations, stories, images, videos, and music. Generative AI is powered by massive machine learning models called foundation models, or FMs. With FMs, customers can use the same pre-trained model to adapt to multiple tasks.

What are Foundation Models? Generative AI is powered by very large machine learning models that are pre-trained on vast collections of data, and are commonly referred to as foundation models, or FMs.

What types of foundation models are currently in the market?

There are currently three main foundation model types in the market.

Text-to-text: These natural language processing models can summarize text, extract information, respond to questions, and create content such as blogs or product descriptions. An example is sentence auto-completion.
Text-to-embeddings: These FMs compare user search bar input with index data and connect the dots between the two. The result is more accurate and relevant results.An example is comparing a user’s request with catalog data.
Multimodal: These emerging foundation models can generate images based on a user's natural language text input.

2. Benefits of Running AI-ML Workloads in the Cloud:

Innovation: The cloud offers a platform for experimentation and innovation, allowing businesses to test new ideas without significant upfront costs. It's even more true in an AI-ML context where workloads require last generation GPUs to run with decent performance, and the procurement of such GPUs can be facilitated when done through an Hyperscaler.
Cost-Efficiency: Pay-as-you-go models mean businesses only pay for the computing power they use.
Scalability: Cloud platforms can easily handle vast amounts of data, essential for training robust ML models.
Accessibility: With cloud platforms, businesses can access their data and applications from anywhere, facilitating remote work and global collaborations.
Security: Cloud providers invest in the best security protocols, ensuring data protection and compliance.

3. AI-ML Workloads: Training & Inference

Training refers to the process of teaching a model using labelled data. It's computationally intensive and can take a long time.
Inference, on the other hand, is the process of using a trained model to make predictions on new data. It's typically faster and requires less computational power compared to training.

Training requires vast amounts of data and computational power as the model learns.

Inference, on the other hand, is less resource-intensive, as the trained model is simply applied to new data to make predictions.

4. Neural Networks Unveiled

Various neural network architectures have emerged, each with its unique design and application. From the image-centric Convolutional Neural Networks (CNNs) to the sequence-based Recurrent Neural Networks (RNNs), and the innovative Generative Adversarial Networks (GANs) to the state-of-the-art Large Language Models (LLMs), each has carved its niche in the AI domain. Let's delve deeper into these architectures, understanding their design principles and primary uses.

CNNs (Convolutional Neural Networks)

Architecture: Uses convolutional layers to process data in local receptive fields, making them adept at identifying patterns in images.
Primary Use: While CNNs are primarily for images, they have also been used in Natural Language Processing (NLP) tasks, especially for sentence classification, before the dominance of the Transformer architecture.

RNNs (Recurrent Neural Networks)

Architecture: Has loops to allow information persistence, meaning they have a kind of memory about previous steps in a sequence.
Primary Use: Before Transformers, RNNs (and their variants like Long Short-Term Memory - LSTMs and Gated Recurrent Unit - GRUs) were the go-to for many NLP tasks. However, they struggle with long-range dependencies due to the vanishing gradient problem.

GANs (Generative Adversarial Networks)

Architecture: Consists of two networks, a generator and a discriminator, that are trained together. The generator tries to produce fake data, while the discriminator tries to distinguish between real and fake data.
Primary Use: GANs can be used for text, but they're more challenging to train for this purpose compared to images. Large Language Models - LLMs, especially the generative ones like GPT, can generate text without the adversarial component.

Large Language Models (LLMs), used for Generative AI

Architecture:
- Transformers Architecture: The success of modern LLMs is primarily due to the Transformer architecture, which employs self-attention mechanisms to process input data, allowing it to recognize long-range dependencies in text.
- Huge Parameter Count: Notable models, such as GPT-3, boast up to 175 billion parameters, ranking them among the most extensive neural networks available.
Primary Use: LLMs are machine learning models crafted to understand and generate human language, with their core function being to predict subsequent words in a sequence based on preceding words.
Pre-training and Fine-tuning: LLMs typically undergo pre-training on extensive text corpora to grasp language nuances and are later fine-tuned for specialized tasks like

The Most Recent Foundation Models for LLM

Google introduced Bard, Anthropic created Claude 2, and Meta unveiled LLaMa 2. Each of these models has unique capabilities. A comparison between GPT-4, Claude 2, and Llama 2 suggests that these models are among the top performers in the LLM landscape.
Meta unveiled a new version called LlaMa 2. LlaMa-13B outperforms GPT-3 (175B) on most benchmarks. LLaMa65B is competitive with the best models, such as Chinchilla-70B and PaLM-540B.
Claude 2 is a newer version introduced by Anthropic. Claude is available in two modes: Claude, which is the full, high-performance model, and Claude Instant, which is a faster model but may compromise on performance. Claude's use cases are conversation automation, question answering, and workflows based on research into the training of AI systems.
Stability AI released Stable Diffusion, a multimodal used to generate high quality images, logos, art and design.

4. Silicon Requirements

AI-ML workloads often have specific hardware requirements, some are fit for GPUs only, while others can run on CPUs.

GPU or CPU?

ML workloads, especially deep learning models, often benefit from GPU acceleration due to the parallel processing capabilities of GPUs. However, some lightweight models or inference tasks can be run on CPUs. For instance, AWS offers specific CPU instances like P4d, optimized for ML tasks.

AI/ML Models that Typically Require GPUs

Training deep neural networks, especially those with a large number of layers or a large amount of data, is computationally intensive. GPUs, with their parallel processing capabilities, can handle thousands of tasks at once, making them ideal for this:

Convolutional Neural Networks (CNNs): Used primarily in image processing tasks like image recognition, segmentation, and more.
Recurrent Neural Networks (RNNs): Used for sequential data tasks like time series analysis, natural language processing, and more.
Generative Adversarial Networks (GANs): Used for generating new data that is similar to some existing data, like creating art or simulating images.
Reinforcement Learning: Especially deep reinforcement learning where neural networks are used to predict actions.

AI/ML Models that Can Run on CPUs

Traditional Machine Learning: Algorithms like linear regression, decision trees, clustering, and others can run efficiently on CPUs.
Lightweight Models: Some neural networks that are smaller in size or have been optimized for deployment can run on CPUs.
Natural Language Processing (NLP): Basic NLP tasks like tokenization, part-of-speech tagging, and named entity recognition can be handled by CPUs.

When GPUs can be used for Inference

While GPUs can speed up inference, especially for large models, many production systems use CPUs for inference because:

Cost: CPUs are generally cheaper than GPUs.
Optimization: Models can be optimized for deployment using techniques like quantization, pruning, and model distillation, making them lightweight enough for CPUs.
Batch vs. Real-time: While GPUs excel at processing large batches of data simultaneously, many inference tasks in production involve real-time processing of individual data points, which CPUs can handle efficiently.
Model Serving: Deploying and serving the model to end-users, especially in a cloud environment, can be done using CPUs.

AI/ML Model Comparison for CPU Vs GPU

Model Type	Use Cases	Training (CPU/GPU)	Inference (CPU/GPU)
CNN	Image classification - Object detection - Image segmentation - Facial recognition	GPU (preferred for large datasets and complex models) CPU (for simpler models or smaller datasets)	GPU (for real-time processing or large batches) CPU (for lightweight models or non-real-time tasks)
RNN	Time series forecasting - Natural language processing - Speech recognition - Video analysis	GPU (preferred due to sequential nature of data) CPU (for simpler tasks)	GPU (for real-time processing or complex models) CPU (for simpler models or tasks)
GAN	Image generation - Data augmentation - Style transfer - Super-resolution	GPU (almost exclusively due to adversarial training complexity)	GPU (for real-time generation or complex models) CPU (for pre-trained models generating simpler outputs)
LLM	Text generation - Text classification - Question answering - Translation - Sentiment analysis	GPU (preferred especially for large models like GPT-3) CPU (only for very small models or fine-tuning)	GPU (for real-time responses or large models) CPU (for optimized or pruned models)

5. Frameworks for AI-ML models

An AI/ML framework is a software library or interface designed to provide developers with tools, algorithms, and pre-defined structures to design, train, and deploy machine learning models efficiently and effectively.

The Popular Frameworks

The Frameworks that have gained the most traction over the past 3 years are Tensorflow and PyTorch.

TensorFlow:

Developed by Google, TensorFlow is an open-source library for numerical computation and machine learning. It provides a comprehensive ecosystem of tools, libraries, and community resources to build and deploy ML models.

Tensorflow offers a flexible platform for building and deploying ML models, supports deep learning and traditional machine learning, and has robust support for production deployment.

PyTorch:

Developed by Facebook, PyTorch is an open-source machine learning library based on the Torch library. It's used for a range of tasks including deep learning and artificial intelligence.

PyTorch is known for its dynamic computation graph, which makes it particularly useful for deep learning research. It has a rich ecosystem of libraries and tools, and a strong community support.

Other popular frameworks:
- Keras: An open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library.
- Caffe: Developed by the Berkeley Vision and Learning Center (BVLC), it's particularly popular for image classification tasks.
- Theano: An open-source numerical computation library that allows developers to efficiently define, optimize, and evaluate mathematical expressions, including matrix-valued ones.
- Scikit-learn: A tool for data mining and data analysis. It's built on top of libraries like NumPy, SciPy, and Matplotlib and is not specifically for deep learning but for traditional machine learning.
- MXNet: A deep learning framework that allows you to define, train, and deploy deep neural networks on a wide array of devices, from cloud infrastructure to mobile devices.

Which one is the best?

While TensorFlow was initially more geared towards production deployment (with tools like TensorFlow Serving), PyTorch gained popularity in the research community due to its dynamic computation graph. However, with the introduction of TensorFlow 2.0, TensorFlow also adopted dynamic features, making it more user-friendly for research as well. Many academic papers provide code implementations in either TensorFlow or PyTorch, and sometimes both. Major tech companies also use these frameworks for their machine learning solutions.

Which framework for LLM?

For large language models (LLMs) like GPT-3 or BERT, the primary frameworks used are:

TensorFlow: Models like BERT have been implemented in TensorFlow and are available for use.
PyTorch: Due to its dynamic computation graph, many researchers prefer PyTorch for developing and experimenting with new LLMs.
Transformers by Hugging Face: This library, developed by Hugging Face, provides thousands of pre-trained models for various NLP tasks, including LLMs like GPT-2, GPT-3, BERT, and more. It's built on top of both TensorFlow and PyTorch, allowing users to switch between the two.

What is Hugging Face?

Hugging Face is a company specializing in natural language processing and understanding. They have developed a platform that offers a range of tools and resources for working with NLP models, including a model hub, datasets, and more.

Model Hub: A repository of pre-trained models that can be used for a range of NLP tasks.
Datasets: A library of datasets that can be used for training and evaluating models.
Transformers Library: A library that provides thousands of pre-trained models for various NLP tasks.

Hugging Face has been widely adopted in the NLP community due to its user-friendly interface and rich repository of pre-trained models. It’s facing competition from OpenAI, DeepMind, and other companies working in the NLP space, but Hugging Face has carved out a niche in providing accessible NLP resources and tools, and forged strong partnerships with AWS and NVidia.

6. Cloud Services for AI: Self-managed with IaaS or Managed Services?

Both AWS and GCP offer robust platforms for AI and ML workloads. AWS's Amazon SageMaker provides a comprehensive set of tools covering the entire ML lifecycle. On the other hand, GCP's AI Platform offers tools for building, deploying, and managing ML projects. The choice between managed services and self-managed solutions often depends on the specific needs of the organization and the scale of deployment.

AWS

Infrastructure: AWS offers scalable, high-performance, and cost-effective infrastructure for ML. They provide a choice of processors and accelerators to cater to different needs. Specifically, Amazon Elastic Compute Cloud (Amazon EC2) instances such as P4d and Inf1 are designed to offer high performance and cost-effectiveness for ML training and inference in the cloud.
Choice of ML Frameworks: AWS supports a variety of ML frameworks and libraries, including TensorFlow, PyTorch, Apache MXNet, Hugging Face, and others. These frameworks can be utilized as a fully managed experience in Amazon SageMaker or with AWS Deep Learning AMIs and Containers optimized for performance on AWS.
AI Services: AWS offers purpose-built AI services for various tasks such as:
- Computer Vision: Analyze images and videos, detect defects, and automate inspection.
- Automated Data Extraction and Analysis: Extract text and data from documents, acquire insights from unstructured text, and control the quality of data extraction.
- Language AI: Build chatbots, automate speech recognition, convert text into speech, and more.
- Business Metrics: Forecast business metrics, detect online fraud, and identify data anomalies.
AWS's Amazon SageMaker: A comprehensive suite covering the entire ML lifecycle, SageMaker offers tools for building, training, and deploying ML models. However, its vast array of services can be overwhelming for beginners.
Amazon Bedrock can fine-tune the model for a particular task without having to annotate large volumes of data (as few as 20 examples is enough). None of the customer’s data is used to train the original base models. Customers can configure VPC settings to access Amazon Bedrock APIs and provide model fine-tuning data in a secure manner where all data is encrypted at rest and in transit.

GCP

Infrastructure: GCP offers a range of compute options for ML, from managed services that abstract away the infrastructure to deep integrations with their advanced processing hardware:
- AI Platform Training: Uses distributed training on a cluster to train ML models using the resources of Google Cloud.
- AI Platform Prediction: Provides a serverless platform to host trained ML models in the cloud and make predictions on new data.
- Deep Learning VM Image: Pre-configured and optimized for deep learning on Compute Engine.
Choice of ML Frameworks: GCP supports popular ML frameworks such as TensorFlow, scikit-learn, and XGBoost. They also offer deep learning containers that come pre-installed with popular ML libraries and tools.
Specialized Hardware: GCP offers both CPUs and GPUs for ML workloads. They also provide Cloud Tensor Processing Units (TPUs), which are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate ML workloads.
Integrated Tools: GCP integrates with tools like BigQuery and Dataflow to streamline the data engineering process. They also offer AI Hub, a centralized platform to discover, share, and deploy ML resources within an organization.
GCP's Vertex AI is a Managed ML platform that provides tools for building, deploying, and managing ML projects.from data preparation to model deployment. The repository Model Garden provides a collection of state-of-the-art pre-trained models and modelling solutions. It aims to facilitate the adoption of ML models by providing high-quality, reusable code.

Model Garden: Vertex AI Foundation Models

GenAI Studio provides a user-friendly UI for tasks like data ingestion, model training, and deployment, making it easier for both experts and novices to work with AI.

Amazon Sagemaker Vs GCP AI Platform

Amazon SageMaker

Pros	Cons
Fully managed service that covers the entire ML lifecycle.	Can be expensive, especially for large-scale deployments.
Supports TensorFlow, PyTorch, and other popular frameworks.	Some users find it less flexible compared to self-managed solutions
Offers tools for data labeling, model training, and deployment.

GCP AI Platform

Pros	Cons
Offers a range of tools for building, deploying, and managing ML models.	Can be complex to set up and manage
Supports TensorFlow, scikit-learn, and XGBoost	Pricing can be a concern for smaller organizations
Integrates with BigQuery and Dataflow for data engineering

Hugging Face and AWS Alliance:

Hugging Face and AWS have formed a partnership to make it easier for developers to use and deploy Hugging Face models on AWS. Here are some key points:

Deep Integration with Amazon SageMaker: Hugging Face has integrated its Transformers and Tokenizers libraries with Amazon SageMaker, AWS's fully managed service for machine learning. This allows developers to train and deploy Hugging Face models at scale using SageMaker.
Pre-built Containers: AWS offers pre-built containers for Hugging Face, making it easier to get started without dealing with the nuances of setting up the environment
Performance: With the partnership, users can leverage the infrastructure of AWS to train large models efficiently, utilizing features like SageMaker's distributed training capabilities.
Model Hub on AWS: Hugging Face's Model Hub, a repository of pre-trained models, is also accessible via this partnership, allowing developers to deploy a wide range of models directly on AWS.

Managed Service Vs Self-Managed

Managed Services of Self-Managed for AI-ML Apps in the Cloud

If your requirements are Flexibility, Security, and Integration, you want to self-manage your AI-ML applications.
If your requirements are more about Ease of Set Up and Deploy, Scalability, Cost, Maintenance and Security, then picking Managed Services to build and run your AI apps are probably the best choice.

In conclusion, the synergy of AI, ML, and cloud computing offers unprecedented opportunities for businesses and researchers alike. As these technologies continue to evolve, their combined potential will undoubtedly lead to groundbreaking innovations and solutions that will shape our future.

Source:

Attention is All You Need: https://research.google/pubs/pub46201/

Transformer: A Novel Neural Network Architecture for Language Understanding: https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html

Transformer on Wikipedia: https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)#:~:text=Transfor mers%20were%20introduced%20in%202017,allowing%20training%20on%20larger%20da tasets.

What is Temperature in NLP? https://lukesalamone.github.io/posts/what-is-temperature/

Bard now helps you code: https://blog.google/technology/ai/code-with-bard/

Model Garden: https://cloud.google.com/model-garden

Risk-free optimization consulting, guaranteed results - Schedule your call today!

Want to know how to optimize your spending?: Estimate your saving here