Autor: Yeshwanth Reddy
1
Ebook

Modern Computer Vision with PyTorch. A practical roadmap from deep learning fundamentals to advanced applications and Generative AI - Second Edition

V Kishore Ayyadevara, Yeshwanth Reddy

Whether you are a beginner or are looking to progress in your computer vision career, this book guides you through the fundamentals of neural networks (NNs) and PyTorch and how to implement state-of-the-art architectures for real-world tasks.The second edition of Modern Computer Vision with PyTorch is fully updated to explain and provide practical examples of the latest multimodal models, CLIP, and Stable Diffusion.You’ll discover best practices for working with images, tweaking hyperparameters, and moving models into production. As you progress, you'll implement various use cases for facial keypoint recognition, multi-object detection, segmentation, and human pose detection. This book provides a solid foundation in image generation as you explore different GAN architectures. You’ll leverage transformer-based architectures like ViT, TrOCR, BLIP2, and LayoutLM to perform various real-world tasks and build a diffusion model from scratch. Additionally, you’ll utilize foundation models' capabilities to perform zero-shot object detection and image segmentation. Finally, you’ll learn best practices for deploying a model to production.By the end of this deep learning book, you'll confidently leverage modern NN architectures to solve real-world computer vision problems.

2
Ebook

Modern Computer Vision with PyTorch. Explore deep learning concepts and implement over 50 real-world image applications

V Kishore Ayyadevara, Yeshwanth Reddy

Deep learning is the driving force behind many recent advances in various computer vision (CV) applications. This book takes a hands-on approach to help you to solve over 50 CV problems using PyTorch1.x on real-world datasets.You’ll start by building a neural network (NN) from scratch using NumPy and PyTorch and discover best practices for tweaking its hyperparameters. You’ll then perform image classification using convolutional neural networks and transfer learning and understand how they work. As you progress, you’ll implement multiple use cases of 2D and 3D multi-object detection, segmentation, human-pose-estimation by learning about the R-CNN family, SSD, YOLO, U-Net architectures, and the Detectron2 platform. The book will also guide you in performing facial expression swapping, generating new faces, and manipulating facial expressions as you explore autoencoders and modern generative adversarial networks. You’ll learn how to combine CV with NLP techniques, such as LSTM and transformer, and RL techniques, such as Deep Q-learning, to implement OCR, image captioning, object detection, and a self-driving car agent. Finally, you'll move your NN model to production on the AWS Cloud.By the end of this book, you’ll be able to leverage modern NN architectures to solve over 50 real-world CV problems confidently.