Skip to main content
Engineering LibreTexts

10-E.9: Cloud Bootstrapping and Storage

  • Page ID
    40959
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Booting with VMs

    We have discussed the booting of Linux. However, booting in the virtual world is a bit different. In the world of virtual machines there are several different ways to control the boot. Most Linux VMs use either cloud-init; Anaconda or Kickstart.

    When an instance configured for Cloud-init boots up and the service (actually, four services in systemd implementations to handle dependencies during the boot process) starts, it checks its configuration for a data source to determine what type of cloud it is running in. Each major cloud provider has a data source configuration that tells the instance where and how to retrieve configuration information. The instance then uses the data source information to retrieve configuration information provided by the cloud provider, such as networking information and instance-identification information, and configuration data provided by the customer, such as authorized keys to be copied, user accounts to be created, and many other possible tasks.

    Anaconda is the installation program used by Fedora, Red Hat Enterprise Linux and some other distributions. Anaconda is a fairly sophisticated installer. It supports installation from local and remote sources.

    During installation, a target computer's hardware is identified and configured, and the appropriate file systems for the system's architecture are created. Finally, Anaconda allows the user to install the operating system software on the target computer. Anaconda can also upgrade existing installations of earlier versions of the same distribution. After the installation is complete, you can reboot into your installed system and continue doing customization using initial setup.

    Kickstart installations offer a means to automate the installation process, either partially or fully. Kickstart files contain answers to all questions normally asked by the installation program, such as what time zone you want the system to use, how the drives should be partitioned, or which packages should be installed. Providing a prepared Kickstart file when the installation begins therefore allows you to perform the installation automatically, without need for any intervention from the user. This is especially useful when deploying CentOS on a large number of systems at once.

    Kickstart files can be kept on a single server system and read by individual computers during the installation. This installation method can support the use of a single Kickstart file to install CentOS on multiple machines, making it ideal for network and system administrators.

    All Kickstart scripts and the log files of their execution are stored in the /tmp directory to assist with debugging installation failures.

    Virtual System Storage

    When working with virtual systems a virtual disk is simply a file or set of files that appears as a physical disk drive to a guest operating system. These files can be on the host machine or on a remote computer connected via a network. When a virtual machine is configured with a virtual disk, you can install a new operating system onto the virtual disk without repartitioning a physical disk or rebooting the host.

    In most virtual machines the actual files used by the virtual disk start out small and grow to their maximum size as needed. The main advantage of this approach is the smaller file size. Smaller files require less storage space and are easier to move in the virtual machines to a new location.

    Thin and Thick Provisioning

    In computing, thin provisioning involves using virtualization technology to give the appearance of having more physical resources than are actually available. If a system always has enough resource to simultaneously support all the virtualized resources, then it is not thin provisioned. The term thin provisioning is applied to disk layer in this article, but could refer to an allocation scheme for any resource. For example, real memory in a computer is typically thin-provisioned to running tasks with some form of address translation technology doing the virtualization. Each task acts as if it has real memory allocated. The sum of the allocated virtual memory assigned to tasks typically exceeds the total of real memory.

    The efficiency of thin or thick/fat provisioning is a function of the use case, not of the technology. Thick provisioning is typically more efficient when the amount of resource used very closely approximates to the amount of resource allocated. Thin provisioning offers more efficiency where the amount of resource used is much smaller than allocated, so that the benefit of providing only the resource needed exceeds the cost of the virtualization technology used.

    Just-in-time allocation differs from thin provisioning. Most file systems back files just-in-time but are not thin provisioned. Over-allocation also differs from thin provisioning; resources can be over-allocated/oversubscribed without using virtualization technology; for example overselling seats on a flight without allocating actual seats at time of sale, avoiding having each consumer having a claim on a specific seat number.

    Persistent Volumes

    Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed.

    A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

    Blob and Block Storage

    A Binary Large OBject (BLOB) is a collection of binary data stored as a single entity in a database management system. Blobs are typically images, audio or other multimedia objects, though sometimes binary executable code is stored as a blob. Database support for blobs is not universal. Often multimedia files are stored in this manner because they are quite large.

    The data type and definition was introduced to describe data not originally defined in traditional computer database systems, particularly because it was too large to store practically at the time the field of database systems was first being defined in the 1970's and 1980's. The data type became practical when disk space became cheap. The term is used in NoSQL databases, especially in Key-value store databases such as Redis.

    In computing data storage, a block, sometimes called a physical record, is a sequence of bytes or bits usually containing some whole number of records having a maximum length; a block size. Data thus structured are said to be blocked. The process of putting data into blocks is called blocking, while deblocking is the process of extracting data from blocks. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data-stream.

    Adapted from:
    "Anaconda" by Multiple contributors, The Fedora Project is licensed under CC BY-SA 3.0
    "How Cloud-init can be used for your Raspberry Pi homelab" by Chris Collins, OpenSource.com is licensed under CC BY-SA 4.0
    "Kickstart Installations" by Multiple Contributors, Centos Installation Guide is licensed under CC BY-SA 3.0
    "Thin provisioning" by Multiple ContributorsWikipedia is licensed under CC BY-SA 3.0
    "Persistent Volumes" by Unknown Contributors, Kubernetes is licensed under CC BY 4.0
    "Block (data storage)" by Multiple ContributorsWikipedia is licensed under CC BY-SA 3.0
    "Binary large object" by Multiple ContributorsWikipedia is licensed under CC BY-SA 3.0


    10-E.9: Cloud Bootstrapping and Storage is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?