Storage Options in Google Cloud Platform - Rundown

Arneesh Aima
DataDrivenInvestor
Published in
10 min readFeb 5, 2020

--

Introduction

In this blog, I am going to cover various storage options that are provided by the Google Cloud Platform(GCP). Choosing an appropriate storage option is extremely essential for assuring that your services/apps/data pipeline yields optimum results. The selection of the right storage option not only enhances the performance of your services/apps/data pipeline but also helps you in setting a cost-efficient project. The running cost of an organization’s backend system can be turned into a cost-efficient system by keeping in mind some basic principles and acquiring adequate knowledge before deploying anything. Numerous times we rush into the creation of a service or an app without thinking everything through and we end up faces issues later on. So the best advice is to set up a game plan before you start working on any project and spend some time doing adequate research, rushing this will yield results at that instance but later on you will end up in the same boat as usual. Don't worry if it's taking you some time to come up with the best possible storage option for your project, the time spent here will be nothing compared to the hours you will have to spend if you end up selecting a data store which was not meant for your use case.

Various Storage Option provided by CGP :

The storage options that GCP provides us are: Google Cloud Storage,Google Cloud Bigtable, Google Cloud SQL, Google Cloud Spanner, Cloud Datastore and Cloud Firestore.

1. Cloud Storage

Google Cloud Storage offers developers and IT organizations durable and highly available object storage. It assesses no minimum fee; you pay only for what you use. Prior provisioning of capacity isn’t necessary.

You must be curious about what object storage is right? Well as a developer you might be familiar with the straight forward file storage. In File Storage, you manage your data as a hierarchy of folders. It’s not the same as block storage, in which your operating system manages your data as chunks of disk Object storage is completely different for a simple file storage system. Object storage means this: you say to your storage, “Here, keep this arbitrary sequence of bytes,” and the storage lets you address it with a unique key. In Google Cloud Storage and in other systems, these unique keys are in the form of URLs, which means object storage interacts well with web technologies.

Google Cloud Storage is not a file system, although it can be accessed as one via third-party tools such as Cloud Storage FUSE. The storage objects offered by Google Cloud Storage are “immutable,” which means that you do not edit them in place, but instead create a new version. Google Cloud Storage’s primary use is whenever binary large-object storage is needed: online content, backup and archiving, storage of intermediate results in processing workflows, and more. If you have worked with Amazon s3 before you must be aware that amazon s3 follows the concept of immutability too, if you didn't know this, well now you do. This concept of immutability might seem to be a bit heavy-headed on storage because after all you are keeping all versions of your files and adding more and more versions, but this feature is very helpful when it comes to actual development cycle, often you might need to access the older version of some file for acquiring historic data for Data Warehousing or running some Machine Learning / Data Science analysis, this immutability feature comes handy in such cases.

Key Features of Google Cloud Storage

● High performance, internet-scale

● Simple administration — Does not require capacity management

● Data encryption at rest

● Data encryption in transit by default from Google to endpoint

● Online and offline import services are available

Choosing among Cloud Storage classes

Cloud Storage lets you choose among four different types of storage classes: Regional, Multi-regional, Nearline and Coldline. Multi-regional and Regional are high-performance object storage, whereas Nearline and Coldline are backup and archival storage. All of the storage classes are accessed in analogous ways using the Cloud Storage API, and they all offer millisecond access times.

Regional Storage lets you store your data in a specific GCP region, us-central1, europe-west1 or asia-east1. It’s cheaper than multi-regional storage, but it offers less redundancy.

Multi-Regional Storage costs a bit more, but it’s geo-redundant. That means you pick a broad geographical location, like the United States, the European Union, or Asia, and Cloud Storage stores your data in at least two geographic locations separated by at least 160 kilometers. Multi-Regional Storage is appropriate for storing frequently accessed storing data: website content, interactive workloads, or data that’s part of mobile and gaming applications. People use regional storage, on the other hand, to store data close to their Compute Engine virtual machines or their Kubernetes Engine clusters. That gives better performance for data-intensive computations.

Nearline storage is a low-cost, highly durable storage service for storing infrequently accessed data. This storage class is a better choice than Multi-Regional Storage or Regional Storage in scenarios where you plan to read or modify your data on average once a month or less. For example, if you want to continuously add files to Cloud Storage and plan

Coldline Storage is a very-low-cost, highly durable storage service for data archiving, online backup, and disaster recovery. Coldline Storage is the best choice for data that you plan to access at most once a year, due to its slightly lower availability, 90-day minimum storage duration, costs for data access, and higher per operation costs. For example, if you want to archive data or have access in the event of a disaster recovery event.

2. Cloud Bigtable

Bigtable is a NoSQL big data database service, gmail and google maps use this at backend. A SQL database is one that has rows and columns and has a predefined schema and we have to abide by the rules of the schema, only those entries will follow the scheme rules can be inserted in the database, additional fields can’t be added. NoSQL databases are ones in which not all rows might need to have the same columns, and in fact, the database might be designed to take advantage of that by sparsely populating the rows.

Key Features of Cloud Bigtable

Bigtable should be used if your required are one or all of the mentioned below:
● Large quantities (>1 TB) of semi-structured or structured data
● Data is high throughput or rapidly changing
● Transactions, strong relational semantics not required
● Data is time-series or has natural semantic ordering
● You run asynchronous batch or real-time processing on the data
● You run machine learning algorithms on the data

3. Cloud SQL

Cloud SQL is an easy-to-use service that delivers fully managed relational
databases. If you want to keep all your focus on building your application and not worry about database management tedious tasks such as applying patches and updates,managing backups, and configuring replications. If you are in an initial phase of a startup and low of DevOps team and need a relational database , Cloud SQL is the one for you.

Features of Cloud SQL:

Cloud SQL should be used if:

●Offers MySQL and PostgreSQL databases as a service
● Automatic replication
● Managed backups
● Vertical scaling (read and write)
● Horizontal scaling (read)
● Google security

4. Cloud Spanner

Cloud Spanner supports strong consistency, including strongly consistent secondary indexes, SQL, and managed instances with high availability through synchronous and built-in data replication.

Features of Cloud Spanner

You should use Cloud SQL if:

● A SQL RDBMS, with joins and secondary indexes
● Built-in high availability
● Strong global consistency
● Database sizes exceeding ~2 TB
● Many IOPS (Tens of thousands of reads/writes per second or more)

5. Cloud Datastore

Cloud Datastore is a highly-scalable NoSQL database for your applications.
Like Cloud Bigtable, there is no need for you to provision database instances.
Cloud Datastore uses a distributed architecture to automatically manage
scaling. Your queries scale with the size of your result set, not the size of your
data set.

Features of Cloud Datastore

● Atomic transactions
Datastore can execute a set of operations where either all succeed, or
none occur.
● High availability of reads and writes
Datastore runs in Google data centers, which use redundancy to
minimize impact from points of failure.
● Massive scalability with high performance
Datastore uses a distributed architecture to automatically manage
scaling. Datastore uses a mix of indexes and query constraints so your
queries scale with the size of your result set, not the size of your data
set.
● Flexible storage and querying of data
Datastore maps naturally to object-oriented and scripting languages
and is exposed to applications through multiple clients. It also provides
a SQL-like query language.
● Balance of strong and eventual consistency
Datastore ensures that entity lookups and ancestor queries always
receive strongly consistent data. All other queries are eventually
consistent. The consistency models allow your application to deliver a
great user experience while handling large amounts of data and users.

● Encryption at rest
Datastore automatically encrypts all data before it is written to disk and
automatically decrypts the data when read by an authorized user. For
more information, see Server-Side Encryption.
● Fully managed with no planned downtime
Google handles the administration of the Datastore service so you can
focus on your application. Your application can still use Datastore when
the service receives a planned upgrade.

Summed Up Comparison of All Storage Options

Consider using Cloud Datastore, if you need to store structured objects, or if you require support for transactions and SQL-like queries. This storage service provides terabytes of capacity with a maximum unit size of 1 MB per entity.

Consider using Cloud Bigtable, if you need to store a large amount of structured objects. Cloud Bigtable does not support SQL queries, nor does it support multi-row transactions. This storage service provides petabytes of capacity with a maximum unit size of 10 MB per cell and 100 MB per row.

Consider using Cloud Storage, if you need to store immutable blobs larger than 10 MB, such as large images or movies. This storage service provides petabytes of capacity with a maximum unit size of 5 TB per object.

Consider using Cloud SQL or Cloud Spanner if you need full SQL support for an online transaction processing system. Cloud SQL provides up to Up to 10,230 GB, depending on machine type, while Cloud Spanner provides petabytes. If Cloud SQL does not fit your requirements because you need horizontal scalability, not just through read replicas, consider using Cloud Spanner.

Cloud Filestore — Machine Learning-Media Processing

Cloud Filestore is widely used when it comes to performing heavy machine learning tasks, media processing, rendering etc due to the high throughput it is highly preferred but don’t confuse it with a storage option, consider this as a temporary drive for performing high read intensive tasks. For example you want to run heavy machine learning models across multiple GPU on your GCP project, you can host a FileStore in GCP, keep all your image and text data on this filestore instance, then mouth this filestore instance on all your GPU’s (If your GPU’s are on different clusters then you will have to create a shared VPC Network), in this way you don’t have to keep copies of same data on every GPU and also as filestore provides faster read/write operations it wouldn’t cause your ML operations to slow down. Often a lot of people use Google Drive for storing Machine Learning Image and Text data but google drive is very slow and might not be as time-efficient as you want, but yeah if are low on budget then go for plain old Google Drive, it will still work but filestore is the best choice so far for such tasks. Cloud Filestore is a managed file storage service for applications that require a filesystem interface and a shared filesystem for data. Filestore gives users a simple, native experience for standing up managed Network Attached Storage (NAS) with their Google Compute Engine and Kubernetes Engine instances. The ability to fine-tune Filestore’s performance and capacity independently leads to predictably fast performance for your file-based workloads.

Key Features of Filestore:

● Cloud Filestore offers low latency for file operations.

● With Cloud Filestore, you pay a predictable price for predictable performance. You can pick the operations per second and the storage capacity you need with Filestore, which enables you to tune your filesystem for a particular workload. The performance you experience for a particular workload will be consistent over time.

● Cloud Filestore is a fully managed, NoOps service that is integrated with the rest of the Google Cloud portfolio. You can easily mount Filestore file shares on Compute Engine VMs. Filestore is also tightly integrated with Google Kubernetes Engine so your containers can reference the same shared data.

● Leveraging Elastifile, you can scale file storage elastically to suit the evolving needs of your business. When capacity or performance requirements change, easily grow or shrink your cluster(s) accordingly via the GCP-native GUI or via API-based controls.

Thank You !
My LinkedIn : Visit Me on LinkedIn

--

--

Experienced Full Stack/ML Engineer and passionate Blogger. Highly skilled in ReactJS, NodeJS, ELK Stack, Kubernetes, Computer Vision, NLP, Statistical Analysis.