Tuesday, April 3, 2012
By Navneet Joneja, Product Manager,
and Ville Aikas, Technical Lead
When evaluating options for cloud storage, customers often wonder, "How can we optimize our storage to get the highest performance possible?". We believe you shouldn't have to, so we do all the optimization for you – enabling you to focus on your application instead of the minutiae of storage optimization.
The performance of cloud storage services (and indeed most web services) depends on two main factors: the network that moves the data between us and the end user, and the performance of the storage service itself.
When you make a request to Google Cloud Storage, one of the key determinants of performance is the network path between you and our servers. This path is critical because if the network is slow or unreliable, it doesn’t really matter how fast the backend is.
There are two main ways to make the network faster:
- Serve the request from as close to the user as possible
- Optimize the network routing between the end-user and the service, including avoiding pockets of network congestion and minimizing the number of network hops between the user and the service.
The other component of system performance is how quickly our servers process your request. The data needs to be managed optimally and once an end-user’s request reaches our servers, we need to serve the request as fast as possible. In a sense, Google Cloud Storage is a gigantic filesystem: authorization checks need to happen, the object in question needs to be looked up, and the data requested needs to be read from the physical storage medium and transferred to the end user, all as efficiently as possible.
So, how do we make sure your requests are served as fast as possible?
- Google Cloud Storage is built on Google’s proprietary network and datacenter technology. We’ve spent more than a decade building out proprietary infrastructure and technology to power Google’s sites (after all, we believe that fast is better than slow). When you use Google Cloud Storage, the same network goes to work for your data.
- We replicate data to multiple data centers and serve an end-user’s request from the nearest data center that holds a copy of the data. We also offer a choice of regions (currently U.S. and Europe) to allow you to keep your data close to where it’s most needed. We then take this one step further. When you upload an object and mark it as cacheable (by setting the standard HTTP Cache-Control header), we automatically figure out how best to serve it using Google’s broad network footprint, including caching it closer to the end-user if possible.
- Finally, you don’t need to worry about optimizing your storage layout (like you would on a physical disk), or the lookups (i.e. directory and naming structure) like you would on most file systems and some other storage services. We take care of all the "file system" optimizations behind the scenes.
Navneet Joneja loves being at the forefront of the next generation of simple and reliable software infrastructure, the foundation on which next-generation technology is being built. When not working, he can usually be found dreaming up new ways to entertain his intensely curious almost-two-year-old.
Ville Aikas likes to work on tools and services that make developers lives easier and "just work". When not busy cranking out code, he loves to play soccer with his kids, build robots and watch F1.
Posted by Scott Knaster, Editor