Categories: Cloud Computing

What is Google file system?

 

Google File System (GFS)

Google requires a strong and very huge data storage system for storing great deal of data and catering its users the ability to access, create and alter data. Google does not manages all this through a large distributed computing environment which is equipped with high power computers. Google manages all the data through its exclusive Google File System (GFS) which is based on the principle of utilizing the capabilities of inexpensive commodity components and allowing hundreds of clients to access the data.
Since GFS deals with large data files, so the core concerns for the programmers were the manageability, scalability, fault tolerance and consistency of the system. GFS was designed by the programmers in a way that it could easily manage large data files and also provide quick access to the users for their desired documents.

 

 

Structure of Google File System

It is a vivid fact that the manipulation and accessing of large data files is a time-consuming task and takes up a great deal of network bandwidth. So in order to handle large data files efficiently and allow less access time for users, GFS stores data files by dividing them into chunks of 64 megabytes (MB). Each chunk has a unique identification number (chunk handle) and chunks are replicated on different computers to cater failures. Moreover, chunks also have checksum within them to ensure data integrity.
Google file system contains clusters of computers and within each clusters there is one master server, several chunk servers and several clients. Each file chunks is replicated thrice on different chunk servers, to attain high level of reliability. One replica is called the primary one while the other two are called secondary ones.
The master stores the file system metadata, which include information regarding mapping from files to chunks, current chunk location, namespace and access control information. The master server communicates with chunk servers through Heart Beat messages. Clients are the Google Apps, or Google Docs etc. which place file requests. The chunk servers do not transfer the requested file to the master server. Instead, the chunk servers directly transfer the requested file to the client.

Related Post

 

Working of Google File System

Google file system works by using two core elements, one is lease and the other is mutation. Mutation includes the changes made to the chunk in a write or append operation. Lease is used for maintaining consistent mutation order across all the replicas. The primary replica is given the chunk lease by the master server. The primary replica picks up a serial mutation order which is followed by the other secondary replicas too. Thus the lease grant order chosen by the master defines the global mutation order and within the lease the serial numbers assigned by the primary define the order of mutations. In GFS a write request by the clients follows the sequence of these numbered steps:

 

1. The client inquires the master about which chunkserver holds the current lease for the chunks and also the location of other secondary replicas.
2. The master server replies back with the location of the primary and secondary replicas. This location is cached at the client side for future mutations, except in cases when the primary replicas becomes out of reach or does not contain the lease.
3. The client pushes the data to the replicas and then sends a write request to the primary replica.
4. The primary replica assigns serial numbers to the mutations and forwards the same serial mutation order to the other secondary replicas.
5. The secondary replicas reply back to the primary intimating that they have completed the write request in the same order as supplied by the primary.
6. The primary replica then informs the client about the completion of write request and incase of errors, also reports them.

 

To Read further, click here




  • Tags: cloud
    Mikel

    View Comments

    • @Mikel Google which is perhaps the largest collector of 'data' has to have system like this for its efficient working. Must say, this is too techie for me to understand but that Google needs this system is perfectly understood (lol)

    Recent Posts

    Heart Attack Causes and its Solution

    What is the Main Cause of a Heart Attack? What is its Solution? A heart attack is the blockage of… Read More

    4 months ago

    Understanding the Debt Ceiling: Its Impact, Importance, and Implications

    In the vast economic arena, one term that often takes center stage, inciting extensive debates and discussions, is the "debt… Read More

    9 months ago

    De-Dollarization: The New World Order of Currency and Its Global Impact

    De-Dollarization: The Changing Face of Global Finance The financial landscape is in a state of flux, with an intriguing economic… Read More

    10 months ago

    Unstoppable Bayern Munich: The Story Behind Their 11th Consecutive Bundesliga Title

    The curtains closed on a dramatic Bundesliga season with Bayern Munich standing tall once again, clinching their 11th straight title.… Read More

    10 months ago

    Celine Dion Cancels Concert Tour Due to Deteriorating Stiff-Person Syndrome

    The Unfolding Story of Celine Dion's Health In recent news that has left fans across the globe stunned, iconic singer… Read More

    10 months ago

    Navigating the Crossroads: LeBron James, Anthony Davis, and the LA Lakers’ Uncertain Future

    As the echoes of the recent NBA season start to fade, the attention of enthusiasts is firmly glued to one… Read More

    10 months ago