For those of us who use the internet on a regular basis, Google is the great answerer of interrogatives. Have a question? Be it common (What is the difference between an acid and a base?) or more obscure (How do monkeys go about peeling bananas?), Google is sure to turn up an answer that – if nothing else – points you in the right direction. There are other search engines, of course; but none have been verbed (think “Google it”) or even been made an official term in the Oxford Dictionary, like Google. There may have been a time when peers encouraged one another to “Ask Jeeves” or “Yahoo it,” but that time is long gone. Google is the search king in terms of U.S. market share at 65.6% (with Microsoft’s Bing at 16.5%).
Apart from search, Google operates its own social network (Google+), Gmail, an advertising business that fetched over $43 billion in revenue in 2012, and a host of other products and services. This begs the question: With all of these products/services and the unthinkable amount of data that come with them, how does a company like Google go about storing its information? If we get a little meta and turn to Google with our question, we learn that our answer lies in the functionality of thousands upon thousands of servers. In August of 2011, Data Center Knowledge reported that the number is close to 900,000. Pretty remarkable, right?
These servers don’t all serve the same purpose, of course. Instead, each server has designated tasks. Let’s look at some of Google’s server types and the tasks they are responsible for carrying out.
1. Web Servers
Google’s web servers are those that will probably resonate most with the common user, as they are responsible for handling the queries that we enter into Google Search. When a user enters a query, web servers carry out the process of interacting with other server types (e.g. index, spelling, ad, etc.) and returning results/serving ads in HTML format. Web servers are the ‘results-gathering’ servers, if you will.
2. Data-Gathering Servers
Data-gathering servers do the work of collecting and organizing information for Google. These servers “spider” or crawl the internet via Googlebot (Google’s web crawler), searching for newly-added and existing content. These servers have the responsibility of indexing content, updating the index and ranking pages based on Google’s search algorithms.
3. Index Servers
Google’s index servers are where a lot of the “magic” behind Google Search happens. These servers are responsible for returning lists of document IDs that correspond to “documents” (or indexed web pages) wherein the user’s query is present.
4. Document Servers
Document servers store the document version of web page content. Each page has content saved in the form of JPEG files, PDF files, and more, all of which is stored in several servers depending on the type of information. Document servers provide snippets of information to users based on the search terms entered and are capable of returning entire documents, as well.
The document IDs returned by index servers correspond to documents housed by these servers. Due to the influx of indexed documents each and every day, these servers require more disk space than others. If we were to answer the question – Where does Google store its data? – with one server type, it’d most certainly be the document server.
5. Ad Servers
Ad servers are vital to both Google’s revenue stream and the livelihood of thousands of businesses. These servers are responsible for managing the advertisements that are integral to Google’s AdWords and AdSense services. Web servers interact with these ad servers when deciding which ads (if any) should be displayed for a particular query.
6. Spelling Servers
We didn’t all get A’s in spelling during school and some of us need a little help when searching. If you have ever searched for something in Google and the results came up with the phrase, “Did you mean correctspelling,” know that a spelling server was at work. No matter how search terms are entered, spelling servers work to perform the search anyway, taking advantage of the opportunity to learn, correct and better locate what users seek.
In looking at these six server types, we can begin to understand how and where Google stores its data. The infrastructure is fairly complex of course, and we’ve only scratched the surface. However, the next time you execute a Google search, you might do a little more thinking about what exactly is happening behind the scenes.