WHAT HAPPENS WHEN WE SEARCH GOOGLE.COM ON THE WEB?
Ever wondered what magic goes into reaching a site on the web? Well, it is abstract, but nothing totally out of mind. Here’s a basic explanation of what goes into sending your request to reach a site or endpoint and getting a response on the internet.
Domain Name System, simply put, the DNS is the phonebook of the internet. If you’re trying to reach a location, you would have to know the address, yeah? Imagine you have to reach hundreds of endpoints, and you have to remember all the addresses of those points. DNS eliminates the need for us to memorise these addresses. Web browsers interact through Internet Protocol (IP) addresses. DNS translates domain names to IP addresses so browsers can load Internet resources.
HyperText Transfer Protocol is a protocol used to transfer data over the internet. When you access a website, your browser sends a request to the site’s web server, and it responds with an HTTP status code.
Secure Sockets Layer (SSL ) is a security protocol that creates an encrypted link, in the form of a certificate, between a web server and a web browser.
Domian Name Ssytem (DNS)
Every operating system has its own configuration, which tells them what DNS server to ask for lookups(i.e., searching for google.com). A DNS server is one with a database, a key to the value store, the key is the website name, and the value is the website’s IP address.
Every domain name has to be registered against a DNS provider. Suppose you purchase a domain name from cloud flare. In that case, cloud flare becomes the DNS provider, and any sub-domain within that domain name gets registered against the cloud flare DNS provider. The question now is how do other DNS providers have this information, so there’s no confusion.
There is this process known as domain name registration. During this registration, what happens is there is this “Global Dictionary” everyone has access to, which tells every DNS provider which handles which domain name. The dictionary specifies what server to ask for each domain name, so it’s like a mapping of the domain name and the DNS server that manages those domain names.
When you visit ‘google.com,’ cloud flare checks its cache(data store memory) to see if someone has checked for ‘google.com’ before, if no one has, cloud flare will go take a look at the DNS registration for that domain and try to figure out which server manages the record for that domain name, and then it asks the server for the address of google.com, the server returns the address to Cloudflare, Cloudflare then caches(stores it in its data store memory) it and then returns the address to you, so the next time this request is made the cache is checked instead of making a round trip repeating the process again.
There’s a TTL (time to leave) associated with every domain name. Most of the time, it’s an hour. After 1 hour, it invalidates its cache, the reason for this being updates that could have been made to the domain’s name IP address. Whoever makes a request after an hour will make Cloudflare make that roundtrip again.
When cloud flare returns the IP to your system, your system also caches it, so it doesn’t have to go back to cloud flare, and that’s how the DNS is resolved.
When we visit a random site and get a “this site cannot be reached” error, automatically, we know it’s not coming from the application you’re trying to reach because the traffic hasn’t even gotten to the application. We now know the error is coming from resolving DNS.
Now that your server knows what IP address you’re asking for, how does your traffic get to the server?
HyperText Transfer Protocol (HTTP)
Our system tries to establish a connection with the server. To do that, it needs to go between different networks, an action known as “hops,” it sends a Transmission Control Protocol(TCP) data packet(information), which has a header containing the source, ID of the packet, and a body containing the data you want to send. The TCP packet is generated within our system, and the data we want to send out is broken down into packets. When the packet arrives at its destination server, it will be assembled according to its ID to get the complete message.
Our systems’ network doesn’t know how to get to the source; however, what it knows is the list of networks it is directly connected to, so it sends the packet to the next network over a hop(traveling from one source to one destination) which it is directly connected to, so the next network hands it over to the next nearest network.
The image above has 64 hops, which means that after these hops are completed and it hasn’t reached its destination, it’s going to return a timeout error. However, the number of hops can be adjusted. Different packets follow different routes.
When the server gets to its destination server, the destination server replies with a handshake acknowledging that it has received the messages and that the data can begin to come in. Also, the acknowledgments packets will be sent back via a different route to the source destination. Hence, a connection is established. This is known as a handshake, then we can begin sending the webpage that we see when we type google.com.
Hypertext Transfer Protocol Secure(HTTPs)
HTTPS simply tells us that the information being transported is secure. An SSL certificate is required to make a site HTTPS. A certificate has a private and a public key. During the handshake, the destination server sends its public key to the source server, and any message it sends to it will have to be decrypted with the public key. If there is a man in the middle attack ( a person intercepting the packet), they wouldn’t be able to decrypt your message because it will be encrypted with the private key, SSL is basically like an extra layer of security during communication that involves exchanging keys
Although this is not the typical starter documentation, at least i’ve proved it’s not magic!