Latency, and throughout – System Design Concepts
Performance is a crucial part of any System. The better the performance, the faster the requests are processed and the better the user experience. It’s also commonly asked in interviews.
Parameters to measure the performance of a system can be latency, throughput, availability, scalability, and reliability.
1. Latency

Latency refers to the delay between the time a request is made and the time a response is received. You might have experienced it in your day-to-day life, for instance, while some apps respond quickly while some may not. The same can be said for websites. Latency can be measured in milliseconds.
1.1 Is latency crucial for System Design?

Latency is important because it directly affects the user experience. We as a user always expect the system to be responsive and provide real-time feedback for our queries. And if the site is not responsive then it can lead to a poor user experience. Moreover, it also affects the performance of a system, making the system more difficult to scale.
1.2 What causes latency?
You might be thinking what could be the reason for latency right. There could be many reasons for latency in system response. Let’s see a few of the major reasons:
- Network Delays
Network delay occurs due to the time taken by the data packets to travel across the network. This delay can be because of network congestion, routing(time taken by routers to process and forward the packet to the correct path), and distance. - Processing Time
Processing time is the time system takes to process requests or data. Processing time can be significant because of slow machines, large data sets, or inefficient algorithms. - I/O Delays
I/O delays can occur when data is transferred between two components of a system.
It can occur because of slow disk access time or inefficient data transfer protocols. - Software Design
Poorly designed software, inefficient data structure or algorithms, or improper caching implementation can also lead to latency in a system.
1.3 How to improve latency?
Let’s see some important ways by which we could improve latency.
- Caching
We can store the most frequently used data in a cache(cache memory is made of static random-access memory (SRAM) cells) for the faster response of a system. It significantly reduces the latency by providing cached data immediately, without waiting for a response from the backend system. - Content Delivery Networks (CDNs)
We can use CDNs. It is a geographically distributed network of servers and data centers that work together to deliver web content. For instance, when a user requests content from a website or application that uses a CDN, the request is routed to the nearest server or data center in the CDN network, rather than to the origin server where the content is hosted.
CDNs store cached copies of content closer to the end-users, reducing the distance data has to travel and improving latency. This is especially useful for serving static content such as images, videos, and other media files. - Optimize network infrastructure
We can use better networking equipment (routers and switches), to reduce network latency. It is also advisable to use a load balancer to distribute the requests evenly across the servers, to avoid server overloading and thus latency. - Optimize database queries
If the database queries are not optimized it can lead to a long response time. You can use some database query optimizing techniques such as indexing, using proper database schema design, using optimized join queries, and monitoring query performance. - Minimize data transfer
Minimize the amount of data transferred between the client and the server by using compression, reducing the size of images, and avoiding sending unnecessary data.
Overall, improving latency requires a holistic approach to system design, including optimizing network infrastructure, improving database performance, and minimizing data transfer.
2. Throughput

Throughput is the amount of data a system can transfer in a given amount of time. It can be measured in bits per second. Throughput can also be said as the rate at which a system can process, produce or deliver a certain amount of data or work over a given period of time. In system design, throughput is an important performance metric that is used to evaluate the efficiency of a system.
2.1 What causes low Throughput and how to improve throughput?
- Inefficient Algorithms
It is one of the most important reasons which could be causing your system to not perform at its fullest. If the algorithms used to design a system are not fast and efficient, that can lead to smaller information transfer and hence cause lower throughput.
Optimise algorithms for enhanced throughput by clearly understanding the trade-offs between the time complexity and memory space and choosing the algorithms having a lower time complexity. Also, choose the best data structures which best fits your requirement.
One can also enable parallel processing by distributing the load across machines. - Inadequate Hardware Resources
One obvious reason for lower throughput could be the system having inadequate CPU, memory, and network bandwidth.
Ensure the system has adequate hardware resources to handle large volumes of requests, preventing system bottlenecks and improving system throughput.
- Poor Network Design
You can optimize the network architecture by minimizing excessive hops(the number of routers a packet passes while traveling from source to destination), using efficient network protocol(ex: Multiprotocol Label Switching (MPLS), Open Shortest Path First (OSPF), etc) and increase network capacity (by using higher bandwidth network, fiber optic cabling, etc) to maintain smooth data flow and improve throughput. - Synchronization and locking issue
Mitigate lock contention(happens when a thread tries to acquire a lock that is already acquired by another thread) and delays in multi-threaded systems by implementing effective synchronization techniques and minimizing unnecessary locking, enhancing overall system performance and throughput.