SQL Server Architecture: An In-Depth Analysis

MS SQL Server Architecture

Microsoft SQL Server follows a client-server database architecture, where the SQL Server process initiates when a client sends a request to the server. The SQL Server then processes the request and, finally, responds to the client with the results of the processed data or any relevant information.

There are three main components of MS SQL Server, and let's explain the each component in detail.

1. SQL Server Network Interface (Protocol Layer)

2. Relational Engine

3. Storage Engine

 1. SQL Server Network Interface (SNI) Layer

It is referred to as the SQL Server Network Interface, and it contains SQL Server protocols. The following are the primary protocols associated with SQL Server:

a.    Shared memory

b.    Named pipes

c.    TCP/IP

d.    TDS (Tabular Data Stream)

a.   Shared Memory

Shared memory is the simplest protocol and is used by default for local connections when the client application and the SQL Server are located on the same machine.

b.   Named Pipes

Named pipes is a protocol that can be used if the client application and the SQL Server are connected via a local area network (LAN). By default, this protocol is disabled and default port is 445 for named pipes.

 c.   TCP/IP

TCP/IP is the primary protocol used to connect to SQL Server from a remote machine using an IP address and a port number. The default TCP port for SQL Server is 1433. This protocol is required when the client application and the SQL Server are on separate machines.

 d.   TDS (Tabular Data Stream)

TDS, or Tabular Data Stream, is a crucial component in the communication process between client and server in SQL Server environments. All three protocols—Shared Memory, Named Pipes, and TCP/IP—utilize TDS packets, which are encapsulated within network packets. This encapsulation facilitates the seamless transfer of data from the client machine to the server machine.


2. Relational Engine

The Relational Engine, also referred to as the Query Processor, is responsible for handling the execution of queries as they are received from client applications.

There are three major components of the relational engine.

a. CMD Parser (Query Parser)

b. Optimizer

c. Query Executor

 a.   CMD Parser (Query Parser)

The CMD Parser is the first component of Relational Engine which known as Query Parser. It receives the query data that sent from the protocol layer. The main task of CMD Parser is to check syntactic and semantic errors and finally generate a query tree.

Syntactic Error: Each database has its own set of keywords and grammar rules. When a query is received by the CMD parser, it first checks the keywords and grammar rules. If the received query doesn't follow to the language syntax or grammar rules, it returns an error.

Semantic Errors: The semantic check analyzes whether the queried table and column exist in the database schema. If they exist, the process of binding associates them with the query. This association is known as binding. The CMS Parser throws an error message if the queried table and column do not exist.

b.   Optimizer

The work of the optimizer is to create an execution plan for the user’s query. This is the plan that will determine how the user query will be executed. Not all query need to optimize only DML commands such as SELECT, INSERT, UPDATE, and DELETE that are marked and sent to the Optimizer. The calculation of their cost is based on CPU and memory usage, as well as input/output needs.

 The Optimizer follow three phases. Details are given below:

Phase 0: Trivial Plan

Initially, the Optimizer checks for a straightforward plan. If there is none available, it moves on to the first phase.

Phase 1: Transaction Processing Plan

In this step, the goal is to find the best transaction processing plan. Initially, a straightforward plan with one index per table is considered. If such a plan isn't sufficient, the Optimizer looks for a more complex plan involving multiple indexes per table.

Phase 2: Parallel Processing & Optimization

If the Optimizer can't speed up the query using the mentioned methods, it turns to parallel processing, considering the capabilities and setup of the user's machine. If that's not feasible, the final optimization kicks in, aiming to explore any other potential options to execute the query in the most cost-effective way.

 c.    Query Executor

Finally, the Query Executor calls the access method, initiating the retrieval of necessary data from the storage engine. Once this data is obtained, it is sent back to the protocol layer. Ultimately, the data is transmitted to the end user.

3. Storage Engine

The Storage Engine's primary task is to manage the storage and retrieval of data from storage systems such as disks or SANs. It includes a buffer manager and a transaction manager, both of which collaborate to interact with data and log files based on the requirements of the query.

Data File Types

SQL Server stores its data, including user and system tables, indexes, and SQL code such as stored procedures, views, and functions, in various types of data files. These data files physically store data in pages, and each page has a size of 8KB, which is the smallest data storage unit in SQL Server.

There are three data files.

a.    Primary file

i. Every database contains one Primary file.
ii. This store all important data related to tables, views, Triggers, etc.
iii. Extension is .mdf usually but can be of any extension.   

Ø      b.    Secondary file

    i. Database may or may not contains multiple Secondary files.
    ii. This is optional and contain user-specific data.
    iii. Extension is .ndf usually but can be of any extension.

c.    Log file

    i. Extension is .ldf
    ii. Used for Transaction Management. 
    iii. This is used to recover from any unwanted instances. Perform important task of Rollback to    uncommitted transactions.         

Access Method

It acts as an interface between the query executor and the Buffer Manager/Transaction Logs, determining whether the query is a SELECT or a non-SELECT statement.

a. If the query is SELECT statement, the query is pass to the Buffer Manager for further processing.
b. And if query if NON-SELECT statement, the query is pass to Transaction Manager. This mostly includes the UPDATE statement.

Buffer Manager

The core functions of buffer manager are given below:

a.  Plan Cache

b. Data Parsing: Buffer cache & Data storage

c. Dirty Page


a.    Plan Cache


The Manager first checks if there is an existing execution plan in the Plan Cache. If it finds one, it uses that plan along with its related data. However, for the first-time execution of a complex query, it's efficient to store the plan in the Plan Cache. This way, the SQL server can quickly access it the next time the same query comes in, ensuring faster availability.


b.    Data Parsing: Buffer cache & Data storage

The Buffer Manager allows access to the needed data. If there is already an execution plan in the Plan Cache, it is a soft parsing scenario. The data is available in the Data Cache and utilized by the Query Executor. This approach offers improved performance due to fewer I/O operations. However, in the absence of an execution plan in the Plan Cache, we encounter hard parsing. In this case, data has to be fetched from the storage.


c.    Dirty Page


 It is saved as a processing logic of Transaction Manager. More details about this will be covered in the Transaction Manager section.


Transaction Manager

 The Transaction Manager is invoked when the query comprises a non-SELECT statement. 

Log Manager

  1. Log Manager keeps a track of all updates done in the system via logs in Transaction Logs.
  2. Logs have Logs Sequence Number with the Transaction ID and Data Modification Record.
  3. This is used for keeping track of Transaction Committed and Transaction Rollback.

Lock Manager

  1. During Transaction, the associated data in Data Storage is in the Lock state. This process is handled by Lock Manager.
  2. This process ensures data consistency and isolation. Also known as ACID properties.

Execution Process

  1. Log Manager Start logging and Lock Manager locks the associated data.
  2. Data’s copy is maintained in the Buffer cache.
  3. Copy of data supposed to be updated is maintained in Log buffer and all the events updates data in Data buffer.
  4. Pages which store the data is also known as Dirty Pages.
  5. Checkpoint and Write-Ahead Logging involve running a process that marks all dirty pages to be written to the disk, although the pages still stay in the cache. This process occurs roughly once per minute. However, before reaching the data page on the log file, the page is initially sent from the Buffer log. This mechanism is commonly known as Write-Ahead Logging.
  6. Lazy Writer handles situations where dirty pages can stay in memory. When the SQL server detects a significant load and requires buffer memory for a new transaction, it releases dirty pages from the cache. It follows the LRU (Least Recently Used) algorithm to clean pages from the buffer pool to disk.
  7. Log Manager keeps a track of all updates done in the system via logs in Transaction Logs.
  8. Logs have Logs Sequence Number with the Transaction ID and Data Modification Record.
  9. This is used for keeping track of Transaction Committed and Transaction Rollback.

Post a Comment