Entity Framework Core SQL Server

Index on SQL Server Guid Column

If you’re working with SQL Server, chances are good that you have come across the need to create an index on a guid column. In some cases, although I don’t advise it, guids are used as the primary keys on every table. This typically results in a clustered index on these columns. Other times you may need to use a unique identifier as a secondary Id on an entity for business reasons, such as an employee id. Regardless of which type of index you use or your reasons for doing so, index fragmentation becomes a bigger issue with guids.

Indexes and Fragmentation

Before we get into what index fragmentation is and why you should care, lets go over quickly how SQL Server stores data. SQL Server uses a unit of data called pages and each page is 8KB. Extents represent eight pages and all pages belong to an extent. When you insert data into a table the data will be added to an existing page that has enough space. If this is not possible, a new page will be created.

The problem of fragmentation arises due to the inserting, updating and deleting of records. There are two types of fragmentation: internal and external. In the case of internal fragmentation, fragmentation is measured by the amount of free space. If a page is 100% full then it has no fragmentation. As we delete records from a page, we increase the fragmentation. When a query is run on an index with high internal fragmentation, the total amount of I/O is greater than it should be. What could have been stored on ten pages is instead spread out across twenty pages and thus more pages are scanned than necessary.

External fragmentation on the other hand is caused by the logical ordering of the pages not matching the physical order. If we have ten pages with free space and we perform inserts, SQL Server will attempt to use pages that have free space. Over time as we update the row, SQL Server may end up adding data to pages that are not sequentially in order with the pages. As an example: We have 5 pages and we insert a record that is spread across pages two and three. Data is deleted from page one. We then update our previous record and instead of adding data to pages four and five, data is added to page one.

The Problem with an Index on a Guid Column

Besides increasing the amount of space needed due to their increased size, guids also cause increased internal fragmentation from their non-sequential nature. In the case of clustered indexes, data is sorted and stored based on the the key value. Since unique identifiers are random and non-sequential by nature the data is not inserted in order which in turn leads to extra page splits.

Non-clustered indexes will result in a smaller record which will in turn result in more data per page and potentially reduce the amount of page splits. However, the random nature of guids will still cause increased fragmentation.

Using Sequential Ids

If you need to use a guid and don’t need a completely random generated value, then using newsequentialid on SQL Server will help to reduce the amount of fragmentation in your indexes. Be aware though, this is not for use in cases where security is concerned due to the sequential nature of the value generated. As you can see in the image below, there is very little difference between each inserted in guid:

index

As you can see, each id has very little difference between the previous. This is a much better choice for indexes, but a terrible choice if you have security concerns since these can obviously be guessed based on previous ids.

If you want to use sequential ids and an index (in this case non-clustered) with Entity Framework Core, you can do so easily with the following code:

builder.Property(p =>p.EmployeeId).HasDefaultValueSql("newsequentialid()");
builder.HasIndex(p => p.EmployeeId).HasName("IX_Employee_EmployeeId").ForSqlServerIsClustered(false);

From there, it would be a good idea to set your C# model Guid to a private set to reduce the chance that it is generated from C#:

public Guid EmployeeId { get; private set; }

If your guid is a business object that has behavior, consider changing it from a primitive to a value object.

Defragmenting SQL Server Index

Start off by running the following script which will pull back the indexes with a fragmentation greater than 30%. I got this script from here. You can of course just remove the where clause to pull back everything if you like, but the rules on defragmentation make this useful as you will soon find out. Change the where clause to WHERE indexstats.avg_fragmentation_in_percent > 5 AND < 30 to check the second rule.

SELECT OBJECT_NAME(ind.OBJECT_ID) AS TableName, 
ind.name AS IndexName, indexstats.index_type_desc AS IndexType, 
indexstats.avg_fragmentation_in_percent 
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats 
INNER JOIN sys.indexes ind  
ON ind.object_id = indexstats.object_id 
AND ind.index_id = indexstats.index_id 
WHERE indexstats.avg_fragmentation_in_percent > 30 
ORDER BY indexstats.avg_fragmentation_in_percent DESC

If the percentage of fragmentation is greater than 30%, then run INDEX REBUILD WITH (ONLINE = ON). If the index fragmentation is greater than 5% and less than 30%, run ALTER INDEX REORGANIZE. For more information on rebuilding and reorganizing indexes I recommend reading the Microsoft page on the subject. To do this in SQL Server Management Studio, just go to the index and right click it. Then choose either Reorganize or Rebuild based on the previous rules.

You can find the code for this post here.

Author

Sean Leitzinger

Comment (1)

  1. Jeff Moden
    December 23, 2021

    Hi Sean,
    First, I appreciate anyone that steps up to the plate to share their knowledge. Thank you for being one of the good guys there.

    Shifting gears to the subject at hand, watch the following video. When you’re done watching, you might want to make a revision to this article. Be wary of time marks 15::00, 30:15, and 45:25. They inserted some sudden, loud, and annoying ads and they are headset lifting moments.

    What’s in the video? I destroy the myth of Random GUID fragmentation, prove that the alternative of using NEWSEQUENTIALID() is probably not the best choice, lay waste to what most people have used as “Best Practice” Index Maintenance (especially for GUIDs) for more than 2 decades, and provide much better alternatives resulting in being able to insert 100,000 rows per simulated day for 58 simulated days for a total of 5.8 million rows with (get this) almost no page splits, less than 1% fragmentation, and no index maintenance for the entire 58 days. Based on your obvious writing skills, I think you’re going to like the ‘tube a whole lot. And, yes… if you have any comments about the presentation, I’d love to hear them.

    Again, nice article based on what we’ve all been taught in the past but I believe you’ll want to change this good article. Thanks for hearing me out. Heh… I aim to please… I sometimes miss but I’m always aiming. 😀

Leave a comment

Your email address will not be published. Required fields are marked *