.NET Core ASP.NET Core C# Dapper Entity Framework Core

Maximizing Entity Framework Core Query Performance

Entity Framework Core query performance is something that comes up often when working on projects that rely on it heavily. I have often heard that Entity Framework is not performant enough which then leads to everything being written as a stored procedure. Usually this happens for two main reasons: developers aren’t familiar with how to write queries in a performant manner and developers that are more comfortable with SQL want to develop everything in their technology of choice. Entity Framework is not a silver bullet for everything. There are times when it simply can’t deliver the performance needed or when it is simply functionally incapable of doing what is necessary due to limitations of the framework. That being said, there is no reason you can’t write the vast majority of your application with it and reap all the benefits it provides.

Handcrafting Queries for Performance

First, let’s start off with one of the more common ways that people write a query with the following:

return new OkObjectResult(await userManager.Users
    .Where(m => m.TenantId == TenantId)
        .AsNoTracking()
    .Include(m => m.Picture)
    .Include(m => m.Claims)
    .Include(m => m.UserGroup)
    .ToListAsync());

By default, Entity Framework Core will not pull back navigation properties unless you include them manually. The call to AsNoTracking() tells Entity Framework to not set up change tracking on the entity which improves performance. In instances where you need to pull back a child property and then pull back the children of the child, you would call ThenInclude() on the child property you included. The problem with this approach is that it will pull back everything on the entity whether you need it or not. To make matters worse, when you call ThenInclude, it tends to then pull back all the navigation properties on it. In my CRM on one of my largest queries I started off by using Include and ThenInclude which resulted in my query pulling back almost a Gigabyte of data as it pulled back all the circular navigation properties.

Needless to say, I ended up rewriting that query by hand to pull back only what I needed. After it was rewritten, the query pulled back 1500 partial Contact objects in less than 2 seconds. This is what it ended up looking like after I removed all the includes:

public async Task GetContacts(Expression<Func<Contact, bool>> where, SalesFusionDbContext context, long tenantId)
{
    var sunday = DateTime.Today.AddDays(-(int)DateTime.Today.DayOfWeek);
    var tenant = await context.Tenant.FindAsync(tenantId);

    return await context.Contact
    .Where(m => m.TenantId == tenantId)
    .Where(where)
    .Select(m => new
    {
        m.Id,
        m.UpdatedBy,
        m.UpdatedByName,
        m.CreatedBy,
        m.CreatedByName,
        m.DateUpdated,
        m.DateCreated,
        m.IsDeleted,
        m.ContactStatusId,
        m.PictureId,
        m.ReferralSourceId,
        m.TerritoryId,
        m.TimeZoneReferenceId,
        m.UserId,
        m.Company,
        m.Email,
        m.Facebook,
        m.FirstName,
        m.Flagged,
        m.GooglePlus,
        m.LastName,
        m.LinkedIn,
        m.OfficeNotes,
        m.Prefix,
        m.PreviousStatus,
        m.RecycleDate,
        m.Suffix,
        m.Title,
        m.Tracking,
        m.Twitter,
        m.Website,
        m.CustomField1,
        m.CustomField2,
        m.CustomField3,
        m.CustomField4,
        m.CustomField5,
        m.CustomField6,
        m.CustomField7,
        m.CustomField8,
        m.CustomField9,
        m.CustomField10,
        m.CustomField11,
        m.CustomField12,
        m.CustomField13,
        m.CustomField14,
        m.CustomField15,
        Activity = m.Activity
            .Select(a => new
            {
                a.Id,
                a.Description,
                a.UserId,
                a.ContactActivityType,
                a.CreatedByName,
                a.CreatedBy,
                a.DateCreated,
                a.Timeline.Color,
                a.Timeline.Icon
            }).OrderByDescending(t => t.DateCreated.Date).ThenByDescending(t => t.DateCreated.TimeOfDay).ToList(),
        BookContacts = m.BookContacts
            .Select(b => new
            {
                b.BookId,
                b.Book,
                b.ContactId
            }).ToList(),
        Locations = m.Locations
            .Select(l => new
            {
                l.Id,
                l.ContactId,
                l.AddressLine1,
                l.AddressLine2,
                l.City,
                l.Country,
                l.Fax,
                l.IsPrimary,
                l.MobilePhone,
                l.Name,
                l.OfficeHours,
                l.PostalCode,
                l.PrimaryPhone,
                l.SecondaryPhone,
                l.State,
                l.CustomField1,
                l.CustomField2,
                l.CustomField3,
                l.CustomField4,
                l.CustomField5,
                l.CustomField6,
                l.CustomField7,
                l.CustomField8,
                l.CustomField9,
                l.CustomField10,
                l.PrimaryPhoneExtension,
                l.SecondaryPhoneExtension
            }).ToList(),
        m.ContactStatus,
        m.Territory,
        CampaignContacts = m.CampaignContacts
            .Select(cc => new
            {
                cc.CampaignId,
                cc.ContactId,
                cc.Campaign
            }).ToList(),
        HasFollowUp = tenant.PatientModeEnabled ? m.Patients.Any(f => f.FollowUp.Value.Date <= DateTime.Now.Date) : false,
        HasOverdue = m.SalesTasks.Any(s => s.DateScheduled.Date < DateTime.Now.Date && !s.IsCompleted),
        HasDueToday = m.SalesTasks.Any(s => s.DateScheduled.Date == DateTime.Now.Date && !s.IsCompleted),
        HasPending = m.SalesTasks.Any(s => s.DateScheduled.Date >= sunday.Date 
            && s.DateScheduled.Date <= sunday.AddDays(6).Date && !s.IsCompleted),
        AssignedTo = m.UserId == null ? "" : $"{m.User.LastName}, {m.User.FirstName}"
    }).Take(1500).ToListAsync();
}

To be sure, this approach takes a lot more work than just calling Include(), but it is infinitely more performant than the former approach. It is also still easier to write and manage than writing a stored procedure. That being said, for small queries on simple objects you can get away with using Include(). Notice I am returning dynamic from this query. There’s a significant performance implication of doing this that we’re going to discuss next.

The True Cost of Entity Framework is the Mapping

We live in the age of JSON and web APIs now. The idea of strongly typed contracts that are published to work with a WCF service are a thing of the past. When you finish your query and send it back through Web API it is going be converted to JSON one way or another. That strongly typed object you were working with on the server side has no meaning on the client side, and as far as the client is concerned, the only thing that exists is the JSON. The true performance cost in Entity Framework Core is the mapping to the entity, not the SQL query itself. That’s not to say that Entity Framework doesn’t mangle the SQL sometimes, but often that’s a result of either the way you wrote the query or your database schema.

If you have no need to manipulate the result of the query before returning it to the client, then you are just wasting a huge performance gain by not returning an anonymous type and crafting the query with only anonymous types. The previous query the I wrote before that was returning 1500 records in less than two seconds was taking between four and six seconds when I was mapping everything to entities. The more complex the object and the more records you are pulling, the greater the hit to performance this causes. If you are improperly using DTOs between your data implementation and your API as I described in this post, then you are doubling up on the mapping and taking an even larger hit to your performance. Do yourself a favor and work against anonymous types rather than entities when working with performance critical queries.

There is no Shame in the Stored Procedure

With all that said about maxing out query performance, don’t be ashamed if you reach for the nuclear option and write a stored procedure. There are instances where you simply can’t do what you need to do in an Entity Framework query and must do it in a stored procedure. Don’t use the stored procedure with Entity Framework though. I would suggest instead that you reach for Dapper given that there is so little to gain by doing it with EF. Plus, you get a small performance boost by using Dapper. Here’s an example of what using Dapper looks like:

public class CheckForDuplicateOnNewContactRequest : Request, IRequest, 
    IRequestHandlerAsync<CheckForDuplicateOnNewContactRequest, IActionResult>
{
    public Contact Contact { get; set; }

    string connectionString;

    public CheckForDuplicateOnNewContactRequest(IOptions<ConnectionStrings> connectionStrings)
    {
        connectionString = connectionStrings?.Value?.SalesFusionConnection;
    }

    public async Task<IActionResult> HandleAsync()
    {
        using (var connection = new SqlConnection(connectionString))
        {
            dynamic results;

            if (Contact.Locations?.Count > 0)
            {
                results = (await connection.QueryAsync("usp_CheckForDuplicateOnNewContact", new
                {
                    TenantId,
                    Contact.FirstName,
                    Contact.LastName,
                    PrimaryPhone = Contact.Locations[0]?.PrimaryPhone ?? "",
                    SecondaryPhone = Contact.Locations[0]?.SecondaryPhone ?? "",
                    Fax = Contact.Locations[0]?.Fax ?? "",
                    AddressLine1 = Contact.Locations[0]?.AddressLine1 ?? "",
                    State = Contact.Locations[0]?.State ?? ""
                }, commandType: CommandType.StoredProcedure)).AsList();
            }
            else
            {
                results = (await connection.QueryAsync("usp_CheckForDuplicateOnNewContact", new
                {
                    TenantId,
                    Contact.FirstName,
                    Contact.LastName,
                    PrimaryPhone = "",
                    SecondaryPhone = "",
                    Fax = "",
                    AddressLine1 = "",
                    State = ""
                }, commandType: CommandType.StoredProcedure)).AsList();
            }

            if (Contact.Locations?.Count <= 1)
                return new OkObjectResult(results);
            else
            {
                for (var i = 1; i < Contact?.Locations.Count; i++)
                {
                    var result = await connection.QueryAsync("usp_CheckForDuplicateOnNewContact", new
                    {
                        TenantId,
                        FirstName = "",
                        LastName = "",
                        PrimaryPhone = Contact.Locations[0]?.PrimaryPhone ?? "",
                        SecondaryPhone = Contact.Locations[0]?.SecondaryPhone ?? "",
                        Fax = Contact.Locations[0]?.Fax ?? "",
                        AddressLine1 = Contact.Locations[0]?.AddressLine1 ?? "",
                        State = Contact.Locations[0]?.State ?? ""
                    }, commandType: CommandType.StoredProcedure);

                    if (result != null)
                        results.Add(result);
                }

                return new OkObjectResult(results);
            }
        }
    }
}

This goes and checks for duplicate contacts using a stored procedure and dapper. I originally tried to create this as an EF query, but ran into too many issues due to limitations within Entity Framework. Dapper is pretty close in performance to raw ADO.NET. If you do need every last millisecond of performance, don’t be afraid to skip Dapper and just use ADO.NET. Reach for Entity Framework first, and if you have to, then go for Dapper and a stored procedure, and finally ADO.NET as an option of last resort.

Author

Sean Leitzinger

Comment (1)

  1. David
    February 10, 2020

    Thank you for your article. It was very helpful. I made some tests and I couldn’t find a case in my use case where the anonymous types were significantly more performant than AsNoTracking but I could see a trend. In my case there is an small performance boost if you use both anonymous types and AsNoTracking. I can’t explain why but it was significative (about 10-15%). So I compared normal AsNoTracking vs. anonymous types + AsNoTracking, and I got this results:

    Single Entity
    26 ms vs. 34 ms

    Single Entity + Navigation Property (Include)
    77 ms vs. 71 ms

    Single Entity + List Navigation Property (Include on List)
    63 ms vs. 62 ms

    Single Entity + List Navigation Property + Navigation Property (Include + ThenInclude)
    81 ms vs. 71 ms

    As you can see as you increase the complexity of the data structure the anonymous type option starts to become the most performant option. I think that until 3 levels you should be fine with Include + ThenInclude. But later you should start looking at the anonymous type alternative.

    Another strategy I’m using to improve performance is to avoid quering the navigatin properties with include and only fill them as I need them with:
    ctx.Entry(entity).Collection(e => e.NavigationListProperty).Load();
    or
    ctx.Entry(entity).Reference(e => e.NavigationProperty).Load()

    Note: You don’t need to use AsNoTracking multiple times in the query. As long as you apply it once it’s applied to all objects returned.

Leave a comment

Your email address will not be published. Required fields are marked *