ORMs, the “Vietnam” of Software Development?…

… not sure myself, but certainly it is a sneaky creature.

I will not talk about all ORMs (as if I could) since I want to exemplify on what gave me grief.
I was using Linq2SQL to perform some “simple” queries in an application.

I wish not to discuss:

  1. whether Linq2SQL is an ORM at all. For me it is and we might discuss its shortcomings and debate o semantics.
  2. why I am using a “walking zombie” of a technology. For me, i is not an “un-dead”. Linq2Sql does the job I am asking it to do efficiently enough (or so I thought) and the bits are not going to vanish all of a sudden from the framework.

My intention is neither debating nor ranting (I do that more than anything apparently), but leaving the grumpy-not-so-old man in me behind and offer the reader some knowledge (for a change).

The Model

Let’s say we have a very simple model:

Model

There is a Subject class that has a Parent relationship and multiple children: Child and OtherChild. Fairly simple if you ask me.

As usual, all code will be provided in Google Code. Since we are dealing with databases you will have to have a SQL Server database named SneakyORMs with the schema specified in SneakyORMs.sql.
For query execution and analysis I am going to use the must-have tool if you are dealing with LINQ (in any of its flavors): LinqPad. If you do not know the tool, go ahead and download it, it is free but you can activate Intellisense for a very reasonable amount of money.
First things first, let’s insert some data in the system, executing initData.linq after the connection to the newly created database has been set up and verifying that there is indeed some data AllEntities.sql.

AllEntities

A Soup of Queries

Alright then, we have all nuts and bolts in place. Let’s do something interesting. Let’s query the Parent_Subject relationship with the query Parent_Subject.linq:

Parents.Select(p => new { P = p, S = p.Subjects }).Dump();

Parents.Where(p => p.Id == Parent.One).Dump();

Subjects.Where(s => s.Id == 11).Dump();

This query displays all Parents and Subjects, but also displays a given Parent and a given Subject. How many queries do you think are going to be executed? Stop guessing and start asking LinqPad:

SELECT [t0].[Id], [t0].[Name], [t1].[Id] AS [Id2], [t1].[Description], [t1].[ParentId], (
    SELECT COUNT(*)
    FROM [Subjects] AS [t2]
    WHERE [t2].[ParentId] = [t0].[Id]
    ) AS [value]
FROM [Parents] AS [t0]
LEFT OUTER JOIN [Subjects] AS [t1] ON [t1].[ParentId] = [t0].[Id]
ORDER BY [t0].[Id], [t1].[Id]

Only one. Thinking about it, it makes sense as we already got all the information required in out first query and Linq2Sql offers some “identity map” capabilities in the DataContext, So it is clever enough to figure out that both the Parent and the Subject have been retrieved before, and it does not go to the database again.

Let’s  move on and place out attention in Subject, by querying the Subject_Children relationship. Let’s execute the Subject_Children.sql:

Subjects
.Where(s => s.ParentId == Parent.One)
.Select(s => new{ S = s, C = s.Children }).Dump();

Subjects.Single(s => s.Id== 13).Dump();

Childs.Single(c => c.Id == 131).Dump();

As before, we have 3 LINQ queries and in the first one we retrieve all the information. Anyone dares to take a guess on how many queries are actually issued? Those of you who thought “one” could not be more wrong (I was one of you, if that gives you any relief, not that it should, though…). Those of you who thought “not one, you are trying to trick us” are closer, but it is a pretty lame thought, mind you. I would only give credit to those who thought “two” and are able to explain it to me because it is beyond my comprehension. Not kidding, leave a comment explaining it to me and we will see what I can do in terms of “credit”. LinqPad says:

DECLARE @p0 UniqueIdentifier = '11111111-1111-1111-1111-111111111111'

SELECT [t0].[Id], [t0].[Description], [t0].[ParentId], [t1].[Id] AS [Id2], [t1].[Data], [t1].[SubjectId], (
    SELECT COUNT(*)
    FROM [Children] AS [t2]
    WHERE [t2].[SubjectId] = [t0].[Id]
    ) AS [value]
FROM [Subjects] AS [t0]
LEFT OUTER JOIN [Children] AS [t1] ON [t1].[SubjectId] = [t0].[Id]
WHERE [t0].[ParentId] = @p0
ORDER BY [t0].[Id], [t1].[Id]
GO

DECLARE @p0 Int = 131

SELECT [t0].[Id], [t0].[Data], [t0].[SubjectId]
FROM [Children] AS [t0]
WHERE [t0].[Id] = @p0

Doh! After all Linq2Sql is not clever enough to figure out that the Child with id 131 was already part of the result-set was fetched before.

Let’s dig a bit deeper and focus on only one Subject and its Parent SingleSubject_Parent.linq:

var subject = Subjects.Single(s => s.Id== 13);
subject.Dump();
subject.Parent.Dump();

Parents.Single(p => p.Id == Parent.One).Dump();

DECLARE @p0 BigInt = 13

SELECT [t0].[Id], [t0].[Description], [t0].[ParentId]
FROM [Subjects] AS [t0]
WHERE [t0].[Id] = @p0
GO

DECLARE @p0 UniqueIdentifier = ‘11111111-1111-1111-1111-111111111111’

SELECT [t0].[Id], [t0].[Name]
FROM [Parents] AS [t0]
WHERE [t0].[Id] = @p0
Uhm, two queries? Makes some sense, since we are asking for something we had not fetched before (the Parent). Therefore it has to go to the database and fetch it in another round-trip.
Couldn’t we disable this lazy behaviour and instruct Linq2Sql to eager fetch more information? Yes we could, let’s try playing with the LoadOptions property of the DataContext in order to retrieve a Subject and its Parent in one go:

DataLoadOptions opt =new DataLoadOptions();
opt.LoadWith<Subject>(s => s.Parent);
LoadOptions = opt;

var subject = Subjects.Single(s => s.Id== 13);
subject.Dump();
subject.Parent.Dump();

Parents.Single(p => p.Id == Parent.One).Dump();

DECLARE @p0 BigInt = 13

SELECT [t0].[Id], [t0].[Description], [t0].[ParentId], [t1].[Id] AS [Id2], [t1].[Name]
FROM [Subjects] AS [t0]
INNER JOIN [Parents] AS [t1] ON [t1].[Id] = [t0].[ParentId]
WHERE [t0].[Id] = @p0
A small victory here. One single query suffices to get both Subject and Parent.

But we are also interested in the Subject_Children relationship, would the behaviour be the same?

var subject = Subjects.Single(s => s.Id== 23);
subject.Dump();
subject.Children.Dump();

Childs.Single(c => c.Id == 231).Dump();

DECLARE @p0 BigInt = 23
SELECT [t0].[Id], [t0].[Description], [t0].[ParentId]
FROM [Subjects] AS [t0]
WHERE [t0].[Id] = @p0
GO

DECLARE @p0 BigInt = 23
SELECT [t0].[Id], [t0].[Data], [t0].[SubjectId]
FROM [Children] AS [t0]
WHERE [t0].[SubjectId] = @p0
GO

DECLARE @p0 Int = 231
SELECT [t0].[Id], [t0].[Data], [t0].[SubjectId]
FROM [Children] AS [t0]
WHERE [t0].[Id] = @p0
What? Three queries!?! Linq2Sql is simply not clever enough to figure out that the Child with id 231 was fetched before. Would eager loading solve this mess?

DataLoadOptions opt =new DataLoadOptions();
opt.LoadWith<Subject>(s => s.Children);
LoadOptions = opt;

var subject = Subjects.Single(s => s.Id== 13);
subject.Dump();
subject.Children.Dump();

Childs.Single(c => c.Id == 131).Dump();

DECLARE @p0 BigInt = 13
SELECT [t0].[Id], [t0].[Description], [t0].[ParentId], [t1].[Id] AS [Id2], [t1].[Data], [t1].[SubjectId], (
SELECT COUNT(*)
FROM [Children] AS [t2]
WHERE [t2].[SubjectId] = [t0].[Id]
) AS [value]
FROM [Subjects] AS [t0]
LEFT OUTER JOIN [Children] AS [t1] ON [t1].[SubjectId] = [t0].[Id]
WHERE [t0].[Id] = @p0
ORDER BY [t0].[Id], [t1].[Id]
GO

DECLARE @p0 Int = 131
SELECT [t0].[Id], [t0].[Data], [t0].[SubjectId]
FROM [Children] AS [t0]
WHERE [t0].[Id] = @p0
Hardly. We moved from three down to two, as we are eager loading the children, but still the identity map is not able to figure out that 131 was already fetched before. Is it terrible? Not, but it could do better.

Conclusions

What we have seen? We have seen that Linq2Sql exhibits an unexpected behaviour (at least for me) in regards to relationships and the necessity to go to the database multiple times to satisfy the demand of data. It was successfully using its identity map powers to save queries when querying the Parent_Subject relationship but failing to do so when querying the Subject_Children or the Subject_OtherChildren. Cause? Unknown. Diagnosis? Twiddle fingers and look upwards.

ORMs are wonderful tools. And as such they can’t be handled recklessly. Their sheer existence does not make them evil or golden.
For myself? They have become a very-nice-to-have weapon in my arsenal. Most of the time they save me a lot of work (and money to my customers) but other times they are just another piece of the puzzle that need to be understood.

Piece of advice: do not forget your SQL skills. They come very, very handy. You may just switch from looking at the execution plan in SQL Server Management Studio to looking at the SQL generated by my LINQ queries.

But remain vigilant… Rogue queries and N+1 are everywhere.

Daniel Gonzalez Garcia
Vertica A/S

Kategorier: Udvikling

Tagged as: ,

Skriv et svar

Udfyld dine oplysninger nedenfor eller klik på et ikon for at logge ind:

WordPress.com Logo

Du kommenterer med din WordPress.com konto. Log Out / Skift )

Twitter picture

Du kommenterer med din Twitter konto. Log Out / Skift )

Facebook photo

Du kommenterer med din Facebook konto. Log Out / Skift )

Google+ photo

Du kommenterer med din Google+ konto. Log Out / Skift )

Connecting to %s