iTranslated by AI
3 Things You Should Know Before Using LINQ to Objects
Introduction
LINQ to Objects (hereafter LINQ) is a mechanism for declaratively describing filtering, grouping, and processing operations on collections such as List and arrays.
Compared to traditional procedural methods, writing declaratively can improve both productivity and quality.
Let's compare them using code that "extracts even numbers from an array of int and converts them into a List sorted in ascending order."
Below is the code written in a traditional procedural manner.
int[] numbers = { 5, 10, 8, 3, 6, 12 };
List<int> evenNumbers = new();
foreach (var number in numbers)
{
if (number % 2 == 0)
{
evenNumberList.Add(number);
}
}
evenNumbers.Sort((x, y) => x - y);
In contrast, below is the code written declaratively using LINQ.
int[] numbers = { 5, 10, 8, 3, 6, 12 };
var evenNumbers =
numbers
.Where(x => x % 2 == 0)
.OrderBy(x => x)
.ToList();
You can see that it's "written declaratively, as if defining specifications in text," making it clear: "I'm filtering for even numbers with Where, sorting them with OrderBy, and then converting to a List."
There is a common perception that LINQ is difficult, but that is a misunderstanding.
From procedural writing to declarative writing
While you need to understand a new concept, you can improve both productivity and quality by mastering LINQ.
However, at the same time, I have seen several undesirable implementations common among beginners while reviewing LINQ code.
I believe one reason for this is that many introductory LINQ articles only describe functional implementation and do not sufficiently share the key points to grasp before actual use.
Therefore, in this article, I will explain three points to keep in mind before you actually start using LINQ.
Three things to keep in mind
- Declare the correct specification
- Do not generate collections excessively with
ToList, etc. - Generate collections appropriately with
ToList, etc.
Understanding these three points will help you avoid most problems when using LINQ.
Declare the correct specification
In LINQ, there are cases where different implementations can yield the same result.
In such cases, the first thing to consider is "declaring the correct specification."
Let's look at a concrete example. Suppose we have an Item class with a unique Id property.
public class Item
{
public Item(int id)
{
Id = id;
}
public int Id { get; }
}
And there is a list of Items with non-duplicate Ids.
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
If we want to retrieve the Item with Id 3 from here, "as a result," it would be the same regardless of which of the following four implementations we use.
// Get exactly one existing item
// - Throws an exception if no matching item exists
// - Throws an exception if multiple matching items exist
var item = items.Single(x => x.Id == 3);
// Get zero or one existing item
// - Returns the default value if no matching item exists
// - Throws an exception if multiple matching items exist
var item = items.SingleOrDefault(x => x.Id == 3);
// Get one or more existing items
// - Throws an exception if no matching item exists
// - Returns the first object if multiple matching items exist
var item = items.First(x => x.Id == 3);
// Get zero or more existing items
// - Returns the default value if no matching item exists
// - Returns the first object if multiple matching items exist
var item = items.FirstOrDefault(x => x.Id == 3);
So, which one is "correct" to use?
For example, if you retrieve it using FirstOrDefault, in this specific case, it returns null if there is no matching item. Therefore, if you only process when the return value is not null, you might write a program that is "hard to crash." Also, in the case of the Single series, the specification is to throw an exception if there are duplicate results, so it needs to search until the end of items. In contrast, the First series can stop the process as soon as a matching item is found, which might result in less computational complexity.
So, is it always correct to use FirstOrDefault?
The conclusion is "it depends on the context."
Let's make the specification a bit more concrete.
- In a certain application, multiple
Items with non-duplicateIds are registered. - The application displays a list of the registered
Items. - Selecting an arbitrary row from the list allows viewing the details of the corresponding
Item. - Registered
Items are never deleted.
If the specifications are as described above, FirstOrDefault will work as expected.
In fact, even if there is a bug in the Item registration function and an Item with a duplicate Id is registered, it won't crash and will work reasonably well. And it might be faster. So is it still correct to use FirstOrDefault?
No. In fact, FirstOrDefault is the one you should avoid using the most. Why?
Because it temporarily hides the failure.
If it were implemented with Single or SingleOrDefault, an exception would occur if there were duplicate Ids, and at that point, you could detect that a bug exists in the registration function.
However, using the First series delays the detection of the failure. In the worst-case scenario, it might only be discovered after release and after creating a lot of inconsistent data, potentially rendering all the users' efforts for naught.
Therefore, "if the specifications are as described above," you must not use the First series.
So which should you use, Single or SingleOrDefault? You probably already know, right?
You should use Single. This is because "if it is implemented according to the specifications," it is impossible for no matching condition to exist.
In the above case, having a conditional branch for null makes the code unnecessarily complex. It might even cause a misunderstanding for someone looking at the code in the future, wondering, "Could there be a case where the Item doesn't exist?"
Also, if the value being handled is, for example, an int type, ~OrDefault returns 0 if no value matches the condition. This leads to the problem of not being able to determine whether no matching item was found or if an item with a value of 0 was hit.
Using ~OrDefault when it is not needed can become one of the biggest technical debts in using LINQ.
Furthermore, if the speed of the Single series becomes an issue, you should reconsider using LINQ itself in the first place. LINQ may not be the best option when looking at performance alone. However, in most cases, it's at a negligible level.
When using LINQ, keep in mind to "declare the correct specification."
If you choose a method based on other criteria, you will postpone issues such as bugs. This is because the basically correct style is to detect and address bugs at the earliest possible stage.
Do not generate collections excessively with ToList, etc.
When you've just started learning LINQ, you tend to implement using the following cycle:
- Research implementation methods on the web.
- Verify the behavior through debugging, etc.
You implement the target code by repeating this frequently. Of course, there's nothing wrong with this cycle.
However, I often see code where collections are being generated excessively during such a cycle.
For example, let's say we want to implement a process to "extract items with even Ids from an array of Items and output the Ids to the console in ascending order."
Probably, at first, you'll try to implement the first half, "extracting even numbers from an array of ints," and write something like this to check the behavior:
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
var evenItems =
items
.Where(x => x.Id % 2 == 0)
.ToList();
You'll debug, check the variables, and confirm that only even numbers are indeed being retrieved. Success!
Then, for the next step, you'll sort them "in ascending order."
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
var evenItems =
items
.Where(x => x.Id % 2 == 0)
.ToList()
.OrderBy(x => x.Id)
.ToList();
After verifying the behavior again, you'll "output it to the console."
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
var evenItems =
items
.Where(x => x.Id % 2 == 0)
.ToList()
.OrderBy(x => x.Id)
.ToList();
foreach (var item in evenItems)
{
Console.WriteLine(item.Id);
}
This code works correctly. However, it is not desirable code. Why?
It's because ToList is being called excessively.
Creating a List consumes a certain amount of both CPU and memory resources. You might see such code even in benchmark articles claiming "LINQ is slow," but often, simply reducing unnecessary list conversions can make it many times faster.
In this case, the following code would be preferable:
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
var evenItems =
items
.Where(x => x.Id % 2 == 0)
.OrderBy(x => x.Id);
foreach (var item in evenItems)
{
Console.WriteLine(item.Id);
}
In this instance, there is no need to call ToList even once; by passing the IEnumerable<Item> directly to the foreach, the task can be accomplished with minimal cost.
While implementing incrementally through the research-debug-research-debug cycle, it's easy to accidentally leave ToList or ToArray calls in the middle. Be careful to minimize collection generation.
Generate collections appropriately with ToList, etc.
Now, let's look at the opposite case.
Let's add a small specification to the previous example.
- Before change
- Extract items with even Ids from an array of Items and output the Ids to the console in ascending order.
- After change
- Extract items with even Ids from an array of Items and output the count and the Ids (in ascending order) to the console.
Before outputting the Ids, we will output the count of Items that match the criteria. A simple implementation would look like this:
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
var evenItems =
items
.Where(x => x.Id % 2 == 0)
.OrderBy(x => x.Id);
Console.WriteLine($"Count:{evenItems.Count()}");
foreach (var item in evenItems)
{
Console.WriteLine(item.Id);
}
There is a major problem with this code. To understand it, let's insert log output into the Where clause.
var evenItems =
items
.Where(x =>
{
Console.WriteLine("\"Where\" was invoked.");
return x.Id % 2 == 0;
})
.OrderBy(x => x.Id);
Note that the Console output above violates the principle of "not writing code with side effects in LINQ," so please use it only for confirming behavior. (Thanks to aetos for pointing this out!)
Then the execution result is as shown below.

You can see that Where is being called in two places: before the display of Count and during the foreach loop.
This is because filtering and projection operations in LINQ use deferred execution.
In the code above, Where and OrderBy are declared, but they are executed when they are used, not when the evenItems instance is created.
Therefore, if evenItems is used twice (for Count and foreach), the LINQ processing will be executed twice.
This poses problems such as wasting CPU and memory resources, and if the original data source is modified between executions, the output results may become inconsistent.
To prevent this, you should appropriately generate a collection using ToList, etc., to fix a snapshot. In this case, it is better to do it as follows:
List<Item> items = new[] { 5, 10, 8, 3, 6, 12 }
.Select(x => new Item(x))
.ToList();
var evenItems =
items
.Where(x =>
{
Console.WriteLine("\"Where\" was invoked.");
return x.Id % 2 == 0;
})
.OrderBy(x => x.Id)
.ToList();
Console.WriteLine($"Count:{evenItems.Count()}");
foreach (var item in evenItems)
{
Console.WriteLine(item.Id);
}
By calling ToList after OrderBy, you can limit the LINQ processing to a single execution, as shown below.

Conclusion
Using LINQ is by no means difficult, and using it appropriately can improve both productivity and quality.
However, there are at least a few things you should consider when using it.
- Declare the correct specification
- Do not generate collections excessively with
ToList, etc. - Generate collections appropriately with
ToList, etc.
By keeping these three points in mind, you can avoid many problems in advance.
Understand these points and enjoy a great LINQ life!
Discussion