Skip to content

Working with data (LINQ)

QueryStorm exposes tables as collections of strongly typed objects, allowing you to easily query and modify data inside them using C# and LINQ.

Video

In case you prefer video to text, here's a video introduction to using C# in Excel.

Download

The examples in this document use a public dataset about salaries in the public sector in San Francisco.

Let's get querying

To connect to the active workbook, click the Connect (C#) icon in the QueryStorm ribbon:

Connect with C#

Once connected, Excel tables show up in the object explorer and are available to your scripts as variables. Each table is a collection of strongly typed rows. The row types are generated dynamically. In this example, there's only one table, and it's called salaries.

Connected with C#

To demonstrate basic querying, let's find people who earn between 100k and 105k:

salaries.Where(s=>s.TotalPay >= 100000 && s.TotalPay <= 105000)

Here are the results:

Sample result

Now let's find the highest paid worker for each JobTitle:

salaries
    .GroupBy(s => s.JobTitle)
    .Select(g => g.OrderByDescending(s=>s.TotalPay).First())

Funky column names

Table columns in Excel can have names that are not legal C# identifiers. Such columns will be represented by normalized property names where non-alphanumeric characters are removed.

For example, if the table has a column named Job title (with a space) we can reference it like so:

salaries.GroupBy(s => s.JobTitle)//space was removed

In the results grid, however, column names will show up with their original names. How come? The properties on the generated row type have the [System.ComponentModel.DisplayName] attribute applied. The results grid and the table writer check for this attribute so you can use it to control how headers look in the results grid, and when written into the workbook.

Note

You can also access the value of any property in a row via indexer, e.g. row["Job Tilte"]. The indexer returns an object of type dynamic.

Locating rows in Excel

In order to find a particular row in Excel, we can include the row address in the results. Each row has a GetAddress() method that can be used to find the row in Excel.

Note

The reason GetAddress() is a method rather than a property is simply to avoid a naming collision in case a user table has a column named Address (which is not unlikely).

Let's find people with the job title "Transit Operator" in Excel.

salaries
    .Where(s => s.JobTitle == "Transit Operator")
    .Select(s => new { s.Id, s.EmployeeName, s.TotalPay, Address = s.GetAddress()})//include Address as a property

Once we have the address in the results, we can double-click the address cell OR the row header to navigate to the row in Excel.

Select row in Excel

Note

If there is no address in the results, double-clicking the row header will have no effect.

Updating rows

To modify data in an Excel table, we need to modify the row objects and then commit the changes. Here's how to modify the TotalPay of all Transit Operators:

//prepare changes
salaries
    .Where(s => s.JobTitle == "Transit Operator")
    .ForEach(s=> s.TotalPay += 10000);

//commit changes into Excel table
salaries.SaveChanges();

Note that changes must be explicitly saved in order to be committed into the Excel table. We can do this in a single command by using the Update() method:

salaries
    .Where(s => s.JobTitle == "Transit Operator")
    .Update(s=> s.TotalPay += 10000)

Deleting rows

We can also delete rows. Let's delete rows where the TotalPay is zero:

salaries
    .Where(s => s.TotalPay == 0)
    .Delete()

Inserting rows

Finally, we can also insert rows:

salaries.Insert(99999, "John Smith", "Journalist", 0, 0, 0, 0, 99999, 0, 2017, null, "San Francisco", "PT");

The Insert method is strongly typed, but all its arguments are optional, allowing you to skip the ones where null should be inserted (as well as calculated columns).

Saving changes to Excel can be slow, especially when inserting thousands of rows. In such cases, all inserts should be staged via the InsertIntoCache() method and finally committed by a single call to SaveChanges():

people.InsertIntoCache(10, "abc");
people.InsertIntoCache(11, "def");
people.InsertIntoCache(12, "ghi");
people.SaveChanges(); //saves the new rows into Excel

C# scripting syntax

The scripting flavor of C# is supported by Roslyn (the C# compiler) and is slightly different from regular C#. Most normal C# syntax is also valid in scripts, but scripts also allow a more relaxed syntax where you can evaluate expressions, without all the ceremony of defining types and methods.

Note

Under the hood, Roslyn preprocesses scripts and turns them into regular C# before compiling to IL.

For example, we can return the current date like so:

DateTime.Now

We didn't need to define a class and Main method. We didn't need an explicit return statement, either. The last statement in the script is assumed to be an expression that needs to be evaluated and returned, unless it is terminated with a semicolon. Closing an expression with a semicolon, however, hides the output.

If you want to return a value from a statement that is not the last statement in the script, you need to explicitly return it with a return statement, e.g.:

return Add(1,2);

public int Add(int a, int b)
{
    return a + b;
}

Another things scripts can do is reference other scripts and libraries. This is supported via #r and #load directives that are placed at the beginning of scripts:

#r "System.IO.Compression"

//...code that uses the System.IO.Compression assembly...
#load "anotherScript"

//...code that uses members from "anotherScript"...

For more on referencing scripts and libraries, click here.

Comments