Skip to content

SQL in Excel

QueryStorm comes with a SQLite database engine that can work with Excel tables as if they were database tables.

In this example, I use a public dataset with data on 5000 movies scraped from IMDB.

Connecting

Clicking the connect button pops up the QueryStorm IDE and opens the SQLite connection that treats Excel tables as database tables.

Connect to workbook

Querying

Once connected, we can start querying. The most important thing to know is that QueryStorm works with Excel tables (not sheets).

Querying

Selecting rows

Let's go through a few simple examples. For a start, let's analyze the budgets and incomes of movies that are about the future:

select avg(budget), avg(gross), avg(gross)/avg(budget) as return
from movies
where plot_keywords like '%future%'

Select example #1

Turns out movies about the future are good business; they return 115% of the budget spent.

Now let's look at how the incomes and scores of movies changed through the decades:

select
    cast (title_year / 10 as int) * 10 as bin,
    count(*) count,
    avg(imdb_score) score,
    avg(budget),
    avg(gross) gross
from
    movies
group by
    bin
order by 
    bin

Select example #2

Since we're here, let's draw some overly-general conclusions from insufficient data:

  • IMDB scores appear to be higher for older movies
  • in 2000's spending in the movie industry peaked, but wasn't profitable
  • return on investment in 1930's was a whopping 40x, but has been declining ever since

Updating rows

We're not limited to just querying. We can also fiddle with the data in Excel tables. Let's put a star next to the highest grossing movie of each director:

Update example

In the above query, I'm looking for all movies such that no movie from the same director grossed more. There were 2695 rows updated, one per director.

Deleting rows

For some movies, we don't have information about the gross income they produced. We're analyzing income, and these rows are in the way, so let's just delete them:

delete from movies where gross is null

The messages pane indicates that 772 rows have been deleted:

Command executed in 220ms, 0 rows returned, 772 rows affected

Inserting rows

Lastly, we can also insert data into tables. I'll insert a row and supply the director name and movie title, and leave the other columns empty:

insert into movies(director_name, movie_title)
values ('Jeff Jackson', 'Sound plan 8')

The row is inserted at the end of the table.

We can also insert the results of a select statement to insert a bunch of rows at once.

insert into movies(director_name, movie_title)
select director, movie from newMovies

Special columns

When using the SQLite engine, all workbook tables get two extra hidden columns: __address and __row. These columns are hidden, meaning that SQLite will not include them in the results if you just specify * in the select list, but you can include them explicitly in the select list when you need them.

Data in a sheet

The __address column

The __address column is for locating the original row in Excel. Double-clicking an address in the results (or the row header, if an address is present in the row) will select the row in Excel. We can also use the address in order to perform formatting operations on the row.

The __row column

The __row column specifies the original index of the row in Excel. We can use this to preserve the original ordering of rows. It's also convenient for some other operations, e.g. self-joins without duplicates.

Comments