SQL in Excel¶
QueryStorm comes with a built-in SQLite database engine that enables using SQL on Excel tables.
The examples in this article use a public dataset containing data on 5000 movies scraped from IMDB.
Clicking the connect button creates an in-memory SQLite database which sees Excel tables as if they were database tables, and opens up the QueryStorm IDE.
Once connected, we can start querying. The most important thing to know for a start is that QueryStorm works with Excel tables [not to be confused with sheets].
Let's run a few queries. For a start, let's analyze the budgets and incomes of movies that are about the future:
select avg(budget), avg(gross), avg(gross)/avg(budget) as return from movies where plot_keywords like '%future%'
It turns out making a movie about the future is good business; they seem to return 115% of the budget spent.
That was pretty simple. For a slightly more complex example, let's take a look at how the incomes and scores of movies changed through the decades:
select cast (title_year / 10 as int) * 10 as bin, count(*) count, avg(imdb_score) score, avg(budget), avg(gross) gross from movies group by bin order by bin
Since we're here, let's draw some (overly-general) conclusions (from insufficient data):
- IMDB scores appear to be higher for older movies
- in 2000's spending in the movie industry peaked, but wasn't profitable
- return on investment in 1930's was a whopping 40x, but has been declining ever since
The SQLite engine is not limited to querying. We can also modify data in Excel tables using SQL
update statements. Let's update all rows and put a star next to the highest grossing movie of each director:
In our query, we're looking for all movies such that no movie from the same director grossed more. There were 2695 rows updated (one for each director), meaning that on average there are ~2 movies per director in the list (4926 movies / 2695 directors).
For some movies, we don't have information about the gross income they produced. We don't care about those, so let's just delete them:
delete from movies where gross is null
The messages pane informs us that 772 rows have been deleted:
Command executed in 220ms, 0 rows returned, 772 rows affected
Lastly, we can also insert data into tables. Let's insert a row and supply just the director and title:
insert into movies(director_name, movie_title) values ('Jeff Jackson', 'Sound plan 8')
The row is inserted at the end of the table.
If we like, we can also insert the results of a select statement to bulk insert many rows at once.
insert into movies(director_name, movie_title) select director, movie from newMovies
When using the SQLite engine, all workbook tables get two extra hidden columns:
__row. These columns are hidden by default, meaning that SQLite will not include them in the results if you just specify
* in the select list, but you can include them explicitly in the select list when you need them.
__address column is useful for locating the original row in Excel. Double clicking it (or the row header, if an address is present in the row) in the results grid will select the row in Excel. Additionally, we can also use the address in order to perform formatting operations on the row.
__row column specifies the original index of the row in Excel. We can use this to preserve the original ordering of rows (and some other things e.g. self-joins without duplicates).