SQL in Excel¶
QueryStorm comes with a SQLite database engine that can work with Excel tables as if they were database tables.
In this example, I use a public dataset with data on 5000 movies scraped from IMDB.
Clicking the connect button pops up the QueryStorm IDE and opens the SQLite connection that treats Excel tables as database tables.
Once connected, we can start querying. The most important thing to know is that QueryStorm works with Excel tables (not sheets).
Let's go through a few simple examples. For a start, let's analyze the budgets and incomes of movies that are about the future:
select avg(budget), avg(gross), avg(gross)/avg(budget) as return from movies where plot_keywords like '%future%'
Turns out movies about the future are good business; they return 115% of the budget spent.
Now let's look at how the incomes and scores of movies changed through the decades:
select cast (title_year / 10 as int) * 10 as bin, count(*) count, avg(imdb_score) score, avg(budget), avg(gross) gross from movies group by bin order by bin
Since we're here, let's draw some overly-general conclusions from insufficient data:
- IMDB scores appear to be higher for older movies
- in 2000's spending in the movie industry peaked, but wasn't profitable
- return on investment in 1930's was a whopping 40x, but has been declining ever since
We're not limited to just querying. We can also fiddle with the data in Excel tables. Let's put a star next to the highest grossing movie of each director:
In the above query, I'm looking for all movies such that no movie from the same director grossed more. There were 2695 rows updated, one per director.
For some movies, we don't have information about the gross income they produced. We're analyzing income, and these rows are in the way, so let's just delete them:
delete from movies where gross is null
The messages pane indicates that 772 rows have been deleted:
Command executed in 220ms, 0 rows returned, 772 rows affected
Lastly, we can also insert data into tables. I'll insert a row and supply the director name and movie title, and leave the other columns empty:
insert into movies(director_name, movie_title) values ('Jeff Jackson', 'Sound plan 8')
The row is inserted at the end of the table.
We can also insert the results of a select statement to insert a bunch of rows at once.
insert into movies(director_name, movie_title) select director, movie from newMovies
When using the SQLite engine, all workbook tables get two extra hidden columns:
__row. These columns are hidden, meaning that SQLite will not include them in the results if you just specify
* in the select list, but you can include them explicitly in the select list when you need them.
__address column is for locating the original row in Excel. Double-clicking an address in the results (or the row header, if an address is present in the row) will select the row in Excel. We can also use the address in order to perform formatting operations on the row.
__row column specifies the original index of the row in Excel. We can use this to preserve the original ordering of rows. It's also convenient for some other operations, e.g. self-joins without duplicates.