MySQL - Basic Tutorial 3 - Summarise ALL the things!
This article is Part 6 in a 7-Part series called MySQL intro.
- Part 1 - MySQL - WhySQL?
- Part 2 - A pain in the MySQL!
- Part 3 - MySQL - Basic Admin
- Part 4 - MySQL - Basic Tutorial
- Part 5 - MySQL - Basic Tutorial 2
- Part 6 - This Article
- Part 7 - MySQL - Basic Tutorial 4 - SHOW TABLES becomes DB_SNAPSHOT
… the exploration continues !
Introduction
Background:
In the previous post of this series, we looked into summarising information about a database to enable us to subsequently utilise its contents. How can you use what you cannot see or understand?
Aim:
The question of the day is how to get an idea of the size of a database? Specifically, how do we efficiently get a summary of the size (rows and columns) of all tables in a database?
The problem:
Previously, we used SHOW TABLES
to summarise all the names of a database’s tables in a single call (instruction/statement). However, we were only able to obtain row and column counts separately for one table at a time. This isn’t much of a problem for very small databases, however, for large databases, this seems to be extremely tedious (and unproductive). The sakila database consists of 23 tables, as we discovered in our previous exploration, therefore, we need a more efficient solution!
Summarise all the columns
This solution is fairly simple, and has its origins in our previous column counting example, but we are going to.
-
i) retrieve more specific information: table names and columns.
-
ii) reduce instruction specificity by removing this statement from the
WHERE
clause:
Let’s see what this looks like…
i) table column list
When we do this, the resulting statement below simply produces a table that lists the names of the tables against the corresponding column names. This is close, but at 132 entries… not the summary we are looking for!
ii) table column count
In order to achieve our desired end, we need a little help from aggregation functions in conjunction with the GROUP BY
` clause.
For the record, this will not work:
However… THIS is what we were searching for:
Summarise all the rows
As with the previous section above, we need to rely upon the information_schema
database to retrieve the information that we need. As always, I am grateful for the revelation provided by StackOverflow, specifically here.
Note: The syntax below uses backticks (`) in the FROM
clause, but refers to the same table as that used for the column queries above. Variety is, after all, the spice of life!
This is a very good result, we get a summary of the number of rows in most of the tables. Most? Yes, apparently there is some accuracy issue that results in some tables being estimated to have NULL
rows. See here for a possible solution (I haven’t tried it yet so… “caveat emptor!”). The point being that there seems to be no simple and accurate solution to this in a single query. even using show table status
doesn’t help accuracy using the default innoDB MySQL database engine !
That said, we are doing pretty well so far !
Putting it all together
Strategy: Temporary tables
From what I have been able to gather, the best solution is to create a temporary table to which the results of our query are assigned via the AS
keyword:
There is an interesting discussion here about potential pitfalls of temporary tables. Temporary tables are fit for our purpose, given that we don’t want to “pollute” the “data space”” by creating unecessary tables, however, it is good to be aware of potential problems that this might cause. Alternatively, one could always CREATE
and subsequently DROP
these tables if need be.
For our purppses, these are the CREATE TABLE
that we will use:
- To count the rows:
- To count the columns:
- Tada!
Note: The contents of these tables are as displayed previously for queries the corresponding sections .
Joined: The cohesive picture
The last step in this process is, naturally, to join (i.e. merge ) the temporary tables that we have crated together to form a coherent, approximate, picture of our database. (Approximate due to the row count accuracy limitations of our chosen method).
- This basic syntax here will do nicely !:
- or to use the (probably more correct) inner join syntax :
The final output of this process (using either join syntax above) is:
Naturally, you can prefix this query with the CREATE TABLE
syntax as outlined in this strategy to store the results as a database table.
Side note: if you write “TABLE_COLS” as “table_cols”, that is how it will appear in the result… FYI . Importantly, we selected all the columns from the sakila_row_counts
and only the TABLE_COLS
column from sakila_col_counts
as this shares the TABLE_NAME
, which would be repeated if all columns from both tables were selected !
Conclusion
Today, we have come very close to achieving our simple goal of capturing a simple snapshot of our database of interest. However, 7 of the 23 tables (~30 %) of the sakila
database have NULL rows (i.e. whose size is partially hidden from us). I am curently looking for a good solution for this, which I hope to cover in later posts. For now, however, we rest (tenuously) upon our laurels, having effectively created for ourselves…
SHOW TABLES PLUS
This article is Part 6 in a 7-Part series called MySQL intro.
- Part 1 - MySQL - WhySQL?
- Part 2 - A pain in the MySQL!
- Part 3 - MySQL - Basic Admin
- Part 4 - MySQL - Basic Tutorial
- Part 5 - MySQL - Basic Tutorial 2
- Part 6 - This Article
- Part 7 - MySQL - Basic Tutorial 4 - SHOW TABLES becomes DB_SNAPSHOT