I was recently asked to perform a technical phone screen of a data warehouse QA candidate. The project manager wanted to ensure that this person knew enough SQL to be able to efficiently test our data warehouse contents (not end-to-end processing testing).
So that I didn't sound like a bumbling fool when I made the call I put together a list of questions. It's by no means exhaustive but I figured it could be helpful to some . . .
When was the last time you used SQL on a day to day basis (how often do they use it and how rusty might they be)?
How would you rate your SQL from 1-10 (sets expectations for how well they will answer questions below)?
What database platforms/versions have you used (Oracle, SQL Server, MS Acess, MySQL, etc.)?
1) What are the two keywords necessary for any SQL statement?
A1) SELECT
A2) FROM
2) How do you limit the results of a query based one or more criterion?
A1) WHERE clause
What is a join?
A1) It Combines records from two or more tables.
4) What types of joins are there and what do they do?
A1) INNER JOIN
A2) LEFT OUTER JOIN
A3) RIGHT OUTER JOIN
A4) CROSS JOIN
A5) FULL OUTER JOIN
5) What is the syntax for joining two tables?
A1) FROM tableA
INNER JOIN tableB ON tableA.column1 = tableB.column2
6) How do you get a single instance of each value of a column in a table?
A1) DISTINCT
7)
How do you get the number of occurrences of each distinct value in a table?
A1) COUNT, GROUP BY
8)
How do you get the largest value of a column in a table?
A1) MAX
9) What is the difference between putting criteria in a WHERE clause or a JOIN clause?
A1) When it is an INNER JOIN it does not affect the results.
A2) When it is an OUTER JOIN, putting the criteria in the WHERE clause effectively makes it an INNER JOIN which is not usually the intended behavior.
A3) When it is an OUTER JOIN, putting the criteria in the JOIN clause allows the criteria to be applied to only those rows that are outer-joined and does not create the INNER JOIN – effect.
A4) Depending on the database platform and version, the performance could be affected.
10) What is the difference between a WHERE clause and a HAVING clause?
A1) The WHERE clause applies the criteria at the row level whereas the HAVING clause applies it at the aggregate level.
11) What is a derived table/inline view?
A1) It is a query that is aliased as a table and used like any other table in a FROM clause.
12) What is a nested subquery?
A1) It is a query that is nested within a SELECT or a WHERE clause.
13) What is a correlated subquery and when are they useful?
A1) It is a query that is used within a parent query and refers to the parent query.
A2) A correlated subquery is evaluated once for each row processed by the parent statement.
A3) They are useful in providing a set of values to be used in a WHERE clause.
Our data warehouse solution consists of the following.
1) Flat files that provide all source data
2) A staging database that the flat files are loaded into
3) A data warehouse database that contains the transformed data based on a documented set of rules.
1) If you could assume that the flat files were loaded to the staging database without any errors, how would you test that the data was correctly transformed from the staging database to the data warehouse? What tools, if any, would you need?
A1) Write queries against the staging database that follow the transformation rules. Then write queries to pull the related rows from the
A2) Some kind of a query execution tool is necessary. Toad, RapidSQL, SQL Server Management Studio, etc.
2) How do you test large sets of data where there could be millions of rows?
A1) Narrow down the results based on specific criteria.
A2) Apply aggregate functions such as SUM to determine if the test query aggregates to the same value as the data warehouse query.
A3) Join the test query/tables to the data warehouse query/tables and apply a WHERE clause that filters on only those values that do not equal each other.
3) What data would you provide to the development team when you found inconsistencies between the documented transformations and what was in the data warehouse?
A1) The transformation rule
A2) The test query
A3) The data warehouse query (if independent from above)
A4) Explanation of expected versus actual results