For a while I’ve been building up a script that uses SQL Servers INFORMACTION_SCHEMA.COLUMNS table to create a query that applies aggregates to each column in a table and then formats the results to allow an at a glance summary of the shape of the data inside of it.
I showed some of the techniques used to build this in a session regarding metadata and data warehousing which I co-presented with Emma Dolling and Ruth Pearson at SQLBits. The one comment that I heard the most after this session was “I want that script”. I did promise to make it available more than a few times and so I’ve put together a github repository to hold it and other resources we built for the session.
The full repository is here and you can access the script directly here or copy it from the end of this post.
Below is an example of the results generated against the OrderLines table of the World Wide Importers sample database.
The script works by generating a query for every table in the database, this then needs to be copied into a new query and run, after that you have your results.
There are other tools out there that will do similar to this but the great thing about this script is that it’s entirely T-SQL based, as long as you can run a query and have permission to see metadata tables you can make use of this. Sometimes a VM and SQL Server Management Studio is all you have to work with and that’s all this needs.
A few comments/warnings:
- Note the need to change the default option in SSMS so that line breaks aren’t removed from the generated query.
- I’ve been asked about performance, this query by necessity produces a lot of aggregates on an entire table. If the table has many rows or columns it may take a while to run. There may be room to adapt this query to run against a sample of the rows within the table, reduce the number of aggregates or switch to he APPROX_COUNT_DISTINCT aggregate to make it run faster.
- There may be issues with some rarer data types, for example I’m aware that as is the generated queries can have issues with binary columns. I intend to fix for these issues in a future version of the script.
I hope you find the script useful and I’d love to hear suggestions of how it could be extended to be even more useful in the future.
/* =============================================================================== Name: Dynamic Table Analyser Author: Barney Lawrence Creation Date: 2023-03-23 Description: This script uses metadata to generate queries to support analysis of table content by using distinct counts, maximums, minimums and other details for all columns and pivoting into a presentable format. Usage: Run this query against an instance of SQL Server and copy the results from the column Query for the table you with to analyse. For the query to format correctly in SSMS please ensure to check the option at: Tools>Options>Query Results>SQL Server>Results to Grid>Retain CR/LF on copy or save Source: The latest version of this script can be found at https://github.com/BarneyLawrence/Sessions-Metadata-Data-Warehouse =============================================================================== */ SELECT TABLE_SCHEMA, TABLE_NAME ,' WITH DistinctCounts AS ( SELECT ' + STRING_AGG(CAST('' AS varchar(max)) + 'COUNT(DISTINCT [' + COLUMN_NAME +']) AS [DISTINCT_' + COLUMN_NAME +']' + CHAR(13) +',COUNT(IIF([' + COLUMN_NAME +'] IS NULL,1,NULL)) AS [NULL_' + COLUMN_NAME +']' + CHAR(13) +',COUNT(IIF(CAST([' + COLUMN_NAME +'] AS varchar(max)) = '''' ,1,NULL)) AS [BLANK_' + COLUMN_NAME +']' + CHAR(13) +',MAX(LEN([' + COLUMN_NAME +'])) AS [MAXLENGTH_' + COLUMN_NAME +']' + CHAR(13) +',MIN([' + COLUMN_NAME +']) AS [MIN_' + COLUMN_NAME +']' + CHAR(13) +',MAX([' + COLUMN_NAME +']) AS [MAX_' + COLUMN_NAME +']' + CHAR(13) +',''' + DATA_TYPE + ISNULL('(' +CAST(COALESCE(CHARACTER_MAXIMUM_LENGTH,DATETIME_PRECISION) AS varchar(5)) + ')','') +''' ' + ' AS [DATA_TYPE_' + COLUMN_NAME +']' + CHAR(13) ,', ') WITHIN GROUP (ORDER BY ORDINAL_POSITION) + ' FROM [' + TABLE_SCHEMA + '].[' + TABLE_NAME + '] ) SELECT ''' + TABLE_SCHEMA +''' AS TableSchema, ''' + TABLE_NAME + ''' AS TableName, V.ColumnName, V.DistinctColumnCount,V.NullColumnCount,V.BlankColumnCount,V.MaxColumnLength,V.MinColumnValue,V.MaxColumnValue,V.ColumnDataType FROM DistinctCounts AS C CROSS APPLY ( VALUES '+ STRING_AGG( CAST('' AS varchar(max)) + '(''' + COLUMN_NAME + ''', C.[DISTINCT_' + COLUMN_NAME + ']' + ', C.[NULL_' + COLUMN_NAME + '] ' + ', C.[BLANK_' + COLUMN_NAME + '] ' + ', C.[MAXLENGTH_' + COLUMN_NAME + ']' + ', CAST(C.[MIN_' + COLUMN_NAME + '] AS varchar(max))' + ', CAST(C.[MAX_' + COLUMN_NAME + '] AS varchar(max))' + ', C.[DATA_TYPE_' + COLUMN_NAME + ']' +')' + CHAR(13) ,', ') WITHIN GROUP (ORDER BY ORDINAL_POSITION) +' ) AS V(ColumnName,DistinctColumnCount,NullColumnCount,BlankColumnCount,MaxColumnLength,MinColumnValue,MaxColumnValue,ColumnDataType) ;' AS Query FROM INFORMATION_SCHEMA.COLUMNS GROUP BY TABLE_SCHEMA, TABLE_NAME