Using Metadata To Dynamically Analyze SQL Server Table Contents

For a while I’ve been building up a script that uses SQL Servers INFORMACTION_SCHEMA.COLUMNS table to create a query that applies aggregates to each column in a table and then formats the results to allow an at a glance summary of the shape of the data inside of it.

I showed some of the techniques used to build this in a session regarding metadata and data warehousing which I co-presented with Emma Dolling and Ruth Pearson at SQLBits. The one comment that I heard the most after this session was “I want that script”. I did promise to make it available more than a few times and so I’ve put together a github repository to hold it and other resources we built for the session.

The full repository is here and you can access the script directly here or copy it from the end of this post.

Below is an example of the results generated against the OrderLines table of the World Wide Importers sample database.

The script works by generating a query for every table in the database, this then needs to be copied into a new query and run, after that you have your results.
There are other tools out there that will do similar to this but the great thing about this script is that it’s entirely T-SQL based, as long as you can run a query and have permission to see metadata tables you can make use of this. Sometimes a VM and SQL Server Management Studio is all you have to work with and that’s all this needs.

A few comments/warnings:

Note the need to change the default option in SSMS so that line breaks aren’t removed from the generated query.
I’ve been asked about performance, this query by necessity produces a lot of aggregates on an entire table. If the table has many rows or columns it may take a while to run. There may be room to adapt this query to run against a sample of the rows within the table, reduce the number of aggregates or switch to he APPROX_COUNT_DISTINCT aggregate to make it run faster.
There may be issues with some rarer data types, for example I’m aware that as is the generated queries can have issues with binary columns. I intend to fix for these issues in a future version of the script.

I hope you find the script useful and I’d love to hear suggestions of how it could be extended to be even more useful in the future.

/*
===============================================================================
Name: Dynamic Table Analyser
Author: Barney Lawrence
Creation Date: 2023-03-23
Description: 
	This script uses metadata to generate queries to support analysis
	of table content by using distinct counts, maximums, minimums and other
	details for all columns and pivoting into a presentable format.
Usage: 
	Run this query against an instance of SQL Server and copy the results from
	the column Query for the table you with to analyse.
	For the query to format correctly in SSMS please ensure to check the option at:
	Tools>Options>Query Results>SQL Server>Results to Grid>Retain CR/LF on copy or save
Source:
	The latest version of this script can be found at
	https://github.com/BarneyLawrence/Sessions-Metadata-Data-Warehouse
===============================================================================
*/

SELECT TABLE_SCHEMA, TABLE_NAME
,'
WITH DistinctCounts AS
(
SELECT '
+ STRING_AGG(CAST('' AS varchar(max)) +
    'COUNT(DISTINCT [' + COLUMN_NAME  +']) AS [DISTINCT_' + COLUMN_NAME +']' + CHAR(13)
    +',COUNT(IIF([' + COLUMN_NAME  +'] IS NULL,1,NULL)) AS [NULL_' + COLUMN_NAME +']' + CHAR(13)
    +',COUNT(IIF(CAST([' + COLUMN_NAME  +'] AS varchar(max)) = '''' ,1,NULL)) AS [BLANK_' + COLUMN_NAME +']' + CHAR(13)
    +',MAX(LEN([' + COLUMN_NAME  +'])) AS [MAXLENGTH_' + COLUMN_NAME +']' + CHAR(13)
    +',MIN([' + COLUMN_NAME  +']) AS [MIN_' + COLUMN_NAME +']' + CHAR(13)
    +',MAX([' + COLUMN_NAME  +']) AS [MAX_' + COLUMN_NAME +']' + CHAR(13)
    +',''' + DATA_TYPE +  ISNULL('(' +CAST(COALESCE(CHARACTER_MAXIMUM_LENGTH,DATETIME_PRECISION) AS varchar(5)) + ')','') +''' '  + ' AS [DATA_TYPE_' + COLUMN_NAME +']' + CHAR(13)
,', ') WITHIN GROUP (ORDER BY ORDINAL_POSITION) + '
FROM [' + TABLE_SCHEMA + '].[' + TABLE_NAME + ']
)
SELECT ''' + TABLE_SCHEMA +''' AS TableSchema, ''' + TABLE_NAME + ''' AS TableName, V.ColumnName, 
V.DistinctColumnCount,V.NullColumnCount,V.BlankColumnCount,V.MaxColumnLength,V.MinColumnValue,V.MaxColumnValue,V.ColumnDataType
FROM DistinctCounts AS C
CROSS APPLY
(
VALUES
 '+
STRING_AGG( CAST('' AS varchar(max)) + 
    '(''' + COLUMN_NAME
    + ''', C.[DISTINCT_' + COLUMN_NAME + ']'
    + ', C.[NULL_' + COLUMN_NAME + '] '
    + ', C.[BLANK_' + COLUMN_NAME + '] '
    + ', C.[MAXLENGTH_' + COLUMN_NAME + ']'
    + ', CAST(C.[MIN_' + COLUMN_NAME + '] AS varchar(max))'
    + ', CAST(C.[MAX_' + COLUMN_NAME + '] AS varchar(max))'
    + ', C.[DATA_TYPE_' + COLUMN_NAME + ']'    +')'
    + CHAR(13)
,', ') WITHIN GROUP (ORDER BY ORDINAL_POSITION)
 +'
) AS V(ColumnName,DistinctColumnCount,NullColumnCount,BlankColumnCount,MaxColumnLength,MinColumnValue,MaxColumnValue,ColumnDataType)
;'
AS Query
FROM INFORMATION_SCHEMA.COLUMNS
GROUP BY TABLE_SCHEMA, TABLE_NAME

Using Metadata To Dynamically Analyze SQL Server Table Contents

One thought on “Using Metadata To Dynamically Analyze SQL Server Table Contents”

Leave a comment Cancel reply

Share this:

Related

One thought on “Using Metadata To Dynamically Analyze SQL Server Table Contents”

Leave a comment Cancel reply