Brownstone: Choosing the right distributed tables in Azure Synapse Analytics

Each row in a distributed table stored across multiple distributions which are distributed with a hash or round-robin algorithm. The choice of choosing the correct one significantly affects the performance. If you have a large fact table whose size is more than 2 GB and requires frequent insert, update, and delete operations. Hash-distributed tables work well in this case as the data movement during queries is minimized in dedicated SQL pool to acheive query performance improvement by distributing table rows across the Compute nodes using a deterministic hash function to assign each row to one distribution. Here's the sample showing how to create a hash-distributed table with ProductKey as the distribution column. ```sql CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH([ProductKey]) ) ; ``` A round-robin distributed table distributes table rows evenly across all distributions randomly. Even with the same values, the rows may not be distributed to the same distribution. Therefore, if we need to perform joining operations, it may lead to performance issues as the table usually requires reshuffling the rows. It is often used in a temporary staging table and there is no joining operations required. Here's the example showing how to create a round-robin distributed table. ```sql CREATE TABLE [dbo].[Date] ( [DateID] int NOT NULL, [Date] datetime NULL, [DateBKey] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DaySuffix] varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfWeek] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfWeekInMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfWeekInYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfQuarter] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfYear] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeekOfMonth] varchar(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeekOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeekOfYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Month] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MonthName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MonthOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Quarter] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [QuarterName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Year] char(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [YearName] char(7) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MonthYear] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MMYYYY] char(6) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [FirstDayOfMonth] date NULL, [LastDayOfMonth] date NULL, [FirstDayOfQuarter] date NULL, [LastDayOfQuarter] date NULL, [FirstDayOfYear] date NULL, [LastDayOfYear] date NULL, [IsHolidayUSA] bit NULL, [IsWeekday] bit NULL, [HolidayUSA] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) WITH ( DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX ); ```

Brownstone

Wednesday, 25 November 2020

Choosing the right distributed tables in Azure Synapse Analytics

No comments:

Post a Comment

A Fun Problem - Math

Report Abuse