Wednesday, 25 November 2020
Choosing the right distributed tables in Azure Synapse Analytics
Each row in a distributed table stored across multiple distributions which are distributed with a hash or round-robin algorithm. The choice of choosing the correct one significantly affects the performance.
If you have a large fact table whose size is more than 2 GB and requires frequent insert, update, and delete operations. Hash-distributed tables work well in this case as the data movement during queries is minimized in dedicated SQL pool to acheive query performance improvement by distributing table rows across the Compute nodes using a deterministic hash function to assign each row to one distribution.
Here's the sample showing how to create a hash-distributed table with ProductKey as the distribution column.
```sql
CREATE TABLE [dbo].[FactInternetSales]
( [ProductKey] int NOT NULL
, [OrderDateKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [OrderQuantity] smallint NOT NULL
, [UnitPrice] money NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
)
;
```
A round-robin distributed table distributes table rows evenly across all distributions randomly. Even with the same values, the rows may not be distributed to the same distribution. Therefore, if we need to perform joining operations, it may lead to performance issues as the table usually requires reshuffling the rows. It is often used in a temporary staging table and there is no joining operations required.
Here's the example showing how to create a round-robin distributed table.
```sql
CREATE TABLE [dbo].[Date]
(
[DateID] int NOT NULL,
[Date] datetime NULL,
[DateBKey] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayOfMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DaySuffix] varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayOfWeek] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayOfWeekInMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayOfWeekInYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayOfQuarter] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DayOfYear] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[WeekOfMonth] varchar(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[WeekOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[WeekOfYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Month] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[MonthName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[MonthOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Quarter] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[QuarterName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Year] char(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[YearName] char(7) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[MonthYear] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[MMYYYY] char(6) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FirstDayOfMonth] date NULL,
[LastDayOfMonth] date NULL,
[FirstDayOfQuarter] date NULL,
[LastDayOfQuarter] date NULL,
[FirstDayOfYear] date NULL,
[LastDayOfYear] date NULL,
[IsHolidayUSA] bit NULL,
[IsWeekday] bit NULL,
[HolidayUSA] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
);
```
Subscribe to:
Post Comments (Atom)
A Fun Problem - Math
# Problem Statement JATC's math teacher always gives the class some interesting math problems so that they don't get bored. Today t...
-
SHA stands for Secure Hashing Algorithm and 2 is just a version number. SHA-2 revises the construction and the big-length of the signature f...
-
Contest Link: [https://www.e-olymp.com/en/contests/19775](https://www.e-olymp.com/en/contests/19775) Full Solution: [https://github.com/...
No comments:
Post a Comment