Wednesday, 25 November 2020

Choosing the right distributed tables in Azure Synapse Analytics

Each row in a distributed table stored across multiple distributions which are distributed with a hash or round-robin algorithm. The choice of choosing the correct one significantly affects the performance. If you have a large fact table whose size is more than 2 GB and requires frequent insert, update, and delete operations. Hash-distributed tables work well in this case as the data movement during queries is minimized in dedicated SQL pool to acheive query performance improvement by distributing table rows across the Compute nodes using a deterministic hash function to assign each row to one distribution. Here's the sample showing how to create a hash-distributed table with ProductKey as the distribution column. ```sql CREATE TABLE [dbo].[FactInternetSales] ( [ProductKey] int NOT NULL , [OrderDateKey] int NOT NULL , [CustomerKey] int NOT NULL , [PromotionKey] int NOT NULL , [SalesOrderNumber] nvarchar(20) NOT NULL , [OrderQuantity] smallint NOT NULL , [UnitPrice] money NOT NULL , [SalesAmount] money NOT NULL ) WITH ( CLUSTERED COLUMNSTORE INDEX , DISTRIBUTION = HASH([ProductKey]) ) ; ``` A round-robin distributed table distributes table rows evenly across all distributions randomly. Even with the same values, the rows may not be distributed to the same distribution. Therefore, if we need to perform joining operations, it may lead to performance issues as the table usually requires reshuffling the rows. It is often used in a temporary staging table and there is no joining operations required. Here's the example showing how to create a round-robin distributed table. ```sql CREATE TABLE [dbo].[Date] ( [DateID] int NOT NULL, [Date] datetime NULL, [DateBKey] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DaySuffix] varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfWeek] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfWeekInMonth] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfWeekInYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfQuarter] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DayOfYear] varchar(3) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeekOfMonth] varchar(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeekOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [WeekOfYear] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Month] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MonthName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MonthOfQuarter] varchar(2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Quarter] char(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [QuarterName] varchar(9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Year] char(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [YearName] char(7) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MonthYear] char(10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MMYYYY] char(6) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [FirstDayOfMonth] date NULL, [LastDayOfMonth] date NULL, [FirstDayOfQuarter] date NULL, [LastDayOfQuarter] date NULL, [FirstDayOfYear] date NULL, [LastDayOfYear] date NULL, [IsHolidayUSA] bit NULL, [IsWeekday] bit NULL, [HolidayUSA] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) WITH ( DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX ); ```

Friday, 13 November 2020

Copying files to and from Google Compute Engine Virtual Machines via SCP

To specify the project, zone, and recurse all together, run: ``` gcloud compute scp --project "your-gcp-project" --zone "us-east1-b" --recurse ~/local-directory/ gcp-instance-name:~/server-directory/ ```

GCP Compute 101

## Compute Engine - Offers managed virtual machines - No upfront investment - Fast and consitent performance - Create VMs with GCP Console or **gcloud** - Run images of Linux or Windows Server - Pick memory & CPU: use predefined types, or make a custom VM - Pick GPUs if you need them - Pick persistent disks: standard or SSD - Pick local SSD for scratch space too if you need it - Pick a boot image: Linux or Windows Server - Define a startup script if you like - Take disk snapshots as backups or as migration tools - Offers innovative pricing - Per-second billing, sustained use discounts - Preemptible instances - High throughput to storage at no exta cost - Custom machine types: Only pay for the hardware you need - Scales up or scale outs - Use big VMs for memory- and compute-intensive applications - Use Autoscaling for resilient, scalable applications ## App Engine - a PaaS for building scalable applications - makes deployment, maintenance, and scalability easy so you can focus on innovation - esp suited for building scalable web applications and mobile backends - standard environment: - easily deploy your applications - autoscale workloads - free daily quota - usage based pricing - SDKs for development, testing and deployment - sandbox constraints: - no writing to local files - all requests time out at 60s - limits on third-party software - flexible environment: - build and deploy containerized applications with a click - no sandbox constraints - can access App Engine resources - Comparing the App Engine environments ![image](https://user-images.githubusercontent.com/35857179/81493125-65faa780-92d0-11ea-8526-e2146d69d8d9.png) ![image](https://user-images.githubusercontent.com/35857179/81493136-7a3ea480-92d0-11ea-8e33-a70af55fb857.png)

Truncated exponential backoff

Truncated exponential backoff is a standard error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests. ``` min(((2^n)+random_number_milliseconds), maximum_backoff), ``` where - n incremented by 1 for each iteration (request) - random_number_milliseconds is a random number of milliseconds less than or equal to 1000. - maximum_backoff is typically 32 or 64 seconds.

A Fun Problem - Math

# Problem Statement JATC's math teacher always gives the class some interesting math problems so that they don't get bored. Today t...