Azure Synapse Analytics March Update 2022
Welcome to the March 2022 Azure Synapse update! This month, we have SQL, Apache Spark for Synapse, Security, Data integration, and Notebook updates for you. Watch our monthly update video!
Implementing Spark Posexplode() Equivalent in the Serverless SQL pool
Spark enables us to work with nested data in Parquet files. If you have a Parquet file with a complex type column (array, struct, map), Spark enables you to “unpack” the values from this column and join the elements of this column with the row it belongs.
Optimize Database Schema in Serverless SQL Pools using QPI Library
Serverless SQL pools enable you to query the data stored in Azure Data Lake Storage, Cosmos DB analytical store, or Dataverse, without the need to import your data into the database tables. For optimal performance, it is very important to apply the best practices and optimize the schema/queries.
Set up an Azure-SSIS Integration Runtime (IR) in Azure Synapse Analytics using PowerShell
The Azure-SSIS Integration Runtime is a fully managed cluster of Azure VMs dedicated to run your SSIS packages. You can bring your own Azure SQL Database or SQL Managed Instance for the catalog of SSIS projects/packages (SSISDB). To lift and shift existing SSIS workload, you can create an Azure-SSIS IR to natively execute SSIS packages.
Best Practices for Integrating Serverless SQL Pool with Cosmos DB Analytical Store via Synapse Link
When you are integrating serverless SQL pools in your solution, you need to apply some best practices. There are general best practices for serverless SQL pools in the Synapse Analytics workspace, but some of these settings are not applicable to the Cosmos DB scenario. You will likely only use a subset of best practices which you can find here. In this post, you will only find best practices that you should apply in your Cosmos DB solution and some additional hints that could help you to optimize your solution.
The Data Lakehouse, the Data Warehouse and a Modern Data Platform Architecture
I am encountering two overriding themes when talking to data architects today about their data and analytics strategy – which take very different sides, practically at the extreme ends of the discussion about the future design of the data platform.
Save Money and Increase Performance with Intelligent Cache for Apache Spark in Azure Synapse
Data professionals can now save money and increase the overall performance of repeat queries in their Apache Spark in Azure Synapse workloads using the new intelligent cache, now in public preview. This feature lowers the total cost of ownership by improving performance up to 65% on subsequent reads of files stored in the available cache for Parquet files and 50% for CSV files.
CICD in Synapse SQL: How to Deliver Your Database Objects Across Multiple Environments
When using Azure Synapse Analytics across multiple environments, you can take advantage of Azure DevOps capabilities to automate the integration and delivery of your work, either resulting from Synapse Studio (Workspace artifacts) or from your SQL pools (database objects).
For more information, view the blog here: March 2022 | Microsoft Azure Synapse Analytics Blog | Microsoft Azure Synapse
Contact us today for a more in-depth conversation around Azure Synapse.