SAP Datasphere and Snowflake: An (im)possible marriage? (Part I)

 

It is no secret that SAP has been pushing its Datasphere solution as not just a Cloud data warehouse, but a data integration platform as well. Partnerships with Databricks, Google and Collibra, among others, have been promoted in order to appeal to (potential) customers whose IT landscapes are increasingly vendor-diverse. However, non-SAP solutions that fall outside of this partner ecosystem are often difficult to integrate with Datasphere, and with SAP taking steps to limit the (practically) legal(!) application of its APIs for the Operational Data Provisioning (ODP) framework, things are not getting easier. Does that mean that you have no options at all to connect to popular non-SAP solutions such as Snowflake? Not at all, but you do have to take some additional steps. In this blog, my colleague Dirk-Jan will show you exactly how, by demonstrating a data connection from SAP S/4HANA to Datasphere and finally, Snowflake.

In part 1 of this blog series, Dirk-Jan will achieve this by using a dedicated ETL tool in the form of Azure Data Factory, Azure's cloud ETL service for scale-out serverless data integration and data transformation (this could also be any other supported ETL solution). Part 2 will dive into a direct connection between Datasphere and Snowflake, so stay tuned for that as well!

Getting started: steps to take

  • S4/HANA system
  • Cloud Connector configured to connect Datasphere with S4/HANA
  • Necessary configuration and settings in Datasphere
  • (A little bit of) modeling in Datasphere
  • Setup Snowflake
  • Use an external ETL tool to put data into Snowflake
  • Check the data in Snowflake

S4/HANA

First, we want to create a view in S4/HANA what we will import in Datasphere and use it in a replication flow. So let’s get started by creating a table in ABAP and putting some data in it. Alternatively, you can could create a CDS view or even easier, select an already existing table or view in SAP Datasphere for the follow-up steps. In our example, we will use this table (ZDJ_TABLE) and its corresponding data in Datasphere later on.

Blog May 2024 Afbeelding1

Datasphere

To connect S4/HANA to Datasphere, we need to set up a Cloud Connector and and a data provisioning agent to enable Replication flows. The specific technical and connection set up of the Cloud Connector will not be covered in this blog, but a successful connection should give you the following result:

Blog May Afbeelding2

(Model import is disabled because there is no need for this in this scenario and therefore I did not create a tunnel connection)

In Datasphere we will create a new view where we will select our S4/HANA connection as a source. Next, select the custom table we created in S4/HANA:

Blog May Afbeelding3
Blog May Afbeelding4

After you have created the view from the table, select the “Expose for Consumption”-option and save and deploy the view.

Create a database user for your Datasphere space

Within Space management in Datasphere, create a new database user which will be used to exposed data to external tools. Ensure that the database user has the necessary read/write privileges. You will need the host name, database username and password information later on to configure Azure Data Factory.

Blog May Afbeelding5

Also, you need to allow Azure Data Factory to communicate with Datasphere. There is a JSON where all the IP-addresses are present in the documentation library of Microsoft. Look at the region your ADF is set up in and validate that these IP-addresses are whitelisted.

Blog May Afbeelding6

Azure

The Azure configuration will require some steps, starting with the storage account, a virtual machine where we can install our integration runtime and of course Azure Data Factory.

Storage Account

Create a storage account V2 where we can create a BLOB storage for staging purposes.

Virtual machine with the Integration Runtime

Blog May Afbeelding7

Integration Runtime

After you have setup the Data Factory, you need to install the Integration Runtime. Because we can’t use the AutoResolveIntegrationRuntime for connections to SAP HANA, we will need to create one on a virtual machine in Azure.

Blog May Afbeelding8
Blog May Afbeelding9
Blog May Afbeelding10

Run this on your newly created VM. The easiest way for me was to open the Azure portal on the virtual machine, go to the ADF and choose Option 1: Click here to launch the express setup for this computer.

After a while, the Integration runtime is setup and working.

Blog May Afbeelding11

On the Virtual Machine we need to set up an ODBC connection to Datasphere.

Blog May Afbeelding12
Blog May Afbeelding13
Blog May Afbeelding14

Put the username and password of the database user of the Space which you created earlier in the Datasphere setup.

Azure Data Factory

Now that the integration runtime is set up correctly, we can create a Linked Service. Linked Services are much like connection strings, which define the connection information needed for the service to connect to external resources.

Linked Services to Azure Blob Storage, Datasphere and Snowflake

Next, we must connect ADF to HANA and Snowflake. This will be set up in the Linked Service. For this we need three Linked Services. One to Datasphere, one to Snowflake and one to a Blob Storage for staging purposes

Let’s first create a Linked Service to an Azure Blob Storage.

Blog May Afbeelding15

Subsequently, we will create one to SAP Datasphere.

Blog May Afbeelding16

Fill in the server name of Datasphere and the username and password of the database user you set up. The server name has to be filled in with the Port number 443.

Blog May Afbeelding17

Press the test connection to see if the connection is successful.

Blog May Afbeelding18

I had some problems to save this Linked Service, but after some tweaks in the JSON configuration I managed to save it. Your mileage may vary, so if you encounter save issues as well my advice would be to review the JSON configuration settings.

Finally, we establish a Linked Service to Snowflake. We will use the AutoResolveIntegrationRuntime for Snowflake. Fill in your account name, database name, warehouse, username, password. I also had to specify the Role, so please do this as well if the system asks you to. Note that I used ACCOUNTADMIN for this, which is not the best practice for this of course. You need to create a role with appropriate privileges for this.

Blog May Afbeelding19

Once again, test the connection to validate if everything has been set up correctly.

After we created the Linked Services, we can now configure the pipeline in ADF.

Go to Author in ADF, select Pipelines and then New pipeline.

Blog May Afbeelding20

Add “Copy data” under Move and transform in the pipeline.

Blog May Afbeelding21

Name it appropriately:

Blog May Afbeelding22

Under Source create a new dataset:

Blog May Afbeelding23

Next, create a Sink dataset. Please also check if your Snowflake is hosted on Azure, otherwise you will get the below error.

Blog May Afbeelding24
Blog May Afbeelding25

We’re almost there; now, create the mapping between the HANA table and the Snowflake table.

Blog May Afbeelding26

We’re almost there; now, create the mapping between the HANA table and the Snowflake table.

Blog May Afbeelding27

If everything has been set up, we can test the pipeline. Press Add trigger
and then Trigger now and in the next screen Ok.

Blog May Afbeelding28

Check the Monitor to see if the activity has been successful:

Blog May Afbeelding29

Snowflake

Finally, let’s have a look in our (Snowflake) table if the data has been loaded successfully:

Blog May Afbeelding30

As you can see, the data has been successfully populated. This concludes part I of this blog. In the next entry, I will dive into a direct connection between SAP Datasphere and Snowflake. Should you have any questions in the meantime, please do not hesitate to contact us.

Credits

This blog was written by our experts Dirk-Jan Kloezeman and Lars van der Goes.

About the author

Photo of Dirk-Jan Kloezeman
Dirk-Jan Kloezeman

Dirk-Jan is a SAP HANA Consultant - SAP Basis Consultant at Expertum.

Read more articles by Dirk-Jan Kloezeman

Related articles