Putting the Data Intelligence in SAP Datasphere: a stepwise demo of Replication Flows with S/4HANA, BigQuery and Azure

 

Like all of SAP Datasphere’s regular updates, the most recent 2024.4 release added a slew of new functionalities into the solution. At Expertum, we specifically keep our eye on the Data Intelligence features that are increasingly covered in Datasphere, given that the former solution will soon no longer receive functionality updates from SAP.

In this blog, Lars will briefly touch upon the most recent outlook for Data Intelligence and the strategic shift of its toolset towards Datasphere. Next, my colleague Dirk-Jan Kloezeman and I put the new replication flows to the test, by following in the footsteps of SAP’s recently published technology blogs. So, if you’re curious how far Datasphere’s flows can take you, be sure to read on!

What will we cover in this blog?

  • How will Data Intelligence transition to Datasphere?
  • Our thoughts on Datasphere Replication Flows
  • Connecting SAP Datasphere to SAP S/4HANA (on-premise)
  • Connecting SAP Datasphere to Google BigQuery
  • Creating a Replication Flow from S/4HANA to BigQuery
  • Creating a Replication Flow from S/4HANA to Azure Data Lake

From SAP Data Intelligence to SAP Datasphere?

Along with the transformation of SAP Data Warehouse Cloud to SAP Datasphere last year, SAP subtlety implied that SAP Data Intelligence (DI) would ‘increasingly move towards’ Datasphere in the near future. As stated in the introduction, recent updates to Datasphere have brought more options for data ingestion and replication, different pipeline configuration options and increased support for non-standard operators in data flows within the solution.

More recently, Datasphere’s popular Replication Flow functionality has been extended to use Datasphere as a pipeline between SAP and non-SAP systems, with new nested chains and data previews aimed at improving these flows. We see this as a good step towards incorporating SAP Data Intelligence’s toolset into Datasphere, even though we also still recognize a very apparent gap in functionality between the solutions.

Although more of DI’s functionalities will undoubtedly find their way to Datasphere in the months to come, DI itself will be removed from CPEA contracts as of July 1st 2024. Existing DI customers will be supported (maintenance that is) until December 31st 2028, after which they will be fully dependent on Datasphere for DI-like functionalities. This sounds far away, but as stated before, we still see that a lot of ground will need to be covered to bring Datasphere to the level that Data Intelligence is on now. Datasphere’s Replication Flows, and in particular its premium outbound flows that are capable of connecting to non-SAP systems, could prove a valuable stepping stone in this progress (see the below image for a brief overview, courtesy of SAP).

Datasphere Afbeelding1 1
Datasphere Afbeelding1 2

Our thoughts on Datasphere Replication Flows

Before we take a closer, step-by-step look at Datasphere’s Replication Flows, we want to start off by saying that they do offer a convenient method for moving data from an SAP environment to a non-SAP environment. The use of replication flows is relatively simple in set up and use, however, this functionality does come with costs if you decide to move your data to non-SAP targets.

While using Datasphere via this manner does not utilize any storage on the actual system (and thus no costs are incurred here), customers will have to pay one thousand euros per month per 20 gigabytes of data traffic passing through (premium) replication flows. These come on top of the standard Datasphere costs, which you can check in SAP’s new capacity unit estimator here.

As a possible alternative, we also tried to perform the initial data load from the same SAP Datasphere local table to both Google BigQuery and a Kafka broker, but this unfortunately did not work. Of course, we were not able to find an answer to all of our questions; some of the important ones we still have at the moment include:

  • What happens if multiple Spaces try to replicate data from the same S/4HANA source?
  • To what degree is the metadata of your source (e.g. in terms of lineage) carried over to the target system?
  • Is it possible to multi-target replicate data?
  • When is the support for REST APIs coming to SAP Datasphere? (as this is a requirement for even some SAP-to-SAP connections)
  • Apart from Tables, will it be possible to use Datasphere’s Analytic Models or Views as a source for replication flows? If so, when? (this is currently not specified on the SAP roadmap)

Despite these open topics, for which we will find an answer soon, we hope this blog will give you some insights into both the outlook of Data Intelligence functionalities in Datasphere as well as the possibilities that the (premium) replication flows offer.

Now that we have a bit more clarity on the future of SAP Data Intelligence and SAP Datasphere, I will give the floor to my colleagues Dennis and Dirk-Jan as they dive into the latter solution’s replication flows.

Setting up the connection towards S/4HANA


It stands to reason that you will require an operational SAP Datasphere system if you want to configure along with this guide.

  1. First, connect and configure the SAP Cloud Connector to your S/4HANA on-premise system (we will not go into this specific configuration in this blog).
  2. Next, connect and configure the DP Agent (whose configuration is also outside of this blog’s scope).
  3. Once these steps have been conducted, go to the SAP Datasphere à Connections menu and create a new Local Connection (make sure to select S/4HANA on-premise when prompted) and configure it as follows:
Datasphere blog 1 en 3
Datasphere Afbeelding1 4
Datasphere Afbeelding1 5

4. Next, provide optional advanced properties and the correct connection information for your objects (e.g. a business name, not obligatory but quite handy, and an obligatory technical name) and validate the connection:

Datasphere blog 2

If everything has been configured correctly, a message toast should pop up similar to the one below:

Datasphere Afbeelding1 7

If, for example, something is wrong with the virtual hostname used in the Cloud Connector settings, it will show a popup with additional information like this one:

Datasphere Afbeelding1 8

Setting up the connection to Google BigQuery

Here, we will illustrate the required steps to set up a test environment and connection between SAP Datasphere and Google BigQuery:

  1. Create a Google BigQuery (trial) account via the ‘Try BigQuery free’-button: https://cloud.google.com/bigquery
  2. Now login to your Google account and add your address and credit card information.
  3. After the account creation you will land on an overview page with a project that gets created by default (you can optionally create your own).
  4. Go to Google BigQuery, either through the ‘Products & solutions’ menu or via the direct URL: https://console.cloud.google.com/bigquery
Datasphere Afbeelding1 9

5. Use ‘Create SQL Query’ to create a dataset/schema. The ‘project-id’ prefix is optional.

Datasphere Afbeelding1 10

6. Go to Service Accounts either through the ‘IAM & Admin -> Service Accounts’ menu or with the direct URL: https://console.cloud.google.com/iam-admin/serviceaccounts

Datasphere Afbeelding1 11

7. Create a Service Account and grant it the required roles:

Datasphere Afbeelding1 12
  • Note that a BigQuery Job User is necessary for ‘Data flows’ and ‘Replication flows’.
  • The BigQuery Data Owner is needed to read the dataset/schemas and write in it. Note that another, less privileged role might also work here (we have not been able to validate this yet). Later on in the connection validation from SAP Datasphere this falls under ‘Remote tables’.

8. Open the newly created service account:

Datasphere Afbeelding1 13

9. Go to the ‘Keys’-tab and create a new JSON key. It will directly start a download. After you’ve done this, create a private key of the type ‘JSON’ and save it to your computer.

10. Now go to the SAP Datasphere documentation and check the supported drivers:

Datasphere Afbeelding1 14

11. You can download the driver on the URL below. Make sure to use the correct filename from the documentation.
https://storage.googleapis.com/simba-bq-release/odbc/SimbaODBCDriverforGoogleBigQuery_3.0.0.1001-Linux.tar.gz

12. Open your SAP Datasphere tenant, go to ‘System’ > ‘Configuration’ and in the tab ‘Data Integration’ under ‘Third-Party Drivers’, upload the previously downloaded file through the ‘Driver File:’-upload box. The system will also remind you that you will need to have a valid license for driver files before processing. Depending on the connection speed this may take some time.

13. Once the driver is added make sure to select it and press the ‘Sync’-button under the ‘Third-Party Drivers’-submenu.

14. Go to the SAP Datasphere Connections and create a new Local Connection (like we did before for S/4HANA). However, this time of course select Google BigQuery:

Datasphere Afbeelding1 15

Set up the connection as follows, providing once again the obligatory technical and optional business name of the connection.

Datasphere Afbeelding1 16

15. Select the created connection and validate it (please refer to the image below).

Datasphere Afbeelding1 17

You will once again receive either the ‘Okay’ or ‘Something is wrong’-messages (for each an example below):

Datasphere Afbeelding1 18 en 19 2

Creating the Replication Flows

Now that the connections for both S/4HANA and BigQuery have been set up, we can create a sample replication task from S/4HANA to Google BigQuery:

1. Create a new Replication Flow in the Datasphere Space of your choice (through the Data Builder):

Datasphere Afbeelding1 20

2. Select the Source Connection, in this case S4H_TEST, Source Container, in this case CDS, and add some Source Objects, in this case ZDV_CDS_VIEW:

Datasphere Afbeelding1 21 en 22

3. Next select the Target Connection, in this case GBQ_TEST, and select a container, in this case the created dataset/schema A_TEST_SCHEMA_NAME:

Datasphere Afbeelding1 23
Datasphere Afbeelding1 24

4. Give the Replication Flow a name, save it and finally deploy it.

Datasphere Afbeelding1 25

5. Now run the Replication Flow, optionally check the Data Integration Monitor:

Datasphere Afbeelding1 26 b

6. Finally, return to check Google BigQuery; it should contain the replicated data:

Datasphere blog 4

SAP Datasphere Replication Flow from S/4HANA to Azure Data Lake

Now that we have seen how connections are set up and how replication flows are created to our BigQuery example, let’s take a short look at how this could work for other targets, for example Azure Data Lake.

First, you will need to have configured S4/HANA and an Azure storage account enabled with Gen2 Blob Storage. The next step is to create connections in Datasphere to S4/HANA and of course Azure Data Lake. Give that this blog has already gone in-depth on how to do this, we will not go into this (almost identical) setup process again here and start from the point where the relevant connections have already been established:

Datasphere Afbeelding1 28

1. Navigate to the Data Builder -> New Replication Flow.

You will once again land in the ‘You haven’t added any data yet’-screen, so press ‘Select Source Connection’ and select your S4/HANA connection. Now, press Select Source Container -> CDS
(which is the container we use for both the BigQuery and Azure examples):

Datasphere Afbeelding1 29

2. Select Add Source objects, for example I_BUSINESSPARTNER.

Datasphere Afbeelding1 30

3. Subsequently, press Add Selection and select your Source Object. Now we will add the target connection, so select your Azure data lake connection and the required container:

Datasphere Afbeelding1 31

4. Configure the connection to your needs, for example:

  • Initial and Delta
  • With a Delta load interval of thirty minutes
Datasphere Afbeelding1 33
Datasphere Afbeelding1 34

5. Save, activate and run your replication flow like we did before for the S/4HANA – BigQuery connection:

Datasphere Afbeelding1 35

6. Now we can once again check the Data integration monitor for the run status:

Datasphere Afbeelding1 36

7. In Azure you can subsequently check your files:

Datasphere Afbeelding1 37
Datasphere Afbeelding1 38

8. Finally, you might encounter some errors:

Datasphere Afbeelding1 39

To tackle this issue, implement the respective SAP note in the S/4HANA system via SNOTE.

Datasphere Afbeelding1 41 en 42

For the above issue, truncate the initial data load to circumvent it.

This concludes our blog on the state of SAP Data Intelligence/Datasphere and its replication flows. If you want to know more or have other Datasphere or Data Intelligence related questions, please do not hesitate to contact us. For the official SAP Documentation on Datasphere’s Replication Flows, click here.

Credits

This blog was written by our experts Dennis van Velzen, Lars van der Goes and Dirk-Jan Kloezeman.

About the author

Photo of Dennis van Velzen
Dennis van Velzen

Read more articles by Dennis van Velzen