Now in analysis services there is a Cube which contains both Dimension and Measure group data. But the important thing in this Cube is partition. Now basically we can say that partitions are nothing but simple object present inside the Cube which holds the complex measure group data. It is like having sub container inside the Cube which we can call it as big container which contains Dimension data and small partitions having measure group data.
These partitions do not have to be created manually but they get created when Cube gets created, dimensions and the partitions are loaded with data from data ware house when Cube gets processed.
So if we take an example we can see that there a Fact table called “FactCustomer” and only 1 partition is created when Cube got created.
Here we can see the one partition created.
What is the need for Partitions?
Now usually in small examples when we want to load data into the Cube from our Data warehouse we simply process the Cube. It means we go to our Solution Explorer right click and then select process option which will process the Cube.
But processing all the records from our Fact table is not feasible for large database in real time scenario. In the production environment there millions and billions of record and every time processing them will take a very long time. It means that processing Cube means processing our data warehouse again and again.
So now in real time to make this work faster we will deploy the Cube and process it for the first time and then after that in future whenever we want to refresh the Cube or process the Cube for new data, we will not process the whole Cube but we will process parts of Cube which are necessary. For instance we can process the dimensions individually.
So therefore we need partitions in our cube to keep the measure data, so that for large data warehouses we can simply process only those partitions which are needed and that will make our work faster and performance will increase.
Partitions contain mainly measure group data. And in real time environment Facts or measure groups are processed more that the dimensions because dimensions are masters and no one will change the masters every day whereas facts are changed every day so Facts or measure group data are processed more frequently.
Different Deployment options with Processing:
Now usually when we deploy the Cube after the deployment it asks us to process the Cube. Now for large data records it will be very difficult to every time process the Cube. So while deploying we have three kinds of options. We can set those options according to the various real time scenarios.
The three processing options after the deployment are:
- Do Not Process
Now we will understand one by one what these options mean.
Full option means that whenever we deploy the Cube processing will happen after deployment. It means every time the deployment will be continued by processing.
Do Not Process option means the Cube will not process after deployment. It means that every time we deploy the Cube it will not be continued by processing. So for processing the Cube or the individual partition or dimension we will have to explicitly select the option of process and do processing. In real time projects this Do Not Process option is the best as we have to manually deploy and process the Cube separately.
Default option means that if the Cube is already processed then don’t process it and if the Cube is not processed then process it.
But as a best practice in industry we will use the option Do Not Process.
Now how to select these options, here is the guide:
Below is the image snapshot where under Deployment tthere are three processing option “Default”, “Do Not Process” and “Full”. This specifies whether Analysis services object should be processed when the project is deployed.
Below is the MSBI video which covers how to start with SSIS project: -