Definition of role-playing and role dimensions

Introduction

A single dimension can be referenced multiple times in a fact table, with each reference linked to a different conceptual role. These separate dimension views are called role dimensions and the common dimension is called a role-playing dimension.

Depending on the OLAP (On-Line Analytical Processing) query system that we use, we may be interested in having a single physical table and defining views on it (one view for each role) or a different physical table for each role. The rolap package tries to facilitate the operations to achieve the alternative that we decide to implement.

This document shows by means of an example the possibilities offered by the package in this context. First, through a star database, then by implementing a constellation. Finally, the conclusions are presented.

Role-playing and role dimensions in a star database

First of all, we are going to present the starting data, then we will define the star schema to obtain the star database. Next, we will see the different possibilities with the role-playing dimension and the roles.

Flat table with the initial data

We use a dataset based on the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System dataset. Two additional dates have been generated, which were not present in the original dataset.

Reception Year Reception Week Reception Date Data Availability Year Data Availability Week Data Availability Date Year WEEK Week Ending Date REGION State City Pneumonia and Influenza Deaths All Deaths Other Deaths
1962 2 1962-01-11 1962 3 1962-01-15 1962 1 1962-01-06 1 CT Bridgeport 3 46 43
1962 3 1962-01-16 1962 3 1962-01-18 1962 2 1962-01-13 1 MA Boston 11 270 259
1962 2 1962-01-13 1962 2 1962-01-13 1962 2 1962-01-13 1 CT Hartford 2 54 52
1962 3 1962-01-15 1962 3 1962-01-17 1962 2 1962-01-13 1 CT Bridgeport 2 43 41
1962 4 1962-01-27 1962 6 1962-02-07 1962 4 1962-01-27 1 MA Boston 12 285 273
1962 2 1962-01-13 1962 3 1962-01-18 1962 2 1962-01-13 1 MA Cambridge 2 27 25
1962 11 1962-03-14 1962 11 1962-03-15 1962 9 1962-03-03 1 CT Hartford 3 63 60
1962 5 1962-01-29 1962 5 1962-02-01 1962 4 1962-01-27 1 CT Hartford 1 47 46
1962 4 1962-01-27 1962 4 1962-01-28 1962 4 1962-01-27 1 CT Bridgeport 4 46 42
1962 4 1962-01-27 1962 5 1962-02-01 1962 4 1962-01-27 1 MA Cambridge 1 40 39

We have selected a small dataset so that the result can be more easily appreciated. Several columns with time information have been added so that it is justified to define a time dimension that can play several roles.

Star schema

To define the schema, we get the table columns as shown below.

dput(colnames(ft_cause_rpd))
#> c("Reception Year", "Reception Week", "Reception Date", "Data Availability Year", 
#> "Data Availability Week", "Data Availability Date", "Year", "WEEK", 
#> "Week Ending Date", "REGION", "State", "City", "Pneumonia and Influenza Deaths", 
#> "All Deaths", "Other Deaths")

The definition of the star schema is shown below.

when <- dimension_schema(name = "When",
                         attributes = c("Year",
                                        "WEEK",
                                        "Week Ending Date"))
when_available <- dimension_schema(
  name = "When Available",
  attributes = c(
    "Data Availability Year",
    "Data Availability Week",
    "Data Availability Date"
  )
)
where <- dimension_schema(name = "where",
                          attributes = c("REGION",
                                         "State",
                                         "City"))
s <- star_schema() |>
  define_facts(fact_schema(
    name = "MRS Cause",
    measures = c("Pneumonia and Influenza Deaths",
                 "All Deaths")
  )) |>
  define_dimension(when) |>
  define_dimension(when_available) |>
  define_dimension(dimension_schema(
    name = "When Received",
    attributes = c("Reception Year",
                   "Reception Week",
                   "Reception Date")
  )) |>
  define_dimension(where)

As we will define more star schemas with dimensions in common, some of the dimensions have been defined as variables, to avoid repeating their definition later.

Star database

From the star schema and the flat table, we define the star database, as can be seen below.

db <- star_database(s, ft_cause_rpd) |>
  snake_case()

It has been transformed to snake case because it is usually more appropriate if we work in RDBMS (Relational Database Management Systems).

It can be seen that the tables associated with the dimension when (when, when_available, when_received) have the same structure, although the names of their attributes are different because they come from the same flat table and the field names cannot be repeated in the same table.

The instances of the tables of dimensions and facts of the star database are shown below.

when_key year week week_ending_date
1 1962 1 1962-01-06
2 1962 2 1962-01-13
3 1962 4 1962-01-27
4 1962 9 1962-03-03
when_available_key data_availability_year data_availability_week data_availability_date
1 1962 11 1962-03-15
2 1962 2 1962-01-13
3 1962 3 1962-01-15
4 1962 3 1962-01-17
5 1962 3 1962-01-18
6 1962 4 1962-01-28
7 1962 5 1962-02-01
8 1962 6 1962-02-07
when_received_key reception_year reception_week reception_date
1 1962 11 1962-03-14
2 1962 2 1962-01-11
3 1962 2 1962-01-13
4 1962 3 1962-01-15
5 1962 3 1962-01-16
6 1962 4 1962-01-27
7 1962 5 1962-01-29
where_key region state city
1 1 CT Bridgeport
2 1 CT Hartford
3 1 MA Boston
4 1 MA Cambridge
when_key when_available_key when_received_key where_key pneumonia_and_influenza_deaths all_deaths nrow_agg
1 3 2 1 3 46 1
2 2 3 2 2 54 1
2 4 4 1 2 43 1
2 5 3 4 2 27 1
2 5 5 3 11 270 1
3 6 6 1 4 46 1
3 7 6 4 1 40 1
3 7 7 2 1 47 1
3 8 6 3 12 285 1
4 1 1 2 3 63 1

We observe that each of the when dimensions (when, when_available, when_received) has its own set of instances, exclusively those necessary for the concept they represent.

Star database with a role-playing dimension and role dimensions

In the star database, we can define that any one of the dimensions is the role-playing dimension and the other dimensions are various roles of that dimension. The only requirement is that all dimensions must have the same number of attributes. This definition is shown below.

db_1 <- db |>
  role_playing_dimension(
    rpd = "when",
    roles = c("when_available", "when_received")
  )

The result can be seen in the instances of the star database tables, below.

when_key year week week_ending_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 6 1962-02-07
15 1962 9 1962-03-03
when_available_key data_availability_year data_availability_week data_availability_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 6 1962-02-07
15 1962 9 1962-03-03
when_received_key reception_year reception_week reception_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 6 1962-02-07
15 1962 9 1962-03-03
where_key region state city
1 1 CT Bridgeport
2 1 CT Hartford
3 1 MA Boston
4 1 MA Cambridge
when_key when_available_key when_received_key where_key pneumonia_and_influenza_deaths all_deaths nrow_agg
1 6 4 1 3 46 1
5 5 5 2 2 54 1
5 8 6 1 2 43 1
5 9 5 4 2 27 1
5 9 7 3 11 270 1
10 11 10 1 4 46 1
10 13 10 4 1 40 1
10 13 12 2 1 47 1
10 14 10 3 12 285 1
15 3 2 2 3 63 1

We can see that the tables associated with the when dimension contain the same instances and the fact table has been updated appropriately.

In this example, the field names have been preserved, but surely we may want all tables to have the same field names, except for the surrogate key. This can be easily achieved by indicating that we want the names of the fields of the role-playing dimension or by indicating new names in the function. The first alternative is shown below, using the rpd_att_names parameter.

db_2 <- db |>
  role_playing_dimension(
    rpd = "when",
    roles = c("when_available", "when_received"),
    rpd_att_names = TRUE
  )

In this case, the schema is displayed as the result since the content of the tables is the same.

The physical tables are kept in the star database but we work with a RDBMS and we want to work with only one physical table for the dimension, the rest of the tables can be easily defined by views.

Role-playing and role dimensions in a constellation

With several star databases we can define a constellation where stars can share dimensions. Common dimensions must have the same name and structure. In the case of having role-playing dimensions and role dimensions, this must be taken into account since the shared dimensions must be transformed into conformed dimensions, but the package takes care of this automatically, as we will see below.

Second star database

We start with a similar dataset, with dates added to the original dataset, as shown below.

Arrival Year Arrival Week Arrival Date Data Availability Year Data Availability Week Data Availability Date Year WEEK Week Ending Date REGION State City Age Range Deaths
1962 5 1962-02-03 1962 6 1962-02-05 1962 5 1962-02-03 1 MA Cambridge 65+ years 23
1962 4 1962-01-27 1962 6 1962-02-07 1962 4 1962-01-27 1 MA Boston 25-44 years 8
1962 6 1962-02-11 1962 7 1962-02-17 1962 5 1962-02-03 1 CT Hartford <1 year 1
1962 5 1962-02-03 1962 6 1962-02-05 1962 5 1962-02-03 1 CT Bridgeport 45-64 years 15
1962 11 1962-03-14 1962 11 1962-03-15 1962 9 1962-03-03 1 CT Hartford 25-44 years 5
1962 9 1962-02-26 1962 9 1962-03-01 1962 8 1962-02-24 1 CT Hartford 25-44 years 1
1962 2 1962-01-13 1962 2 1962-01-13 1962 2 1962-01-13 1 CT Hartford 1-24 years 1
1962 6 1962-02-08 1962 7 1962-02-14 1962 5 1962-02-03 1 MA Boston 65+ years 142
1962 8 1962-02-25 1962 9 1962-03-01 1962 8 1962-02-24 1 MA Cambridge 45-64 years 8
1962 5 1962-01-29 1962 5 1962-02-01 1962 4 1962-01-27 1 CT Hartford 1-24 years 1
1962 4 1962-01-27 1962 4 1962-01-28 1962 4 1962-01-27 1 CT Bridgeport 65+ years 25
1962 8 1962-02-24 1962 9 1962-02-26 1962 8 1962-02-24 1 MA Boston 45-64 years 85
dput(colnames(ft_age_rpd))
#> c("Arrival Year", "Arrival Week", "Arrival Date", "Data Availability Year", 
#> "Data Availability Week", "Data Availability Date", "Year", "WEEK", 
#> "Week Ending Date", "REGION", "State", "City", "Age Range", "Deaths"
#> )

The definition of the star schema is shown below.

s_2 <- star_schema() |>
  define_facts(fact_schema(
    name = "MRS Age",
    measures = c(
      "Deaths"
    )
  )) |>
  define_dimension(when) |>
  define_dimension(when_available) |>
  define_dimension(dimension_schema(
    name = "When Arrived",
    attributes = c(
      "Arrival Year",
      "Arrival Week",
      "Arrival Date"
    )
  )) |>
  define_dimension(dimension_schema(
    name = "Who",
    attributes = c(
      "Age Range"
    )
  )) |>
  define_dimension(where)

We have reused the definitions of the When and When Available dimensions from the previous schema. A new dimension When Arrived has been added with different names in order to show how the definition works.

The definition of the star database is shown below.

db_3 <- star_database(s_2, ft_age_rpd) |>
  role_playing_dimension(
    rpd = "When Arrived",
    roles = c("When Available"),
    att_names = c("Year", "Week", "Week Ending Date")
  ) |>
  snake_case()

In this case, a role-playing dimension and a role dimension have been defined. The When dimension has not been included in this definition also to show how the system works.

The tables of the new star database are shown below.

when_key year week week_ending_date
1 1962 2 1962-01-13
2 1962 4 1962-01-27
3 1962 5 1962-02-03
4 1962 8 1962-02-24
5 1962 9 1962-03-03
when_available_key year week week_ending_date
1 1962 11 1962-03-14
2 1962 11 1962-03-15
3 1962 2 1962-01-13
4 1962 4 1962-01-27
5 1962 4 1962-01-28
6 1962 5 1962-01-29
7 1962 5 1962-02-01
8 1962 5 1962-02-03
9 1962 6 1962-02-05
10 1962 6 1962-02-07
11 1962 6 1962-02-08
12 1962 6 1962-02-11
13 1962 7 1962-02-14
14 1962 7 1962-02-17
15 1962 8 1962-02-24
16 1962 8 1962-02-25
17 1962 9 1962-02-26
18 1962 9 1962-03-01
when_arrived_key year week week_ending_date
1 1962 11 1962-03-14
2 1962 11 1962-03-15
3 1962 2 1962-01-13
4 1962 4 1962-01-27
5 1962 4 1962-01-28
6 1962 5 1962-01-29
7 1962 5 1962-02-01
8 1962 5 1962-02-03
9 1962 6 1962-02-05
10 1962 6 1962-02-07
11 1962 6 1962-02-08
12 1962 6 1962-02-11
13 1962 7 1962-02-14
14 1962 7 1962-02-17
15 1962 8 1962-02-24
16 1962 8 1962-02-25
17 1962 9 1962-02-26
18 1962 9 1962-03-01
who_key age_range
1 1-24 years
2 25-44 years
3 45-64 years
4 65+ years
5 <1 year
where_key region state city
1 1 CT Bridgeport
2 1 CT Hartford
3 1 MA Boston
4 1 MA Cambridge
when_key when_available_key when_arrived_key who_key where_key deaths nrow_agg
1 3 3 1 2 1 1
2 5 4 4 1 25 1
2 7 6 1 2 1 1
2 10 4 2 3 8 1
3 9 8 3 1 15 1
3 9 8 4 4 23 1
3 13 11 4 3 142 1
3 14 12 5 2 1 1
4 17 15 3 3 85 1
4 18 16 3 4 8 1
4 18 17 2 2 1 1
5 2 1 2 2 5 1

It can be seen that the When dimension is independent, it does not have the same instances as the other dimensions.

Constellation

The definition of the constellation is shown below.

ct <- constellation("MRS", db_2, db_3)

Below we can see the graphic representation of the tables that compose it.

The situation is that the When dimension in this second star database is independent of the other dimensions, but it is a common dimension with the first star database: it has to be a conformed dimension.

Below are the instances of the tables to better appreciate the result.

when_key year week week_ending_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 5 1962-02-03
15 1962 6 1962-02-05
16 1962 6 1962-02-07
17 1962 6 1962-02-08
18 1962 6 1962-02-11
19 1962 7 1962-02-14
20 1962 7 1962-02-17
21 1962 8 1962-02-24
22 1962 8 1962-02-25
23 1962 9 1962-02-26
24 1962 9 1962-03-01
25 1962 9 1962-03-03
when_arrived_key year week week_ending_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 5 1962-02-03
15 1962 6 1962-02-05
16 1962 6 1962-02-07
17 1962 6 1962-02-08
18 1962 6 1962-02-11
19 1962 7 1962-02-14
20 1962 7 1962-02-17
21 1962 8 1962-02-24
22 1962 8 1962-02-25
23 1962 9 1962-02-26
24 1962 9 1962-03-01
25 1962 9 1962-03-03
when_available_key year week week_ending_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 5 1962-02-03
15 1962 6 1962-02-05
16 1962 6 1962-02-07
17 1962 6 1962-02-08
18 1962 6 1962-02-11
19 1962 7 1962-02-14
20 1962 7 1962-02-17
21 1962 8 1962-02-24
22 1962 8 1962-02-25
23 1962 9 1962-02-26
24 1962 9 1962-03-01
25 1962 9 1962-03-03
when_received_key year week week_ending_date
1 1962 1 1962-01-06
2 1962 11 1962-03-14
3 1962 11 1962-03-15
4 1962 2 1962-01-11
5 1962 2 1962-01-13
6 1962 3 1962-01-15
7 1962 3 1962-01-16
8 1962 3 1962-01-17
9 1962 3 1962-01-18
10 1962 4 1962-01-27
11 1962 4 1962-01-28
12 1962 5 1962-01-29
13 1962 5 1962-02-01
14 1962 5 1962-02-03
15 1962 6 1962-02-05
16 1962 6 1962-02-07
17 1962 6 1962-02-08
18 1962 6 1962-02-11
19 1962 7 1962-02-14
20 1962 7 1962-02-17
21 1962 8 1962-02-24
22 1962 8 1962-02-25
23 1962 9 1962-02-26
24 1962 9 1962-03-01
25 1962 9 1962-03-03
where_key region state city
1 1 CT Bridgeport
2 1 CT Hartford
3 1 MA Boston
4 1 MA Cambridge
who_key age_range
1 1-24 years
2 25-44 years
3 45-64 years
4 65+ years
5 <1 year
when_key when_available_key when_received_key where_key pneumonia_and_influenza_deaths all_deaths nrow_agg
1 6 4 1 3 46 1
5 5 5 2 2 54 1
5 8 6 1 2 43 1
5 9 5 4 2 27 1
5 9 7 3 11 270 1
10 11 10 1 4 46 1
10 13 10 4 1 40 1
10 13 12 2 1 47 1
10 16 10 3 12 285 1
25 3 2 2 3 63 1
when_key when_available_key when_arrived_key who_key where_key deaths nrow_agg
5 5 5 1 2 1 1
10 11 10 4 1 25 1
10 13 12 1 2 1 1
10 16 10 2 3 8 1
14 15 14 3 1 15 1
14 15 14 4 4 23 1
14 19 17 4 3 142 1
14 20 18 5 2 1 1
21 23 21 3 3 85 1
21 24 22 3 4 8 1
21 24 23 2 2 1 1
25 3 2 2 2 5 1

We can see that all the tables with information related to When dimension have the same instances, including all the instances of both star databases. In this example, the dimension attributes have been renamed.

The definition of the role-playing dimension and the role dimensions have been updated when integrating the two star databases. It can be queried using the following function.

ct  |>
  get_role_playing_dimension_names()
#> $rpd_1
#> [1] "when"           "when_arrived"   "when_available" "when_received"

The role-playing and role dimensions that share instances are displayed.

Conclusions

The rolap package supports the definition of role-playing and role dimensions in star databases. In this way, the result can be exported to a RDBMS where we can define the dimensions as independent tables or as views of a single physical table, as we need or prefer.

The definition of role-playing and role dimensions is propagated from star databases to constellations, allowing to integrate various definitions into a common one.