From the original attributes included in the dimensions, we can obtain new attributes that facilitate queries or offer new query possibilities.
Enrich the who dimension
Suppose that we are interested in defining some broader age ranges than the existing ones. This operation can be done by enriching the corresponding dimension.
First, we export the attributes to consider in table form, in this case only the age range.
tb_who <-
enrich_dimension_export(st_mrs_age,
name = "who",
attributes = c("age_range"))
Next, we can see the result of the export operation. It is a table with the selected attributes where duplicate values have been eliminated if there are any (in this case there are no repeated values).
1: <1 year |
2: 1-24 years |
3: 25-44 years |
4: 45-64 years |
5: 65+ years |
In the table we add the columns that we want. In this case a new column to define the new broader age range.
v <-
c("0-24 years", "0-24 years", "25+ years", "25+ years", "25+ years")
tb_who <-
tibble::add_column(tb_who,
wide_age_range = v)
The new table can be seen below.
1: <1 year |
0-24 years |
2: 1-24 years |
0-24 years |
3: 25-44 years |
25+ years |
4: 45-64 years |
25+ years |
5: 65+ years |
25+ years |
We enrich the dimension considering the new data in the table.
st_mrs_age <-
st_mrs_age %>%
enrich_dimension_import(name = "who", tb_who)
We can see the result below, where the dimension has the new defined attribute.
1 |
1: <1 year |
0-24 years |
2 |
2: 1-24 years |
0-24 years |
3 |
3: 25-44 years |
25+ years |
4 |
4: 45-64 years |
25+ years |
5 |
5: 65+ years |
25+ years |
Enrich the where dimension
For the where dimension we can proceed in the same way as we have done for the who dimension: Export the data, complete it manually and import it again, as shown below.
tb_where <-
enrich_dimension_export(st_mrs_age,
name = "where",
attributes = c("division"))
The new table for division data can be seen below.
We look for the names of the divisions and add the data of the regions to which they belong.
tb_where <-
tibble::add_column(
tb_where,
division_name = c(
"New England",
"Middle Atlantic",
"East North Central",
"West North Central",
"South Atlantic",
"East South Central",
"West South Central",
"Mountain",
"Pacific"
),
region = c('1',
'1',
'2',
'2',
'3',
'3',
'3',
'4',
'4'),
region_name = c(
"Northeast",
"Northeast",
"Midwest",
"Midwest",
"South",
"South",
"South",
"West",
"West"
)
)
st_mrs_age <-
st_mrs_age %>%
enrich_dimension_import(name = "where", tb_where)
st_mrs_cause <-
st_mrs_cause %>%
enrich_dimension_import(name = "where", tb_where)
To add the name of the states and the county to which each city belongs, we could proceed in the same way. However, it is easier if we try to locate this data and use it directly. These data are available in the ft_usa_states
and ft_usa_city_county
data sets, respectively.
However, if we operate in the same way, when importing the data an error occurs. The reason is that not all the data in the dimension matches the data in the imported table. We can determine the missing data using the following function.
tb_missing <-
st_mrs_age %>%
enrich_dimension_import_test(name = "where", ft_usa_states)
The result obtained is shown below.
48 |
3 |
Unknown |
Unknown |
East North Central |
2 |
Midwest |
78 |
6 |
Unknown |
Unknown |
East South Central |
3 |
South |
91 |
7 |
Unknown |
Unknown |
West South Central |
3 |
South |
111 |
9 |
Unknown |
Unknown |
Pacific |
4 |
West |
In all cases, the problem occurs for the value “Unknown” in the state attribute. We must add a row to the data before importing it.
tb_where_state <- ft_usa_states %>%
tibble::add_row(state = "Unknown", state_name = "Unknown")
st_mrs_age <-
st_mrs_age %>%
enrich_dimension_import(name = "where", tb_where_state)
st_mrs_cause <-
st_mrs_cause %>%
enrich_dimension_import(name = "where", tb_where_state)
The same problem occurs and we apply the same solution to add the county data.
tb_where_county <- ft_usa_city_county %>%
tibble::add_row(city = "Unknown",
state = "Unknown",
county = "Unknown")
st_mrs_age <-
st_mrs_age %>%
enrich_dimension_import(name = "where", tb_where_county)
st_mrs_cause <-
st_mrs_cause %>%
enrich_dimension_import(name = "where", tb_where_county)
We can see the first rows of the final result below.
1 |
1 |
CT |
Bridgeport |
New England |
1 |
Northeast |
Connecticut |
Fairfield |
2 |
1 |
CT |
Hartford |
New England |
1 |
Northeast |
Connecticut |
Hartford |
3 |
1 |
CT |
New Haven |
New England |
1 |
Northeast |
Connecticut |
New Haven |
4 |
1 |
CT |
Waterbury |
New England |
1 |
Northeast |
Connecticut |
New Haven |
5 |
1 |
MA |
Boston |
New England |
1 |
Northeast |
Massachusetts |
Suffolk |
6 |
1 |
MA |
Cambridge |
New England |
1 |
Northeast |
Massachusetts |
Middlesex |
7 |
1 |
MA |
Fall River |
New England |
1 |
Northeast |
Massachusetts |
Bristol |
8 |
1 |
MA |
Lowell |
New England |
1 |
Northeast |
Massachusetts |
Middlesex |
9 |
1 |
MA |
Lynn |
New England |
1 |
Northeast |
Massachusetts |
Essex |
10 |
1 |
MA |
New Bedford |
New England |
1 |
Northeast |
Massachusetts |
Bristol |