what is a junk dimension ?
Answer Posted / sanju
A junk dimension is a convenient grouping of typically
low-cardinality flags and indicators. By creating an
abstract dimension, these flags and indicators are removed
from the fact table while placing them into a useful
dimensional framework.[2] A Junk Dimension is a dimension
table consisting of attributes that do not belong in the
fact table or in any of the existing dimension tables. The
nature of these attributes is usually text or various flags,
e.g. non-generic comments or just simple yes/no or
true/false indicators. These kinds of attributes are
typically remaining when all the obvious dimensions in the
business process have been identified and thus the designer
is faced with the challenge of where to put these attributes
that do not belong in the other dimensions.
One solution is to create a new dimension for each of the
remaining attributes, but due to their nature, it could be
necessary to create a vast number of new dimensions
resulting in a fact table with a very large number of
foreign keys. The designer could also decide to leave the
remaining attributes in the fact table but this could make
the row length of the table unnecessarily large if, for
example, the attributes is a long text string.
The solution to this challenge is to identify all the
attributes and then put them into one or several Junk
Dimensions. One Junk Dimension can hold several true/false
or yes/no indicators that have no correlation with each
other, so it would be convenient to convert the indicators
into a more describing attribute. An example would be an
indicator about whether a package had arrived, instead of
indicating this as “yes” or “no”, it would be converted into
“arrived” or “pending” in the junk dimension. The designer
can choose to build the dimension table so it ends up
holding all the indicators occurring with every other
indicator so that all combinations are covered. This sets up
a fixed size for the table itself which would be 2^x rows,
where x is the number of indicators. This solution is
appropriate in situations where the designer would expect to
encounter a lot of different combinations and where the
possible combinations are limited to an acceptable level. In
a situation where the number of indicators are large, thus
creating a very big table or where the designer only expect
to encounter a few of the possible combinations, it would be
more appropriate to build each row in the junk dimension as
new combinations are encountered. To limit the size of the
tables, multiple junk dimensions might be appropriate in
other situations depending on the correlation between
various indicators.
Junk dimensions are also appropriate for placing attributes
like non-generic comments from the fact table. Such
attributes might consist of data from an optional comment
field when a customer places an order and as a result will
probably be blank in many cases. Therefore the junk
dimension should contain a single row representing the
blanks as a surrogate key that will be used in the fact
table for every row returned with a blank comment field[3]
| Is This Answer Correct ? | 0 Yes | 1 No |
Post New Answer View All Answers
Suppose we have a source qualifier transformation that populates two target tables. How do you ensure tgt2 is loaded after tgt1?
How can we remove the duplicates from flat file source?
Explain lookup transformation source types in informatica
Differentiate between sessions and batches?
explain any diffcult scenario that u have faced in your experience... or explain any complex maping u have developed?
expain about the tune parameters?
How do you load only null records into target? Explain through mapping flow.
What is the command used to run a batch?
How do you use reject loader.
What is the different lookup cache(s)?
Enlist few areas or real-time situations where informatica is required.
Sequence generator, when you move from develoment to production how will you reset
Write the prerequisite tasks to achieve the session partition?
What are the data movement modes in informatica?
How can you differentiate between powercenter and power map?