Explain about Skew Factor?
Answers were Sorted based on User's Feedback
Answer / sat!sh
The data distribution of table among AMPs is called Skew Factor
Generally For Non-Unique PI we get duplicate values so the
more duplicate vales we get more the data have same rowhash
so all the same data will come to same amp, it makes data
distribution inequality,
One amp will store more data and other amp stores less
amount of data, when we are accessing full table, The amp
which is having more data will take longer time and makes
other amps waiting which leads processing wastage
In this situation (unequal distribution of data)we get Skew
Factor High
For this type of tables we should avoid full table scans
ex:
AMP0 AMP1
10000(10%) 9000000(90%)
in this situation skew factor is very high 90%
| Is This Answer Correct ? | 79 Yes | 3 No |
Answer / sricharan
It is a number.It tells you how the data is distributed
among Processors or Amps.Skew factor varies about 0-100.
If the data is distributed evenly among the processor the
skew factor is ZERO whether it is smaller or large table.
Skew factor doesn't depends on size of table but it only
depends on distribution of data.
If the skewfactor is more we have to access the table on
only primary index columns but not whole table.
| Is This Answer Correct ? | 26 Yes | 0 No |
Answer / yuvaevergreen
Skew Factor is the indication of how evenly the data is
spread across the AMPS.
A skew factor of 0 indicates that the data is perfectly
distributed across all the AMPS.
| Is This Answer Correct ? | 17 Yes | 1 No |
Answer / navaneeth reddy
Skew factor is distribution of rows of a table among the
available no.of AMP's.
If your table has a chance of using unique primary index,it
is always better to use UPI which ensures the skew factor
around 0%.
If there is no chance of having unique values column in a
table choose a column as PI(primary index) which has less
duplicate values which inturn results in less skew factor.
That is the data will be distributed almost(not exactly
equal percentage) equally to all AMP's.
| Is This Answer Correct ? | 12 Yes | 0 No |
Answer / yuvaevergreen
Below query can be used to find the distribution by amps.
SELECT HASHAMP(HASHBUCKET(HASHROW(index or column)))
,COUNT(*)
FROM TABLENAME GROUP BY 1 ORDER BY 2 DESC;
| Is This Answer Correct ? | 3 Yes | 0 No |
i have a table like sales....the field are Prodid Jan(jam month sales)Feb March 1 20 76 50 2 30 94 40 3 40 90 30 4 70 20 30 5 23 40 40 6 85 30 55 7 84 20 65 8 10 93 40 9 57 30 30 10 38 83 40 11 35 39 90 12 83 89 50 Now the Question is i want get the max sales of 12 products from the months.hint:for eg I WANT GET 89 for product12... Can any one help me
What is meant by a dispatcher?
write lock is compatiable with which type of lock?
How to find duplicates in a table?
How to view every column and the columns contained in indexes in teradata?
How many types of index are present in teradata?
What is the purpose of upsert command?
where we can use the delimiter in mload? pls let me know
What is real time and near real time data warehousing?
Discuss the advantages of using partitioned primary index in a query?
one table have input no name 10 rao 20 siva 30 srinu 10 rao i want to ouput like this way no name 20 siva 30 srinu 10 rao how it posible in only sql server query?not oracle?
What is use of compress in terdata?Explain?