Monday, January 25, 2016

Skewness in Teradata


Definition

Skewness is the statistical term, which refers to the row distribution on AMPs. 
If the data is highly skewed, it means some AMPs are having more rows and some very less i.e. data is not properly/evenly distributed. 
This affects the performance/Teradata's parallelism. 
The data distribution or skewness can be controlled by choosing indexes.

The data distribution of table among AMPs is called Skew Factor.....
Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same rowhash, so all the same data will come to same amp, it makes data distribution inequality, One amp will store more data and other amp stores less
amount of data, when we are accessing full table, The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage.
A skew factor of 0 indicates that the data is perfectly distributed across all the AMPS.
In this situation (unequal distribution of data)we get Skew Factor High........


Suggestions

For this type of tables we should avoid full table scans Skew factor is distribution of rows of a table among the available no.of AMP's.
If your table has a chance of using unique primary index,it is always better to use UPI which ensures the skew factor around 0%.
If there is no chance of having unique values column in a table choose a column as PI(primary index) which has less duplicate values which inturn results in less skew factor.
That is the data will be distributed almost(not exactly equal percentage) equally to all AMP's.....

ex:
AMP0 AMP1
10000(10%) 9000000(90%)

in this situation skew factor is very high 90%.


Query to find out Skewness in tables in the system 

SELECT  TSIZE.DatabaseName, TSIZE.TableName,
TDEF.CreateTimeStamp AS Created,
TDEF.LastAlterTimeStamp AS LastAltered ,
TDEF.AccessCount,
TDEF.LastAccessTimeStamp AS LastAccess,
SUM(TSIZE.CurrentPerm) AS SCurrentPerm,
SUM(TSIZE.CurrentPerm)/1024/1024 AS SCurrentPerm_MB,
SUM(TSIZE.PeakPerm) AS SPeakPerm,
SUM(TSIZE.PeakPerm)/1024/1024 AS SPeakPerm_MB,
(100 - (AVG(TSIZE.CurrentPerm)/MAX(TSIZE.CurrentPerm)*100)) AS SkewFactor
FROM DBC.TableSize TSIZE JOIN DBC.Tables TDEF
ON TSIZE.DatabaseName = TDEF.DatabaseName
AND TSIZE.TableName = TDEF.TableName
GROUP BY 1,2,3,4,5,6
--ORDER BY SCurrentPerm_GB DESC, skewFactor
HAVING SCurrentPerm_MB > 100 AND skewFactor > 50
ORDER BY SCurrentPerm_MB DESC, skewFactor;

No comments:

Post a Comment