How to remove duplicates in transformer stage? in parallel
mode
Answers were Sorted based on User's Feedback
Answer / kiran
partition the data by key and sort the data and click on
unique value. This will automatically delete duplicate
data.
| Is This Answer Correct ? | 20 Yes | 3 No |
Answer / praveen sarva
STEP 1) TRANSFORMER STAGE PROPERTIES--> ADVANCED -->
EXECUTION MODE ---> PARLLEL
STEP 2) TRANSFORMER STAGE PROPERTIES --> INPUT -->
PARTITIONING--> PARTITION TYPE --> HASH ---> ENABLE SORT ---
> ENABLE UNIQUE
Simple u will get non duplicate records....
| Is This Answer Correct ? | 11 Yes | 0 No |
Answer / kiran
i am not sure who marked my answer as wrong. Can you please
be responsible enough to state why its wrong?
| Is This Answer Correct ? | 1 Yes | 0 No |
Answer / satya
run u r job in sequencial mode and sort the source data
then play with stage variable's in Transformer.
because in parallel mode data is partioned .
| Is This Answer Correct ? | 1 Yes | 1 No |
Answer / prasad
Take 2 Stage variables in transformer stage
sV1 =Column_Name
sV2 =if Column_Name=sV1 Then 0 Else 1
put it constraint sV2=1 (only will get unique records)
if u want duplicates sV2=0
| Is This Answer Correct ? | 0 Yes | 1 No |
Answer / santhosh
go to transformer stage properties->input->define any kind of partition over there and enable perform sort check box....
n also define the particular column need to be sorted..
it gives the sorted column out view...
| Is This Answer Correct ? | 1 Yes | 6 No |
Difference between data warehousing and olap?
How and where you used hash file?
What are the types of containers and how to create them?
in sequtial file 2 columns avaliable,iwant only one column load the target how do it.
How to exclude first and last lines while reading data into a sequential file(having some 1000 records).I guess probably by using unix filter option but not sure which to use
I am having two tables called MASTER and DETAIL. I want to insert records to both tables. But one condition is that whenever the insert for MASTER table is success then only the records will inserted into the DETAIL table, otherwise abort the job. How can u design this job?
i have the source from Uk,north america how can i pass the data two tables based on the locations
Why fact table is in normal form?
I have a few records all are same structures data, I want to store data in multiple targets how?
what is the differeces between hash and modulus partition methods
Source contains the metadata source ----- hyderabad,chennai,bangalore by using nested loop sequence select source as hyderabad in target just run target as hyderabad only not others please give me answer thanks in advance(give reply how to wrote logic using nested loop sequence)
A flat file contains 200 records. I want to load first 50 records at first time running the job, second 50 records at second time running and so on, how u can develop this job?