I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3

Answers were Sorted based on User's Feedback



I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX: File1: 1 ..

Answer / subbuchamala

File1,File2====Funnel-----Copy=======1st link AGG, 2nd link JOIN----Filter----OutputFile
1. pass the 2 files to funnel stage and then copy stage.
2. from copy stage 1st link to AGG stage, 2nd link to JOIN stage
3. In AGG stage, Group by Key column say ID, NAME take the count and JOIN based on KEY column
4. Filter on COUNT>1 send the output OutputFile
we get desired output

Is This Answer Correct ?    14 Yes 0 No

I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX: File1: 1 ..

Answer / ankit gosain

Hi,

This problem can be solved by creating a job with following
stages:

File2 File2
| |
| |
| |
File1-----Funnel----Aggregator----Join----Filter---Tgt_File
|
|
|
File1

1. Funnel both the files (Now you have Unique & Duplicates
records).
2. Aggregate on the basis of any i/p column and mention the
calculation type = Count Rows (say o/p column row_count).
3. Join the aggregated o/p with the i/p file1,2 one the
basis of key & mention the join type = Inner Join.
4. In filter stage, mention the where clause as row_count>1.

If you have further doubt or query, catch me on
ankitgosian@gmail.com

Cheers,
Ankit :)

Is This Answer Correct ?    1 Yes 0 No

Post New Answer

More Data Stage Interview Questions

how to sort two columns in single job in datastage.

1 Answers   TCS,


How to Convert the columns into rows?

2 Answers  


Hi guys, please design job for this, MY INPUT IS COMPANY,LOCATION IBM,CHENNAI IBM,HYDRABAD IBM,PUNE IBM,BANGLOORE TCS,CHENNAI TCS,MUMBAI TCS,BANGLOORE WIPRO,HYDRABAD WIPRO,CHENNAI HSBC,PUNE MY OUTPUT IS COMPANY,LOCATION,COUNT IBM,chennai,hydrabad,pune,banglore,4 TCS,chennai,mumbai,bangloore,3 WIPRO,hydrabad,chennai,2 HSBC,pune,1 Thanks

3 Answers   IBM,


source has 2 fields like COMPANY LOCATION IBM HYD TCS BAN IBM CHE HCL HYD TCS CHE IBM BAN HCL BAN HCL CHE LIKE THIS....... AND I WILL GET THE OUTPUT LIKE THIS.... Company loc count TCS HYD 3 BAN CHE IBM HYD 3 BAN CHE HCL HYD 3 BAN CHE PLZ SEND ME ANSWER FOR THIS QUESTION..........

3 Answers   Patni,


im new to this tool im now at project plz tell me step by step process how to design plz help me i wnt to go with exp for job plz give me d proper design and explination

0 Answers  






State the difference between an operational datastage and a data warehouse?

0 Answers  


Difference between IBM DATA STAGE8.5 and DATA STAGE9.1 ?

0 Answers   ABC, TCS,


how can we create rank using datastage?what is the meaning of rank?

0 Answers   IBM,


What is the difference between validate and compile?

1 Answers   CTS,


how many types of sorting the data in data stage?

1 Answers   BoA, IBM,


What is configuration your file structure 2)I have two databases both are Oracle while loading data from source to target the job takes 30 min but I want to load less time how?

1 Answers   Hexaware,


Source flat file contains src --- 1 2 ' ' ' 18 we had 3 targets T1 T2 T3 -- -- -- 1 4 7 2 5 8 3 6 9 10 13 16 11 14 17 12 15 18 How can i get? **Using only datastage, but not unix or any other. I am expecting the answer soon.. Thanks in advance.

7 Answers  


Categories