Hi, This problem can be solved by creating a job with following stages

I have 2 files 1st contains duplicate records only, 2nd file contains Unique records.EX:
File1:
1 subhash 10000
1 subhash 10000
2 raju 20000
2 raju 20000
3 chandra 30000
3 chandra 30000
File2:
1 subhash 10000
5 pawan 15000
7 reddy 25000
3 chandra 30000
Output file:-- capture all the duplicates in both file with count.
1 subhash 10000 3
1 subhash 10000 3
1 subhash 10000 3
2 raju 20000 2
2 raju 20000 2
3 chandra 30000 3
3 chandra 30000 3
3 chandra 30000 3

Question Posted / ankit gosain

2 Answers
8285 Views
TCS, I also Faced
E-Mail Answers

Answer Posted / ankit gosain

Hi,

This problem can be solved by creating a job with following
stages:

File2 File2
| |
| |
| |
File1-----Funnel----Aggregator----Join----Filter---Tgt_File
|
|
|
File1

1. Funnel both the files (Now you have Unique & Duplicates
records).
2. Aggregate on the basis of any i/p column and mention the
calculation type = Count Rows (say o/p column row_count).
3. Join the aggregated o/p with the i/p file1,2 one the
basis of key & mention the join type = Inner Join.
4. In filter stage, mention the where clause as row_count>1.

If you have further doubt or query, catch me on
ankitgosian@gmail.com

Cheers,
Ankit :)

Is This Answer Correct ?

1 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

What is the flow of loading data into fact & dimensional tables?

1112

Source has 2 columns: USA,NewYork INDIA,MUMBAI INDIA,DELHI UDS,CHICAGO INDIA,PUNE i want data in target like below: INDIA,MUMBAI1 INDIA,DELHI2 INDIA,PUNE3 USA,NEWYORK1 USA,CHICAGO2

823

Explain connectivity between datastage with datasources?

1041

Can anyone tell me a difficult situation who have handled while creating Datastage jobs?

3291

Where the datastage stored his repository?

1140

describe the Steps to confiure a Qlogic switch

2438

What are sequencers?

1172

DB2 connector> transformer > sequential file Data will be exported into a csv format in a sequential file. This file will be send in a email using a sequence job. Problem here is, how to avoid sending a blank csv file? When I ran the job there are chances that it might return zero records but in the sequence job csv file is going blank. how can I avoid this? thanks

1581

Define repository tables in datastage?

1203

client know skid info?

2135

Can you explain how could anyone drop the index before loading the data in target in datastage?

1548

What could be a data source system?

1098

What are the various kinds of containers available in datastage?

1100

Describe routines in datastage? Enlist various types of routines.

1054

What is a datastage job?

1055