one file contains
col1
100
200
300
400
500
100
300
600
300
from this i want to retrive the only duplicate like this
tr1
100
100
300
300
300 how it's possible in datastage?can any one plz explain
clearley..........?
Answers were Sorted based on User's Feedback
In order to collect the duplicate values:
first cal the count output col in aggregator stage
group by col.
aggregator type: count rows.
count output col..
next, use the filter stage to separate the multiple occurrence.
finally, use the join stage or lookup stage to map the two
tables join type INNER ..
then u can get the desired output..
Is This Answer Correct ? | 14 Yes | 1 No |
Answer / chandu
use aggregator and calculate count of source column after
that use filter or transaformer stage use condition as count
>1 it gives only duplicate records
Thanks
Chandu-9538627859
Is This Answer Correct ? | 6 Yes | 0 No |
Answer / prasad
>Agg--->Filter1------->|
| |
| |
file-->cp-------------------->Join---->Filter2---->target1
|
|
Target2
Agg: use aggregator and select Agg_type=count rows and then give the Count O/P column=Count (User defined)
Count
------------
100--2
200--1
300--3
400--1
500--1
600--1
it will generate in Agg stage then
Filter1: give condition like Count=1( u will get unique records from Filter1)
Join Stage: take Left Outer Join
Filter2:
where=column_name=''(null){u will get duplicates records)
Target1 o/p:
100
100
300
300
300
where= column_name<>''(u will get unique records)
Target2 o/p:
200
400
500
600
Please correct, if am wrong :)
Is This Answer Correct ? | 2 Yes | 0 No |
Answer / sudheer
- aggregator -
seq. file - copy join - filter - seq.op
in arrg - cnt rows
in join - left outer join - left as seq.file data
in filter - where cond. - cnt>1
Is This Answer Correct ? | 1 Yes | 0 No |
Job Design:
|----->Agg--->Filter1-->|
| |
| |
file-->cp-------------------->Join---->Filter2---->target
Agg: use aggregator and select Agg_type=count rows and then give the Count O/P column=Count (User defined).
Filter1: give the condition Count<>1
Join: select left outer join
Filter2: give the condition Count<>0
u will get the right output....what ever the duplicate records.
and if u want unique records, give the condition Count=0
Is This Answer Correct ? | 0 Yes | 0 No |
Job Design:
Agg--->Filter1---------->|
| | Unique
file-->cp-------------------->Join---->Filter2---->target1
|
|-->Duplicate
Target2
Agg: use aggregator and select Agg_type=count rows and then give the Count O/P column=Cnt (User defined).
Filter1: give the condition Where=Cnt=1
U will get unique values like 200,400,500,600
Use Join (Or) Lookup stage: select left outer join
Filter2:
Where=Column_name='' (Duplicate values like 100,100,300,300,300)
Where=Column_name<>'' (Unique Values like 200,400,500,600)
u will get the right output....what ever the duplicate records.
Plz correct me if am wrong.....
Is This Answer Correct ? | 0 Yes | 0 No |
Answer / pooja
Follow the following steps -
1. Seq file stage - Read the input data in seq file - input1.txt
2. Aggregate stage - count the number of rows (say CountRow) for each ID(group=ID)
3. Filter stage - Filter the data where CountRow<>1
4. Perform join on the output of the step 3 and input1.txt.
You will get the result :)
Is This Answer Correct ? | 0 Yes | 0 No |
Answer / me
seq----> copy
from copy stage one link to aggregator apply count rows option ---> filter (on count rows output 1 ) send as reference to look up below
from copy stage second link to lookup
apply filter
Is This Answer Correct ? | 0 Yes | 0 No |
Is it possible to query a hash file?
Hi am sundar, i have datas like 00023-1010 00086-1010 00184F2-1010 . . . . SCH-AS-1010 200-0196-039 . . . Now i want the result as values before the delimiter should come under the column ITEM_CODE and values after the delimiter should come under the column LOC_CODE.. But some datas like "SCH-AS-1010", for this, "SCHAS" should come under the column ITEM_CODE and 1010 should come under the column LOC_CODE.. Pls help me..
how can find maximum salary by using Remove duplicate stage?
IS FILE SET CAN SUPPORT I/P AND O/P LINK AT A TIME?
Emp login_timestamp Logout_timestamp A,2019-02-01 02:24:15,2019-02-01 04:59:42 B,2019-03-29 14:43:30,2019-03-29 20:22:00 ABC,2019-03-29 12:43:00,2019-03-29 23:22:59 In the above calculate the duration of hours spent in office for each emp in datastage.
Can you implement SCD2 using join, transformer and funnel stage?
what is operator combinality in datastage?
Why we use parameters instead of hard code in datastage.
What is usage analysis in datastage?
Notification Activity
What is Ad-Hoc access? What is the difference between Managed Query and Ad-Hoc access?
How do you get log info into a file?