I have source file which contains duplicate data,my
requirement is unique data should pass to one file and
duplicate data should pass another file how?
Answers were Sorted based on User's Feedback
Answer / dilip anand k
Its Simple!!
All you have to do is link your source to a Sort Stage.
Sort the data and generate a Key Change column.
Key Change column = ‘1’ represents that the record is
unique while Key Change Column = ‘0’ represents the
duplicates.
Put a Filter stage and filter out the data into two
different outputs based on the generated Key Change Column.
Is This Answer Correct ? | 21 Yes | 5 No |
Answer / farzana kalluri
input output
1 T1 T2
2 4 1
2 6 2
1 7 3
3 4
4 5
3
5
5
6
7
for this
seq file---->Aggregate(key=id)---->filter---->2 targets
In aggregate use count rows...
in filter count=1 it goes to target1
if count=2 it goes to target2..
Is This Answer Correct ? | 9 Yes | 3 No |
Answer / ramachandra rao
After source use aggregator stage and use option aggregator
type is count and count the records after that use filter in
where clause count>1 ie duplicate records go to one target
and another where clause count=1 ie unique records go to
another target.
Is This Answer Correct ? | 3 Yes | 0 No |
Answer / sonali s
The above solution doesnt give required output. The requirement is as below:
Input:
A
B
B
C
D
D
D
Output should have 2 files as below.
File 1
A
C
File 2
B
B
D
D
D
Please provide solution for this
Is This Answer Correct ? | 0 Yes | 0 No |
Answer / purba
Input:
A
B
B
C
D
D
D
Required output:
A
B
C
D
Solution:
Seq file----->sort stage(create key change column for the I/p key row)
O/p:
A 1
B 1
B 0
C 1
D 1
D 0
D 0
Now take filter stage to filter for key column=0 & keycol=1
We get 2 outputs:
A. B
B. D
C. D
D
Is This Answer Correct ? | 0 Yes | 0 No |
Answer / riyazahamedmohamed
take two links using copystage, of your input file,one is your input file output, another one is for keychange column(using sort stage set the key change column to true) with filter "0" out of transformer, to the look up stage.set the lookup option to continue-reject.you will get the desired output.reject will capture unique records.output file will capture duplicate records.
Is This Answer Correct ? | 0 Yes | 0 No |
Answer / krishna
As per my knowledge
initially soure is in sequential stage anc take aggrigator
stage and select the grouping option and select which column
you want to group then go to option command and select
column for calculation and select the which column you want
to do the operation .in column for calculation w have seen
many options and select missing count column name and give
the column name for output.and add transformer stage with in
the transformer stage add constraints .and give the two outputs
if column name=1 then 1 else 0
if column name>=2 then 1 else 0
it will work
Is This Answer Correct ? | 0 Yes | 6 No |
1)What is ur project architecture ? 2)how to move project from developement to uat? 3)What is the difference between datastage 6,7.1 and datasttage 7.5? 4).How to do error handling in datastage? 5)3.Whta is unit testing, system testing and integration testing? 6)What is the Exact difference between BASIC Transformer and NORMAL Transformer?When we will go for BASIC Or NORMAL Transformer 7)why we use third party tools in datastage? 8)What is the purpose of Debugging stages? In real time Where we will use?
6 Answers CTS, HCL, IBM, Wipro,
How do u set a default value to a column if the column value is NULL?
What is confirmed Dimension? what is Factless Fact? give one example? What are Additive, Semi-Additive Facts?
Why we need datasets ratherthan sequential files?
if i have two tables table1 table2 1a 1a,b,c,d 1b 2a,b,c,d,e 1c 1d 2a 2b 2c 2d 2e how can i get data as same as in tables? how can i implement scd typ1 and type2 in both server and in parallel? field1 field2 field3 suresh , 10,324 , 355 , 1234 ram , 23,456 , 450 , 456 balu ,40,346,23 , 275, 5678 how to remove the duplicate rows,inthe fields?
What is the difference between an operational datastage and a data warehouse?
how can we validate the flat files using the date in the header and number of records in the flat file? Using both conditions at a time.
how to write server Routine coding?
Can you explain engine tier in information server?
how to load meta data
What are orabulk and bcp stages?
What is container and then types?