How do we eliminate duplicate records in a flat file without using Sorter and Aggregator?
Answers were Sorted based on User's Feedback
Answer / kiran
We can use dynamic cache in lookup to eliminate duplicates.
| Is This Answer Correct ? | 11 Yes | 0 No |
Answer / joe
Option 1: using Unix for flat files
Option2: Using Checksum function in the expression to
generate a unique hexadecimal code for each record.
and comparing the same with the next record.
| Is This Answer Correct ? | 5 Yes | 2 No |
Answer / ankur saini
sol--seq gen---rank ---filter
add a sequence generator ...
ex input is
1 a
1 b
2 a
2 b
after seq generator
1 a 1
1 b 2
2 a 3
2 b 4
then ranl it group by all file ports rank on the seq gen key
input seq rank
1 a 1 1
1 b 2 2
2 a 3 1
2 b 4 2
add filter on rank=1
enjoy!!!!!
| Is This Answer Correct ? | 2 Yes | 0 No |
Answer / harish konda
Give the SQL query to sort the data in source in source
qualifier t/f.
And then connect to exp t/f and add one more port (say flag)
to generete numbers like, when prev row and current row
values are same, then increment number, or else give 1.
And next connect to Filter t/f and give the condition in
filter as flag=1.
Then rout the data to target.
| Is This Answer Correct ? | 2 Yes | 1 No |
Answer / isha
Select all source rows.
The Dynamic Lookup transformation builds the caches from the target table.
When the lookup evaluates a row from the source that does not exist in the lookup cache, it inserts the row into the cache and assigns the NewLookupRow output port the value of 1. When the lookup evaluates a row from the source that exists in the lookup cache, it does not insert the row into cache and assigns the NewLookupRow output port the value of 0.
The filter in this mapping checks if the row is a duplicate or not by evaluating the NewLookupRow output port from the Lookup. If the value of the port is 0, the row is filtered out, as it is a duplicate row. If the value of the port is not equal to 0, then the row is passed out to the target table.
| Is This Answer Correct ? | 1 Yes | 0 No |
Answer / priyank
There are several ways of achieving this. We can do it
through expression transformation and other is look up on
the target.
Expression transformation:
Create ports,
Var_PREV_KEY=Key
Var_CURR_KEY=Var_PREV_KEY
Var_CHK_DUPLICATE --> IIF(Var_CURR_KEY=Key,'DUP','NODUP')
OUT_DUPLICATE --> Var_CHK_DUPLICATE
Note: I have taken a scenario where the target table
contains only 1 Key. In case of multiple keys, will have to
create a few more Variable ports for both CURR and PREV and
in the Var_CHK_DUPLICATE port, we need to add those checks
with an 'AND' operator.E.g. For 2 keys,
Var_PREV_KEY1=Key1
Var_CURR_KEY1=Var_PREV_KEY1
Var_PREV_KEY2=Key2
Var_CURR_KEY2=Var_PREV_KEY2
Var_CHK_DUPLICATE --> IIF(Var_CURR_KEY1=Key1 AND
Var_CURR_KEY2=Key2,'DUP','NODUP')
OUT_DUPLICATE --> Var_CHK_DUPLICATE
If the Informatica version is Unix installation, then in
the pre session command you can give an unix command to
remove the duplicates from the file like
sort <file_name> | uniq > <file_name>.new
Hope it helps.
| Is This Answer Correct ? | 4 Yes | 12 No |
My i/p is like below 1,2,3,4,5,6,7,8,9,10....100 I want that to be populated in two o/p as below o/p-1: 10,20,30,40,50,60,70,90... o/p-2: 11,21,31,41,51,61,71,81,91.... How to do it in ingotmatica...
Session Recovery. 1000 rows in the source of which 500 passed through and then I killed the session. Can you perform a recovery and how
What are different types of transformations available in informatica?
How can informatica be used for an organization?
what are the differences between powercenter 8.1 and powercenter 8.5?
what are the challenge face in u r project?explain me
what is SDLC way of code development?
How many mapplets u have created? and what is the logic used
Scenario:- Below is the requirement. Source:- NAME ID Requirement RAVI 1 (no need to repeat as it ID is 1) KUMAR 3 (repeat 3 times as it ID is 3) John 4 (repeat 4 times as it ID is 4) Required Out Put:- Name ID RAVI 1 KUMAR 3 KUMAR 3 KUMAR 3 John 4 John 4 John 4 John 4 Scenario 2:- Source Data ID NAME 1,2 NETEZZA,ORACLE 3,4,5 SQL Server, DB2, Teradata Required Output:- ID NAME 1 NETEZZA 1 ORACLE 3 SQL Server 3 DB2 3 Teradata
Q. WE ARE LOADING ORACLE TABLE THE PROCESS RUNS THREE HOURS. THIS TABLE IS BEING USED BY SOME DOWNSTREAM TEAMS SO WHAT WE WANT IS IN BETWEEN RUN IS PROGRESS IF ANYONE IS FETCHING THE DATA FROM THE TABLE THEY SHOULD SEE DATA TILL YESTERDAY TILL THAT. AFTER THAT PROCESS IS COMPLETED ONLY THEN TEAM SHULD BE AVAIBLE TO SEE TODAY DATA UPDATED. WE DONOT WANT TO LOG THE TABLE. NEED APROACH FROM YOUR SIDE.
Hi, I am working on informatia in a support role. We used stored procedures in informatica and other than that I never got a chance to work on PL/SQL,little on Unix scripts. But I heard people asking about PL/SQL alot. Can any one let me know how do we use PL/SQL with informatica and if required what is the knowledge that we need to have on PL/SQL? Please let me know in detail which would really help me alot in my career prospectus.
what is parallel querying and what r hints.