Hi guys,
please design job for this,
MY INPUT IS
COMPANY,LOCATION
IBM,CHENNAI
IBM,HYDRABAD
IBM,PUNE
IBM,BANGLOORE
TCS,CHENNAI
TCS,MUMBAI
TCS,BANGLOORE
WIPRO,HYDRABAD
WIPRO,CHENNAI
HSBC,PUNE
MY OUTPUT IS
COMPANY,LOCATION,COUNT
IBM,chennai,hydrabad,pune,banglore,4
TCS,chennai,mumbai,bangloore,3
WIPRO,hydrabad,chennai,2
HSBC,pune,1
Thanks
Answer Posted / ankit gosain
Hi All,
Create a job design like below:
SeqFile--->SortStage--->Transformer--->RemoveDup--->SeqFile
Steps:
-----
1. At sort stage, take sort key = Company and sort key mode
= Don't sort (Previously Grouped) & take a
CreateClusterKeyChange column.
2. At Transformer Stage, create two stage variables:
temp of integer type with 0 as default,
temp1 of varchar type.
now, write in their derivation:
if clusterKeyChange=1 then 1 else temp+1----temp
if clusterKeyChange=1 then Location else temp1:',':Location-
---temp1
Create one o/p column (say count).
Now derive the o/p derivation columns as:
Company--------Company
temp1----------Location
temp-----------Count
3. At remove duplicate stage, take key=Company and
Duplicate to retain = Last
now just drag and drop the i/p columns to o/p derivation
& you will get the desired result.
For further queries, mail me on ankitgosain@gmail.com
Cheers,
Ankit :)
| Is This Answer Correct ? | 14 Yes | 0 No |
Post New Answer View All Answers
EXPLAIN SCD
hi.... am facing typical problem in every interview " I need some critical scenarios faced in real time" plz help me guys
How to read the length of word in unix?
How do y read Sequential file from job control?
What is the Environment Variable need to Set to TRIM in Project Level?(In transfermer, we TRIM function but I need to impliment this project level using Environment variable)
Differentiate between data file and descriptor file?
Could anyone give brief explanation bout datastage admin
if we using two sources having same meta data and how to check the data in two sources is same or not? and if the data is not same i want to abort the job ?how we can do this?
Can you highlight the main features of ibm infosphere information server?
What is the purpose of interprocessor stage in server jobs?
Describe stream connector?
which r the connectors used in san?
What all the types of jobs you developed?
How rejected rows are managed in datastage?
Which commands are used to import and export the datastage jobs?