1. The fundamental concept of the Orchestrate
framework is the Data Set. Data Sets are the inputs and
outputs of Orchestrate operators.
2. As a concept a Data Set is like a database table,
in so far as it is a collection of identically-defined
rows. It is the only structure on which Orchestrate
operators operate. Each operator( i.e., stage) accepts
input from one Data Set and sends its output to another
3. A Data Set exists on all the processing nodes
defined for the job that is currently processing it. That
subset of rows in a Data Set that are located on a single
processing node is referred to as a "partition" of the Data
Set. Technically, a partition is a subset of the rows in a
Data Set (or File Set) earmarked for processing on the same
4. A control file is associated with each data set.
The control file contains the record schema that defines
the row structure (effectively its column definitions).
5. Within a Data Set data are stored in internal, or
1. It allows you to read data from or write data to a
2. The stage can have a single input link, a single
output link and a single reject link.
3. It only executes in parallel mode.
4. The data files and the file that lists them are
called a file set. This capability is useful because some
operating systems impose a 2 GB limit on the size of a file
and you need to distribute files among nodes to prevent
5. Only advantage of using fileset over a sequential
file is "it preserves partitioning scheme"
A dataset is a file/stage where the data can be read
directly by the DataStage, whereas a file set needs to be
converted into DataStage readable format (which happens
In simple words the data from the DataSet can be read
faster than from FileSet.
1) dataset in native format so it can view the data only internally(datastage) where as fileset is in binary format so data can be view in any where which is convert from binary to human understandable language.
2) dataset dont support reject link where as fileset support reject link.
3) dataset is copy operator fileset is import and export operator.
I have load a Dataset in UAT with 2 Node configuration, imported the job into PROD environment which is 4 node configuration and using this DataSet as SRC to other job. will the job run fine or give any errors? If job runs fine, on how many nodes? 2 nodes or 4 nodes?
Hi Every one,
I have a scenario plz suggest me
1)On daily we r getting some huge files data so all files
metadata is same we have to load in to target table how we
2) One column having 10 records at run time we have to send
5th and 6th record to target at run time how we can send?
Hi plz help me for above scenarios and If any one is having
JobSequence kindly send me one example and the scenario to
my mail ID(email@example.com)
souce file having the columns like
if first row will be repeat i want the result like this
name company count
krish IBM 1
pooja TCS 1
nandini WIPRO 1
krish IBM 2
pooja TCS 2