BI Foundation: May 2012

Sunday, May 27, 2012

DTM Buffer

DTM buffer is a temporary storage area for storing temporary data for caching data used by transformation. Buffer is divided into blocks. The buffer size and block size for a session is tunable.

We can find out number of blocks using formula,

(Number of Blocks)=0.9*((DTM Buffer Size) / (Buffer Block Size))

Slow Reader, Writer or Transformation thread can be an indication requirement for DTM buffer tuning.

Tuning the DTM buffer:

§ Buffer block size:

· At least 100 rows/block is recommended.

· Block size should be calculated by taking the maximum size for a row.

· Number of blocks can then be calculated using the formula.

§ Buffer size should be increased by multiple of blocks.

DTM buffer size configuration in an Informatica session:

Configure DTM buffer block size:

To utilize the increased DTM buffer, we should also increase the commit interval to some large value to reduce the number of writes to the database.

Friday, May 25, 2012

Build an Informatica mapping: A step by step guide

Building a sample map in informatica is not a very difficult task.You just need is a basic knowledge about informatica client tools and what a mapping should consist of.

Prerequisites to build an informatica mapping:

An informatica server installed and set up on a server machine.

An informatica repository created.

Informatica client tools i.e. Powercenter Mapping Designer, Powercenter Workflow Manager, Powercenter Workflow Monitor installed on client machine.

The first and the foremost task for developing a mapping in informatica powercenter is to open the designer tools. You can find the tool with following icon in All programs under Start Menu:

After starting the designer, you need to add a Domain where informatica repository is located. This is done so that all the development work is saved to that repository.

To add a domain in informatica client:

In informatica designer, click on Repository option in the menu and select Configure Domain.
Click on the small box shown in red to add a new domain.
Feed in the host/port details of the domain gateway node of informatica server. As informatica has a feature of load balancing. So, there can be more than one server node. But a client can connect to only one node on the server, which is called the gateway node.
After this a repository is added to the client. All the work done can now be saved in the repository. You will be able to see a repository in the left pane in the designer client
Right click or double click on the repository and feed in the username/password of the repository as set during repository setup.
As a mapping in informatica is an ETL process used to extract data from source and load that data after transforming the data by applying some process to the target. So for building a basic mapping one needs at least one source, one target and in between a transformation process.
For a source in informatica we have source definition i.e. the structure of the source. The columns in the source definition or target definition are called ports in informatica.

To create or import a source definition in informatica:

1. In the Designer, click Tools > Source Analyzer to open the Source Analyze.

     2. Double-click the Sample folder to view its contents.

    3. Every folder contains nodes for sources, targets, schemas, mappings, mapplets, cubes, dimensions and reusable transformations.

    4.Click Sources from the menu in the designer > Import from Database.

    5. Select the ODBC data source to access the database containing the source tables. (To import you have to define an ODBC source in the system (From Control Panel>Administrative tool>Data sources.)

    To define an ODBC data source:

                   · Go to Control Panel>Administrative tool>Data sources

                            · Click on System DSN

      · Click Add

                     · Select the driver used to connect to the database(Usually Microsoft wire protocol driver from Oracle in case database is oracle)

                      · Give the necessary credentials like host/port/username/password of database server.

                             · Click Ok to create an ODBC Source

    3. Enter the user name and password to connect to this database. Also, enter the name of the source table owner, if necessary.(In Oracle, the owner name is the same as the user name).

    4. Click Connect.

    5. In the Select tables list, expand the TABLES heading

    6. Select the tables from the list and click ok. Now you will have a source definition created in source analyzer of informatica designer client.

    To import or create a target definition:

    The next step is to create a target definition. Target definition describes the structure of the target table.

    Steps that needs to be followed to create a target definition:

    1. In the Designer, click Tools > Target Designer or click the icon shown in the picture below to open the Target Designer.

    2. In the menu, Click on target > Create. Give the name of the target say target table name.

    Create Target Definition

    Note: If you need to change the database type for the target definition, you can select the correct database type when you edit the target definition.

    3. You will see a box like structure in the target designer.

4. Right click on the definition and click EDIT.

5. In the columns tab, Click on icon shown in the figure to add columns to the target.

    6. Or you can simple import the target definition as you did it for source definition.

    7. Click on Targets > Import from Database. Select the ODBC source from which target definition can be extracted. Click on connect and select the target table from the list in Tables header.

    8. After the creation of source and target definition, we need to create the mapping which is the transformation part of an ETL(Extraction Transform Load).

    Creating a mapping:

    To create a simple pass - through mapping. We need to carry out following steps:

    1. Click Tools > Mapping Designer to open the Mapping Designer.

    2. In the Navigator, expand the Sources node in the Sample folder, and then expand the source ODBC container node containing the source sources.

    3. Drag the TEST source definition into the Mapping Designer workspace.

    4. The Designer creates a new mapping and prompts you to provide a name. Give the mapping a meaningful name and click OK.

    5. The source definition appears in the workspace. The Designer creates a Source Qualifier transformation and connects it to the source definition.

    6. Expand the Targets node in the Navigator to open the list of all target definitions.

    7. Drag the ABC target definition into the workspace.

    Link the source and target definition to complete a mapping:

    1. From the source qualifier transformation drag and drop each column which needs to be connected to a particular column in the target.

    (in this image source is taken as EMPLOYEES, Target as T_EMPLOYEES).

    2. Now click ctrl-s to save the mapping.

    To create a session for the mapping:

    A session is a set of instructions that tells the Integration Service how to move data from sources to targets.

    1. In the Workflow Manager Navigator, double-click the sample folder to open it.

    2. Click Tools > Task Developer to open the Task Developer.

    3. Click Tasks > Create.

4. The Create Task dialog box appears.

5. Enter the name of the session (Generally it is same as that of the mapping but not necessary). Click Create.

6. You will be prompted with the mapping that the session is going to run. Select the mapping you developed and Click ok. You will see the session icon in the workspace.

7. Right click on the session and click Edit.

    8. In the mappings tab, in the left pane select the source and set the DB Connection for the source.

         You can create a DB Connection in workflow manager for connecting to source and target database. To create a DB Connection in Workflow Manager—

                    · From the menu click Connection > Relational

      · Select Oracle if the connection is to be made to Oracle database

   · Give the details like username, Password, TNS entry(This TNS entry should be there in application server box).

      · Click OK.

    9. Similarly select target, and set the DB Connection for target also.

    10. Now click ok.

    Create a workflow for the session:

    1. Click Tools > Workflow Designer.

    2. In the Navigator, expand the sample folder, and then expand the Sessions node.

    3. Drag the session created to the Workflow Designer workspace.

    The Create Workflow dialog box appears.

    4. Enter the name for the workflow. Naming convention needs to be followed for naming mappings, sessions, workflows.

    5. Click the Browse Integration Services button to choose an Integration Service to run the workflow.

    The Integration Service Browser dialog box appears.

    6. Select the appropriate Integration Service, and click OK.

   7. You will see a Start task by default there in the workspace with the session you dragged. This task is a necessary task from which integration service starts the workflow execution.

    8. To connect the start task with the session task you created, Click Tasks > Link Tasks.

    9. Drag from the Start task to the Session task.

    10. Click Repository > Save to save the workflow in the repository

Your workflow is ready to be run. To run the workflow right click on the workflow from navigation pane and click start workflow. You can monitor the progress of the run in the workflow monitor.

Monday, May 21, 2012

Concatenation of Oracle DAC parameters on the fly

In Oracle Data warehouse administration Console, the parameters used for dynamic loading the data can not be of length more than 2000 in case of text parameters, such as the $$SET_OF_BOOKS_ID_LIST parameter in Oracle BI Apps, then parameters concatenation feature of DAC can be used to produce a parameter that is more than 2000 characters in length. We can concatenate DAC parameter using @DAC_<parameter_name> syntax. Suppose there are two parameters SET_OF_BOOKS_ID_LIST1 and SET_OF_BOOKS_ID_LIST2, then we can concatenate the two parameters and assign it to third parameter $$SET_OF_BOOKS_ID_LIST. In the value for $$SET_OF_BOOKS_ID_LIST put,

@DAC_SET_OF_BOOKS_ID_LIST1,@DAC_SET_OF_BOOKS_ID_LIST2

and you are done.

Thursday, May 10, 2012

Creating histogram for table columns that contains skewed data

Using DBMS_STATS we can gather histogram statistics for table columns that has skewed data

Syntax:

DBMS_STATS.GATHER_TABLE_STATS(ownername=>'owner',tabname=>'tablename',
estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,degree=>DBMS_STATS.AUTO,method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS <<column_name>>');

Here we should replace <<column_name>> with the name of the column or the expression used in the sql query taking that column.

Gathering statistics of a table in Oracle

With the advent of cost based optimizer in Oracle, every oracle professional will understand the importance of database object statistics. For CBO to choose the best plan proper statistics is the most important requirement. Oracle provides us with a package of DBMS_STATS for proper statistics gathering.

GATHER_TABLE_STATS procudure of the package has many parameters that can be used to control the type of statistics being gathered. Although Oracle recommends to use the AUTO parameter values for many of these parameters for gathering best quality statistics.

Syntax for GATHER_TABLE_STATS:

DBMS_STATS.GATHER_TABLE_STATS (
   ownname             VARCHAR2,
   tabname                                VARCHAR2,
   partname                               VARCHAR2 DEFAULT NULL,
   estimate_percent                    NUMBER   DEFAULT                                                                             to_estimate_percent_type(get_param('ESTIMATE_PERCENT')),
   block_sample                        BOOLEAN DEFAULT FALSE,
   method_opt                           VARCHAR2 DEFAULT get_param('METHOD_OPT'),
   degree                                   NUMBER   DEFAULT to_degree_type(get_param('DEGREE')),
   granularity                             VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'),
   cascade                                 BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')),
   stattab                                   VARCHAR2 DEFAULT NULL,
   statid                                     VARCHAR2 DEFAULT NULL,
   statown                                 VARCHAR2 DEFAULT NULL,
   no_invalidate                          BOOLEAN DEFAULT to_no_invalidate_type (get_param('NO_INVALIDATE')),
   force                                      BOOLEAN DEFAULT FALSE);

However seldom we use all the parameters. The important parameters used are:

ownname                     -- Name of the owner of the table, i,e. the schema name in most of the case in which the table resides.
tabname                       -- Name of the table for which stats needs to be gathered.
partname                      -- Name of the partition, if the table is a partitioned table.
estimate_percentage     -- The estimate percentage or the sample size to be used for gathering statistics. Recommended to use AUTO_SAMPLE_SIZE.
degree                         -- The degree of parallelism with which statistics will be gathered. AUTO is the recommended option provided by Oracle.

Apart from these parameters there are few other parameters that can be used for controlling the quality of statistics.

granularity        -- For partitioned table, collecting stats for each partition locally and collecting global stats for table as a whole is recommended, because depending on the sql optimizer will select the local or the global stats. If the parameters value is set to AUTO, then oracle gathers statistics for both      locally as well as globally. However for AUTO granularity, oracle gathers only for incremental
partitions that has been changed more than 10% if the INCREMENTAL value is set to TRUE else if it is set to FALSE then stats for whole partition and global partition is set.
method_opt        -- The method_opt parameter can be used to create histogram statistics for table containing skewed data. Oracle recommends using FOR ALL COLUMNS SIZE AUTO. This leads oracle to decide which column needs histogram and for which column only normal statistics will suffice.

Monitoring tabe statistics:
It takes oracle few minutes to propagate the statistics data to data dictionary views. To view the statistics immediately use DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO.

Monday, May 7, 2012

Bulk binding in PL/SQL

If you are PL/SQL developer, you are definitely faced with the challenge of improving the performance of your code. Managers are getting more and more obsessed with the response time of the reports and the processes that are used to load information in these reports. In field of analytics, ETL is a part that takes a lot of time for loading and refreshing data that supports decision support system of an organization.

With the introduction of bulk binding, now developers are equipped with a great weapon to tackle performance issues in a PL/SQL code. If we talk bulk binding in PL/SQL, then cursors in PL/SQL are the first thing that comes to the mind. In a PL/SQL code, if we use a cursor to fetch records and do some processing, with each iteration of cursor for loop to fetch record, the Oracle PL/SQL engine switches to SQL engine to fire query in database and fetch the records from it. This causes extra load on the database server making it slower. With use of bulk binding this switch can be reduced to a great extent.

Eg:

/*Code without bulk binding */
DECLARE
    cursor csrTest is SELECT distinct order_no,customer_no FROM order_detail;
BEGIN
    FOR rec in csrTest LOOP
    /*Do some processing with cursor records fetched*/
    END LOOP;
END;

Here if say there are 100000 records to fetch, then there will be 100000 switches.

/*Code with bulk binding */
DECLARE
    cursor csrTest is SELECT distinct order_no,customer_no FROM order_detail;
    TYPE csr_type is TABLE of csrTest%ROWTYPE;
    csrTT csr_type;
BEGIN
OPEN csrTest;
LOOP
FETCH csrTest BULK COLLECT into csrTT LIMIT 100;
    FORALL rec in 1..csrTT
    /*Do some processing with pl/sql table records */;
EXIT WHEN csrTest%NOTFOUND;
END LOOP;
END;

As here the number of fetches are reduced to 100000/100=1000 times. So the switching is reduced considerably.

BI Foundation