BI Foundation: June 2011

Thursday, June 30, 2011

Indexes in data warehouse:

A telephone directory without an index. Can you imagine, what will happen in such a situation. It will take hours to search a
name in the directory. This is a very simple example of importance and use of index. Similarly an index on a table with millions of data
is helpful in reducing the search time.

An index is a database object that contains an entry for each value that appears in the indexed column(s) of the table or cluster and
provides direct, fast access to rows.

A normal index in Oracle database is created using B- Tree(balanced tree) structure. While there can be different types of indexes that
can be created in Oracle, the one's that are mainly used in data warehousing are normal and bitmap indexes.

Now before going forward, let us get a clear idea of how an index works in Oracle.

In Oracle, everything accessed is in terms of block.

What is a block?

Oracle stores data in blocks or logical blocks. One data block corresponds to a specific number of bytes of physical database space
on disk.It is the smallest unit of disk that oracle reads. As rows in a table are very small, so many rows can fit into
a block in oracle.

The size of a block in Oracle can be configured but generally is of size 4kb, 8kb,16kb, 32kb.

What is a B-Tree structure?

A B Tree or a balanced tree is a database tree structure, in which each node can contain multiple childs.
If d is the depth of the tree, then usually the number of childs vary between d and 2d. If a node starts filling
then it is split to balance nodes.

In Oracle, a B - Tree contains database blocks as nodes in the tree.

How an index works?

For example, in a telephone directory you want to search for a name James. You will move to index and see that
page 5 starts with Jack and page 10 starts with Jaqueline. So James would be some where in between page 5 and
page 10.

Now on page 5, you have another sub index containing entries for names between Jack and Japper which are in pages 5,6,7,8,9.
From this index you get that page 8 starts with Jame and ends with Jamy. So now you know that James would be in this
page only. Now you can easily search for james in this page.

Similarly in a b tree index, each child blocks gives us next block to search for a particular row in a table. And finally
leaf node or block gives us the required row.

Eg.

In main index,
Page 5 starts with Jack.
Page 10 starts with Jaqueline.

In Page 5 sub index,
Page 5 starts with Jack and page 9 starts with Japper.
Page 8 starts with jame and ends with Jamy.

So Page 8 is the target page.

Wednesday, June 8, 2011

Surrogate keys

Surrogate key is a unique identifier used to uniquely identify records in a table. Surrogate keys are generally used in data warehousing.

Then what is the difference between surrogate key and a primary key?

Well, it depends on the type of database model being used. For a current database, where only current data are kept, there will be one to one correspondence between surrogate key and primary key.

But in a temporal database, where there are lots of data with same primary key value, as historical data are also saved.

In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modeled world; these two keys are not the same. For example, record containing a customer location details when he lived in US and now when he lives in UK. Both will have same customer_id which is the primary key. But in a temporal database where historical database are stored, surrogate key will be required to uniquely identify record between these two records. For such kind of situations, surrogate keys are required.

Example:

A customer table containing customer details.

Customer_id	Customer_name	Customer_address	Customer_city	Phone_number	Email_id
1002	John Smith	10, Wall Street	New York	1123301111	john.smith@abc.com
1002	John Smith	12, Grand Street	London	1122331111	john.smith@abc.com

As you see here, we cannot identify difference between the two records. As the customer_id are same. But as first record was input 10 years before and now the customer has moved to a new location. So, new record also contain the record. Such situation can be dealt using surrogate keys.

Customer_key	Customer_id	Customer_name	Customer_address	Customer_city	Phone_number	Email_id
1	1002	John Smith	10, Wall Street	New York	1123301111	john.smith@abc.com
2	1002	John Smith	12, Grand Street	London	1122331111	john.smith@abc.com

Now you can see that surrogate keys are unique even if there is temporal data in a database.

A surrogate key are generally generated using database sequence objects.

Advantages of a surrogate key:

Immutability

Surrogate keys do not change while the row exists. This has the following advantages.

Requirement changes:

Attributes that uniquely identify an entity might change, which might invalidate the suitability of the natural, compound keys.

Performance

Surrogate keys tend to be a compact data type such as a four-byte integer. This allows the database to query the single key column faster than it could multiple columns.

BI Foundation

Thursday, June 30, 2011

Indexes in data warehouse:

Wednesday, June 8, 2011

Surrogate keys

Oracle analytics - Different pricing schems

Followers