Sunday 13 July 2014

Dimension Table Vs Fact Table

Dimension Table features

  •           It provides the context /descriptive information for a fact table measurements.
  •           Provides entry points to data.
  •           Structure of Dimension - Surrogate key one or more other fields that compose the natural key (nk) and set of  attributes.
  •           Size of Dimension Table is smaller than Fact Table.
  •           In a schema more number of dimensions are presented than Fact Table.
  •           Surrogate Key is used to prevent the primary key (pk) violation(store historical data).
  •           Values of fields are in numeric and text representation.

Fact Table features 

  •          It provides measurement of an enterprise.
  •          Measurement is the amount determined by observation.
  •          Structure of Fact Table - foreign key (fk) Degenerated Dimension and Measurements.
  •          Size of Fact Table is larger than Dimension Table.
  •          In a schema less number of Fact Tables observed compared to Dimension Tables.
  •          Compose of Degenerate Dimension fields act as Primary Key.
  •          Values of the fields always in numeric or integer form.

The main difference between dimension and the fact table is that Dimension preserves the historical data (like in case of type2) we will have to use update strategy and other transformations to make that happen but fact will be a direct load with few one or more lookups from the dimension and also since the fact and dimenision has the foriegn key relationship the dimension has to be loaded first before the fact.

I think there won't be any logic difference in a mapping to load dimension table & fact table. We can load the dimension table directly but we can't load the fact table first. So to load the fact table we need to load the dimension table first. Also while loading the fact table we will make a lookup on the dimensioin table cause the fact table contains the measures/facts & the foreign keys which are primary keys in the dimension tables surrounded to that fact table. We can load the dimension table & fact table in one mapping by using the Target Load Order/Target Load Plan in informatica.

Target 1 (Dimension Table)
Target 2 (Fact Table)
Dimention Table - A pure dimention table is a collection of primary keys
Fact Table - A pure fact table is collection of foreign keys.
Fact table contains numeric facts. i.e. key performence indicatiors. A dimention table is a primary key foregin key relation to fact tbale.

To load the fact table we need to load the dimension table first. Also while loading the fact table we will make a lookup on the dimensioin table, cause the fact table contains the measures/facts & the foreign keys which are primary keys in the dimension tables surrounded to that fact table. We can load the dimension table & fact table in one mapping by using the "Target Load Order/Target Load Plan" in informatica.


Repository tables

What are Repository tables

All objects that we create in Informatica PowerCenter (sources, targets,
transformations, mappings, sessions, workflows, command tasks etc) get stored in a
set of database tables. These database tables are known as either Repository tables
or metadata tables or OPB tables.


List of some important Repository tables
There around a couple of hundred OPB tables in 7.x version of PowerCenter, but in
8.x, this number crosses 400.

OPB_SUBJECT - PowerCenter folders table
This table stores the name of each PowerCenter repository folder.
Usage: Join any of the repository tables that have SUBJECT_ID as column with that
of SUBJ_ID in this table to know the folder name.

OPB_MAPPING - Mappings table
This table stores the name and ID of each mapping and its corresponding folder.
Usage: Join any of the repository tables that have MAPPING_ID as column with that
of MAPPING_ID in this table to know the mapping name.

OPB_TASK - Tasks table like sessions, workflow etc
This table stores the name and ID of each task like session, workflow and its
corresponding folder.
Usage: Join any of the repository tables that have TASK_ID as column with that of
TASK_ID/SESSION_ID in this table to know the task name. Observe that the session
and also workflow are stored as tasks in the repository. TASK_TYPE for session is 68
and that of the workflow is 71.

OPB_SESSION - Session & Mapping linkage table
This table stores the linkage between the session and the corresponding mapping. As
informed in the earlier paragraph, you can use the SESSION_ID in this table to join
with TASK_ID of OPB_TASK table.

OPB_TASK_ATTR - Task attributes tables
This is the table that stores the attribute values (like Session log name etc) for tasks.
Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to
find what each attribute in this table means. You can know more about OPB_ATTR
table in the next paragraphs.

OPB_WIDGET - Transformations table
This table stores the names and IDs of all the transformations with their folder
details.
Usage: Use WIDGET_ID from this table to that of the WIDGET_ID of any of the
tables to know the transformation name and the folder details. Use this table in
conjunction with OPB_WIDGET_ATTR or OPB_WIDGET_EXPR to know more about
each transformation etc.

OPB_WIDGET_FIELD - Transformation ports table
This table stores the names and IDs of all the transformation fields for each of the
transformations.
Usage: Take the FIELD_ID from this table and match it against the FIELD_ID of any
of the tables like OPB_WIDGET_DEP and you can get the corresponding information.

OPB_WIDGET_ATTR - Transformation properties table
This table stores all the properties details about each of the transformations.
Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to
find what each attribute in this transformation means.

OPB_EXPRESSION - Expressions table
This table stores the details of the expressions used anywhere in PowerCenter.
Usage: Use this table in conjunction with OPB_WIDGET/OPB_WIDGET_INST and
OPB_WIDGET_EXPR to get the expressions in the Expression transformation for a
particular, mapping or a set.

OPB_ATTR - Attributes
This table has a list of attributes and their default values if any. You can get the
ATTR_ID from this table and look it up against any of the tables where you can get
the attribute value. You should also make a note of the ATTR_TYPE,
OBJECT_TYPE_ID before you pick up the ATTR_ID. You can find the same ATTR_ID
in the table, but with different ATTR_TYPE or OBJECT_TYPE_ID.

OPB_COMPONENT - Session Component
This table stores the component details like Post-Session-Success-Email, commands
in Post-Session/pre-Session etc.
Usage: Match the TASK_ID with that of the SESSION_ID in OPB_SESSION table to
get the SESSION_NAME and to get the shell command or batch command that is
there for the session, join this table with OPB_TASK_VAL_LIST table on TASK_ID.

OPB_CFG_ATTR - Session Configuration Attributes
This table stores the attribute values for Session Object configuration like "Save
Session log by", Session log path etc. 

PMCMD - Power Mart Command

PMCMD Means “Power Mart Command” Prompt use to perform the tasks for informatica functions using in a commendable view not as a graphical user view.

PMCMD is program command utility to communicate with the informatica servers.
Comparing pmcmd with other commands, as for example if we use sql command in a command prompt which acts in a request , commands given in a prompt as execute the result in the prompt mode. Same as that pmcmd used to communicate with the informatica server like “to start and stop the services”.

Start and stop services of informatica.

Start and Stop Batches and sessions.

Recovery Sessions.

Start workflows.

Start workflow from a specific task.

Stop Abort workflows and Sessions.

Schedule the workflows.

Scheduling the sessions and run the sessions using commands.

PMCMD which initialize the informatica workflows and batch files to execute the mappings and more.
It checks the status of the informatica services whether it’s working or not.

Location of PMCMD ::: <Installed Drive>\Informatica\server\bin\pmcmd

Syntax:
pmcmd command_name [-option1] argument_1 [-option2] argument_2
Some of the commands used in informatica:
pmcmd>connect -sv Service -d domain -u username -p password
pmcmd>startworkflow -f 'folder' workflow
pmcmd>getworkflowdetails -f 'folder' -rin workflow
pmcmd>gettaskdetails -f 'folder' workflow
pmcmd>stoptask -f 'folder' workflow
pmcmd> getsessionstatistics -f 'folder' workflow
pmcmd> scheduleworkflow -f 'folder' workflow
pmcmd> unscheduleworkflow -f 'folder' workflow


pmcmd> disconnect -sv Service -d domain -u username -p password

Informatica commands

pmrep command : pmrep is used to perform repository metadata administration tasks, such as listing repository objects. 

infacmd command : infacmd is used to perform service-related functions, such as creating or removing a Repository Service.

infasetup command : infasetup to back up, restore, define, and delete domains, and to define and update nodes

pmcmd command : pmcmd is used to communicate with PowerCenter Integration Services in order to manage the Informatica workflows (start, stop, recover, …) 

Ping integration service : 
pmcmd pingservice -sv <Integration service name> -d <Domain_name>

Ping repository service  :
pmrep connect -r <Repository service name> -h <hostname> -o <port> -n Administrator -x <Admin_passwd>

Create Backup file:
pmrep backup -o <file_name.rep>

Restore Backup file:
pmrep restore -i <file_name.rep> -u <Domain_User> -p <Domain_Passwd>

Check list of dependent object of workflow:
pmrep listobjectdependencies -n <workflow_name> -o workflow -f <Workflow_Folder_name> -d session -p children

Run Workflow:
pmcmd startworkflow -sv <Integration service_name> -d <Domain_Name> -u Administrator -p <Admin_Passwd> -f 

<Workflow_Folder_name> -lpf <Parameter file full path> <WORKFLOW_NAME>
option: -lpf is used for local machine parameter file path. Else use option -paramfile $PMROOTDir

Start Session Task:
pmcmd starttask -uv Administrator -pv <Admin_Passwd> -s <HOSTNAME:PORT> -f <Folder_Name> -w 

<Workflow_name> -lpf <Parameter_File> <Task_Name>

Get  status of the session : 
pmcmd getsessionstatistics -sv <Integration_service> -d <Domain_Name> -u Administrator -p <Admin_Passwd> -f 

<Folder_name> -w <Workflow_name> <Session_Task_Name>

Change node properties:
infasetup updateGatewayNode -na <hostname:port> -hs <https_port> -kf "$INFA_HOME\tomcat\conf

\Default.keystore" -kp <passwd>

Create domain manually:
infasetup defineDomain -dn <domain_name> -ad <Administrator> -pd <Administrator_passwd> -ld \"$INFA_HOME

\infa_shared\log\" -nn <node_name> -na <hostname:port> -mi <min_port> -ma <max_port> -sv <service port> -rf 

nodeoptions.xml
  
infasetup.sh defineDomain -dn Domain_hostname_901 -ad Administrator -pd Administrator -ld \"$INFA_HOME

\infa_shared\log\" -nn node_hostname_901 -na hostname.us.oracle.com:6005 -mi 6003 -ma 6113 -rf 

nodeoptions.xml -da localhost:1521 -du INFA_901_IDM -dp INFA_901_IDM -dt ORACLE -ds orcl

Difference between Delete and Truncate in Oracle?

Delete
--------

It is a DML statement

Can Rollback

Can delete selective records

It fires database triggers.

It does not requires disabling of referential constraints

Deletes perform normal DML. That is, they take locks on rows, they generate redo (lots of it), and they require segments in the UNDO tablespace. Deletes clear records out of blocks carefully. If a mistake is made a rollback can be issued to restore the records prior to a commit. A delete does not relinquish segment space thus a table in which all records have been deletedretains all of its original blocks.


Truncate
----------

It is a DDL statement

Can’t Rollback

Can’t delete selective records.It will delete all the records in table

Doesn't fire database triggers

It requires disabling of referential constraints

Truncate moves the High Water Mark of the table back to zero.No row-level locks are taken,no redo or rollback is generated.

Difference between subquery and correlated subquery in SQL?

Subquery :- The inner query is executed only once. The inner query will get executed first and the output of the inner query used by the outer query.The inner query is not dependent on outer query.

Eg:- SELECT cust_name, dept_no FROM Customer WHERE cust_name IN (SELECT cust_name FROM Customer);

Correlated subquery :-The outer query will get executed first and for every row of outer query, inner query will get executed. So the inner query will get executed as many times as no.of rows in result of the outer query.The outer query output can use the inner query output for comparison. This means inner query and outer query dependent on each other

Eg:- SELECT cust_name,dept_id FROM Cust
WHERE cust_name in (SELECT cust_name FROM dept WHERE cust.dept_id=dept.dept_id);

Complex Queries in SQL (Oracle)

These questions are the most frequently asked in interviews.
1. Select Bottom n rows from oracle table?
Sql>select * from (select * from emp order by rowid desc) where rownum<=&n order by rowid;
2. Select records where count>1 ?
Sql>select col1 from abc group by col1 having count(col1)>1
3. Count vowels in ename from emp table (not for all vowel strings i.e AAA, AEIOU column data)?
Sql>SELECTENAME,NVL(LENGTH(ENAME)-LENGTH(REPLACE(TRANSLATE(UPPER(ENAME),'AEIOU','A'),'A')),0) COUNT FROM EMP
4. Select records having avg sal>sal group by deptno?

Sql>select empno,sal,deptno from (select empno,sal,deptno ,avg(sal) over(partition by deptno) asal
where sal >asal

5. To fetch ALTERNATE records from a table. (EVEN NUMBERED)

Type-1: select * from emp where rowid in (select decode(mod(rownum,2),0,rowid, null) from emp);
Type-1: select * from emp where rowid in (select decode(mod(rownum,2),0,null ,rowid) from emp);
type-1: select distinct sal from emp e1 where 3 = (select count(distinct sal) from emp e2 where e1.sal <= e2.sal);
select distinct sal from emp e1 where 3 = (select count(distinct sal) from emp e2where e1.sal >= e2.sal);
select * from emp where rownum <= &n;
select * from emp minus select * from emp where rownum <= (select count(*) - &n from emp);
select * from dept where deptno not in (select deptno from emp);  
alternate solution:  select * from dept a where not exists (select * from emp b where a.deptno = b.deptno);
altertnate solution:  select empno,ename,b.deptno,dname from emp a, dept b where a.deptno(+) = b.deptno and empno is null;
Type-1: select * from (select sal from emp order by sal desc) where rownum<=3;
select distinct sal from emp a  where 3 >= (select count(distinct sal) from emp b  where a.sal >= b.sal);
Type-1:
select * from emp a where  rowid = (select max(rowid) from emp b where  a.empno=b.empno);
delete from emp a where rowid != (select max(rowid) from emp b where  a.empno=b.empno);
select count(EMPNO), b.deptno, dname from emp a, dept b  where a.deptno(+)=b.deptno  group by b.deptno,dname;

Type-2: select * from (select empno,ename,sal,rownum rn from emp order by empno) where mod(rn,2)=0;
Type-3: select * from emp where (rowid,0) in (select rowid,mod(rownum,2) from emp);

6.  To select ALTERNATE records from a table. (ODD NUMBERED)?

Type-2: select * from (select empno,ename,sal,rownum rn from emp order by empno) where mod(rn,2)<>0;
Type-3: select * from emp where (rowid,1) in (select rowid,mod(rownum,2) from emp);

7.  Find the 3rd MAX salary in the emp table.

Sql>select distinct sal from emp e1 where 3 = (select count(distinct sal) from emp e2 where e1.sal <= e2.sal);

8.   Find the 3rd MIN salary in the emp table.

Sql>select distinct sal from emp e1 where 3 = (select count(distinct sal) from emp e2where e1.sal >= e2.sal);

9.   Select FIRST n records from a table.

Sql>select * from emp where rownum <= &n;

10. Select LAST n records from a table?

Sql>select * from emp minus select * from emp where rownum <= (select count(*) - &n from emp);

11. List dept no., Dept name for all the departments in which there are no employees in the department.

Sql>select * from dept where deptno not in (select deptno from emp);  
alternate solution:  select * from dept a where not exists (select * from emp b where a.deptno = b.deptno);
altertnate solution:  select empno,ename,b.deptno,dname from emp a, dept b where a.deptno(+) = b.deptno and empno is null;

12. How to get 3 Max salaries ?

Type-1: select * from (select sal from emp order by sal desc) where rownum<=3;
Type-2: select distinct sal from emp a where 3 >= (select count(distinct sal) from emp b where a.sal <= b.sal) order by a.sal desc;
Type-3: select * from (select ename,sal,rank() over (order by sal desc) ranking from emp) where ranking<=3;
Type-4: select * from (select ename,sal,dense_rank() over (order by sal desc) ranking from emp) where ranking<=3;
Type-5: select * from (select ename,sal,row_number() over (order by sal desc) ranking from emp) where ranking<=3;

13.  How to get 3 Min salaries ?

Sql>select distinct sal from emp a  where 3 >= (select count(distinct sal) from emp b  where a.sal >= b.sal);

14. How to get nth max salaries ?

Type-1:
select distinct sal from emp a where &n =  (select count(distinct sal) from emp b where a.sal <= b.sal);
or
select min(sal) from (select distinct sal from emp order by sal desc) where rownum<=&n;
Type-2:
select * from emp e1
where (&N-1) = (select count(distinct(e2.sal)) from emp e2 where e2.sal>e1.sal)
       
Type-3: using dense_rank() function // this will show all rows with that same salary

SELECT empno,ename,sal,deptno FROM (SELECT e1.*, DENSE_RANK () OVER (ORDER BY sal DESC) rnk FROM   emp e1) WHERE   rnk = 3
        select * from (select ename,sal,dense_rank() over (order by sal desc) ranking from emp) where ranking=&n;
     select * from (select ename,sal,rank() over (order by sal desc) ranking from emp) where ranking=&n;

Type-4: using row_number() function // this will not give multiple records if there are employees with same salaries.     
select * from (select ename,sal,row_number() over (order by sal desc) ranking from emp order by sal desc) where ranking=&n;

Type-5: using rownum // this will not give multiple records if there are employees with  same salaries.

Note:--- This will show which record you want in the table records list. Not by sorting.
Sql>select * from (select ename,sal,rownum ranking from emp order by sal desc) where ranking=&n;

15. Select DISTINCT RECORDS from emp table.

Sql>select * from emp a where  rowid = (select max(rowid) from emp b where  a.empno=b.empno);

16. Select Duplicate records/rows in emp table?

Sql>Select * from (select d.*, count(*) over (partition by empno) cnt from emp d) where cnt>1;

17. How to delete duplicate rows in a table?

Sql>delete from emp a where rowid != (select max(rowid) from emp b where  a.empno=b.empno);
Sql>delete from emp a where rowid > (select min(rowid) from emp b where  a.empno=b.empno);

18.  Count of number of employees in  department  wise.

Sql>select count(EMPNO), b.deptno, dname from emp a, dept b  where a.deptno(+)=b.deptno  group by b.deptno,dname;