Redshift is a columnar database provided by Amazon AWS as a cloud database service. Redshift uses a cluster architecture with a leader node responsible for cluster and node management and compute node for data storage and computation. Redshift is based on postgre sql database however with some relaxed constraints and rules.
Redshift database cluster can be created by specifying the number of compute nodes required in the cluster and customers are only charged for compute nodes. Leader nodes usage is completely free. A redshift database architecture is as given below.
Physical Architecture
Logical Architecture
For schema to be used when a user logs in and queries in a database search_path is used to resolved the schema. To add a schema in search_path use set command.
set search_path to '$USER',pg_temp,pg_catalog,...;
------------------
pg_temp and pg_catalog is used first by default in order and then any schema specified by search path is scanned
To set the timeout of queries in session use statement_timeout.
set statement_timeout to;
----------------------
This specifies the timeout for any statement being executed on the cluster. 0 turns off the timeout.
To set the timezone of a session use timezone.
SET timezone time_zone
-----------------------
TO set the timezone of current session. Run "select pg_timezone_names();" to view list of timezones. to set the timezone for database user used
"ALTER USER SET timezone to 'America/New_York';"
WLM: Workload management
You can use workload management (WLM) to define multiple query queues and to route queries to the appropriate queues at runtime.
When you have multiple sessions or users running queries at the same time, some queries might consume cluster resources for long periods of time and affect the performance of other queries. For example, suppose one group of users submits occasional complex, long-running queries that select and sort rows from several large tables. Another group frequently submits short queries that select only a few rows from one or two tables and run in a few seconds. In this situation, the short-running queries might have to wait in a queue for a long-running query to complete.
You can improve system performance and your users’ experience by modifying your WLM configuration to create separate queues for the long-running queries and the short-running queries. At run time, you can route queries to these queues according to user groups or query groups.
You can configure up to eight query queues and set the number of queries that can run in each of those queues concurrently. You can set up rules to route queries to particular queues based on the user running the query or labels that you specify. You can also configure the amount of memory allocated to each queue, so that large queries run in queues with more memory than other queues. You can also configure the WLM timeout property to limit long-running queries.
SET query_group TO group_label
------------------------------
This is used to set query group to a specific group for the session; This can be used to drive WLM to assign a query to a queue based on group assignment for the session.
create group admin_group with user admin246, admin135, sec555;
Redshift database cluster can be created by specifying the number of compute nodes required in the cluster and customers are only charged for compute nodes. Leader nodes usage is completely free. A redshift database architecture is as given below.
Physical Architecture
Logical Architecture
For schema to be used when a user logs in and queries in a database search_path is used to resolved the schema. To add a schema in search_path use set command.
set search_path to '$USER',pg_temp,pg_catalog,...;
------------------
pg_temp and pg_catalog is used first by default in order and then any schema specified by search path is scanned
To set the timeout of queries in session use statement_timeout.
set statement_timeout to
----------------------
This specifies the timeout for any statement being executed on the cluster. 0 turns off the timeout.
To set the timezone of a session use timezone.
SET timezone time_zone
-----------------------
TO set the timezone of current session. Run "select pg_timezone_names();" to view list of timezones. to set the timezone for database user used
"ALTER USER
WLM: Workload management
You can use workload management (WLM) to define multiple query queues and to route queries to the appropriate queues at runtime.
When you have multiple sessions or users running queries at the same time, some queries might consume cluster resources for long periods of time and affect the performance of other queries. For example, suppose one group of users submits occasional complex, long-running queries that select and sort rows from several large tables. Another group frequently submits short queries that select only a few rows from one or two tables and run in a few seconds. In this situation, the short-running queries might have to wait in a queue for a long-running query to complete.
You can improve system performance and your users’ experience by modifying your WLM configuration to create separate queues for the long-running queries and the short-running queries. At run time, you can route queries to these queues according to user groups or query groups.
You can configure up to eight query queues and set the number of queries that can run in each of those queues concurrently. You can set up rules to route queries to particular queues based on the user running the query or labels that you specify. You can also configure the amount of memory allocated to each queue, so that large queries run in queues with more memory than other queues. You can also configure the WLM timeout property to limit long-running queries.
SET query_group TO group_label
------------------------------
This is used to set query group to a specific group for the session; This can be used to drive WLM to assign a query to a queue based on group assignment for the session.
create group admin_group with user admin246, admin135, sec555;