Search This Blog

Thursday, January 12, 2012

How to read a data page ?

-- Through this small demo we will see how data is stored in a page and how to decode it. Create a database ExplorePage as shown below
CREATE DATABASE ExplorePage
GO
USE ExplorePage
GO

-- Create a table that has atleast one fixed length and one variable length column

CREATE TABLE Page
(
IntType Int NOT NULL PRIMARY KEY
,CharType Char(2)
,VarcharType Varchar(5)
)
GO

-- Insert 2 rows in this table as shown below

INSERT INTO Page VALUES (1, 'AB', 'TEXT');
INSERT INTO Page VALUES (2, NULL, 'MORE');
GO

-- As shown below 2 rows have been inserted into the table



-- Let us see the pages that has been allocated to this table. As can be seen in the snapshot below page 150 is an IAM page (PageType = 10) and page 147 is -- data page (PageType = 1)

DBCC IND (ExplorePage, Page, 1)



-- Using DBCC PAGE command let's read this page as show below. For this example we will not focus on the header as it is already explained in my previous -- blogs. Let us focus on the data section

DBCC TRACEON (3604)
DBCC PAGE (ExplorePage, 1, 147, 3)






Slot 0 Offset 0x60 Length 21

Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 21
Memory Dump @0x000000000E20A060


0000000000000000: 30000a00 01000000 41420300 00010015 †0.......AB......
0000000000000010: 00544558 54††††††††††††††††††††††††††.TEXT


Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

IntType = 1

Slot 0 Column 2 Offset 0x8 Length 2 Length (physical) 2

CharType = AB

Slot 0 Column 3 Offset 0x11 Length 4 Length (physical) 4

VarcharType = TEXT

Slot 0 Offset 0x0 Length 0 Length (physical) 0

KeyHashValue = (8194443284a0)
Slot 1 Offset 0x75 Length 21


Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 21
Memory Dump @0x000000000E20A075


0000000000000000: 30000a00 02000000 00000300 02010015 †0...............
0000000000000010: 004d4f52 45††††††††††††††††††††††††††.MORE


Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4

IntType = 2

Slot 1 Column 2 Offset 0x0 Length 0 Length (physical) 0

CharType = [NULL]

Slot 1 Column 3 Offset 0x11 Length 4 Length (physical) 4

VarcharType = MORE

Slot 1 Offset 0x0 Length 0 Length (physical) 0

KeyHashValue = (61a06abd401c)







-- From this extract let's read the data (hex dump) for the first record, which is something as shown below. All the explaination below uses this dump unless -- specified otherwise

Record 1 hex dump :___________________
___________________
30000a00 01000000 41420300 00010015 †0.......AB......
00544558 54††††††††††††††††††††††††††.TEXT

Byte 0 expands as shown below
Byte 0 = 30 = 0011 0000 (binary)
-- First 3 bits here can have the follwing possible value
000 - Primary record
001 - Forwared record (when the data becomes large enough to fit into its original page
010 - Forwarding record
011 - Index record
100 - Blob fragment
101 - Ghost index record (deleted row)
110 - Ghost data record (deleted row)
111 - Ghost version record

In our example the first 3 bits are 000, thus it is a PRIMARY RECORD

Byte 1 = can either be 0x00 or 0x01. If it is 0x01, that means the record type is ghost forwarded record

Byte 2 & 3 = 0a 00 = a0 00 (Reversing bytes to read) = 000a (Complete Reverse (hex)) = 10 (decimal). It says that the NULL BITMAP is at offset 10 from the byte 2. In this case the value is 00 (first 2 bits in 00010015). The value 00 signifies none of the column has a NULL value as expanded below

00 = 0000 0000 = Table has 3 columns and since first 3 bits are 0 which means all columns have value.

Byte 3 to 7 = 01 00 00 00 = 10 00 00 00 (Reversing bytes to read) = 00 00 00 01 (Complete Reverse (hex)) = 1 (decimal). This is the value of the first column for the first row as inserted by us. Since int takes 4 bytes, byte 3-7 are used by it.

Byte 8 & 9 = 41(HEX) 42(HEX) = 65(DEC) 66(DEC) = A (CHAR(65)) B (CHAR(66)) = AB. As can be seen this is the second column AB. Since char data is fixed width is lies along side int and takes 2 bytes (1 byte per character).

Byte 10 & 11 = 03 00 = 30 00 (Reversing bytes to read) = 00 03 (Complete Reverse (hex)) = 3 (decimal). This shows the number of columns in this record. Since there are 3 columns, so is the value 3.

Byte 12 = 00 (we have already discussed as the NULL BITMAP (offset 10 from Byte 2)

Byte 13 & 14 = 01 00 = 10 00 (Reversing bytes to read) = 00 01 (Complete Reverse (hex)) = 1 (decimal). This is the count of variable length columns in the record. In this case there is only one as VarcharType

Byte 15 & 16 = 15 00 = 51 00 (Reversing bytes to read) = 00 15 (Complete Reverse (hex)) = 21 (decimal). It is the variable length column offset array. This says that the variable length column ends at offset 21. Notice that the start offset of the variable length column value isn't stored - it doesn't need to be
because by definition it must begin right after the last offset in the variable length column offset array.In this case the variable length column offset array ebds at Byte 16, so Byte 17 to 20 is 54455854 as expanded below

54(HEX) 45(HEX) 58(HEX) 54(HEX) = 84(DEC) 69(DEC) 88(DEC) 84(DEC) = T (CHAR(84)) E (CHAR(69)) X (CHAR(88)) T (CHAR(84)) = TEXT

Record 2 hex dump :___________________
___________________
0000000000000000: 30000a00 02000000 00000300 02010015 †0...............
0000000000000010: 004d4f52 45††††††††††††††††††††††††††.MORE

Byte 0 expands as shown below
Byte 0 = 30 = 0011 0000 (binary)
-- First 3 bits here can have the follwing possible value
000 - Primary record
001 - Forwared record (when the data becomes large enough to fit into its original page
010 - Forwarding record
011 - Index record
100 - Blob fragment
101 - Ghost index record (deleted row)
110 - Ghost data record (deleted row)
111 - Ghost version record

In our example the first 3 bits are 000, thus it is a PRIMARY RECORD

Byte 1 = can either be 0x00 or 0x01. If it is 0x01, that means the record type is ghost forwarded record

Byte 2 & 3 = 0a 00 = a0 00 (Reversing bytes to read) = 000a (Complete Reverse (hex)) = 10 (decimal). It says that the NULL BITMAP is at offset 10 from the byte 2. In this case the value is 02 (first 2 bits in 02010015). The value 02 signifies second column has a NULL value as expanded below

02 = 0000 0010 = since second bit from left is 1 which means column has a null value (indeed true!!).

Byte 3 to 7 = 02 00 00 00 = 20 00 00 00 (Reversing bytes to read) = 00 00 00 02 (Complete Reverse (hex)) = 2 (decimal). This is the value of the first column for the second row as inserted by us. Since int takes 4 bytes, byte 3-7 are used by it.

Byte 8 & 9 = 00(HEX) 00(HEX) = 00(DEC) 00(DEC) = Blank (CHAR(00)) Blank (CHAR(00)) = Blank. As can be seen this is the second column AB. Since char data is fixed width is lies along side int and takes 2 bytes (1 byte per character).

Byte 10 & 11 = 03 00 = 30 00 (Reversing bytes to read) = 00 03 (Complete Reverse (hex)) = 3 (decimal). This shows the number of columns in this record. Since there are 3 columns, so is the value 3.

Byte 12 = 02 (we have already discussed as the NULL BITMAP (offset 10 from Byte 2).

Byte 13 & 14 = 01 00 = 10 00 (Reversing bytes to read) = 00 01 (Complete Reverse (hex)) = 1 (decimal). This is the count of variable length columns in the record. In this case there is only one as VarcharType

Byte 15 & 16 = 15 00 = 51 00 (Reversing bytes to read) = 00 15 (Complete Reverse (hex)) = 21 (decimal). It is the variable length column offset array. This says that the variable length column ends at offset 21. Notice that the start offset of the variable length column value isn't stored - it doesn't need to be
because by definition it must begin right after the last offset in the variable length column offset array. In this case the variable length column offset array ebds at Byte 16, so Byte 17 to 20 is 4d4f5245 as expanded below

4d(HEX) 4f(HEX) 52(HEX) 45(HEX) = 77(DEC) 79(DEC) 82(DEC) 69(DEC) = M (CHAR(84)) O (CHAR(69)) R (CHAR(88)) E (CHAR(84)) = MORE

It's awesome to be ables to decode it!!!

Partition based index rebuild with data compression

While doing compression on a partitioned table, be careful of rebuilding multiple index partitions in a single go. Lets explain with an example

USEmaster

GO

-- Create a database called partition

CREATE DATABASE Partition;

USE Partition

GO

-- Create a table test

CREATETABLE Test

(

c1 int identity NOT NULL

,c2 int NOT NULL

,c3 date NOT NULL

,c4 char(7000)

);



Create a partition function with 3 partitions

  • partition 1 : all data less than today
  • partition 2 : all data for today
  • partition 3 : all data for tomorrow and afterwards

CREATE PARTITION FUNCTION pf_Test(date) AS RANGE RIGHT FOR VALUES (GETDATE(),GETDATE()+1)

-- Create the partition scheme with all partitions mapped to PRIMARY file group

CREATE PARTITION SCHEME ps_Test AS PARTITION pf_Test ALL TO ([PRIMARY]);

-- Create the clustered index on column c3. This will be the partitioning column

CREATE CLUSTERED INDEX idx_c3_Test ON Test(c3) ON ps_Test(c3);

-- Insert data in all 3 partitions

INSERT INTO Test VALUES (1,GETDATE()- 1,'a')

INSERT INTO Test VALUES (1,GETDATE(),'b')

INSERT INTO Test VALUES (1,GETDATE()+ 1,'c')





At this stage I want to take a snapshot of the pages allocated for this table.

  • First partition has one data page = 110
  • Second partition has one data page = 115
  • Third partition has one data page = 119

DBCC IND(Partition, Test, 1);



-- Now using the below command I will rebuild only partition 2,3

ALTER TABLE dbo.Test

REBUILD PARTITION=ALL

WITH (DATA_COMPRESSION = PAGE ON PARTITIONS(2,3))

What I see when I fire DBCC IND is that as a result of rebuild the data pages have changed. Starngest thing to note is that even though we didn't specify partition 1 to be rebuilt, it has been and thats why the page allocation for partition 1 has also changed as mentioned below.

  • First partition has one data page = 121
  • Second partition has one data page = 127
  • Third partition has one data page = 151

DBCC IND(Partition, Test, 1);



Thus it has actually done a complete index rebuild (all partitions), which could be very expensive if done on a large table. So what is that I should have done. You guessed it right, I should rebuild the index 1 partition at a time

ALTER TABLE dbo.Test

REBUILD PARTITION= 2

WITH (DATA_COMPRESSION = PAGE ON PARTITIONS(2))

ALTER TABLE dbo.Test

REBUILD PARTITION= 3

WITH (DATA_COMPRESSION = PAGE ON PARTITIONS(3))

A DBCC IND would show that this time page allocation changed for only partition 2 &3, the index rebuild happened only on partition 2 and 3 as in the snapshot below.

DBCC IND(Partition, Test, 1);

Clustered table data pages are connected but heap's are not

As BOL says and depicts : In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. The pages in each level of the index are linked in a doubly-linked list.

Let’s verify this using a small example


-- Create a database Struct.

USE Master

GO

IF EXISTS (SELECT 1 FROM sys.databases WHERE name = 'struct')

DROP DATABASE struct;

GO

CREATE DATABASE struct;

GO

USE Struct


-- Create a clustered table, such that each row in the table occupies 1 page

CREATE TABLE Clust (col1 int identity primary key, col2 char(8000))


-- Insert 10 rows in this table

INSERT INTO Clust VALUES (REPLICATE('a',1))

GO 10


Using DBCC IND let's see the pages and identify the tree structure. As can be seen the root page of the B*tree is page 90. I derived this by looking into the IAM page 89 (page type 10). You can used DBCC PAGE (Struct, 1, 89, 1) to get the same information

DBCC IND (Struct, Clust, 1)

Lets see how the pages are connected. We will start with the first data page (pagetype = 1) 80. Using the DBCC PAGE command we can get this infomation. As shown below previous page (m_prevPage) is 0, whereas next page (m_nextPage) is 93

DBCC TRACEON (3604)

DBCC PAGE (Struct, 1, 80, 0)

/* m_prevPage = (0:0) m_nextPage = (1:93) */

Now let's read page 93. previous page is 80 and next is 94 and similarly the doubly linked list goes on. Check yourself

DBCC TRACEON (3604)

DBCC PAGE (Struct, 1, 93, 0)

/* m_prevPage = (1:80) m_nextPage = (1:94) */

Let's see how a heap differs from this. BOL says and depicts a heap is a table without a clustered index. The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages.



-- Create a heap table and populate 10 rows into it

CREATE TABLE Heap (col1 int identity, col2 char(8000));

GO

INSERT INTO Heap VALUES (REPLICATE('a',10))

GO 10


Using the DBCC IND command to check the number of pages allocated. This table is not a B*Tree and thats why there is no index page (pagetype = 2) as for table Clust created previously. Lets see if the data pages (pagetype = 1) are connected to each other. The first page is 118

DBCC IND (Struct, Heap, 1)


Using DBCC PAGE we will extract the header from page 118 as shown below. This is not the complete header but only the information that we are interested in. Clearly the previous and the next page links are 0 as also shown pictorially in the diagram above from BOL


DBCC TRACEON (3604)

DBCC PAGE (Struct, 1, 118, 0)

/* m_prevPage = (0:0) m_nextPage = (0:0) */

Handy query to find allocation units for a table.

Find below a handy query to find allocation units

-- Create table Test
CREATE TABLE Test (col1 int identity, col2 varchar(7000), col3 varchar(2000), col4 nvarchar(max))

-- Insert a single row in this table to ensure the page size exceeds 8060 bytesINSERT INTO Test VALUES (REPLICATE('a',7000),REPLICATE('b',2000),REPLICATE('c',8070));

-- Fire the query and see the result
SELECT o.name
,i.index_id
,i.name
,a.type_desc
,a.data_pages
,p.partition_number
,a.type
,a.total_pages
FROM sys.allocation_units a
INNER JOIN
sys.partitions p
ON a.container_id = p.partition_id
INNER JOIN
sys.objects o
ON o.object_id = p.object_id
INNER JOIN
sys.indexes i
ON i.object_id = p.object_id AND i.index_id = p.index_id
WHERE o.name = 'Test'
ORDER BY o.name, i.index_id

How does a non clustered index looks like ?

Through this blog I am going to show you how a non clustered index looks like in a heap table as well as a cluster table.

USEMaster

GO

-- Create a database NLCStruct

CREATE DATABASE NLCStruct;

USE NLCStruct

GO

Create a heap table such a way that each page has 2 rows. The col1 is identity and will start from 1 and increment by 1

CREATE TABLE Heap(col1 int identity, col2 varchar(4000));

Now that the heap is created, let's insert few rows in this table and see the how the pages allocated looks like

INSERT INTO Heap VALUES (REPLICATE('a',4000))

GO 8

-- So the rows in the pages are distributed like this

  • Page 80 = 50 (hex) = 2 rows (col1 = 1 and col1 = 2)
  • Page 90 = 5A (hex) = 2 rows (col1 = 3 and col1 = 4)
  • Page 93 = 5D (hex) = 2 rows (col1 = 5 and col1 = 6)
  • Page 94 = 5E (hex) = 2 rows (col1 = 7 and col1 = 8)

DBCC IND(NLCStruct, Heap, 1)





We now create a non clustered index on this table and see the page allocated to that index using DBCC IND. The page is 109

CREATE NONCLUSTERED INDEX idx_Heap ON Heap(col1);

DBCC IND(NLCStruct, Heap, 2)



Now lets see what does the index page 109 holds for us. This can be explored using the DBCC PAGE command as shown below

DBCC TRACEON(3604)

DBCC PAGE(NLCStruct, 1, 109, 3)





Lets explore this output from DBCC PAGE. We will focus only on the column HEAP RID (key)

  • Row 1 => 0x5000000001000000 => 0x50000000 is page number in hex (80 decimal) => 0100 = 1000 (byte reversed) = 0001 (read reverse) - is file id 01 => 0000 = 0000 (byte reversed) = 0000 (read reverse) - is row number 0
  • Row 2 => 0x5000000001000100 => 0x50000000 is page number in hex (80 decimal) => 0100 = 1000 (byte reversed) = 0001 (read reverse) - is file id 01 => 0100 = 1000 (byte reversed) = 0001 (read reverse) - is row number 1
  • Row 3 => 0x5A00000001000000 => 0x5A000000 is page number in hex (90 decimal) => 0100 = 1000 (byte reversed) = 0001 (read reverse) - is file id 01 => 0000 = 0000 (byte reversed) = 0000 (read reverse) - is row number 0
  • Row 4 => 0x5A00000001000100 => 0x5A000000 is page number in hex (90 decimal) => 0100 = 1000 (byte reversed) = 0001 (read reverse) - is file id 01 => 0100 = 1000 (byte reversed) = 0001 (read reverse) - is row number 1

Similarly you can decode the rest of the data. This is what the BOL also says "The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following". "If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID)."



Lets explain the same thing for a clustered index by creating a cluster table such a way that each page has 2 rows. The col1 is identity and will start from 1 and increment by 1. Also col1 has a primary key and since there is no other index the primary key will be the clustered index also.

CREATE TABLE Cluster(col1 int identity primary key, col2 varchar(4000));

Now that the cluster is created, let's insert few rows in this table and see the how the pages allocated look like. There will be 8 rows in the table with col1 = 1 to col1 = 8

INSERT INTO Cluster VALUES (REPLICATE('a',4000))

GO 8

-- So the rows in the data pages are distributed like this

  • Page 120 = 78 (hex) = 2 rows (col1 = 1 and col1 = 2)
  • Page 127 = 7F (hex) = 2 rows (col1 = 3 and col1 = 4)
  • Page 148 = 94 (hex) = 2 rows (col1 = 5 and col1 = 6)
  • Page 151 = 97 (hex) = 2 rows (col1 = 7 and col1 = 8)

DBCC IND(NLCStruct, Cluster, 1)



We now create a non clustered index on this table and see the page allocated to that index using DBCC IND. The page is 173

CREATE NONCLUSTERED INDEX idx_Cluster ON Cluster(col1);

DBCC IND(NLCStruct, Cluster, 2)



Now lets see what does the index page 109 holds for us. This can be explored using the DBCC PAGE command as shown below

DBCC TRACEON(3604)

DBCC PAGE(NLCStruct, 1, 173, 3)





I don't think there is much to explain here. We can clearly see that each row in the non clustered index leaf node has the corresponding cluster key against it. One thing I would like to point out is that in this case we create a unique clustered index, however if there was a non unique clustered index created then BOL says "If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index."

How table scan happens in heap ?

Though this blog I want to explain how table scan happens in case of a heap table

USE master

GO

-- Create database HeapAlloc

CREATE DATABASE HeapAlloc;

Use HeapAlloc

GO

Create a clustered table. We will populate this table with 100 rows each row occupying a page. Also as col1 is an identity column it will assign each row a unique value starting from 1. Thus each page has 1 row with col1 haveing values between 1 - 100 for each of the 100 pages that will be allocated. We will then take a snapshot of the pages allocated to this table

CREATE TABLE Cluster(col1 int identity primary key, col2 char(8000))

-- populate this table to allocate 102 pages = 100 data pages + 1 IAM + 1 Index page

INSERT INTO Cluster VALUES ('a')

GO 100

Lets check the pages allocated to this table. As is visible the minimum page allocated is 80 and max is 276

DBCC IND(HeapAlloc, Cluster, 1)








Now we create a heap table to which we will allocate few rows (at least 1 uniform extent)

CREATE TABLE Heap(col1 int identity, col2 char(8000))

Populate this table to allocate 16 pages = 8 in mixed extent and rest 8 in uniform extent. This also means that col1 will have values from 1 - 16 for the 16 allocated pages

INSERT INTO Heap VALUES ('b')

GO 16

Lets check the pages allocated to this table. As is visible the minimum page allocated is 118 and max is 287

DBCC IND(HeapAlloc, Heap, 1)



-- Now we drop the cluster table and release all its pages

DROP TABLE Cluster;

At this point we insert another 1 row (col1 = 17) in the Heap table and see where the page allocation happens using the DBCC IND command

INSERT INTO Heap VALUES ('b')

DBCC IND(HeapAlloc, Heap, 1)



Clearly we see that page 184 has been allocated for this new row (17) which was not there previously. When I fire a select, this is how I expect the output to come 1 2 3 4 5 6 7 8 17 9 10 11 12 13 14 15 16.

SELECT * FROM Heap



Indeed it is. This is so because BOL clearly says this "Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap. Because the IAM represents extents in the same order that they exist in the data files, this means that serial heap scans progress sequentially through each file. Using the IAM pages to set the scan sequence also means that rows from the heap are not typically returned in the order in which they were inserted."

Required indexes during a partitioning operation switch in and switch out ?

Through this example let's see the indexes that are required at the source and the target during the switch out and switch in operation of partitioning.

-- Create a database called partition

CREATE DATABASE Partition;

USE Partition

GO

-- Create a table test

CREATE TABLE Test

(

c1 int identityNOTNULL

,c2 intNOTNULL

,c3 dateNOTNULL

,c4 char(7000)

);



Create the partition function with 2 boundary values. 2 boundary values means 3 partitions

  • 1st partition = all values less than GETDATE() (excluding time)
  • 2nd partition = all values >= GETDATE() (excluding time) and < GETDATE() + 1 (excluding time)
  • 3rd partition = all values >= GETDATE() + 1 (excluding time)



CREATE PARTITION FUNCTION pf_Test(date) AS RANGE RIGHT FOR VALUES (GETDATE(),GETDATE()+1)

-- Create the partition scheme with all partitions mapped to PRIMARY file group

CREATE PARTITION SCHEME ps_Test AS PARTITION pf_Test ALL TO ([PRIMARY]);

-- Create the clustered index on column c3. This will be the partitioning column

CREATE CLUSTERED INDEX idx_c3_Test ON Test(c3) ON ps_Test(c3);

-- Create the primary key on the column c1.

ALTER TABLE Test ADD CONSTRAINT pk_Test PRIMARY KEY (c1,c3) ON ps_Test(c3);

-- Create a non clustered index on column c2

CREATE NONCLUSTERED INDEX idx_c2_Test ON Test(c2) ON ps_Test(c3);

-- Insert data into 2nd and 3rd partition

DECLARE @i int= 1;

WHILE (@i <= 10)

BEGIN

INSERT INTO Test VALUES (@i,GETDATE(),'a');

SET @i = @i + 1;

END



Lets see what sys.partitions table has for us. As can be seen clearly we have 10 rows inserted into the second partition

SELECT * FROM sys.partitions(nolock) WHEREobject_id=OBJECT_ID('Test') AND index_id = 1;



-- Now we will insert another 10 rows in the 3rd partition. Note that I have increased the date by 1

DECLARE @j int= 1;

WHILE (@j <= 10)

BEGIN

INSERT INTO Test VALUES (@j,GETDATE()+ 1,'b');

SET @j = @j + 1;

END

-- sys.partitions shows us. Last partition now has 10 rows

SELECT * FROM sys.partitions(nolock) WHEREobject_id=OBJECT_ID('Test') AND index_id = 1;





-- I will now switch out the data from partition 2 to a temp table. In order to do that I need to create a temp table as follows

SELECT * INTO Temp_Test FROM Test(nolock) WHERE 1 = 0;

-- Now that the temp table is created I need to have the clustered index on this table same as in Test so that the switch out of data can happen.

CREATE CLUSTERED INDEX idx_Temp_Test ON Temp_Test(c3);

-- I now switch out partition 2. For the swich out no primary key or non clustered index was required. Only thing that was required was the clustered index

ALTER TABLE Test SWITCH PARTITION 2 TO Temp_Test;

-- Lets check the data in the Temp_Test table

SELECT * FROM Temp_Test;



-- Lets see how sys.partitions look for Test table. You can see after switch out the data count in partition 2 has gone back to 0

SELECT * FROM sys.partitions(nolock) WHERE object_id=OBJECT_ID('Test')AND index_id = 1;



-- I will now try to switch in this data in the Temp_Test table into another table Test_Arch. Let's create this Test_Arch table

CREATE TABLE Test_Arch

(

c1 intidentityNOTNULL

,c2 intNOTNULL

,c3 dateNOTNULL

,c4 char(7000)

);

-- Since even the archived table is partitioned. I will create the partitioning function partitioning scheme for this table. I will use some old dates for the boundary values

CREATE PARTITION FUNCTION pf_Test_Arch(date)AS RANGE RIGHT FOR VALUES (GETDATE(),GETDATE()+ 1)

CREATE PARTITION SCHEME ps_Test_Arch AS PARTITION pf_Test_Arch ALL TO ([PRIMARY]);



CREATE CLUSTERED INDEX idx_Test_Arch ON Test_Arch(c3) ON ps_Test_Arch(c3);

CREATE NONCLUSTERED INDEX idx_c2_Test_Arch ON Test_Arch(c2) ON [PRIMARY];



Now that we have created the archived table, the only thing that we need before we switch in the data is the constraint on the temp table on partitioning column. I have used the hard coded dates, but you can choose the date when you execute this



ALTER TABLE Temp_Test ADD CONSTRAINT ck_Temp_Test CHECK (c3 >='2010-03-17'AND c3 <'2010-03-18');

-- Finally we are ready to switch in the data. What we want to check is, do we need to disable the non aligned indexes in the Temp_Test before we switch in or it works fine

ALTER TABLE Temp_Test SWITCH PARTITION 1 TO Test_Arch PARTITION 2;

-- No it doesn't work. I get the error that the index in Test_Arch is not partition aligned. This means I will have to align the index idx_c2_Test_Arch to the partition scheme

'ALTER TABLE SWITCH'statement failed. The table'Partition.dbo.Test_Arch'is partitioned whileindex'idx_c2_Test_Arch'isnot partitioned.

-- I drop the index and recreate it aligned to the partitioning scheme. Will I be able to switch in data now

DROP INDEX Test_Arch.idx_c2_Test_Arch;

CREATE NONCLUSTERED INDEX idx_c2_Test_Arch ON Test_Arch(c2) ON ps_Test_Arch(c3);

-- Lets fire the SWITCH IN statement again. I get another error, this time it says that not all the indexes available in the Test_Arch table, are present in the Temp_Test table. Indeed index idx_c2_Test_Arch is not present in Temp_Test

ALTER TABLE Temp_Test SWITCH PARTITION 1 TO Test_Arch PARTITION 2;



-- I will create this index now

CREATE INDEX idx_c2_Temp_Test ON Temp_Test(c2)

-- At this point I also want to see the page allocation for table Temp_Test for both the clustered and the non clustered index as shown below

DBCC IND(Partition, Temp_Test, 1);-- clsutered index

DBCC IND(Partition, Temp_Test, 2);-- non clsutered index









-- Finally I will switch in the data now..

ALTER TABLE Temp_Test SWITCH PARTITION 1 TO Test_Arch PARTITION 2;



Switching happened successfully as can be seen below. First one is the Temp_Test and second Test_Arch.



SELECT * FROM Temp_Test;

SELECT * FROM Test_Arch;



Lets see the page allocation for Test_Arch now. As can be seen below. The pages that were of Temp_Test have now been allocated to Test_Arch (Look at object_id). This shows that during the switch out and switch in operation it is the metadata change only that happens and that is also the reason why switch should happen in the same filegroup

DBCC IND(Partition, Test_Arch, 1);-- clsutered index

DBCC IND(Partition, Test_Arch, 2);-- non clsutered index








Points to note here

  • Just like while switching out we need to disable all the nonclustered index not aligned to the
    partitioning scheme, simiarly while switching in we need to disable an non partition aligned indexes in the target table.
  • Also while switching in, the temp table (source) should have the same indexes as the target table unless disabled at the target.
  • There are lots of other constraints that should be taken care of apart from these e.g. switch operation has to happen within the same filegroup or for xml column any xml schema collection should be the same in the source and the target. I haven't spoken about them here as I wanted to focus more on this aspect. Will talk about it in future blogs