Pages

Thursday, July 21, 2011

Redgate schema compare

We have a system of about 300+ databases which the way the applications are designed need to be identical. Our releases include releasing objects to all the databases. But once in a while we have a production requirement or issue and have to roll out a hotfix to a particular database. You know how exciting DBA jobs get once a while and thus we tend to forget releasing those few objects to all other databases.

Hence, for that reason we wanted a database schema compare job to be run once a month or week.

Recently we bought Redgate developer suite and loved the capability to integrate with SSMS.
Also, its database schema compare was much faster than any other tools I have used previously. Along with that it also offered the capability to do source control and database compare, which we had been looking for a while.

Thus to automate the compare process, we decided to use the Redgare sql compare commandline features.
Here is a sample of what options were used/ignored and what switches were used to include and exclude objects:

1. echo Start Compare %date% -%time% >>C:testlog.txt

2. cd "C:Program FilesRed GateSQL Compare 9"
SqlCompare /S1:server1 /db1:baseDatabase /S2:server2 /db2:databaseToCompare /Exclude:Schema /Exclude:User /Exclude:Role /Options:Default,IgnoreComments,IgnoreConstraintNames,IgnorePermissions,
IgnoreFillFactor,IgnoreWhiteSpace,IgnoreQuotedIdentifiersAndAnsiNullSettings,
IgnoreWithNocheck >>C:testlog.txt

3. echo End Compare %date% -%time% >>C:testlog.txt

As you can see, the 1st and the 3rd statement is used to get time estimates for comparison. For us it took only about 40 minutes to go through 300+ databases, which is impressive for the kind of work this tool is doing in each database.

The 2nd statement is where the actual command is either run in windows command prompt or put in a bat file. As you can see it first changes directory to where sqlcompare.exe lies and then runs the compare across server1-database1 and server2-database2.

/Exclude is a switch used in comparison which lets us exclude comparison for certain objects, as in our case like user, role and schema. But you can also exclude to the level of tablename which is really cool aspect.
More information on switches can be found at:
Switches

/options are options in option tab you see in the SQL Schema compare Gui. You can use it to Ignore Permissions, whitespace, FillFactors etc during comparisons which may not be that important during the first run.
More information on options is at:
Options

The way we designed it it I ran the SQL comapre across two same databases, from GUI and commanline using same set of options at a time. Then adding more options as needed. This way I cud get the exact same result in commandline as in Gui.

We wanted this as a step by step process and initially we compared only Procedures, functions, views and tables fixing them across all databases.
Second iteration we compare only tbales and constraint names, because we had hundreds of system named default constraints and renamed them appropriate and same across all databases.

Needless to say before doing any of the object changes across all databases, we send them to our software developers for a quick review and released it using regular release process.

Saturday, July 9, 2011

Tempdb on separate drive on server

Case study:
How many times have we heard that tempdb should be on a fast cache drive? Thus to start with we had put tempdb files along with mdf files on fast drive. While the ldf's for the databases were on separate drive. This worked very well till SE's started writing queries to process data for internal use growing the tempdb mdf. Or when we had to load months of data in a database to resolve some issue growing its mdf out of limits and that choked that drive.

Thus we decided to shift tempdb to its own fast cache read write drive T:. On an average our tempdbs required only about 5 GB space. But once in a while shot unto 100 GB during some manual processing. What our Operations team decided was to let tempdbs of various servers each be on T: drive on the servers and share the same pool of space common to all. That pool being about 100GBs. The logic behind this was, when data was processed for one database, not necessarily other databases would also process that amount of data as well. So at a time even if one tempdb grew a lot it could use up 60-80GB and shrink back when the processing is over.
Thus that saved a lot of space given to each tempdb on each server if they were kept separate and had to be given 100GB each. It also helped since tempdb being on separate drive did not cause a bottle neck on the mdf drives, thus not slowing down other production queries. This resulted in faster data retrievals for client applications.

Friday, July 8, 2011

Deleting data from heap table

We started running out of space on a few of our servers. Adding more space is an easy getaway, but everyone knows the perils of throwing more and more hardware to solve space issues. So determined to use the available space, we decided to look at any particular tables which had a lot of data and could be deleted or archived. We figured that there was a heap table used in every databases, in some as big as 200GB, which had data from when the system had started. Also new data was being added to that table everyday. After careful consideration of keeping back dated data upto one year and deleting the rest since we had flat files of data over many years which could be loaded if required, we decided to delete the data from heap.
The data was deleted in batches since the table was continuously used and we din't want to lock it down for long time. But as everyone knows the major issue with deleting data from a heap is the data is deleted but pages are not deallocated, unless you do a rebuild or have an index over it or delete with a TABLOCK.
Creating an index though an option was not going to work for us, since it would take nearly as much space as heap. We did not have that much space on server. Rebuilding also took a lot of space and hence with only a few GB remaining even this was not an option.

So for the time being we decided to delete 500 rows at a time with TABLOCK option. With TABLOCK, the delete acquires a lock on the table and deleted data as well as deallocates the pages. So we planned to run and monitor this so as to not have other queries wait on it and detect blocking early on.

Once we got a lot of space back and the heap had only one year of data in all databases, we decided to create a new table with index and have the queries use this and delete the heap.
The other option was to use a compressed table while doing so, so that the space required for a years data would be pretty less.
We decided to drop the heap once the new table was in place and data from heap was transferred to it.
Next, we put a job in place which would run weekly or monthly and delete data from the table, so that the table remained manageable with only one years data preserved.

These were a few options we used to sort our space issues, other than deleting unused indexes. Creating required indexes etc. If you know of any other options please do share your thoughts.

Transactional Replication

The way our system is designed, We use transactional replication for quite a while now. Though once in a while it gets confused with snapshot replication, since transactional replication creates a log reader job, a snapshot job and a distributor job.

The snapshot job is required only the very first time when the articles in the database need to be published from the publisher to the subscriber. We found that this job is no more required and is best turned off at other times. We realized this when we started having blocking in the middle of the day. Investigation revealed that it was because of replication. But replication ran at other times too. The blocking did not happen at those other times. So we looked closely and found another job running at the same time as the snapshot creation and distribution and since snapshot job blocked the tables, the other job also blocked, thus causing blocking and deadlocks. We shifted the other jobs to run 5 minutes after the replication job and disabled the snapshot job since it was not necessary.

As we experimented we realized that, the log reader continuously runs and reads any new changes to the articles to be published in the transaction log and the distributer which can be run either continuously or say every hour or once a day according to the requirement, will pick up all the data and schema changes and distribute it to other servers.
Our initial assumption that distributer can distribute only data changes and not schema changes was wrong.
Thus we disabled the snapshot job.

The snapshot is only required when a new article (table) is added to the published set and that needs to be newly created at the subscriber. Since we have streamlined releases and client downtime during releases. When we notice such a scenario (which is rare) we plan to run the snapshot job during releases manually and hence reduce the blocking caused by the snapshot job.