Video streamVideo streamVideo stream |
 |
 |
|
|
|
Question indexQuestion indexQuestion index |
 |
 |
|
|
|
Interview transcriptInterview transcriptInterview transcript | Discuss this interviewDiscuss this interviewDiscuss this interview |
|---|
|
Hi Paul. Thank you for having me. My name is Kimberly Tripp. I've been working with SQL Server for about, I am always embarrassed to say, 15 years, but I started when I was 12, so it's been a long time. For the last 10 years, I have been doing training and consulting and speaking at conferences. I write for SQL Server magazine. I write white papers for the SQL team a lot and I do some consulting, but I have been filling a lot of my schedule with writing and speaking at conferences. So I have not done as much consulting, but I do training and I've learned so much from the students that I have had. It is just amazing some of the real world scenarios that you've got from students and training and prior to that that's been for the last 10 years. Before that, I used to work for Microsoft. I was on the SQL Server team and I did user education actually. The books online was my last nine months, but I did a lot of testing with the SQL team before that. I did two sessions today on indexes and I even made a joke today when I started my session that it seems like how can I possibly talk for two hours about nothing but indexes and amazingly in one of the workshops that I do which is a four-day course, we spend about two days on nothing but indexes. I jokingly said we're really just going to hit the tip of the iceberg today even though we are spending two hours on indexes. I did one session on indexing best practices on SQL 2000 and then my next session What are the new features in SQL 2005 related to indexing and then tomorrow I am doing a post conference for VSLive, which is all about server side best practices and performance. We're looking at logging, recovery, stored procedure optimization. You've seen me do some of that and some of the other events. So stored procedure optimization, locking, blocking, basically how to make server side code a lot more efficient and optimal. It is a dev event, so I focused more on the dev side and what developers should look at and evaluate in the database, not necessarily the coding. The coding is more on the database side objects, so today was indexes. In terms of indexes, I think the most important thing that people can do is learn what the base structures look like and that means determine whether or not your table should have a clustered index or not. In many cases, the clustered index being put on an identity column can be a really good choice for performance because it minimizes fragmentation and all of the secondary uses of nonclustered indexes can be a lot more efficient when the clustered index is static, narrow, and unique. If I just say start with the base structure that's you're clustered index, probably choose a clustered index on an identity column and then help your performance through secondary indexes, which is nonclustered, kind of my best practices. If I just tell in a nutshell what I would do is just define the clustering key, define unique keys because you need those for data integrity, manually index your foreign keys and then capture a workload and use ITW. The good news I said this today since going through an hour of best practices on 2000 is really hard to get through everything and the whys. I can kind of say what you need to do but not necessarily convince you with all the whys, but I did two MSDN webcast that are free to download online and one is on finding the right balance between having two many indexes and too few indexes and I also talk about how to use Index Tuning Wizard and how to use profilers to effectively get information out of Index Tuning Wizard. That is a great way to start and then index maintenance is the next best practice. In addition to creating right indexes and having the right structures, maintaining the structures is really important and really critical. My second webcast on MSDN is an hour and a half for nothing but index defrag and they are both free and they are all on MSDN and I do have links from them off my web site. Maybe we can put something on TheServerSide.NET so that they can get to them easier. It is all free.
That is a great question. I'd have to start with where and what are we looking at in terms of problems, but in terms of stored procedures, just to start there. A lot of issues that I see is that people have parameters and are not necessarily handling recompilation issues very well. They're either recompiling too much or not enough and it seems like the way that the store procedures are written is really important. In 2000, the thing that I recommend today is to break down some of your larger more complex store procedures into smaller more manageable chunks and it's more likely you'll get better plan. That is not entirely true, you have to do a little bit more than that, you may want to actually force recompilation at certain sub procedures, but simply speaking you might be able to get better performance by smaller procedures and then in SQL 2005 just to compare and contrast that, they have statement-based recompilation and you can even do some interesting things in the way that they optimize the statements. There's actually an optimized clause where you can actually say optimize this particular statement as if it had a value of X and you can put in what you think the most common value is going to be so that the plan that is generated is for the common values. So if somebody sends in an uncommon value, you won't a bad plan saved. Stored procedures is a big area. Probably the other area that is the common is indexes and maintenance; improperly maintained. I went to a customer once and I said, "My you have a lot of indexes" and they had a ton of indexes and I said, "It seems like you know you've quite a few." "Oh we have an indexing strategy" they responded and I said "Oh what's your strategy, I am always interested in strategies." He said, "We just automatically index every column" and I said "No don't do that." Generally speaking, SQL Server does better with fewer wider indexes as opposed to a whole bunch of narrow indexes. A little bit of strategy goes a long way and tools like ITW can actually really help you if you do not know how to create the indexes manually. Creating indexes, managing indexes, stored procedure recompilation, disaster recovery preparedness in terms of being able to recover and minimizing data loss. Oh a really common one is transaction log handling. A lot of people will be like "What's this log thing? It's filling, its getting really large, what happened my log is 54 gig" and they learn about the ability to turn off logging to set something called the simple recovery model, but the problem is that minimizes certain capabilities in terms of recovery and you're more prone to data loss. It depends on I guess what I am focusing on. I can probably take 18 different areas of SQL Server and say, "Well if they're talking about this is the problem I see," but I think overall understanding of how SQL Server logs and recovers and deals with transactions is a really common misconception and especially locking and isolation and recovery, very very common. That is actually my workshop for tomorrow. I'm spending the whole day on logging, recovery, stored procedure optimizations and just how do you get a better holistic approach, really if your database is designed and structured you can recover it without data loss. It's a great question actually and I am a big fan of stored procedures, especially because I can control that. You can really look at a variety of different ways in which data and objects and queries are cached, that's kind of the first thing to look at. When you send a statement ad hoc to SQL Server what does SQL Server do and SQL Server evaluates that statement and determines whether or not it is safe. There is this whole process where SQL Server will look at the statement and try to figure out if it is safe enough to parameterize and save in cache so that subsequent users can benefit from a precompiled statement. The other option is to do forced statement caching, which is often done through sp_executesql, which is a parameterized statement where when its sent to the server, SQL Server automatically chooses its plan and caches it and subsequent executions use that same plan and then there are stored procedures. When you execute a stored procedure, the first execution generates this plan and subsequent users get that same plan. With sp_executesql and stored procedures, they both re-use plans and they both have the same negative, which is if the plan that's chosen on the first execution isn't good, then subsequent users suffer through that bad plan as well. So when it comes to which one would I use, they both have the same pros and cons and often I can control better the performance on the server side with stored procedures where I can take a statement that is kind of an offensive statements in terms of performance and I can extract it into a sub procedure, I can force even that statement to be recompiled for having this sub procedure that I call. You get a lot more control out of stored procedures than I do with sp_executesql. While that can be really nice to do forced statement caching and its simple, I tend to prefer stored procedures. You get the centralization of re-use out of it and I like the control that you get out of stored procedures. I don't think that everything needs to be in the stored procedures, I don't think that everything needs to be on the server. There have been very heated debates on where code should live, should it be in the client, should it be in the middle tier, should it be on the server, and I don't think there is one answer to any of those questions. I think different code needs to be in different places for different reasons, but when it is doing data manipulation, when it is on the server, I am more fond of stored procedures, they are closer to the data, they can be centralized, you can secure them, you can give access to the stored procedures without access to the underlying data and I just think they're probably one of the best ways to go from performance and usability. That's a great question and I think probably the best way to minimize destroying performance is getting a little bit more of a feel for how things work on the server side and doing maybe more analysis when writing different stored procedures and working with DBAs. I have these series of talks that I am doing these days called Development and DBA bridging the gap and kind of what is the gap between development and DBA and what I see often, especially today in SQL 2000 and I think you mentioned this that developers tend to, on the server side, do nothing but store procedures, maybe triggers, but stored procedures, functions, views as an interface into the data. They don't really look at anything else in terms of the database or the server or in terms of general design in a lot of cases and if development and DBA could work together earlier on in the project and kind of development telling DBA what their thoughts and DBA saying well, "Okay if you do, its XYZ" and kind of working together to build the project and doing testing sooner, I think there can be a lot more, I don't know. I was chatting with a few people last week and this reminds me where projects were completely and totally developed without operation staff involved and sure enough they designed something that operation would not support and in fact a year and a half of development time was completely and totally thrown away because literally they absolutely would not support it. I guess I look at stories like that and I kind of think if development DBA operation, if they were at least stake holders in some of the design meetings and development meetings kind of watching and giving some insight even if it is just a separate pair of eyes that has a different focus, someone else will notice things that you would not necessarily notice and they could later sabotage your application if you did not know them. So I just think working together and a little bit better teamwork, that is probably one of the biggest problems is dev and DBA not really knowing what the left hand is doing, when the right hand is doing something, and then not working together. So I don't know if that is really the number one thing, but I would say that is pretty high up there.
I could make a joke that nobody knows best practices yet. We haven't seen anything in production yet, but I've certainly been working with a huge number of the features and I have been looking at, in dev, some features more than others and most of them are availability related and just to kind of make sure that makes sense. A lot of people when they hear availability, they immediately say, "Oh clustering," but in fact I haven't been doing much with clustering in Yukon at all. I've been looking mostly at some of the new features, database mirroring, database snapshots, piece meal back up restore, partitioning and snapshot isolation, to be honest, is also very focused on availability. When I say availability I don't think in terms of clustering, I think in terms of how can a user get whatever it is that they need to get done effectively all the time. How do I keep that database, that server, that table even more available. In terms of features SQL Server 2005 is actually amazing in terms of availability. I will give you one example. I mentioned index maintenance is really important to keep a database performance. Well, to do that in 2000, your tables are taken off line, whereas in SQL 2005, certain types of tables, and most to be honest, can be rebuilt in an online fashion, keeping the table available even while you are doing important maintenance operations. That is even at the table level, but we can go down to another level, the row level. You can have users making modifications to rows and those rows are locked in 2000 and readers have to wait. Well if you choose to implement snapshot isolation, you can end up accessing rows that are currently being modified by using a row version, a version of the row before the modification occurred in the form of something called snapshot isolation. So you get better data availability with snapshot isolation, you get better table availability with online index operation, then you can start going to things like partitioning and partitioning can give you better availability to large tables if you have to do lets say arrange the reads. If you have the archive data and you want to get rid of, like right now we are lets say February 2005, and you want to get rid of January 2004's data. If that's in one table, doing a large scale delete is really expensive, but if you partition that table into smaller, more manageable chunks, you can get rid of a partition in literally milliseconds, whereas the delete could take minutes and you end up not having to worry about significantly impacting performance with that large delete and you just get rid of that partition and you get better performance and better availability in terms of management, because you don't have to worry about waiting for index being manipulated during the delete. Partitioning and then if you separate those different pieces of the database into separate file groups on different disks and one of the disks fails, like in 2000 today if you have even one page that's damaged, I don't know if you've ever had this, if you have even one page that's damaged the whole database is off-line. I mean the whole database is unavailable and you're talking about a 20 terabyte database, if you are talking about a VLDB and you got one damaged page, whole database is off-line. Whereas in 2005 your availability is essentially at the file group level and the file groups depend on the file's state. Imagine a file group that just has one file and imagine a table, like a partition table that separates each month onto separate disks. Again lets say we're February 2005, so we've got this data and lets say that active current we want to it be highly available. We put it on RAID 10, we use triple mirroring whatever, so we keep it extremely highly available in terms of disk, but we don't care so much about January 2002. Maybe we've five years of data, so we put that on RAID 0, we don't want to waste disk, we want better performance whatever. Well if that crashes, the good news is that file is unavailable, that file group is unavailable. But you end up partitioning your object in such a way that does not matter, because the database stays available even when file groups are unavailable and the best part is you can even restore an old image of that data, because it hasn't changed, while the database was being used. So you get online access to a damaged database, and you can keep that database available and you can even recover that database while it is online. In terms of availability 2005, just has a great number of features that are allowing your objects to be more available, more of the time and even do disaster recovery online. I mean, I haven't even touched on things like security and some of the new features, but it is just amazing and there is database mirroring, which is a great new feature that allows you to have secondary database I should say, a copy of the database at a secondary site, so if you have site failure you mirror can take over. I am really over simplifying, in fact I am not doing justice to that feature at all, but I am just saying that there are so many new features in terms of availability. I am gonna stop and let you go to the next question, but it is just really exciting right now. I am working on white papers and resources on these and courses and materials and there is so much new stuff, I think it is really exciting. I think everybody is going to love it.
Well actually, this is funny because anyone who knows me knows that I am more kind on the IT side, operations, DBA, recovery, fault tolerance that kind of thing and I am getting a lot more into things like the CLR, because it is now in the server. I have to deal with it. But it is not just that, actually amazingly and people look at me funny when I say this, I am excited about the CLR being in the server because the implementation of having SQL's OS, SQL's operating system essentially SQL Server manage the CLR means that you could be running managed code in a protected safe manner on the server, close to the data in the same memory space as SQL Server meaning SQL Server can better manage resources. If you were running managed code on SQL Server before in 2000, it could get out of control, memory leaks that could bring your server down. When you're running this by SQL Server managed by SQL Server and the resources are owned and managed by SQL Server then you're not going to have CLR code getting wildly out of control in terms of resources, so a lot of people would say, "Oh well, so you like it because you can control it" and it is not just that, although there is little of that, but it is not just that. It is just the sheer fact that it is a lot safer of an environment in which that managed code can run. It doesn't mean necessarily that you want to run anything and you mention this. When do you want to use the CLR, when do you want to use T-SQL. I think it is the most important point that most of us are trying to point out when we talk about using the CLR, it's not just a way for people who know C# or VB to put all their code in the server and just run it in the server, but it is a way for you to take things that maybe aren't data manipulations, they're not joins, they are not really complex, very easily handled with SQL types of statement. A lot of string manipulation, a lot of complex computations that aren't necessarily great in SQL that you can then bring into a function or aa stored procedure. Put it on the server, run it in the same memory space and have it managed by SQL Server and close to the data and that is where you can get some incredible gains. I've actually written, embarrassingly, I don't do a lot of this, but I have written some table valued functions that use web services and SQL CLR based functions. I did some demos at some conferences and I am really having fun with that. I really do like the new feature. I think it's a great feature. I can't read every line of C# and know it inside and out. I don't think every DBA should, if I start looking at DBA and dev I think what devs need to do is make sure that they find proper use of the CLR and make sure that their DBAs know at least the benefits of CLR and give them some insight into what the code is doing, but a DBA, if I look that audience as well they don't need to know every line of code because just in the way that the CLR is managed in SQL Server it is a lot safer than it was before. So I am actually more excited about that feature than most people think. They are like you probably don't want to talk about the CLR at all I am actually enjoying it. I am having fun learning C#. TSS: Resistance is futile, you will join the development group. We've already adopted you. All my friends are .NET, I can't escape. It is a great question. It is really amazing what SQL CLR is allowing you to do in terms of exposing web services for example, inside a SQL Server through table valued functions. One of the demos that I did was using Amazon web services for the table valued function so that you can literally do a query against the last ten DVDs, for example, that a particular actor had done. So that is kind of taking external web services and bringing them into the server and now kind of the opposite, which is taking something like a stored procedure and exposing it through an end point. How do I see that used and I've had this question, people have said, "Do I see SQL Server turning into an application server" and no I don't. I don't see that all. Where I see the ability to access the SQL Server through an HTTP end point would be where, you can't use a real SQL client. Where you want to access SQL Server from some client that is an not a native SQL client where you want to do Unix whatever. You can expose it and access it through any client that can communicate with an HTTP end point and the beauty of it is that it's not IIS based. The real beauty of it is that it is secure, just that end point you can grant specific login access to that particular end point and you can end up securing just one direct path into the server, just one stored procedure that can be exposed. You are paying a penalty for performance, I'll give you that. I mean exposing end points is going to be more expensive than if you are writing a native application to the server, but the flexibility and the fact that you can expose it to anyone is really powerful. So I think it is a great feature. As far as misuse I think that some people are going to end up taking every stored procedure exposing all of them through end points and making applications actually more complex to work with as opposed to kind of directly connecting and working with objects. I am excited about that feature too. It is really exciting, I don't use much in that space, I think for me a kind of combination dev and DBA I have been looking at how to properly use these features, and I think that is really critical. I am really excited about it. I am waiting for it to be misused too, like expose everything with 87 end points, is like lets not do that. That 's what I think will be misused. Well, thank you very much for talking to us today. Thank you.
|