Video streamVideo streamVideo stream |
 |
 |
|
|
|
Question indexQuestion indexQuestion index |
 |
 |
|
|
|
Interview transcriptInterview transcriptInterview transcript | Discuss this interviewDiscuss this interviewDiscuss this interview |
|---|
|
My name is Jeffrey Richter, as you said, and I have been writing Win32 books and .NET books for about 12 or 13 years now. I have written about 12 books in total so far. I also do a lot of consulting and programming work. I do consulting at Microsoft in various teams over the years, Windows, little bit in Office, most recently since October 1999, the common language runtime. I am also a co-founder of Wintellect, a company dedicating to helping customers build better software faster. We have consulting, debugging and training services. I mostly do training stuff, but I do some consulting stuff as well. We have been in business for about five years now. Yeah, a lot of people do believe that managed code is slower than unmanaged code and in some cases, that's definitely true, that managed code is slower. In some cases the managed code is as fast and in some cases even faster than unmanaged code. You do have to realize that when your code runs, your IL code, your intermediate language code it JIT compiled into native X86 code and the JIT compiler is an optimizing compiler that ships with the common language runtime engine. When it produces the native code, that is, optimized native code, very similar to what would have been produced if not identical, to what would have been produced by your unmanaged C compiler. So when your method is running, it should be running just as fast. There are sometimes when that is not the case, but many times it is the case. I know the runtime team in version 2.0 has spent a lot of time improving performance of the runtime. They have really done a lot to try to reduce working set, which means the amount of memory that is required while your program runs, they try to reduce that a lot. In general, programmers know that when they optimize code you can optimize for speed or size. What we have learned with Windows over the years and this applies to managed code as well running under Windows is that in general when you optimize for size, reduce the amount of memory needed, you are also optimizing for speed as well because if you are program needs less memory that means your operating system will do less paging inside memory when paging means that the operating system is going out the disc and loading things off disc, which requires speed significantly when that happens. So the runtime team has spent a lot of time trying to optimize for space, memory usage, and that in turn will give them improvements in speed. I know a lot of people have complained about the startup time at the runtime. In version 1.0 and 1.1 of the runtime when the CLR initializes, it loads this huge XML file, several thousand lines long, I can't remember exactly, maybe it is like 7000 lines or something like that and it passes that XML content and that XML contains various settings that the runtime uses like for security and various things of that nature, where to find files, that kind of stuff, it will get parsed in. If you look at that XML file under version 2.0, it's now under just a couple of hundred lines. They reduced it by orders of magnitude. So, now, when the runtime initializes, it just loads in a couple of hundred lines and passes the part to that. . You might ask what has happened to all that data that was in the other XML file. Those default values have now been hard coded inside the runtime so that if you don't change any of the default and most people did not change the default, then they are just already there like intrinsic, if you will, inside the runtime engine and so now startup is way, way faster in version 2.0 of the runtime. In addition, the team has also done a lot of things to improve like calling through a delegate to call other methods. They made a lot of improvements to Reflections, make Reflection go quicker and there have been lots of other improvements to Collection classes and things like that to iterate over them more quickly, some improvements to the garbage collector and so on, just to make the system overall run faster. A lot of people do believe that the engine is kind of panacea and it's the way of saying, "Well managed code, which runs slower because of the JIT compilation and verification stage and so on. We can get rid of that if we just ngen it first." Ngen is the action of taking your IL code, compiling it to the native code usually when a program is installed on an end-user machine and then when the user has to run the application the runtime finds the ngened code and then just starts using it. There is a bunch of problem with ngen. It sounds great in theory but when you start looking at the details of it, there are some issues with it. First, you need to an ngen at the customer's site, not at the developer's site because ngen will compile code specifically for what CPU you might have in the machine or if it's a multiprocessor machine or a single processor machine it will need a different code based on that. You also have to engine for a particular version of the runtime. You also need to engine based on the exact DLLs for your dependency. There is a lot of things that have to line up quite right and in fact, when the runtime loads an ngened file, if it sees that it was ngened for a different version of the processor, a different number of processors in a machine or a different set of dependent assemblies or a different version of the runtime that was actually being used, then the runtime can't use that file and then defaults back to regular JITting, in which case it actually will slow down your application because it had to load the file, test these values, see that they are not all the same and then unload it to get rid of it. Some other problems with the ngen is that at runtime the runtime can make more assumptions about your code as it is executing, which it can't do prior to it executing. So ngen code actually is inferior code to the code produced by the JIT compiler at true runtime and your program can actually end up running 5-10% slower. The real benefit of ngen is an application startup time. If you've noticed for your application the startup time is very slow and you would like to improve upon that, then you might want to try running ngen, then run your program and the startup time will probably be a lot faster, but now you're steady state and your program has already been initialized and running, it is probably going to run slower than it would have if you had just JITted it all along.
Pretty controversial topic that a lot of people have is about the IL code and how easy it is to reverse engineer intermediate language. As I am sure everybody knows, intermediate language is this high-level assembly language. It doesn't have any CPU registers in it. It is all virtual stack based. It even has object-oriented constructs in it like newing up objects and calling their constructors, calling methods virtually, in fact, there is a callvirt IL instruction, there are IL instructions for throwing and catching exceptions. If you write your code in C# or VB.NET or whatever language your choice is and you compile it, the compiler produces this intermediate language and then there are a bunch of tools available, even Microsoft makes an IL disassembler that if you look at the IL code and reverse engineer it to see what it is doing. There are also people who make de-compilers, which will reverse engineer it into C# or VB.NET or Delphi or some other language you can see what it looks in source code form. You don't get the comments, but you get pretty close to what the original code looked like. This has a lot of people worried that people can write these algorithms, spend an enormous time on building these algorithms that sets you apart from your competition, compiling it, distributing your code and then people can go and easily reverse engineer it and see what you're doing. There are several ways to kind of counter this problem or to solve this problem of reverse engineering. First is if you are writing web services or web form applications, then your DLLs reside on your company's servers and your end-users who are using your web site or using your web service they do not have access to your files at all and therefore, since they don't have access to the files, they can't see the IL code, therefore they can't reverse engineer it. As long as you're trusting the people in your company who are running the web servers and so on, then there should be no problem with reverse engineering the IL code. Another thing to be aware of is that really most applications don't have intellectual property worth protecting. Nobody cares how your file exit menu item works, how your file open, your file save, your help about, your edit copy, your edit cut, your edit paste, most of these things that are very common with a Windows form application is pretty standard code, then there is no real intellectual property there that is worth protecting. If you do have some parts to your program that has IT that is worth protecting you could consider writing that in unmanaged code and the rest of your application you write in managed code, so you will have real easy life from almost 99% of the stuff you are doing and the couple of things that you think will really set you apart from the competition you write that in unmanaged code and use the runtime interop facilities to get out to it in order to execute that code. Finally, the real solution for this problem is some form of digital rights management and Microsoft, I know, is working hard, they have a team of people who are working hard at digital rights management problems and this of course is the same way you would protect video content or audio content or any other content like email contents and things like that from people. That's the way we would protect intermediate language content because you can consider that content that's inside a DLL or executable and while Microsoft is working on this they have no solution today for this. At least in regards to managed assemblies, but they will have that solution sometime in the future. Well, I do know that when Longhorn ships, which is sometime in 2006, it is supposed to ship with the Whidbey runtime. That means that a new version of the runtime is probably about a year after when Longhorn ships. We are probably talking about something that may be two years out. For me, personally, the stuff that I have had involvement with is not going to make it into Whidbey version of the runtime but would make it into the next version of the runtime are some thread synchronization things, I personally created a new reader-writer lock that gives higher performance than the one that currently shipped with the framework class library that I was told would make it into actually Whidbey but it got cut from Whidbey, so now I am hoping it will go into the next version. I have also created another form of thread synchronization lock that is new that actually sold the patent right to Microsoft for and they are hoping to build an implementation of that or maybe I will give them my implementation of that invention that will go into the next version of the runtime and other stuff that I work on, I work with the version architecture team at Microsoft. We are coming up with a way to improve versioning hell. We got rid of DLL hell, but now everybody says the runtime has the versioning hell and I completely agree with that. The version architecture team is trying to make it so that versioning of assemblies and rollback and backward compatibility and things like that is much easier, the defaults are much better in the future. So, that will be in the next version of the runtime and finally, the version architecture team also concerns itself with an add-in model. We expected there will be a lot more people who are building host executable applications where third parties create components that plug into those applications. And right now, the runtime offer is pretty easy way of doing that you can create an AppDomain and secure it, you can load components into it, you can unload the AppDomain when you are all done, but right now the runtime offers no standard way for people to discover what components are installed in the machine that I can find that can be used by my host. There is no standard like discovery mechanism or registration mechanism for these kind of components and that should come in the next version of the runtime as well. That's a question that I ask myself pretty much on a daily basis and I would say that one of the hardest things that I have been faced within my life over my lifetime is figuring out a proper balance between life and work and various things and I am still working on that. Now, I am married and now I have a two-year-old son and trying to balance that all has been very difficult. It tends to go in waves. Right now, I am very focused on the Whidbey edition of my .NET book and I also do a lot of training currently for Wintellect and a lot of other Wintellect signing contracts and things like that working the trade booth here at Tech-Ed and so on. I am mostly doing that. In the recent past, I have periodic work with the runtime team. I go in and I have meetings and things like that, but it's been much less of late. There is another group at Microsoft that I've been doing a lot of work with, which is the Connected Services Frameworks team, CSF team, and that's been fun for me because there has been a lot of programming. The balance is hard. I struggle with it every day for balance. When the book is done though, it will probably be a couple of years before I rev the book and then I will probably have more time to be focusing more on the CLR side. Certainly right now the book is a high priority for me, Wintellect is a high priority for me and Microsoft consulting is a lesser priority. I like the Microsoft consulting a lot. I really enjoy that and working with the people there and when the book is done there will be a better even balance between that. My family fits in somehow, weekends, evenings.
There is this group at Microsoft and they are building a platform that is mostly used or will be used by telecommunication companies to enable scenarios where people can do more things with their cell phones typically like the one example would be go to a bookstore and maybe you see a book that you would like to buy, you can use your cell phone to take a picture of the barcode, that then gets sent back to the telecom company, they call a web service made by another company that can look at the bar code and parse it to figure out what the ISBN number is of the book. Then, they can send that request off to other web services, lets say, for Amazon, Barnes & Noble, and Borders and ask them what the price is of the book or maybe which stores have it in stock most or something like that and then that information can come back to your cell phone and you can place the order directly there on your cell phone and then the book shows up at your house at the cheapest price possible. Again, they are building the platform for this. That was just one scenario. Lots of scenarios exist. They are building it all presently on version 1.1 of the runtime, but actually our latest version of it was supposed to be shipped this month in June and then we are going to focus on the next version, which will be running on top of Whidbey. They are currently using Whidbey 2.0 but we are probably going to move to either WSE 3.0 or Indigo. We are researching which one we think is best for us right now. We have spent a phenomenal amount of time on scalability and performance. You can imagine there are millions of cell phone customers out there and they might all be trying to do this simultaneously. We need to create a session state object on the server for each of these people. We want to get a lot of scalability on these servers having 64-bit Windows actually and Whidbey supporting 64-bit version is going to be huge for us. That's really going to make it so we can support millions and millions and millions of customers on one box very efficiently. We are really looking forward to switching to Whidbey for that purpose. Also, WSE, we found that performance of it to not be so great. We are hoping that Indigo's performance is significantly better and our preliminary tests show that it is. We think that will improve our performance a lot. Today, I have done a lot of work with them to try to work with the garbage collector. The garbage collector today is really designed to kind of do a garbage collection every 256K or so, a GC kicks in 256K of allocations and the reason is to try to keep your working set small, which goes back to what I said earlier in the interview to reduce your working set to keep your performance higher. We kind of wanted a different goal. We wanted to use as much memory that was possible on the machine and keep as much session state in, so our program runs fast. So, we wanted certain objects to be able to stay in memory for a long time even though we were not accessing them because we thought a customer might come back into the web service shortly, but we wanted other objects to be garbage collected right away and we had to write a lot of like really fuzzy heuristic code to ask the garbage collector how much memory do you have now? We had to write a lot of our own collection classes that use weak references to track references to objects that if it has not been used in the past half hour it is okay to get rid of this, but it has been used more recently so we want to hold on it. We had to do a lot of work there. That was really fun and enjoyable for me, although I felt like I was fighting against the runtime a little bit and the way the garbage collector was designed. I do know that one feature of it was kind of planned in the garbage collector was, you could in the future, register an event and when the garbage collector was about to do a collection it could fire this event and let you know it is about to do that and that would help us in a future version of that. We would definitely use that feature.
It's a good question. The garbage collector for most people is the big black box and for many people, if you say, they ask you, "Does it work" and you say, "Yes" and they are pretty content with that. But the runtime team has spent a phenomenal amount of time analyzing the garbage collector and improving performance. There are a lot of hard coded values that are inside the garbage collector and those values are not publicly exposed to anybody mostly because for anybody to tweak them manually you would really have to understand the implementation of the garbage collector and what those things do. In addition, when you call a method on a class, like Console.WriteLine, you don't know if inside there it doesn't an allocate any objects at all or allocates a 10K object to do the work, but the garbage collector knows, whenever an allocation occurs, who is doing it, what size it is, where it is occurring, what's the importance of this, is it living for a long time, is it living for a short time, and the garbage collector algorithm is constantly being tweaked. Normally what they do is they take a bunch of applications that are already, they run them on top of the garbage collector, they use tools like Intel VTune to monitor performance and see various things that are going on, then they tweak a couple of numbers and they run a bunch of applications over, tweak a bunch of numbers inside the garbage collector algorithm, then they rerun the applications and they say, "Hey, Did it improve? Did it make it worse?" and the odds are it probably makes some applications better and it probably makes some applications worse and they are trying to build a generic memory management system that is going to work for the widest variety of applications. Certainly if you build your own memory allocation system that was fine-tuned for your application, nobody would do better than you doing it yourself. But a lot of people don't want to do it yourself and that's why the garbage collector is there for and it's just constantly being tweaked from version to version.
Well, from garbage collection standpoint, the garbage collector is fine-tuned for objects that live a very short lifetime. So, you should feel very comfortable in newing up an object that you're only going to do one or two things right away and then you don't need it anymore. The garbage collector is really well tuned for that scenario. You should also feel comfortable creating an object in memory that you are going to keep around for a very long time, so the garbage collector is fine-tuned for objects that have a very long lifetime and objects that have a very short lifetime, objects that have medium sized lifetime, like create an object that lives for 5 minutes, then you get rid of it, then create another that lives for 5 minutes, then you get rid of it, that will work but you're not going to get the best performance possible. Also, again, with the garbage collector, try to keep your thread stacks as short as possible because when a garbage collection occurs, the collector code has to walk up your thread stacks to find all the variables that referred objects in the heap. Well, if the stack is really short, like, don't call deeply recursive functions, then walking the stack to find those variables is very, very fast and for most applications that happens any way. A Windows Form app, the thread usually sites in a get message loop, the user moves the mouse or hits a key on the keyboard, the thread wakes up, calls something, and comes right back to the top. For a Web Form or Web Service application, you have these thread pool threads that are sitting idle, a request comes in to the server, a thread wakes up, handles the response, and then it goes back to sleep. So, you just want to try to keep that model going, don't have threads that are just doing very deeply recurrence functions, just keep everything short. Usually the biggest thing for people is fine-tuning an algorithm. Using StringBuilder over string is a good tip. I see a lot of people do string concatenations. They are very expensive in terms of time. Using a StringBilder to do concatenations can be usually beneficial performance wise. Reflection, while they have done a lot of things to improve reflection in CLR 2.0, it is always going to be somewhat slow because it is doing string comparisons against stuff that is in metadata. Metadata has to be loaded in, which increases working set, string comparisons is always slow, that's the way it's always going to be. That's three, right? Well, I'm glad you asked. I think anybody who has monitored CPUs coming out from hardware manufactures like Intel and AMD over the past few years, you will notice that CPU speed has not been increasing at Moore's law certainly. Moore's law was power increases by double every year and a half or so and that did happen with CPUs up for a while, but now they are pretty much maxed out at about 3.2 GHz or so and we should be seeing CPUs that are much faster by now. We're up against physical limits of hardware and what we're able to do in like pushing electron through metal and we've pretty much hit the physical limit. So, what the hardware manufacturers are doing is they're now finding it much cheaper and easier for them to produce single CPU chips that have multiple processors on them. Hyperthreaded CPUs have already been shipping for years, which is kind of a cheap way of doing that and I don't know how much I'm actually in love with hyperthreading because it actually makes performance worse and it's hard to program against in a efficient way, but now we are seeing just last month Intel and AMD are announcing multicore chips and it is expected in the future they will produce CPUs that have maybe 30-32 processors on a single chip and that will be mainstream and the GHz, the speed of them is the same, raw speed of them is all the same but you have more processors now. What this means to people building applications and servers is that the way that you're going to get scalability or improve the performance is by making your applications multithreaded. That's the only way it's going to happen. We are no longer going to get just our program runs faster by putting on a faster machine. That means that application developers are going to have to be more concerned with threads. In order to make the program run faster, you will have to have some work done on a separate thread than what this thread is doing over here. That means that there will probably be thread synchronization that has to occur as well in order to make that happen. So, there are a lot of people at Microsoft and me personally who are looking into this now, in fact, MSDN magazine contacted me a few months ago and asked me if I would start writing a new column for them, which is called The Concurrency column and I agreed to do that. In fact, I just submitted the first column this week to them which will come out in a few months and the whole column is just about how to build high performance scalable applications to properly use multiple threads and thread synchronization to get the best performance possible. The runtime currently has a thread pool in it, which programmers can take advantage of to make request to have multiple threads doing things and that is a really good start and people should be using that more than they are, I believe. Also, the runtime has this awesome thing called the asynchronous programming model where you call Begin or End methods like a FileSream or NetworkStream, BeginRead and EndRead and BeginWrite and EndWrite to do asynchronous IO to hardware, programmers should be doing that. In Version 2.0 and Whidbey, ADO.NET now supports the asynchronous programming model, which is the best place you want to use it. In fact, it's kind of embarrassing that ADO didn't offer asynchronous programming model in earlier versions of the runtime, but if you are doing database stuff, you care about performance and scalability, you definitely want to use asynchronous programming model against that. People should start looking at that now. The runtime has a bunch of threading primitives in it, thread synchronization primitives in it and people should learn about that and the pros and cons of the various different locks that are in there. There is this reader-writer lock in there, which I think is really terrible actually. I think the performance of it is bad. I think it also has a policy decision where it tends to favor readers over writers, which starves writers and they do not get in. This I mentioned earlier, that I got my own reader-writer lock and I have given this to the runtime team, it was supposed to make in the Whidbey but it didn't make it in the Whidbey, but mine is way faster. I can say mine is way faster than theirs and it has a policy that I prefer certainly. I am also thinking of making it available on the Wintellect web site. I have a power-threading thing that I have been building. I haven't posted it just yet, but it has got a bunch of new lock on it that are at very high speed, very high performance locks. I also had this lock that I sold the patent rights to Microsoft or its part of that and I have built my own asynchronous programing model that works just like the one that is in the runtime, but mine is about eight times faster than the one that ships with the runtime. I think this is something that people are now just starting to care about because they are realizing that the processor speeds are not getting faster and that multiple core chips are coming out and I think people are really going to start taking this more seriously and start using threading much more inside their applications.
|