David DeWitt
SQL Down Under Show 61 - Guest: David DeWitt - Published: 12 Nov 2013
SDU Show 61 features Jim Gray System Lab leader Dr David DeWitt discussing the database industry and the upcoming SQL Server 2014.
Details About Our Guest
Dr David DeWitt manages the Jim Gray Systems Lab for Microsoft. David was a University of Wisconsin Computer Science faculty member from 1976 to 2008 until he became a Technical Fellow at Microsoft. David has been one of the most popular speakers at recent PASS summits.
Show Notes And Links
Jim Gray Systems Lab is here: (http://gsl.azurewebsites.net/Home.aspx)
Show Transcript
Greg Low: Introducing Show 61 with guest Dr David DeWitt.
Welcome, our guest today is Dr David DeWitt. David is originally from the University of Wisconsin, where he was a faculty member from 1976 and 2008. He is best known as a technical fellow from 2008 in Microsoft and manages Jim Gray Systems Lab and so welcome David.
David DeWitt: well thank you for having me.
Greg low; what I get everyone to do first up these tell me how you would come to be first involved in SQL Server and in your current role? What led to this?
David to wit; so it’s a long pretty complicated story, soon after I joined the University Wisconsin I switch from working I switch from working in computer architecture to databases and did some early work on parallel database systems in the late 70s and early 80s. Some point in the early 80s I should say the department recruited a student from forestry a guy named Peter Steral. Peter had come to Wisconsin had come out to work and was looking at getting a degree in forestry. He got involved and took a relational database class from our department and he got very interested in database relational systems and went on to DAC. He worked in DAC for a number of years and RDB and was recruited in Microsoft when Microsoft decided to get seriously into the database. He was one of the first hires, Peter and I remained in contact for many years. Several number of years ago Peter decided that Microsoft should some sense give back to their university because we’d trained not only Peter but we trained dozens and dozens of graduate students who then went on to have pretty successful careers at Microsoft. There are a lot of them Scott Guthrie was one of the optimiser experts in SQL Server 7. There were a large number of students who will went to Microsoft and Peter said we should start a lab there and hire some people and fun some graduate students and do a cooperative venture between Microsoft and the University.
We try doing this, once maybe 8 or 9 years ago that we never managed to get the logistics worked out. Second time we tried it we got the logistics worked out, I retired from the University and started this up for Microsoft. A lot of people are surprised I’m not part of the Microsoft research but really we are a small research and advanced development group inside Microsoft. Inside their SQL Server group in Microsoft.
Greg low: I love to see that sort of collaboration arrangement. I must admit I’m old enough I grew up through the heydays of working at HP in the 1980s and just the tail end of the time when both Bill Hewlett and Dave Packcard were there. It was just after they had retired and John Young was running the company but one of the things that most impressed me at the time was the close association between the University there in Stamford. The close association between the company and the University. They seem to have arrangements at the time that’s right if I was doing a Ph.D. study you could do that, you could be working in HP systems lab you can be doing that in the University. Conversely it gave opportunities for the University. It is just something I don’t see here in Australia much at all.
David DeWitt: well actually it is pretty rare in the United States too. It has got this somewhat tougher because the extraordinary interest in owning intellectual property these days.
Greg low: yes.
David DeWitt: both from the part of companies and on the part of universities. University of Wisconsin, Madison has a very large patent portfolio a variation of putting vitamin D in milk by radiation patented here. The blood thinner warfarin patented here. It really took the head of the University and Bill Gates to work through their lawyers together to come up with some intellectual property sharing arrangement that suited both parties.
Greg low: yes.
David DeWitt: The IP makes it a challenge, but we have a facility which houses both graduate students and full-time Microsoft employees. Graduate students aren’t in turns, they are normal graduate students not paid by Microsoft but through a grant from Microsoft University. The grad student works with the staff. Graduate students frequently do projects together, they get a chance to do their projects inside SQL Server source code and we have already turned some of those ideas over into product. We will ship as part of SQL Server 2014.
Greg low: yes I think that is one of the wonderful aspects, the fact that you get access to the right source to be able to make serious contributions.
David DeWitt: I think it is also I opening for the students. A lot of the students in previous generations had done the work inside either MySQL or Postgres and I think it is really I opening for the students to see the difference between an open source system and a system as sophisticated as SQL Server. As big as SQL Server, it is one thing to have an idea and put it into MySQL, it is another idea and put it into a product that has been hone tuned by very high quality engineers for years and years. There is a lot inside and maybe there is too much inside but it has been a very good in experience for the graduate students.
Greg low: yes, I think it is one of the things I certainly remember with HP at the time where people would talk about the products being almost over engineered. I mean it was no small thing to say that because I remember in particular are one of the large disk drives that were shipped at the time it was 7935 and it was a work of art. What was interesting was just a number of Ph.D. thesis that had been created and in fact one that had done his complete dissertation in and around the plastics design of the disk drive. It was just astonishing to look at the work they just went into designing that.
The products that came out certainly were very sophisticated and so I suppose in terms of SQL Server what is you’re feeling about where it is sitting in the market and where it is heading just in general?
David DeWitt: while you know I think, I guess there are couple things I would say we are by far the market leader in terms of SEATS it at this point among the big commercial 3: Oracle, DB2. We are obviously still trailing Oracle in terms of revenues but our revenues have gone up significantly in the last couple of years. I think in certain aspects we are no longer necessarily chasing Oracle’s taillights both with the column store that came as part of SQL 2012 and now PDW SQL Server and now SQL Server 2014 we really are will way ahead of Oracle when it comes to column store.
An OLTP engine, it was coming out first as CTP2 of SQL 2014. We are on a totally different path for people doing high-performance OLTP engines. We don’t have a separate code base. We don’t have a separate API. We are fully integrated into the products so I actually think from a technology point of view we are market leader and from a SharePoint point of view we are doing extremely well. I think what is interesting I reflect back and my 5+ years at Microsoft, when I will first joined everyone was worried about the lamp stack. The sky is falling, the sky is falling, MySQL is going to take over. MySQL has not taken over. It is not to say it is not a great product, it has its uses but I think enterprises understand that systems like SQL Server, Oracle and DB2 are really where you want to put your high-value data in.
Greg low: yes.
David DeWitt: we have a lot of competitors in the data space and I think we have got some interesting things going on there. But I think as a product, everybody always feels you know we could do more.
Greg low: yes.
David DeWitt: one of the things that has been really I opening, for me has been what it means. I was part of the verticus start up, part of the vertical column store and I worked on that for over a year.
Greg low: yes.
David DeWitt: I was on split sabbatical for a year. It has really been I opening to me what it means to ship software to hundreds and thousands of customers and the quality of is extraordinary. Sure we ship products, we ship bugs in our products but the drive to zero defects as a ships is just relentless. It is always a challenge, in order to make ship dates you are always cutting features that you know that some customers would love in order to ship a product out the door of super quality. That has been fascinating for me to witness and be part of.
Greg low: yes and as part of that that practical nature of it. Actually you mentioned along the way Postgres too I was just interested in your thoughts on that one? I was going to say I am seeing more of it in recent times than MySQL.
David DeWitt: yes I think it has a very long history, it has gone on from being written in discipline and Mike and I have been acquaintances since more than 40 years 43 years since Mike and I first became friends. You know it’s gone on from being just an academic project to widely embraced, very nice product and I think it is quite good system.
Greg low: yes.
David to wit; I think of open source systems it is by far the best though I’m sure my SQL people will
Greg low: oh yes I’m very sure about that
David DeWitt: you know I think it is got a.. Mike really started that the maintained from post after ingress
Greg Low: After Ingress
David DeWitt: yes and it was Postgres not Postgres SQL and so you know it was Mike’s attempt to introduce some ideas that we had from Ingress. Abstract data types, hyper extensibility, user defined functions. He tried some pretty wild things, in terms of overwrite storage manager and time travel and it was just one of Mike’s many really important contributions to the field. It is a great system, certainly not a high performance for OLTP engine if that’s what you are interested for. Not really, I mean through green there is a scalable offering. It certainly has its niche.
Greg Low: Oh look, and one of the things I think people will do like some of those they are able to respond perhaps more quickly to some of the changes in the industry. I mean, probably be notable thing I keep hearing from a lot of clients at the moment are things like JASON support and kind of like we have XML support in 2005 and kind of the industry pretty much move to using JASON. The following year and keep asking is there some sort of native support or something for that.
David DeWitt: you know I think you know, through .PB, I don’t know if you are familiar with what type of domain for that is. We have a JASON project that we are working on inside Microsoft, XML in SQL Server probably widely viewed as a mistake.
Greg low: yes yes it was a fascinating addition to the product it felt like coming out of left field at the time.
David DeWitt: yes I think you know I think XML was Column type that made a whole lot of sense. A bitter support for X-path but extend the language to X-query complicate the optimizer. You know it complicated the code base and then the world moved on but not that Excel is used in. You are right JASON has certainly caught on and as far as SQL Server and PDW sure will support JASON part of our big data story but what is going to be the next format? As we go to a lot of trouble to move towards JASON as that column type in which will be pretty trivial to do. We have to then run JavaScript, running Java interpreter, inside JavaScript contributor, inside the SQL engine.
Greg Low: I think the thing that would probably do it, is probably just some native support for functions and things prebuilt. The thing is you can roll your own, it is more I think you see. Like a site I was on yesterday they were just asking about, they really just wanted to go INSERT TABLE, list of columns, values, here is a chunk of JASON.
David DeWitt: as you said you can roll your own, by having either a table, you can always do this by having a table where you can put the path of the JASON field instead column A, and column B is the actual value. So you have done a vertical composition of the table, is it pretty? No but you can simply embed JASON inside markers. There are a lot of things you could do but when are we going to add JASON as a column type who knows. That is part of SQL 14 that is all I can really say.
Greg Low: No indeed. What’s your thoughts on chasing standards compliance? That is another one…
David DeWitt: what standards?
Greg Low: yes exactly. I read the Postgres material and one of the things they did comment on is the fact that they chased standards more aggressively than some others but yes. But yes the standards are pretty weak.
David DeWitt: I would say every company gives somewhat lip service to standards but everybody has their own proprietary programming language, they have their own proprietary to APIs. I think everybody implement some subset of the standards and nobody implements the full standard.
Greg Low: Yes.
David DeWitt: you know people pick and choose. I think personally the standardization effort is off the rails I don’t think it is ever going to really. You are not going to get all the variations, no vendor is interested in having each product completely compatible with someone else’s product. Unless you are start-up, so…
Greg low: yes, listen the other one in terms of little bit of database history. I was watching a session yesterday and of course they were talking about the fact that not all that many years back. In fact the mid-90s of course the other big push at the time was everybody were saying object databases.
David DeWitt: oh yes, I know. That was right before XML, it went about just as far as. You know I think it was actually as part of that I had a project in city. One year in Paris, I was part of objectivity, Jim Gray and I were both on the objectivity board. I was on the board for a long time I built the storage engine for O2 and the last time I programmed it was in the late 80s early 90s. You know it was driven by the desire to have the type of systems the programming language that you are writing applications in that match the type of systems in database systems.
We always traditionally talked about what we called Ecmen community be mismatched that you have. As a programming language you had the ability to do arrays as the delta data types. All you had some other language they had sets of data types but you know it was really motivated. I think people were motivated by the right thing and unfortunately they had Stonebreaker as an enemy. Mike made it very clear, it was interesting. Mike made it very clear that, it was easier to add object orientation but this was someone who started this in Postgres and they did it with Illustra. It was easy to add object orientation, extensible data types to a relational database system than it was to start over from scratch within object orientated system.
I remember one particular meeting at objectivity where Jim Gray and I were both on the board and it started from a bunch of operating types from Berkley. They were building, they thought they knew how to build a recovery system and nobody in the world at that time knew about recovery systems. And Jim said this is not going to work, you are going to have some customers who have corrupted databases and these guys said oh no no we are operating system types we know better than you. And sure enough soon after release objectivity crashed and corrupted some customers databases. The idea was good, they just underestimated what it took to build a query optimizer. The people wanted SQL, they didn’t want to do navigation they did not want to go back to the code days of doing objectivity. They have forgotten the lessons of history, all the relational vendors beefed up their products with some type of extensibility making it easier to find functions. They all went away.
Greg low: on that note actually, I have often seen that commented that it is the idea that somebody writing a new operating system from scratch nowadays because it is such a large effort and it would be hard to ever justify doing that. They can’t imagine anybody ever doing that. What is your thoughts on database engines? Are they sufficiently complex that it is unlikely we will see too many more substantial once?
David DeWitt: yes I think it is very unlikely. I think that, you look at the. It is a point to talk about what is happening at the corner cord big data field because this is a place where people are trying to write new engines. So we started with map produce and I would encourage listeners to go to BING or Google what Rick and I wrote about map produce the giant step backwards. Who know we are really come from, but we started with map produce and high got added.
Now what we have yes we have Hortonworks building new relational engine. We have Kaidara building new relational engine from scratch. People realize that map produce, don’t get me wrong these scalability and fault tolerance, are really remarkable in engineering achievement that Google did. That HADOOP followed on with, that managers don’t want their people writing low level mapping reduced functions. They know that SQL is the right way to go. Now it is hard being popular, now we have Kaidara trying to write entirely from scratch. We have Hortonworks responding with rewrite of hive and stinger and this week Facebook announced their new SQL like language called Presto. It will be how long will it take for the Kaidara people, the Hortonworks people to have the said functionality that these standard relational product currently has. It will be years, and years and years and yes maybe and yes maybe they don’t need to do that all of the OLTP enhancements that a product like SQL Server or Oracle has. But they are going to build a relational engine and they are going to discover they needed a cost base query optimizer. Building a relational engine is easy compared to building a cost base query optimizer. In terms of updates I’ve noticed that Presto, came and store the results back into the database although the results meant that we had to go back, so. Maybe will get another relational engine at some point, but there is a lot of code there.
Greg low: it is a massive undertaking.
David DeWitt: yes it is a really massive undertaking and I think there is something to be said about complexity of the code and we can even talk about the X query stuff. At some point we probably we should rip all the X query stuff out of SQL Server. Maybe it has already happened and I don’t know about it but maybe will happen you but there probably pieces of SQL Server that rarely get exercised that should be pulled out. But I think companies should really focus on SQL the main way that they do data processing and until something very different than SQL comes along. I don’t think we will see any engines that. Let’s go back to the object oriented paradigm where the XML people thought X query was going to replace relational databases systems didn’t happen. You know, and you go back to even the operating system we have Windows, we have Linux and you can think of Linux it is pretty remarkable that somebody did rewrite Unix.
Greg Low: Exactly!
David DeWitt: Have built a great ecosystem around a rewrite of Unix. Why that is, who knows? Maybe because they didn’t’ open source quickly enough so Berkley Unix was there. Bill Joy did also in Berkley Unix and that was basically a rewrite of Unix from scratch that Bill did in the early mid 80s but I don’t think we will see another relational engine.
Greg Low: Yes I am trying to remember the name of the one. I was teach at a university myself and often in the 1980s and there was also the one that was the red book. I was trying to remember the guys that wrote a small Unix clone and it was used in a lot of the academic things at the time. But again it was just published out and pushed out for free and it was great for the university to have something where people could try ideas out and plug that in.
David DeWitt: I think that has happened and that is really what especially Postgres and some extent MYSQL, and Postgres. Especially Postgres I think you would look at the number of companies that taken Postgres and started products and made a great contribution in the field that piece of code exist. Maybe that is why you know Facebook put out Presto. They wanted to tell the community ‘ok, we are truly open source’ we are going to do the platform to build upon for scalable query processing over large amounts of data.
Greg Low: But I must admit even there, again I was again noting yesterday they were talking about. I saw forum discussions from some of the guys building Postgres and they were discussing whether or not they should have for example a planned cache. You go ok.
David DeWitt: There is stuff people do, I mean SQL Server PDW doesn’t have a planned cache yet. It is mainly long running decision support queries and the cost of parsing and re-optimizing something that is going to run minutes and hours is probably not too significant. For OLTP applications obviously planned cache is absolutely, stored procs absolutely require.
Greg Low: What had me fascinated though it was actually one of the core people doing development on the product and it wasn’t so much whether or not they had it in the product as yet but the comment that I thought was most telling was he said he wasn’t sure what one was and he was trying to come to a conclusion as to whether it was something they should have. I thought hmm okay. That sounded like something they might of considered, shall we say a long time ago.
David DeWitt: Yeah but you know I think there is in the open source community there is, not everybody in the open source community is familiar with what the commercial products and I think sometimes they have blinders on what commercial products have to offer. I mean I doubt that anybody in the Linux community understand how sophisticated Windows Server is. I don’t mean just Microsoft’s operating system product and it is a very. Windows Server 2012 shared storage base is whatever I never remember the official product name. It is a lot of work in our appliances to get rid of the SAN, you know we use Windows shared volumes, we don’t have to have a SAN anymore. We can share volumes among a bunch of nodes in a rack and the people in the Linux community is aware advances that have been made on Windows operating system. I know, Windows 8 didn’t get a great reception. I stuck with Windows 7 for quite a while but I think the Windows Server really does have a lot of pretty neat features.
Greg Low: Oh look it absolutely does. The think the Linux folk didn’t ever really ever get, is they were always focused on the cost of the operating system and the bottom line is in any project I was involved of with the cost of the operating system compared to the cost of the project was so irrelevant. I simply wanted an operating system that did all that stuff for me. I didn’t want, you know if you had a $400,000 project whether there is a $1000 going to an operating system or not is really not really the biggest question. It is much more what that is going to buy you.
David DeWitt: Right, now you know. Frankly with the move to the cloud, the type of operating system is really pretty irrelevant and even the cost of the database system for those kind of customers that favour the cloud and more and more will. The cost of database system is diminishing as, especially compared to the cost in the United States, I am sure in Australia. The cost of hiring a top notch, burden cost of hiring top notch developer, you know is 2-300,000. $200,000 probably burden etc. There is some SQL Server license that is expensive but you know if you have a dozen developers that have clustered database system is probably lost in the noise of the project so.
Greg Low: Yes indeed. In terms of the lack of awareness in the open source community in a lot of what the commercial products do. Do the universities there are a bit of the blame for that? I had a young cousin who went through university in the UK and he was so proud that one of the things he learnt was an enormous amount of HTML syntax and what attributes went with white nodes and tags and so on. The first day I showed him intellisence in Visual Studio, where it just suggested to me where the appropriate ones were. He nearly cried.
David DeWitt: that may have been cost driven, I spent 32 years there we did a lot of Linix stuff still to that point. We did have some classes that were running on Windows, from the most part people used editors and worked their code without the modern development environments. Most students graduated without even seeing a modern development environment. In intellisence is really pretty phenomenal when you start to use it. Academia is somewhat to blame but it is also maybe Microsoft deserves some blame to because it didn’t say we were going to. I think aspect of the open source as a cost and I think people really were. When you have and you are trying to teach hundreds and thousands of students that you have you know you have lots of seats. Cost was an issue, whether he was a licensing agreement. It is no different when you go to any academic department you find some small fraction using Windows laptops and everyone else using Macs.
I don’t think they are doing it because underneath there is a command shell, I mean some people are.
Greg low: no no no, I mean that’s right there will be an odd person who does but apart from that yes no. Listen with SQL Server 2014 then what are the things you are most excited about, in terms of concepts in that?
David DeWitt: obviously I’m really excited about the Hekaton. Memory optimised to our OLTP in marketing speak. It was a project that I actually soon after it was launched in incubation I actually managed five or six of that developers, early developers as we went from conception to working prototype. At some point because I am not a straight line DEV manager, you know I think some press has been oh this is just something Microsoft made up. They really don’t have any in memory OLTP engine, it is really not the case. I have been working on it for the last five years and we decided to do it unlike OLTP where it was done as a separate database engine. We felt that it was important to integrate inside SQL Server so that people could easily migrate their applications. There are a lot of limitations, it is really focus on OLTP at this point. I think there is no reason we can’t have it as a great in memory relational engine over time but the first initial focus is OLTP. There are features missing, but you know we have had one customer which is a bedding company in production for almost 2 years at this point which is pretty amazing that we have let a customer go production with something that wasn’t even CTP 1.
When Hekaton, satisfies whether it be for caching because there are some people not using it for persistent data but just using it for caching purposes. That we called durability in schema only. I think for some set of customers it will really be an extraordinary product but for some customers they will look at it and it will run slower. There are limitations because scanning a large amount of data, the rows in the table are spread all over memory and so the instruction cache, data cache. In what we call inter-op mode have a lot of nooses. Rows are not stored together, inside physical memory so we don’t have parallelism in inter-op mode that gives Hekaton tables.
I think for some customers it will be a great experience but for some customers they will find it not useful at all. But for some customers because of the way we have done it they can migrate their hot tables into Hekaton. The bulk of their tables into SQL Server, is SQL server clustered indices, or even Apollo we now have an ability to have very hot in memory data. Apollo column store tables both in memory and disk and SQL Server tables in with PDW. Apollo data sitting in HDFS and HADOOP. I think it is really going to blow the minds are some people and again we did it differently. Sometimes I wonder whether we should built a separate product, it would have been probably easier. We wanted to be part of the standard SQL Server, we wanted people to migrate things, not lose their investment in BI, any of their investments in application programs.
Greg Low: that is right, I think it is a masterstroke the idea of being able to migrate table by table because as you say the alternative is you go to a completely different separate product and the whole product is then a complete sink and swim. It must do every single thing you after in that case and the chances of that are very low at this point.
David DeWitt: yes and I think you know the fact the same high availability mechanisms through Always On and Hetron. The experience of the total customer experience we are aiming to be very friendly and people have to understand that it is that P1 product. We will do something, we will continue to have enhancements and continuous enhancements. It is very much now on a short-term release cycle rhythm. It wasn’t like 2000 to 2005, we are two years since last release, we might be only 12 months. We certainly will not be three years for the next release. Those sure we are trying to crank really, really, really quick and so we are updating SQL Azure capabilities all the time.
Greg Low: Actually that is an interesting question, because with the Azure ones, there was discussions in the beginning of last year that they were sort of quite proud of they pretty much merged the code base and the suggestion was that things will start to appear in their probably even before the on premises boxed type product. Do you think the intent is to flesh that out properly? Because a good example is there was a wonderful Windows functions in TSQL that were added the on premises product in 2012 but be connect item where people ask hey can we have those in Azure. The SQL database product actually got closed because they didn’t see the need to do it. Which is kind of where we sort of expecting it might be the other way round. Things might have been tried out their first or.
David DeWitt: I think we are back to one code base, there are even things if you look at the Apollo clustered column indexes, those came out in the appliances before they came out in the boxed one. They were part of 2014 they were part of 2012, they appeared in PDW v2 April. I think SQL Azure and SQL Server 2012 and 2014 the language surfaces are still different.
Over time we will get the language surfaces aligned, PDW has a slightly different language surface, we don’t have text indexing in PDW at this point, but we will add it at some point. Eventually you will see the case that things appear sooner in the cloud then the boxed product.
Right now our focus is getting SQL Azure is cost-effective and a stable as absolutely possible. The last time I heard there are 500,000 SQL Azure database is growing and some ungodly number like10 or 20000 a month. We are learning to be a service, there are challenges to being a service.
Greg Low: Oh yes.
David DeWitt: when stuff goes wrong, it is really noticeable. You see it not just with Microsoft services but where there was a Gmail outage, Google outages.
Greg Low: Yes, this site I was on yesterday, it was basically I was embedded in Amazon on Web services for the day. Actually what was intriguing that is that one of the people had considered using SQL database in Azure but ironically the capabilities of SQL Server in RDS in Amazon in where they have managed database service. One of the reasons they were heading in that direction in that case simply because of the compatibility capabilities of that. I thought there was an amazing irony there, that Microsoft there was being beaten by their own product, in somebody else’s environment.
David DeWitt: we are making money on that product, don’t be confused. This should probably be off the record we could probably be more aggressive in squeezing Amazon on price.
Greg Low: It is interesting that there are managed version of SQL is more compatible.
David DeWitt: yes we know that
Greg Low: Than the database one.
David DeWitt: we would do something to respond, well we probably will. IAS version of SQL Server where we don’t manage it for you. The cloud world is weird, Oracle has come to Microsoft. We are going to sell people Oracle in Windows Azure.
Greg Low: That is amazing.
David DeWitt: SAP I think one thing that is important to keep in mind, if you look at the companies that really operate scale. Google, Microsoft and Amazon that is truly truly operate millions of machines. I think Microsoft has dozens and dozens of data centres around the world. You look at the intersection of people to operate scale, data centres the scale, people that have first rate relational database products. The intersection is pretty small and sure Google has big query but nobody is going to run OLTP on big query.
Greg Low: No they are not.
David DeWitt: we are probably a little bit late to the cloud business but we are learning and we are getting better. I am actually paying, I’m running a small SQL Azure database, for a swim club. I used to have an on prem SQL Server box and I would have to do the patches and I was like the DBA and system administering and as well as the application developer. When SQL Azure came out, it is a small database and it just manages accounts. I stop worrying about patches for SQL Server, I stop worrying about patching Windows and focused on the application development. Yes there are some challenges, just like losing connections. We were having to rewrite every single thing to make sure that the connection is still alive. What a pain but. Sometimes where people made a living managing SQL Server instances the future could be challenging for them because.
Greg Low: Exactly.
David DeWitt: there are some applications which would never move to the cloud. Health care in the United States is one of those because of tougher regulations. Some banking applications would probably never move to the cloud. The bottom line is when you go and look at one of these machine rooms that we run or Amazon runs or Google runs, you have got one or two people running 100,000 servers and patching. I was skeptical for a long time but I think there is going to be more and more people saying you know it is just more cost-effective to let someone else manage the hardware.
Greg Low: Oh look, I have been a complete convert for a long time. In fact the idea of offering a platform as a service. Just having a service in point that talks TSQL, that just suits me down to the ground. In fact I was a bit disappointed with Reporting Services in that they have now sort of nominated that the platform version of that is going away in favor of people going back to running VMs instead. That was a challenging service, it didn’t have the pricing and licensing was wrong. I think the functionality was wrong things like that but it seems a pity that they didn’t just push on and try and turn it into a better platform offering rather than just saying hey you know go and run in a VM. I really think that sort of platform direction is much more where we need to be heading.
David DeWitt: interesting, interesting. I think you knows I sometimes, personally I sometimes wonder why we didn’t push IAS and RDS may be version of that because we went the path route and there are challenges in terms of packing. We literally have some customers who have literally thousands and thousands of Azure instances. One of them is in Australia, I am not going to name names because I might get in trouble but one of them is in Australia. It is hard you know, when you have customers doing a past offering. Some bad customers that consume all the resources on an instance, whereas if you put them in a VM you have more control over that. They don’t try and get scale through a pass off, again I think looking back in this business for 40 years. When I first started teaching, we taught people Codesol databases and at that point we could run Ingres on the PDPL rather than 45.
Greg Low: Indeed and we did!
David DeWitt: And we did, I think you know, the shift now from on prem to cloud is as big as the shift from codesol to relational.
Greg Low: Yes I completely agree.
David DeWitt: Who knows what it will look like in 10 or 15 years, we are certainly not providing a perfect service but we are certainly putting a lot of focus on our failings and we think it is really important to do a really, really bang up job.
Greg Low: Yes.
David DeWitt: On providing this service.
Greg Low: As I said, I see these things as a set of services and I just look at the rest of the platform and it is morphing into a series of surfaces. I think that is a really really interesting direction but I suppose in terms of the development of the product then I would say in 2014 I must admit I look at all the features coming in 2014 to me the clustered column store index Apollo stuff that is the one that actually really appeals to me.
David DeWitt: Maybe we can get a data warehouse person or an OLTP person. What can I say!
Greg Low: Oh no, I do a bit of both.
David DeWitt: I agree, I think for data warehousing people we have seen in PDW. My lab mainly works on the parallel database appliance and that is where most of the focus of my lab has been. Because I am a long term parallel database. I like to ship it to some appliance, I like to touch it. We seen that came out and it was available in PDW v2 which shipped in April this year and I know there are not thousands of PDW appliances out there but the customers that using PDW for the data warehouse where the clustered column indexing has been a huge performance boost to those customers.
It has made us very performance competitive, maybe you know with Teradata there are still places where we are not competitive with Teradata but in terms of manageability and Teradata has been in this business since the 80s so they really have a super, super product. The cluster column index and the customers from the data warehousing on PDW that seeing it. They are super happy with the performance. The people that played with CTP2, on SQL Server 2014, we have got a great product. We have been working on it for while through a few releases to get it probably right but two releases I guess.
Greg Low: I look at the 2012 one, the non-clustered column store indexes that kind of left me cold. There are couple of spots where you could sort of use it but an updatable clustered column store index completely different story. That is a world of goodness. I just know to me feels like, I actually think these compression technologies probably the biggest driving force at the moment. I look at this and we seem to have gone from the days when memory was limited everything we lived on was on disk. You just bring into memory the things that you are working on at the time. The push now seems to be to just compress the life out of everything and if it then fits in memory then you can just do everything by brute force. Everything is much simpler.
David DeWitt: yes I think it is also we are, the trend has been that we have in some sense excess CPUs on our hands and we can look forward to. You know in terms of, we can look forward to cycles to do the decompression and you know. It has enabled us, you know the compression has enabled us to fit more memory. It has enabled us to cut the number of disk IOs.
Greg Low: Does it parallelize well? The compression because I am guessing it would across of lots
David DeWitt: Oh yes, because all the columns compressed individually. The columns are stored
Greg Low: so when we do get 80 or 100 call procs that actually lends itself really well.
David DeWitt: We are nearly there. DL minus 80 that is 10 the 580 is four way by eight. Five way is thirty-two way and the DL 9080. I don’t know it is 80 way or 160 way. For not very much money, the single box SMTP that HP have been producing are really, really superb. Superbly engineered boxes, you know.
Greg Low: Yes, indeed so do you think there is a time that databases are heading to eventually be like a commodity. I sort of.
David DeWitt: We will always be able to charge money for it.
Greg Low: Oh yes, indeed. Actually for example in the case of TSQL, there seems to be, I keep getting this sort of feeling that there is a lot of people in the development area that seem to think that TSQL is sort of like done or something like that. I just sort of wonder your thoughts on, do you see this still being an area where you think there would be significant things evolving. For example in 2014, it is just interesting there is not a single TSQL enhancement. Do you imagine we are still going to see much in the way of the language itself evolving?
David DeWitt: You know I think, again I really cannot speak. I don’t really know what people have in mind and I won’t pretend to think that I know. I would say going back to one of your earlier questions, might we do JASON as a column type at some point, possibly.
I think right now the organization is focused on performance, high availability, performance of both Hekaton and Apollo and cloud. I think any resources that we might of put into expanding language surface are being put into our cloud offering. I think until we get to the point the cloud offering is highly competitive. I think it will be a while before we go back to thinking about language surface.
Do you think it has changed for Oracle? Do you think anything has changed?
Greg Low: Yes I think it does a bit. I suppose one of the things that does concern me is that I spend a lot of time with software houses. When they don’t see the developments surface in terms of how I build better or simpler applications more that sort of thing I do worry they start to see my grandfather’s database sort of thing. I just wonder is there a need to somehow return some sort of coolness factor or something. Because I just find a lot of the developers, I think one of the things they are looking at not just how do I cut the cost or how do I at the back end or how do I do high availability? They are kind of interested in what are you doing to make me able to write code better or simpler all those sorts of questions.
David DeWitt: yes you know I guess I would like to know more about what they would like. I think we are also very focused on through the BI tools giving Powerview and Power query. You know giving a very first-class end-user experience and maybe that is what the developers are feeling. We are trying to make it to enable users to think these advanced analysis tools in Excel or trying cut them out of the loop. You know I think, there is not much I can say to respond to that.
Greg Low: Indeed, I think it was more just around the idea that the language has become a fairly. There are advantages in the language becoming very static.
David DeWitt: we might be better off deprecating somethings like XML, X query.
Greg Low: Actually what is your thoughts on CLR integration I suppose while I have got you on the topic? I know it was an incredibly topical when it appeared it always felt to me that if they were actually doing it they didn’t take it far enough. Now it is kind of court in a funny limbo land.
David DeWitt: again product renewal would be great, but again we have an X amount of resources we are putting those resources into the areas performance and cloud and
Greg Low: I was wondering more in terms of should it be a candidate to come back out again?
David DeWitt: You know I wouldn’t think so. I would think that is, I have never heard it being discussed taking CLR integration back out.
Greg Low: I must admit when I look at it, I actually thought the ability to build data types was kind of a bit interesting when he appeared but the thing seemed most missing to me was the ability to build a custom index type based on the data type. I had people who had Oracle data capsule projects and things and they were interested in storing maybe different things to your typical OLTP but they were looking at storing something like chemical properties and looking at reactions and so on. It was sort of interesting their whole idea that again if you go in store something a little more complicated. Unless you can index it appropriately. It was interesting that at the time they said oh no you don’t need that capability but it was interesting that when they then used it themselves to build the spatial data types. Which they did an astonishingly good job of, they immediately introduce their own custom index type to support the spatial.
David DeWitt: you know that was certainly the motivation for when Mike started the Postgres project. He would say did the structure and develop just for generalize index structures or something like that. I can’t remember if Joe Hellerstine was part of this or wasn’t.
Greg Low: I know that the Postgres one that was wickedly complex.
David DeWitt: I think the challenges, it was one thing to do extendible indexes and they are rebuilding the setting. But extendible indexes and a setting where you go to worry about updates, really it is hard for users to get right. It would be really nice to have non extendible types but extendible indices but I again I don’t hear anybody talking about that.
Greg Low: I can’t imagine it even being on the radar.
David DeWitt: It is not on anybody’s radar screen at this point.
Greg Low: What keeps you excited on a daily basis now? In terms of things coming up?
David DeWitt: most of my team has been, what I can talk about okay there are some things I can’t talk about because we haven’t announced them. As I said earlier my team, I have a very small team there are only eight of us and we have been very it involved with the SQL Server PDW appliance team since the Data Alegra acquisition happened. We were instrumental, in 3 release which bought query optimization to PDW and we are working on releasing the second version of what we call Poly based which is integration of PDW with HADOOP. The first version of that is part of the April PDW v2 release, we are putting the finishing touches on the AU1 release which will allow the system to push predicates. In the HADOOP as jobs, so if you are not familiar with it. Basically it extends has information in an external table, uses through PDW and create an external table for data that sits in HADOOP and you can use the standard TSQL interface to query data so that in HADOOP you can combine data in HADOOP with data in standard SQL Server table, you can join two HADOOP tables and other companies have this notion of an external table. Which we can continue a little bit further, in that you have a PDW appliance that you can connect to. To a large HADOOP cluster will actually you write a query with a predicate over some functions over data in HADOOP will actually push that predicate as a map produce job.
This is really different compared to what Oracle has done to its tables. So we are trying to provide a very, very high performance TSQL compatible interface to what we call SQL Server semantics. Which is why we don’t compile into hive interface big data. So companies that have SQL Server installations and HADOOP installations can combine data from both worlds and get business value out it. Standard BI tools were blah, blah, blah.. so getting. We are just sort of finishing what we can AU1 release of B2 version of PDW.
Greg Low: that is great!
David DeWitt: It is exciting to see customers using it in all sorts of different ways. Some companies are required to keep data online for auditing purposes. They are rolling monthly data out into HADOOP and yet they are still able to run their reports on really cold data. Think of HADOOP some people are using it in cleansed data before it gets loaded into the appliance. I see, a lot of really positive feedback on that. When we bring it to the cloud, when we will bring it to the boxed product, these are all questions we are trying to answer.
Greg Low: Yes no indeed. Listen so the PASS summit, you seem to single handedly turn the morning keynotes to something that people want to attend. Which is great, that has been fairly recent. Where might people see you upcoming anytime, or anything else coming up in the near future?
David DeWitt: One year of those is plenty.
Greg Low: One a year, okay.
David DeWitt: You know those talks, I don’t know the people, I know they have been very well received and I am very appreciative of the audience. So wonderful group to speak to. Unlike a bunch of under graduates they hate morning or 9 in the morning. They are not all up there, updating their Facebook status. They are tweeting and not enough time paying attention. Those talks take me 2-3 months of my year to prepare those talks and between conception of the talk and what is going to be in the talk and the Powerpoint. They are enormously time consuming. One a year is all I can take. They are very stressful, not the presentation part, I have discovered it no different talking to 5000 people than 500 people.
Greg Low: Yes that is right, it is no different at all.
David DeWitt: The expectations are very high, making sure the expectations are met is a challenge. One a year, I have no ideas for next year, I am not sure I will do one next year. But so far I have no ideas. If I am going to do another one I better start thinking of an idea.
Greg Low: That is good. Well listen so thank you very much today for your time David. It has been most interesting!
David DeWitt: Yes it has been fun talking and I appreciate the interest in Microsoft and SQL Server and I am always interested in new ideas and things we should be doing in the product so I encourage your listeners if they have got or they have rants they want to send to me. Pass them on and I will pass them on.
Greg Low: That is great.
David DeWitt: Negative feedback is always appreciated because it can always improve our products.
Greg Low: It is funny, actually when they talk about science, I saw someone the other day commenting about a scientist must have been upset when he was shown to be wrong on something and I always think any time the scientist is shown to be wrong they just love it.
David DeWitt: yes absolutely. I think it is part of doing science, sometimes you get a right and sometimes you get a wrong. Okay
Greg Low: thank you so much David!
David DeWitt: You bet, have a good day.
Phone: 1300 SQL SQL (1300 775 775) l International +61 1300 775 775 l Fax: +61 3 8676-4913
Copyright 2017 by SQL Down Under | Terms Of Use | Privacy Statement