Itzik Ben-Gan
SQL Down Under Show 6 - Guest: Itzik Ben-Gan - Published: 5 Sep 2005
SQL Server MVP Itzik Ben-Gan discusses techniques for using the T-SQL language to solve common problems, shows how considering a negative point of view might help and discusses the upcoming enhancemen
Details About Our Guest
Itzik Ben-Gan is a mentor and one of the founders of Solid Quality Learning. He teaches, lectures, writes and consults about SQL Server internationally. Been working with databases and data warehouses since 1992 and has been focusing on SQL Server specifically since 1998. He writes monthly columns in SQL Server magazine, and has co-authored the book Advanced Transaction SQL for SQL Server 2000. Founded Israeli SQL Server and OLAP users group in 1999 and has been managing it since. He is an MCT and MCDBA and the current SQL Server MVP.
Show Notes And Links
Show Transcript
Greg Low: Introducing show number six with guest, Itzik Ben-Gan.
Our guest this evening is Itzik Ben-Gan. He is a mentor and one of the founders of Solid Quality Learning. He teaches, lectures, writes and consults about SQL Server internationally. Been working with databases and data warehouses since 1992 and has been focusing on SQL Server specifically since 1998. He writes monthly columns in SQL Server magazine, and has co-authored the book Advanced Transaction SQL for SQL Server 2000. Founded Israeli SQL Server and OLAP users group in 1999 and has been managing it since. He is an MCT and MCDBA and the current SQL Server MVP. Welcome Itzik Ben-Gan.
Itzik Ben-Gan: Thank you, glad to be here.
Greg Low: First, tell us how you came to be involved with SQL Server at all?
Itzik Ben-Gan: I guess it started about 12 years ago; it was a course in the Army. Not SQL Server specifically, more SQL. That’s my main interest even though I’m focused in SQL Server. I was in a course in the army. It was the first time I saw the language for IMB mainframes. The language is standard, I fell in love. Amazing that you’re behind their relational programming and the whole concept of the SQL language. Then I started working in various fields within computing in the army and after. I tried many things. The rise of experience. Programming, system administration, networking, mainframes, so on. Chose the part I felt I like best, I am also good at. Started drilling down vertically, diving into it.
Greg Low: The first time I came across you was many years ago, the Microsoft trainer newsgroups.
Itzik Ben-Gan: Right. Basically I used to train…. I had a lot of field experience before training. Always wanted to train. Didn’t know that training is what I’m best at. I like to do it. I could guess because all my family are teachers. My father teaches mathematics and my sister teaches English and my brother teaches architecture. I could have guessed. At some point I thought of taking a part-time job in training. When I started it was such a good fit for me that I decided to focus on training. Do the other stuff less-extensively. At the beginning I trained networking and the other aspects of computing and programming. Then when I started to train SQL exclusively. I learned this is what I want to focus on.
Greg Low: Your father’s interest in mathematics, did that lead to your interest in puzzle-solving?
Itzik Ben-Gan: Definitely. My father loves puzzles, many of the ones I have I got from him. SQL itself is so logic-oriented; pure mathematic and logic puzzles have so much in common. The thought processes involved and the patterns of thinking. I definitely am sure it had something to do with it.
Greg Low: Recently you were in Australia, traveling around. One of the things we got to see were some of the sessions where you were covering puzzle-solving. Clever ways to make use of T-SQL.
Itzik Ben-Gan: The idea of what I just talked about is that I try to show different aspects of a logical problem. How they relate to SQL problems. Maybe the topic sounds very theoretical, but it’s very practical. You take any problem, realize within it is a pure logical problem. If you practice those logical problems and puzzles, you improve your SQL problem-solving capabilities.
Greg Low: Give us an example?
Itzik Ben-Gan: Sure. So let’s take the idea of what I call reverse-logic, negative thinking. A pure logical puzzle is a very ancient puzzle where two gods stand in front of two doors. One leads to sudden death and one leads to gold and a lot of money. One god always lies and one always tells the truth. You only have one chance to ask any question. You don’t know who is the liar and who is sincere. Without reverse-logic you can’t be 100 percent sure which door leads to the gold. If I asked the other god “which door leads to the gold?” what will he say? Either one you ask will point to the incorrect door. If you ask the liar he knows it’s the one, tells the truth, he would want to lie; he will point to the incorrect door. If you ask the sincere, he knows the other one is a liar; he would eventually point to the wrong door, so then you go to the other one. In SQL there are many types of problems when you can apply similar logic. Relational-division. You have a certain set of elements; you want to figure out whether the set contains a whole… I want to say it holds subsets of elements. For example you have ordered lines, you want to know which orders contain a certain basket of products. Not as simple as having the different attributes you need to investigate in a filter. Here, each product appears in a different order line. In order to figure out which orders contain a whole basket of products you have to apply these relational-division techniques. Some apply reverse-logic. Which orders contain products A, B, and F, for example. You can say “give me all orders for which you cannot find any products in the least that does not appear in the order.” By applying two times the negative idea, two negatives give you a positive. You only get the orders that contain the whole list of the products. In some cases you’ll see the technique of applying reverse-logic will be in similar performance to applying positive logic. In some cases worse performance, in some cases better. This is another type of thinking, and another tool in your toolbox, you can always use it. Dramatically faster and simpler.
Greg Low: Any other straightforward example?
Itzik Ben-Gan: When you need to apply this simplification idea in logic. Whenever you get a problem your mind is used to thinking in one way. There are some cases where the solution is so simple it’s under your nose but you don’t see. The idea in logic here is to try and relax limitations; think in more simple terms. Don’t get distracted by details. For example, you have chocolate; the chocolate is made of five columns and eight rows. You’re supposed to cut the chocolate by making straight cuts either horizontally or vertically without placing any chunk on the other. You only get to make one cut when you have a certain chunk. What is the minimal number of cuts you need to make to spread them into the individual pieces? Most people try to think “let’s cut in the middle first; horizontally and then again in the middle of the chunk, then start vertically.” You start getting confused by the count of the cuts that you’ve made. Every time you add up the cuts you get a different number. Makes you believe there are different numbers. If you think of the chocolate as a single piece or chunk, after you make one cut you have two pieces. You have to deal with each piece individually. Take one of the pieces and make a cut, now you have three pieces. To cut the 40 individual pieces there’s no other way than 39 cuts. It’s a matter of simplification. In terms of SQL, for example, there are many types of problems where looking at the request you start thinking in more traditional terms. Nesting sub-queries, using iterative logic, so on. Many cases the solution is so simple you didn’t think about it. For example you need to count the number of occurrences of a string within a string. Most people will think in terms of a loop. Using iterative logic, identifying the position of the first occurrence. Going through the next. So on and so on. You can hit the replace function and substitute all occurrences of the substring with an empty string. The difference in length between the two is the total in length between all occurrences removed. Take the difference in length between the original length and the new length. Divide by the length of the substring you removed, and you have the number of occurrences. Simply note the more intuitive way to come up with regionally. Figure this out it makes a lot of sense and sounds very simple.
Greg Low: Do you think you potentially run the risk of making it more difficult for somebody else to follow what you’ve done?
Itzik Ben-Gan: Not necessarily. Actually what I think is that many of these techniques are a matter of getting used to this sort of pattern. This is the first time you hear of a concept. It’s very tricky, very confusing, sounds strange. This is what I like to call fundamental key technique. My main approach when people ask me “how do you learn SQL? Become good at programming in TSQL?” The project I find powerful is to be able to identify key elements within the problem. Then come up with the fundamental key technique to solve it. Then work on it from various angles. Polish it, tune it, and then it’s another new technique in your toolbox. For example, when I mentioned the relational division and reverse logic approach, it’s implemented in SQL as twice non-exist. Very tricky to figure out the first time you see. Once you get used to the idea it simply becomes part of your vocabulary. It’s natural. Take another tough problem. You have an ally of elements, comma separated lists of values. One person in each row, you need to normalize this data; split it into individual elements. The technique to solve this is first to cross or join this table that contains the allies. Join it with auxiliary table of numbers. The auxiliary table of numbers is one of the most powerful key techniques I’m using in so many solutions. It’s a sequence of integers; one and on, as many as you need. Join the allies table with the numbers table; make it join condition that is based on the character and the exposition is equal to a comma. Get a match for each element. In general, use such an auxiliary table of numbers to generate duplicates. The separating elements problem when you look at it for the first time; you need to identify that out of each base row. You will end up generating multiple duplicate rows. In another step you will isolate each individual piece. I have generated duplicates. Once you realize when you isolate this key element generating duplicates, you realize that using an auxiliary table of numbers where N is less than or equal to whatever number. A character is equal to something. This becomes part of your vocabulary. In another problem when you’re identifying a step that involves generating duplicates, you’ll know it’s joined to an auxiliary table of numbers.
Greg Low: I noticed when you were doing a lot of the puzzles that you tended to have an auxiliary table of numbers; you just build a table with a whole sequence of numbers, use that in lots of different problems.
Itzik Ben-Gan: Right. Simply an amazing number of problems. One is separating elements that I just mentioned, another problem is very practical problem. You need to identify what’s called gaps and islands. Might be in integers, more typically appears in sequences of days. A process reports periodically that it’s online, once a day, or once every four hours. Then in a certain period of time the process goes down. Figure out when the process was down, when it was up; return this information as ranges of dates from date to date. This is another example where people usually tend to think in traditional terms. Once you figure out that you can use numbers here this is not really an auxiliary table of numbers in this case; more something called row numbers. You can use row numbers in a fantastically simple solution. The technique that solves the problem in a very simple manner using the row numbers applies only to the islands problem. You need to figure out when the process was available. When you want to figure out when the process was available, use an auxiliary table of numbers; then request all numbers from the auxiliary table that do not exist in the current table. If these are dates you can use a certain base date plus the N value minus one in order to generate an auxiliary table of dates. Generate the inverse data; grab everything from the auxiliary table where the value does not exist within the current table, and is within two extreme points. Now you’re back to the islands problem; implement this solution that uses their row numbers. Another example. I used this auxiliary table for so many I could dedicate a whole day alone to the auxiliary table of numbers easily.
Greg Low: That’s great. One thing I would like to ask you about while we’re talking on the subject of T-SQL… Had a bit of discussion around the country on coding standards. I’m interested to hear your thoughts in general about any coding standards you apply to T-SQL.
Itzik Ben-Gan: Right. First of all, I don't think that everyone should use the same coding standard. Standards are good, don’t get me wrong, but I think this should be some sort of mix between what you feel is best for you; most natural for you. Very personal. I find it similar to the way you speak, everyone uses English. May be the standard language. Everyone has their own vocabulary and their own personal way of saying things. It’s important that people understand you but it’s important to have your own personality and your own character within in. Very similar. Coding style in SQL. The most important thing is that it will be easy to read, easy to understand. Not just for you but to all other people. For example, some people I saw are using only uppercase, all across. This is terrible. I can’t believe someone would feel this is a natural or obvious. Maybe it’s easy to make everything uppercase; maybe it’s something similar, but it’s hard to read. Hurts your eyes. I can give you examples of what I use that with time I figured out that is both convenient for me, and is readable and easy to maintain by others. I use four key words. Uppercase, all the select from where and so on; exclusively uppercase. For functions, the same thing. Uppercase. For expressions, using column names, I use exclusively lowercase. All column names are lowercase. If I have to deal with existing tables I would use the casing that the table has because of case sensitivity issues if the environment is case-sensitive.
Greg Low: If you have multi-words in a column, do you then use underscores?
Itzik Ben-Gan: Yes. The question is if this is just a two-word column and it’s fairly short, if I feel that it’s readable without underscores I won’t use them. Try to avoid it. If it’s made of three words and they’re lengthy; try to avoid very lengthy strings. If I feel it contributes to readability I will use it. That’s my guide. It’s also very important indentation. Typically if the query is lengthy I will break select alone, from alone; each table in the join would appear separately. Some indentation in front of it. Join condition would also appear below the table name with additional indentation. For example we did select, group, order by list. If the list is short I’ll place everything on one line along with the select. If it becomes too long I break it into several lines. There is one exception in the newsgroups, even though it’s not good education. Some things when I answer questions in newsgroups I’m just in a hurry, want to give a reply. I just make everything lowercase; it’s not something people should use as an example. If I have the time I will go into using proper casing like I’m used to.
Greg Low: That’s great, thanks for your thoughts on that. What we might do then is take a short break; we’ll talk about changes coming up in T-SQL when we get back.
Greg Low: What I do next is get you to tell us a little about yourself and your family and things before we talk about things in T-SQL that are coming in 2005.
Itzik Ben-Gan: I’m based in Israel; spend most of my time traveling. Mainly the U.S., also Europe, just started in Australia. I live in a small country place called the Tel Mond. Moved here recently, since then I really love gardening, dogs. My wife always kept us from having dogs until we moved onto the place with a small piece of land. Now I have opportunity to garden and I have two dogs. They are called Canaan dogs, but not canine like most English people think. Canaan dogs, Israeli breed. Originally wild dogs; at some point the French forces in Israel before the country was formed were looking for a dog who could walk and adapt to very extreme weather in the deserts. They found this wild dog.
Greg Low: It’s named after the region?
Itzik Ben-Gan: Yes, Canaan is the biblical name for Israel.
Greg Low: Okay. Great, well there are quite a number of things, when people first started discussion the idea that you’d be able to write stored-prox in VB and C# and so on. There was discussion about if T-SQL was dead. Seems far from the case. There’s quite a lot of work being put into T-SQL in this edition. Interested in your thoughts and what parts of that you think are significant.
Itzik Ben-Gan: I think there are several very important or crucial issues in all this .NET integration within the product. Lot of potential; lot of risk in it. Good potential and bad potential. Basically, a T-SQL is by far the strongest language to deal with data manipulation or set-oriented activities. Any data acts as activities, especially those that query large portions of data. There’s nothing faster than relational language. Many aspects that programmers try to achieve using T-SQL that T-SQL was never designed to do. Those have more to do with complex algorithms, iterative logic, formatting issues, math calculations, so on and so on. T-SQL was never designed to deal with these. Generalize this as CPU manipulation. Any way all those elements using CLR or .NET integration would be fantastic. This one issue, if programmers will keep this in mind and not try to use one tool for the other purpose; try to force the SQL to do what CLR is intended to do and the other way around. The risk I was talking about. Let’s say a programmer without knowledge of T-SQL and in relational programming. Trying to write C# stored procedure that would do a simple update by opening a cursor and then iterating it through the records one after the other; updating them instead of doing a simple update. Will happen and be terrible in terms of performance. Room for education here. Another big issue is the concern of people that don’t have the skill set; they give out all those DBAs. All those programmers that exclusively deal with relational environment; never have anything to do with programming of VB or C# and so on. This is a great concern for them.
Greg Low: Yeah. So, what about the changes to the language itself? To T-SQL?
Itzik Ben-Gan: This is very interesting. First of all, there are two… Many enhancements of T-SQL in 2005, but the two that are the most significant. First, the one I like best is the one that probably most people won’t immediately realize that it’s so powerful. This is the row number function. Simply a function that… Most of these features are SQL compliant, completely standard. Row number is a function that assigns a sequential integer to rows in a row set; based on a given order by. You determine what is the sort by which you will assign those row numbers. Optionally you can specify a partition by close. If I want to assign row numbers to each customer order separately, it would start from one and on for each customer separate. Looks very simple.
Greg Low: The point is that you’ve got the order of the results is not necessarily related to the order of the row numbers that are coming out.
Itzik Ben-Gan: It doesn’t have to come out in the same order; if you think in terms of the optimizer, the data needs to be pre-sorted in order to calculate those row numbers. If you have an index it will use the index. Typically, since the data was already sorted and assigned you would get results in this order. Not necessarily of course. Maybe you calculate row number based on one column, added other elements to the query that caused the optimizer to access the data for the other elements in a different order. You’re right, it’s not guaranteed. There will be cases because of the access methods that they will be assigned in this order. The number of applications that this row number function has is astonishing. Far beyond the scoring and ranking of obvious applications. I already gave one example by the way where I talked about islands and gaps problems. Without row numbers it’s complex to solve these problems. With row numbers it’s a no-brainer once you figure out the technique. I use them to optimize sorting of structures. Graphs, trees, hierarchies. Use them for so many different applications, paging for example. I could keep going on and on. It’s simply fantastic; so fast in SQL Server 2005 because of the specialized operators and techniques to calculate row numbers by scanning the data only once. If it’s pre-sorted in an index there won’t be the need to sort it. Today in SQL Server 2000, if you’re looking at standard techniques to calculate row numbers, there are amazingly slow. They have N-squared. Performance degradation as the table grows larger. Try to calculate row numbers using the sub-query in SQL Server 2000. For 100,000 rows it would take half an hour. To do this in the row-number function it would take a split second. There are techniques in 2000 to calculate those without standard solutions; you can use the identity, property, create a table with an identity column. Insert select. Then you need to do this in two phases, materialize the results. Slower than row numbers, faster than the technique in 2000.
Greg Low: Going to say, at what stage during the query processing is the row number calculated?
Itzik Ben-Gan: It really depends on where you place it in the query. Only two elements within the query where you can place this row number and other analytical ranking functions. Should generalize this to what’s called “windowing functions.” Other elements within the language that have this same similar logic. Talking about an over-close. Available for scalar aggregates. Similarly available for other analytical ranking functions. Same thing with row-number. Over-close. Over-close applies calculation to a window, it’s as if a whole window of elements is available to this function, not just the current row, without the need to say “group by.” The only element where it’s allowed is in the select list or the order-by list. If you’re thinking in terms of the logical-processing phases it’s almost at the end. After the data is joined, filtered, grouped, after the groups were filtered. Now we’re formalizing the results, will be returned to the color in the select list or the order-by list. This is the phase where it applies. Tricky to talk in terms of when, when dealing with optimizer; in many cases it does shortcuts. You think it’s applied to a certain phase, in practice it found a way to do this earlier. Still return the correct results. The queries evaluated first, then the select list is processed, then the order-by clause is processed. Two last phases in logical processing. If you need to filter, you need to filter first, group first. Logically, do those before, in many cases physically do those before calculating row numbers because they lie on top.
Greg Low: I think that’s the thing I was getting at; they’re not applicable in a ware-clause.
Itzik Ben-Gan: Right. Very simply issue to deal with. Technically they are not available in the ware-clause, use a derive table. This is true even today when you want to re-use an alias that you create in the select-list. Not re-usable because the select is the lost logical phase applied; cannot re-use an alias in the ware-clause. Create a derived table that only has the select and from clauses. An outer query can re-use the alias in similar terms. Calculate the row number, then you create a derive table; use a new construct in 2005 called a common table expression. Similar to a derive table. In an outer query it’s available to you. Filter by, group by, do whatever you want with it.
Greg Low: In fact, that probably is a good point to discuss, as well as you mentioned the row-number effect that we’ve also got rank, dense rank, entire functions there. Lead into common-table expressions; new area.
Itzik Ben-Gan: Right. This is another construct that is based on SQL 1999. Common table expression has two forms. One is non-recasive; similar to a mix of existing table expressions in SQL Server 2000. Derive tables an in-lying view. Been a view; been an in-lying view to define function that is stable value. Take advantages of all three, you have a common table expression. On one hand it’s in-line; available only to the query. It can accept arguments, define variables and so on. Use them within your code. You don’t have to declare the CT multiple times if you need to refer to it in an outer query multiple times like you have to do with derive-tables. Basically in non-recasive form. The main power within this construct is its recasive capabilities. You have recasive within pure base queries. All this manipulations of graphs and trees, uses a lot of recasive activities; all this can utilize this new construct. Very elegant, requires very little code to achieve something you would need a lot more in 2000. Very fast also. Doesn’t implement behind the scenes like other platforms.
Greg Low: Noticed the default nesting for that is 100-level as well. From my reading it can go up to any integers. You can have quite large nesting levels in that.
Itzik Ben-Gan: If I mark through caution zero.
Greg Low: So it can go deeper than that?
Itzik Ben-Gan: If you mark through caution zero there is no limit. By default it’s set to 100; just because you want to avoid infinite recasive clause. Cycle within some graph or tree that is unintended. The cycle is intended there are techniques to identify and stop reversal if you identify the site. There are cases where you have bugs in your data, want to avoid this infinite cycles. This is simply a safety measure.
Greg Low: Speaking of bugs and errors and things, another thing is the structured exception handling.
Itzik Ben-Gan: Oh, right. This is another of my favorite elements.
Greg Low: Favorite things?
Itzik Ben-Gan: One of them. Row-numbers is my favorite, then common table information, then exception handling. In SQL Server 2000 exception handling; I have no other word to say but “awkward.” It was non-structured. You couldn’t… You had to track the error function after every statement that is suspect; switch within the different possibilities. Maybe go to statements; error handling label. This is terrible, even worse. In SQL Server 2000 there are so many errors that are not considered severe errors; terminate your bench. Even conversion error, you try to insert a character into an integer data type. This causes conversion error. Terminates your bench so error-handling code doesn’t have a chance to run. Similarly dead-locks. Many problems you want to be able to trap that you can’t with T-SQL. Both these issues with error-handling were solved by using new construct in 2005. Elegant construct, similar in some sense to tri-catch in object-oriented environment. No object-oriented elements of throwing objects. You have a begin-try and end-try block where you place your suspect code. Begin-catch and end-catch block where you… As soon as an error is detected. Almost all errors are trapped, not 100 percent. The ones that aren’t are the ones you understand can be trapped. Problems that would terminate your connection, not just the bench, this is reasonable. Even compilation resolution errors are trappable in higher levels with tri-catch. Three cases. One is no transaction at all; you go to the catch clause and you identify with the function that there is no active transaction at all. Another case is open transaction, still open and active. What do you want to do with it? Continue and commit or go back, that’s your thing. The third case is for errors with a high-severity level. These will enter a doom state, the transaction on one hand is not committable, but still you hold all the locks so no one else can touch the data; you can still query and investigate it. Can’t do anything that would write to transactions log. If you want to write to the log and do something that writes data, you have to roll back; and in a new transaction do whatever writes you need to do. Trigger this doom state for any error even non-severe if you set the exactable session option on. This would cause all errors without exception to enter this doom state.
Greg Low: I noticed also that the equivalent of throw looks to be raised error.
Itzik Ben-Gan: Yes. If you want raise error and enter doom state.
Greg Low: Inside try-log.
Itzik Ben-Gan: Yes.
Greg Low: What’s your thought of the need for a final need-block?
Itzik Ben-Gan: Many elements in object-oriented programming, we discussed this when we met the developers in one of the summits. People are asking for these things. This is a huge product; has to be released. I find the need for a final need-block. Need to be able to throw specific types of errors and have different catch-blocks; one for each… Many elements that are very nice in object-oriented exception handling can be helpful here. Such a huge scope that they did that I don’t want to complain at this point.
Greg Low: Come a long way.
Itzik Ben-Gan: Open the door, first you enter this feature, then we go in and have it. We’re pulled out of the product just to meet the deadlines.
Greg Low: What are your thoughts on the pivots and unpivot support that they’ve added?
Itzik Ben-Gan: I think it might, we’ll see, might do more damage than good. The way it was implemented was according to one thing. The anti-pivot doesn’t have the capability to do dynamic calculation of the target columnist. You can’t…
Greg Low: Similar to the transform in Access.
Itzik Ben-Gan: Exactly. Transform does do completely dynamic pivot thing in terms of you don’t have to know ahead which list of target columns you’re going to generate. T-SQL people understand that, it only supports static target column. You have to know ahead the list of attributes they are about to rotate. The problem here is first there is a technique to calculate pivoting without using the new pivot operator, it’s not more complicated, it’s not slower; The way pivot was implemented internally is basically using the exact same technique used today in SQL Server 2000 underneath the covers. Group buy, series of marks case. Some case, depending on aggregate, series of such expressions, how people do it behind the scenes. I have to say that even though I like the fact that all these features were implemented in a standard manner, the pivot itself is very, very confusing operator. For example, it has an implicit grouping element. You don’t specify a group by clothes, you just specify what you want to pivot. What is the target list of columns you are going to generate? Implicitly, all the columns you didn’t specify; these are actually used in an internal grouping element, as if you grouped by those columns. People that aren’t aware of this will have a group by a list of columns that they don’t want in the group for the pivot. You have to isolate all of the relevant elements in the derive table for CT. Find this very confusing. I think it’s simpler to just use the technique in 2000; there’s no real advantage in the technique in 2005. Not performance-wise, not clarity of code, not what people will really expect. I feel it might be more damage. People who see pivot expect dynamic lists; realize it’s not credible and will complain.
Greg Low: In fact the thing I find about them is they’ve covered in the in-clause. Almost suggests you could have a sub-select there.
Itzik Ben-Gan: The first question that everyone asks. Some people don’t even ask this, they will use a sub-query there.
Greg Low: That’s lonely. Anything else you think of real note with changes to the language?
Itzik Ben-Gan: There are so many other features as well; these are the ones I think of most importance – major ones. For example, there are also trigger-related enhancements, DDL triggers.
Greg Low: What do you think you use those for?
Itzik Ben-Gan: For myself, first of all, myself is… I would have different applications than third parties; there are advantages here for third parties. Let’s start with what I would do with this. You can enforce company policies for naming… Having a primary key within a table. Maybe all columns must be less than 30 characters. All table names must be a certain length, even with a certain case. All those elements you can force within those triggers. Not to speak of all the things. Something the DBAs are looking for a long time. Third parties will be able to develop their change-management software based on these triggers.
Greg Low: One of the things I look at, I talked to the product group about. I wish you had the opportunity to change the DBL statement inside the trigger. Similar to the way we can change danger in a DML-trigger. The thing is at the moment the event data that is past that contains the DBL is read-only. Talked to the product group about that. Sounded like it was too hard to make that easy to change, they were talking about giving us an instead-of-trigger as well. I’m imagining simple things.
Itzik Ben-Gan: You go and set the triggers, maybe later at some point we’ll see before triggers. This is also the first implemented after-triggers which will have performance issues at some point if the DBL activity is very slow. You are about to roll it back, it will have an issue. We need something like an instead-of-trigger. Maybe in the future if enough people request it. I agree with you.
Greg Low: The thing I was imagining here is I would like to be able to, if I send a statement that said create procedure, as an example. I would like to be able to apply nice formatting to that before it was put into the database. Simple things.
Itzik Ben-Gan: Very interesting. This is just one; there are so many other enhancements. For example, apply-operator. You can use to apply a certain table expression to every outer row from an outer table. Think of it in terms of table operators like join, pivot or un-pivot operators. With join, for example, the two inputs to a join operation. It’s like they don’t have any perceivance in terms of their role. It’s like taking two inputs and applying some activity between them. With apply there’s some sort of perceivance between them. Left input and to each and every row of the left input you apply a whole table expression. Can be a derived table, can be just a table; can be a function that returns a table. The interesting thing here is because it’s as if the left input was already evaluated, the right input has access to all of the attributes from the left input’s rows. It’s like having a table expression that can correlate, which is not available today. You can do fantastic things with it. “I need the three most recent orders for each employee.” How do you do this today, not very simply when you come up with the solution. It’s not going to be scanning the data only once. Here you can do apply and select document three from the table. Where? Let’s say the customer ID is equal to the Customer ID is an outer customer stable.
Greg Low: It looks very powerful.
Itzik Ben-Gan: Inputs. SQL 2000 can only accept the constant. SQL 2005 can accept any self-contained expression including variablizing, self-constrained sub-queries. Applied to modification statements. This is still… If you want to talk about other enhancements there are many others.
Greg Low: That’s running us up to time. One thing I might get you to do, tell us what’s coming up in your world in terms of where we can see you. Books, presentations, things that are happening?
Itzik Ben-Gan: I am currently in the process of writing a T-SQL book for a Microsoft press for 2005. This is basically… You’re probably familiar with Kaitlin in SQL Server?
Greg Low: When we interviewed her she said she was getting you involved.
Itzik Ben-Gan: Three volumes planned for SQL Server 2005. One on storage engine, one on T-SQL I’m writing; one-third of the way through. The third will be query planning. What I was talking about and preaching about is what I have in this book. I am showing a how-to identify fundamental key elements within problems. Polishing them, tuning them, optimizing them, I have done this already. Teaching the thought process and different technique; there are hundreds of key techniques. Using auxiliary table of numbers, using row numbers, various applications. Building with the graphs and hierarchies and trees. The idea behind the book. I am calling it “the book I should have always written.” I feel very strongly about it. Going to have another round in Australia soon, October 10 in Sydney, going to deliver advanced T-SQL course for 2000 and 2005. I deal with problem-solving mainly; it’s not like just showing a new feature. Show them 2000, show them 2005. October 10 will be in Sydney, October 17 in Melbourne, find a lot of details on my website. Solidqualitylearning.com.au. Probably we’ll try to arrange like last time presentations in user-groups. I’ll keep touch, we can arrange something, it would be nice.
Greg Low: Certainly look forward to seeing you around the country. Thank you for your time, we’ll talk again soon.
Itzik Ben-Gan: Thank you Greg.
Phone: 1300 SQL SQL (1300 775 775) l International +61 1300 775 775 l Fax: +61 3 8676-4913
Copyright 2017 by SQL Down Under | Terms Of Use | Privacy Statement