Comments
- Phil Dunn (@Phil_Oracle) on Oracle’s New T5 TPC-C: Where’s the SPARC?, Part II
- ODBC on Installing the Netezza ODBC Driver.
- ashish on View deleted data in Netezza with view_deleted_records.
- Gary on Installing the Netezza ODBC Driver.
- venkat on Installing the Netezza ODBC Driver.
- Gary on EMCA Fails With Error “Failed to allocate port(s) in the specified range(s) for the following proces
- arnab on EMCA Fails With Error “Failed to allocate port(s) in the specified range(s) for the following proces
- Hank Classe on ANS1017E (RC-50) Session rejected: TCP/IP connection failure
- Sean Robinson on Registering with Multiple listeners.
- Gary on Sending nzevent emails for system state changes.
Meta
Category Archives: Architecture
Enterprise Architecture
Enterprise Architecture should be a way to ensure that software development aligns with business strategy – that all stakeholders are collaborating on business outcomes – but whether it achieves this isn’t always obvious to developers. Continue reading
Hairball Plotter and the Half-Baked Stints
Host: Ladies and Gentlemen, I’m your host here with Hairball Plotter-
Hairy: Call me Hairy
Host: (Pause) Oookay, I’m here with Hairy Plotter and he’s going to tell us about some of his – er -
Hairy: Innovations
Host: Innovations? I thought they were just the opposite -
Hairy: Well, eccentric innovations anyhow. You know, the kind of stuff that gets the job done, but with a lot
more moving parts!
Host: Please explain
Hairy: Well, you see, it’s not incumbent upon me as a consultant to work myself out of a job. I need a way
to stay employed and ensconced, and complexity is just the ticket
Host: How so?
Hairy: The more complex the environment, the better job security I have
Host: We have a caller from Ontario on the line, a question for Mr. Plotter?
Caller: Yes, I’m an IT manager and I’ve found that when people try to embed themselves as you describe, they create risk
for me, and I don’t like that.
Hairy: Thanks for that input. I’ll file that.
Caller: Wait a second, are you blowing me off?
Hairy: What difference does it make? If you like it or you don’t, you’re still stuck with me.
Caller: Unless I find something better.
Hairy: I’ll just milk your budget so there’s nothing to spend on something better. Works for me.
Caller: But that doesn’t work for me.
Hairy: Noted.
Caller: But -
(click)
Host: We have another caller from a financial services firm in New York. Caller you’re on the air.
Caller: Thanks, I agree with Hairy on this. Contractors are punted around and treated like fodder. We need a way to
keep our jobs and pay our bills. Working ourselves out of a job doesn’t fit that mold. We need more ways to
make work for ourselves, even if it’s artificial.
Host: Artificial?
Hairy: Well said!
Host: Wait a second, artificial?
Hairy: Well, of course its artificial. All of us are smart enough to do it better, faster, smarter. Heck, I could deploy a
high-reuse/low maintenance implementation that basically lets you run things lights-out.
Host: Good for you!
Hairy: Oh no, so not good for me. Once it’s deployed I get a final paycheck and have to ride into the sunset. Seems romantic
but the sunset doesn’t pay the bills!
Host: So you find a way to make yourself useful.
Hairy: No, I find a way to make myself necessary. Usefulness is for suckers.
Host: Now wait a second -
Hairy: Now listen. Everyone does it. At one of my favorite clients, all of the contractors are on a perpetual time-and-materials contract. When we
had some folks show up on a fixed-price gig, they practically pulled their eyeballs out when they couldn’t get any urgency out of anyone.
Host: Why not?
Hairy: Because all those time-and-materials folks had a vested interest in protracting the work until the next day or the next week. Practically
every time we said we needed something right away because our clock was ticking, they would offer it up for delivery “next Friday” or some such.
But you know, If they ever looked ike they were wrapping up, it could spell curtains for their cushy little tushy.
Host: (laughs) I see what you mean.
Hairy: It’s just how the game is played.
Host: So IT staffing is just a game?
Hairy: Musical chairs. In so many ways.
Host: So what are some of the ways that you – well – set this up?
Hairy: Invest in a lot of wool.
Host: Sorry?
Hairy: We’ll be pulling it over the manager’s eyes.
Host: Oh, I see, that’s kind of funny.
Hairy: Thought you would like that. But seriously, we take the simplest, most direct way to implement something, so that we
know exactly what it would look like, then do the opposite.
Host: Seriously?
Hairy: Well of course. If we do the simplest approach, there’s no room for a hero to step in an save the day. Lots of little
virtual terrorists running around in a complex system. Think about how the 9/11 terrorists were able to evade
electronic surveillance – they stayed off the grid and used cash, and were able to fool the other systems into thinking
they were out of the country.
Host: So you use the same technique?
Hairy: Well, the same forms of deception of course. If we built over-complex systems accidentally people would say that we’re
clever, but if we do it deliberately, we’re clearly diabolical. I would never deliberately paint myself to look bad, so we have to be a little
deceptive, right?
Host: Hmm, so this is a little disturbing. I want to know – oh we have a caller from Orlando.
Caller: Hey man, I want to come work with you.
Hairy: Shoot me a resume.
Host: Stay on the line and we’ll take your contact information. There’s another caller from Dallas Texas. Dallas, you’re on the air.
Caller: Hey, this whole deliberate-snowstorm thing is really a different way to look at things.
Hairy: Snowstorm, I like that. When one of my first mentors realized this technique, he said in a gruff voice – “You’re a blizzard, Hairy.”
At least I’m not slithering – around you know like those snakepits where the idea-mongers hang out. Ideas are good for a spell, then they wear off.
Almost like- Like snake-charming, or snake-whispering.
Host: But I mean, doing it deliberately seems, well -
Hairy: I know what you’re about to say. But think about it. Whether we do it deliberately or accidentally, the outcome is
the same. They are interested in the system, but we’re interested in self-preservation. If I have to choose, I say, don’t
do self-preservation by accident. Do it deliberately. This means set yourself up to be necessary.
Host: Explain necessary?
Hairy: Of course. The more complex the system the more they need you. If it’s simple, what do they need you for?
Host: But if they figure out what you’re doing, you’re cooked, right?
Hairy: But if I work myself out of a job, the outcome is the same. No paycheck.
Host: So you’re saying that most data warehousing is just accidental brilliance?
Hairy: Acccidental or deliberate, the outcome is the same.
Host: Oh, please, don’t go fatalist on me.
Hairy: Come on, man. Most folks can’t pull this off whether accidental or deliberate. The fact is, we still stand up a functional,
operating environment. Consumers are being fed, users are getting what they want. Forget the fact that the
environment is stood up on pallettes and serviced by swarming people on rollerskates.
Host: I- suppose.
Hairy: Again, if someone stood up the working environment and had all this stuff in it, you’d call it clever. Only when
you realize that the complexity is artificial would you take exception to it. If you never realize this, I will always
have a paycheck and you will always have an operational system. You’ll never fire me, because I’m your most visible hero.
Host: Ever heard of Munchhausen-by-proxy syndrome?
Hairy: Not familiar with it, no.
Host: Where a care-giver deliberately harms their child in order to be seen as the savior that delivers them from harm.
Hairy: I see a pattern, sure.
Host: You said it yourself, diabolical.
Hairy: Only if they realize it. Perception is the key.
Host: But you’re admitting to it here. On the air.
Hairy: Nobody will remember it. For one good reason: It’s so outrageous that it simply cannot be true.
Host: Unbelievable.
Hairy: ExACTLY!
Host: But what about scalability? When volumes increase, invariably the complex systems are crushed.
Hairy: Oh please, only about five percent of all implementations have that issue, so if I stay away from those, I’m gold.
Host: But what if one of your implementations grows into this? Seems like you’d have some explaining to do.
Hairy: What are the odds? I can easily place myself in the lower-scale zone and make a good living at it.
Host: But you would agree that the larger the scale, the more the need for simplicity?
Hairy: I don’t do scale, so why would I care? I blow smoke into a manager’s face, nose or other orifices and take a paycheck. What
does scale mean to me?
Host: You have a lot of disparaging things to say about these decision-makers. Aren’t they your customers?
Hairy: Look, you have people like me, who work the magic, and people who sit in the office, smug in their confidence that
they know exactly what to do and how to do it. My objective is to make sure they stay on the outside looking in,
snuggled next to their branded-coffee cup and blissfully unaware of my agenda. We call them the Smuggles.
Host: Smuggles?
Hairy: Yeah, Smug people who snuggle with coffee. Their users are the Smuggees. Smug people who know data but don’t know beans
about how to make it operational. Always offering opinions. Who cares what they think?
Host: Clearly not you.
Hairy: Well, I care what they think to an extent. As long as their reports are running to spec, it doesn’t matter how we pulled
it off, only that they get the data they want.
Host: The “how” and “what” question. I’ve heard that before.
Hairy: And what I’ve heard before, is the endless droning of users dictating to us how we should deploy the systems.
Host: And what do you do with that?
Hairy: If their suggestion will make the environment more complex and difficult to manage, we’re all over it. If it makes the
environment easier to operate, we push back. We call it the “you asked for it” policy. It’s a theme, you see.
Host: Yes, I can see that.
Hairy: O I love ‘em. Nothing is better than deliberately choosing a platform that can’t cut the mustard. I mean, they’re not really
hard to pick, you know. Imagine getting almost to the end of the project and hitting that hard
wall – flying right into it like a blind witch on a runaway broom. It’s hysterical to watch all these
folks running around like headless chickens. You can’t pay for that kind of entertainment. But of course,
they do pay for it, and handsomely.
Host: Until they realize that you’re in the center of it?
Hairy: Eye of the storm you mean, where the seas are calm. I never lose my cool, so they always think things
are under control until someone pulls the single thread that unravels it all. I have plausible deniability. Keeps
me working and the paychecks coming. Sweet.
Host: Don’t you think this is a bit – you know – underhanded?
Hairy: The mass layoffs of the turn of the century were underhanded. They created a 1099-Culture that basically means all of us are
mercenaries. Soldiers of fortune. We go to the highest bidder no matter what. There’s no conscience in that existence, especially when we could
be working for one company today and their competitor tomorrow. Those companies treat the 1099′s like batteries – plug ‘em in, burn ‘em out and toss ‘em.
Host: Seems a bit cynical.
Hairy: One of the better parts about this kind of consulting is that I can propose the solution without
actually producing anything. Then I can flit from flower to flower, pollinating these ideas. They pay
me for the ideas, not the actual work, you see.
Host: So you propose, but you don’t actually execute?
Hairy: Execution is somone else’s problem. Why should I stick around to see if the proposals actually work? If
they don’t, there’s another feather in my hairball cap, or rather another notch in my mayhem gun. But
if the solution works, good grief, all that work for nothing? May as well stay home and play video
games.
Host: What would you like to leave our audience with?
Hairy: Oh, I suppose, don’t worry, be happy and all that. I have a new book coming out called Managing Expectations.
Host: So tell us about that.
Hairy: It’s all about how to give the user a false sense of security while we do whatever we want. I’m all about what’s expedient
for me, but most of the time when the client sees what I’m doing, they love it because it makes things expedient for them as well. Soon
everyone is following the beat of the same drummer.
Host: Which would be?
Hairy: Get it done no matter what the cost. That way, I can charge whatever I want. Cost is no limit.
Host: I see. I think.
Hairy: When you think about it, most IT folk want to do things the expedient way anyhow. Over-thinking the problem seems so stodgy to them.
In fact, most of the time when we’re going through the analysis phase, I can see it written in their expression and practically popping from
their eyeballs.
Host: What’s that?
Hairy: The desire to start coding! Heck, practically every conversation is punctuated with how they intend to do it – long before the analysis
is complete. That’s because IT folk just don’t have the patience for analysis. They want to get coding. I just set the expectation that we’ll
code first and analyze later.
Host: So when does the analysis come into play?
Hairy: What analysis? If we start coding the analysis never happens. There’s never enough time. After all, the herald cry of
expedience is – “If you don’t have time to do it right, when will you have time to do it over?” What do you get when you start coding
without any analysis? Hairball city. That’s my stomping ground.
Host: Thank you for your time Mr. Plotter. This has been very – uh – enlightening.
Hairy: My pleasure
(pause)
Host: That was Hairy Plotter with his half-baked stints. Be with us next time when Jason Statham joins us for
“Data Transporter 2″, where he kicks a bunch of back-ends with ETL tools. Until then, happy computing and keep that data flowing!
–
–
–
Posted in Architecture, Configuration., expedient, hairball, Netezza, organized, scalable
Tagged control, Netezza
Comments Off
Netezza architecture
IBM published Netezza architecture on their red book site. Nothing different, IBM talks about “FAST”, possibly improved architecture and / or algorithms. Netezza architecture from IBM
Continue reading
Teradata Architecture (Hardware)
This is a high level architecture for Teradata 5xxx series, 5625 series employ smaller hard drives and thus the architecture is slightly different. Note that number of arrays correspond to number of nodes; in this case, only two arrays are shown for tw… Continue reading
Netezza data types and functions
Here is a list of data types along with associated info. Note that this is generated by “SQuirrel” tool using Netezza JDBC. These data types are likely applicable to both 5.x and 6.x. TYPE_NAME DATA_TYPE PRECISION NULLABLE CASE_SENSITIVE MAXIMUM_SCALE NVARCHAR -9 [NVARCHAR] 16000 true true 0 NCHAR -8 [ROWID] 16000 true true 0 BOOLEAN -7 [BIT] 1 [...]
Continue reading
Posted in Architecture, Netezza, Netezza architecture, Netezza data types
Tagged Netezza
Comments Off
Rewiring our thinking – a view to a kill
In the past several projects, the issue of using views has consistently arisen, as to when to use, not to use and what to expect. Views are one of those mainstay workhorses that we love to hate and sometimes hate to love, but used correctly, can save a world of hurt and lost development time.
So we would ask the question – why use a view at all? Isn’t the table definition good enough? And what of a synonym? Isn’t this just as good?
Well, synonyms are handy for configuration management and invaluable for testing. For BI, however, they don’t pass-thru the metadata of their underpinning relation’s metadata for consumption by the BI tool, so this can be problematic. We also cannot refresh a synonym easily. It has to be dropped and then created in two operations, where view gives us the concurrency-protected operation of “create or replace view” and is muey bien. More on synonyms in another blog entry.
Views can easily reach across databases, giving us the ability to stand-up a consumption-point that contains one-part tables and one-part views without having to push data around (very handy for say, a reference database that we want as an on-demand resource of fresh information). I’m a big fan of setting up consumption-point databases so that a user comes to a pre-designated place, not the master repository, to fulfill their information needs. This decouples the user from the master repository and gives us enormous freedom in the ongoing enhancement of their user experience.Views are the vehicle towards this goal.
Views also let us do on-demand case/when conversions and typecasting that can be completely encapsulated from the consuming process.
And of course a really cool part about Netezza views – is that we can include as many columns as we want in its “select” clause, the view will not fetch them all, only the one’s mentioned in the select that consumes the view – this is a win-win because otherwise it would fetch all of the columns and then drop the majority on the floor to deliver a few.
Views have the lightweight nature of a single SQL statement that can be easily installed, where a stored proc often contains multiple SQL statements. Both of these mechanisms serve to hide the logic from the BI tool. But think about this – would we use the stored proc as part of another join? Or would we expect to just select from the stored proc and consume an answer? The more complex the operation, the more we need to just select-and-consume, and take the burden off the BI tool to know more than it has to.
A pernicious part of integrating BI tools is just that – expecting that it will know all it needs to know to interact with the Netezza MPP. This is – as you may painfully discover – a false expectation. Case in point, we might have a very creative intersection table between two large fact tables, and we can formulate a query that will browse the information-we-want in mere seconds. Then we plug in our BI tool and ask it to manufacture a query to do the same thing, but it struggles. Now we have to make a call – do we deploy the BI tool in the hopes that later releases will resolve this, or do we install a view or stored procedure that adapt the BI tool to our data model, and then wait for the BI tool to get better in a later release? You see, we can always toss the adaptation when our BI tool gets better. But we cannot allow our user-experience to languish on the same terms. More on this in another essay.
So before I jump into a lot of other things we like about views, I’ll address some of the above in their more malignant form.
I’ll loosely divide views into two buckets – simple and complex. The simple view consumes a single table and may have columnar transforms on it. A complex view, simply put, has more than one table in the join logic.
A simple view cannot be easily misused, but a complex view can be misused so easily it will make the head spin on your best troubleshooter. For example, I cannot count the times I’ve seen a case where a master query joining on a view, which in turn joined on a view, which in turn joined on a view etc. How deep can you go? This is not the issue at all. The issue is in treating the view as though it is a reusable, inheritable object rather than a standalone select-and-consume capability. So where do we draw the line?
Transactional thinking – that is – the notion that we can install nested (inherited) views because they handle transaction-at-a-time anyhow and any given instance of them will have a negligible performance problem – is completely washed away when dealing with multi-billion-row scales on a Netezza platform. It’s not a transactional platform, so each view potentially initiates a full table scan. Multiply these nested upon nested views and we have nested tables scans – sometimes several separate scans on the same table. Which is more efficient, to look at a multi-billion row table one, or multiple times?
One customer had a query that started running very slow one day. We went through a process of discovery to find out what had changed. Seems that a new version of an existing view had
been installed, and the bad query was consuming this view deep under the covers. The bad query and view were both accessing one of the largest tables in the database, the bad query was now scanning the big table twice, taking a double-hit on the master query itself. Even worse, the changed view did not leverage the big table’s zone maps or its distribution key. So a change in one place dramatically affected unchanged functionality of a master query.
Because we are embracing economies of extraordinary scale, dynamic objects have a propensity to lose performance integrity over time. What worked yesterday may not work today, so we have to tune it. Netezza is so efficient that this tuning necessity may not arise for years after the implementation. (In one case, four years afterward). By that time, the knowledge of the system’s dependencies are not fresh on everyone’s mind, so it is easy to make a spot-fix on the view and deploy it. In so doing, we may create a cascading effect for all the other places that consume the view and do so with the expectation of original behavior. In short, the latent nested view architecture is a minefield. We should not implement it because it creates trouble from day one, even though nobody has stepped on mine just yet.
At one customer site we had to sift through six levels of view logic to find the performance problem. The customer wanted to know what they should do to fix the problem, but “the problem” was in the overall inplementation and the nested views, not the one bad view, or for that matter, the recent performance symptoms of a minefield implementation.
Views can behave as traditional objects if they are single-table views or they leverage additional tables that are small and inconsequential to performance. Don’t ever include a big-fat table in a view as part of a performance boosting strategy unless you can designate that the view is in fact a standalone entry point and not something that can arbitrarily participate in the JOIN clause of another master query. Why is this? Invariably we will forget the complexity of the view and then attempt to join it in another operation. For a BI tool, this could be highly problematic as well, because a view that was once simple could spontaneously go complex, and if it affects performance, we’ll be pulling our hair out to find the problem through what reduces to a scavenger hunt, or worse, a submarine hunt.
Many BI tools simply choke on automatically forming a “complex” Netezza query because there is an implicit assumption of indexes via primary keys, and if these don’t exist, the BI tool does the best it can, which in many cases is the least-common-denominator of a query structure. This this doesn’t play well on the SPUs for large-scale queries. I cannot count how many times I’ve seen a convoluted query that we just de-engineered and simplified, and ran an order-of-magnitude faster than the one conjured by the BI tool, yet nothing the tool folks could do seemed to make the BI tool form it the same way. To the rescue: a view that did the right thing – and that was that.
What’s that? Putting together a view diminishes the flexibilty of the query? Only marginally, and since we’re dealing with billions of rows, we don’t have much runway for “ultimate” flexibility anyhow. The larger the datasets, the more we need to make sure the queries are as efficiently formed as possible. And since this means as simply formed as possible, we’re not talking about BI Tool query engineering, but query de-engineering.
To avoid pain and injury, don’t treat all views the same. If we have a complex view, we should tune and designate it as standalone. No matter how much we like its results, it is better not to just arbitrarily include it in another join. the primary reason being – most views are not set up to regard a distribution. So when we include it with our other join, the resolution of distribution might take the form of of least-performing, lower-denominator. We don’t want that.
One alternative, oddly, is to CTAS – execute such a view in context and insert its data into a temporary table, then use the temporary table in the master join. This affords us the option to (a) leverage the view’s normally small output (b) preserve the distribution or (c) align distribution to the next operation (d) simplify the implementation. Of course, your BI tool may not support this, or may support it in an inefficient fashion. Most of the major BI tools will accommodate advanced scenarios, so get your product support rep on the wire and have a heart-to-heart.
Yet another alternative is to use the view like an in-line view, except in the where-clause in correlated sub-query. This can often take the form of a where-not-exists clause or the like and can also be very efficient.
Another alternative is to break apart the view’s logic and assimilate it into the larger view so that all logic is preserved. But you’ll be maintaining that logic in two places, right? Not necessarily. We have a lot of view DDL executables that do not directly spawn from a modeling tool. Several of those being in BASH script, which provides for parameterization of logic. If we put the logic into a parameter, then produce the views by including the parameterized logic, we will maintain the core logic in one place (script) but actually deploy two views that leverage it. This is essentially what happens under the covers with many object-oriented environments anyhow. Multiple objects will consume another class and deploy an instance that includes that class, so this approach embraces that inheritance pattern. Not in the dynamic run-time of the view, but in the view’s initial DLL-level deployment.
MYLOGIC=$( cat <<!
a.limit1 between 50 and 60 and
a.limit2 between 1000 and 50000 and
a.tran_amt < 10000 and
b.employee_id <> 9999
!
)
view1=”create view view1 as select col1, col2, col3 from mytable a where $MYLOGIC ;”
view2=”create view view2 as select col1, col2, col4 from mytable a join yourtable b on a.id = b.id where a.col1 = b.col1 and $MYLOGIC ;”
If our modeling tool supports this capability as part of its functionality, and we should leverage it before simply bolting a view into a join. If our modeling tool does not support it, this scripted DDL scenario is easy enough to formulate and leverage without a lot of overhead. The objective: two views that both behave as optimized joins, rather than one view that behaves as a join-with-a-view.
Either way, there is a theme here, that simply including the complex view as part of another join’s logic – as though it was a table – is risky and can, even at the outset, offer up such bad performance as to be a non-starter. So a plain-vanilla practice should be to make the complex view behave in a standalone query-and-consume fashion by default. Make no assumptions that it is okay to arbitrarily include it in a larger query’s join clause.
The further downside is that a de-facto join-with-a-view can work really well at the outset, but the scale of the data can catch up to the even the most robust of implementations, and wiring up the complex view dependencies creates a problem that will not scale, but will only become obvious over time (a minefield)
One group invoked a standard for view naming conventions. The simple views would have no prefix at all, so they would look like tables to the casual user. Fair game and all that. The complex views were labeled as v_<viewname> as a cue to a user or report builder: don’t use it in the join of a larger query. You’d think that if there was an implicit rule to avoid using anything “v_” prefix that people would play nicely. But not so, since your reporting users may have come from a RDBMS background where it’s perfectly okay to mix views into the master query. Awareness of the standard is one thing, but actually embracing it is another. We cannot protect our systems from people who either don’t know the rules, don’t understand them, or cannot map their experiences from an RDBMS to an MPP.
So a suggestion here would be to name the view in a manner that is a departure from common view nomenclature. Calling it an sp_(NAME) might draw the ire of your admins who want stored procs named for what they are, and not obfuscate their names. But if our views are not really common views, and have caveats on their usage, we need a safer naming convention, one that aligns with the goal we are trying to achieve – that of adapting the BI tool to the MPP. One group used a naming convention of “bi_”, while another used “rpt_”, and still another used the common acronym for their given BI tool. The point is to adopt a convention that is somewhat unconventional, so that those with conventional thinking are able to transform their thinking without finding themselves in a minefield.
Nothing is worse than overlooking a minefield – it’s a scary view – a view to a kill.
Posted in adapt, adaptation, Architecture, business_intelligence, business_objects, consumption_point, fit_for_purpose, microstrategy, Netezza, stored_procedure, views
Tagged Netezza, performance, reporting, users, view;
Comments Off