02.14.10
Posted in General development at 9:26 am by Kyoryu
Been reading Steve Yegge’s stuff. He’s smart. You want him working for you.
This post reminded me of Rob’s Grand Theory of Good Object Oriented Programming (I’m the Rob in question, BTW).
If you’re asking what the hell compilers have to do with object oriented programming, read on.
Fundamentally, what a compiler does is transform data. Each phase of the compiler makes a different transformation of the data. Most steps will spit out the data in an entirely different form, which is neat because it means that any given phase generally doesn’t have to know about anything except for the data format it gets, and the format it spits out. Phases are usually one-way.
A simple compiler might take in a text file, and transform it into a token stream (lexing phase). Then, it will take the token stream and turn that into an intermediate representation such as an AST (parsing phase). After that things get fun. Usually you’ll then mess with the AST a few times, simplifying it, and then transforming it for perf reasons. This may or may not be the same tree format. Once you’re done with that, you’ll start the process of transforming it into something resembling the native format of the machine you’re actually compiling it to.
And so on, and so forth.
What we have here is a series of things that take data in, mangle it, and spit it out the other side, often in a different format.
So, again, what the hell does this have to do with object-oriented programming?
Well, it turns out you can model just about any damn thing you want using this programming model. For instance, say you want to write a calculator program. You might be tempted to write that by checking to see if a button was pressed, if so adding a value to an expression tree, and then calling calculate on the expression tree to get a value, and then setting the text on your output. This could all be done as a single large method, but you’d probably realistically break it into multiple functions if not objects (though the actual code path would likely be roughly equivalent).
You could do that. Or, you could treat it like a compiler.
In our calculator-compiler, we’d start with the input being the coordinate of a mouse click. We would transform that into a symbol, much like a token in a lexer.
This token would then be passed into an expression builder (much like a parser). As a result of this, our parser would then output a current value, which could either be a number, an error condition, or perhaps some other kind of message.
The next piece in line would take the output message and transform it into some kind of data structure representing formatted text (possibly including colors and whatnot).
And the final piece would take that formatted data structure and turn it into a bitmap to be displayed in our output area.
Now, I’d write the calculator program in such a way that each of the pieces I just described was a separate class (at least.. maybe more than one!) That’s a lot of classes. Why would I do this?
Well, it turns out that doing things like that has a lot of benefits. First, it means that any of your code, except for the parts at either the very beginning of the chain (where you read the mouse input) or the very end (where you actually display the bitmap) can be tested in isolation. That’s pretty cool.
Secondly, it means that except for data, no part of the system really knows anything about any other part of the system. That makes changing things much, much easier.
Third, the system becomes almost entirely stateless. And what state exists is completely localized to a single class. State is only used to modify the outputs of a given class, not to be queried by other classses.
Fourth, locking becomes really easy. A given class locks when it is modifying its local, internal state, and releases the lock before it calls the next object.
Fifth, multithreading becomes easy. If this is all modeled as objects making void calls on each other, then you can make calls to objects on pretty much arbitrary threads.With no return value, and no real assumption of state modification, it gets easy.
Sixth, your error scope becomes smaller. Since any given class is only responsible for a specific, defined data transformation, it’s really irrelevant what the next guy in the chain does. It’s not really your responsibility.
Okay, but does this work for interactive apps? Or apps with complex logic, or multiple inputs or outputs?
Yep! It works fantastic! It just means that your chain isn’t linear, that you have some areas where multiple objects talk to one, or where one object outputs data to many. Hell, it even works in cases where objects can cause chains that result in calls back to themselves.
It doesn’t preclude more “typical” OO programming, either. In fact, I often use more typical techniques when creating and describing data structures, rather than program flow.
One of the other interesting things that happens in this style of programming is that you often realize that what you’re doing is very analogous to creating an AST of your own. Which means that you can, conceivably, create a generic form of whatever it is you have that wires your objects together that works on some input. Some input that might be a tree-like structure, which could be generated from a token stream…
See where this is going?
There’s other advantages, too. “Normal” OO programming often runs into a number of common problems. First, you’ve got the efficiency problem. Calling “User.Save()” will perhaps make a call to a database, and looping over 20 Users will result in 20 calls. That sucks. Looking at the problem from this approach, you’d transform that array of 20 pieces of data into a SQL query, which would then be passed to something which could hand it to the SQL server itself.
Another common problem in typical OO systems is increased complexity due to either over-generalization, or overuse of inheritance. This problem typically manifests itself as code that jumps across mutliple different classes through a maze of calls that is impossible to trace. With the ability to draw strong demarcations of responsibility at object boundaries, the code inside a given object tends to be very specific. Since a single object isn’t trying to manage the entire data flow of a single piece of data, you end up avoiding a lot of coupling problems.
By the way, I don’t think I’m really alone in this. If you look at the GoF book from this perspective, a whole lot of the patterns make more sense than if you look at them from a more typical viewpoint. Also, what I’ve just described is, in many ways, the Actor model that is starting to become somewhat in vogue.
It’s also works exceptionally well with test-driven development, given that 95% of your code ends up stateless, and not interacting with the system at all.
Permalink
06.19.09
Posted in General development at 12:39 am by Kyoryu
Robustness is one of those things that we can chase forever. Many developers think that “robustness” means never crashing. A more experienced developer will realize that there are many, many things worse than crashing. Continuing to run while in an invalid state is a much worse option, as it opens up the possibility of corrupted data – a far, far worse problem than a simple crash.
Even past that, we have to look at error conditions that can occur, what compensating actions we can take, and what the impact to the user is.
There seems to be a few general levels of robustness in applications.
- In cases where no system failure occurs, and all input data is correct, the system should work. This is the basic level of correctness. Now, the catch here is knowing what the system should do for any set of valid input…
- User input should be appropriately validated and sanitized to prevent failure. Again, sometimes you can’t just nicely recover, and the only thing you can do is throw an exception or other error code. That’s fine.
- The program should continue to work in case of reasonable system failures – a file being open unexpectedly, a remote system not being available.
- The program recovers in the case of extreme faillures – out of memory, full hard drive, hard drive unexpectedly removed. In many cases, catching these failures may not be worth the effort. It is unlikely that you can do any reasonable recovery, and so doing minimal recovery to try to not corrupt any data, and then get out. If you don’t know that you can even do minimal recovery, just fail and hope for the best.
- In the case of users undermining the system by deleting files you require, I don’t know that it’s even worth bothering. If something you require is gone, you’re broken. Don’t even try to run, exit as quickly as you possibly can to prevent data loss in the future. This scenario is no different than somebody deliberately deleting files from the Windows directory.
And, that’s my view. I’m sure some will disagree, but that’s fine. Attempting to recover from an unrecoverable scenario that is unlikely to ever happen in reality, and if it does, will almost certainly be accompanied with other failures has little value. It is likely that the time spent could be spent doing other things that will have a higher value to your consumers.
Permalink
05.14.09
Posted in General development at 1:14 pm by Kyoryu
Design is always a touchy subject. There are those who believe everything should be designed out beforehand, and those who believe you should design as you go.
I’m pretty firmly in the latter camp, as I’ve never seen the former plan actually work. But, there’s a few caveats to that.
I don’t believe in the idea of designing every class, method, and interface before you start coding. Many of those details will become obvious as part of coding, or improvements will be found, and having to add committees and approval processes to making changes (especially if they’re internal only) seems like a really bad idea. When I talk about big design up front, this is generally what I’m referring to – Big “Design Up Front.”
On the other hand, you need to do some level of design up front. I think the XP folks call this the system metaphor. You need to know the big pieces in your design, and what the general flow of the data is. If it’s a distributed application, who connects to who?
Specific technologies don’t need to be a part of this conversation. If you know that process A will send data to process B when the data is ready, then how that takes place is mostly an implementation issue. The important decisions are things like whether A connects to B or vice versa (especially if distributed), and whether A pushes data or B pulls it via polling. Even in a single process, where are your component boundaries, and are they really boundaries? What’s your threading model?
These are the kinds of decisions that you have to make early, as they shape the system as a whole. These are the big pieces of design that need to be hammered out. I’d call this “Big Design” Up Front, and I’m firmly in favor of it.
Permalink
04.06.09
Posted in General development at 10:12 pm by Kyoryu
One of the hardest things in development is learning to let go. It’s something most developers fall victim to – you get some idea for a system that will simply fix ALL of the problems, walk the car, and wash the dog!
And then you find some use case that your system doesn’t quite cover. So you fudge around the use case, or scope it out. And so on and so forth. And you end up with some nasty piece of code that barely works, is horribly mangled to the point of unmaintainability, and that nobody wants to deal with, ever.
The problem here is letting go. As developers, we are in the job of creating solutions for problems. Any piece of code is a solution to a problem. And most developers are pretty smart, and hate admitting that they’re wrong.
But sometimes we are wrong. And when our use cases (the problem) start conflicting with our code (the solution), it should be the code that loses. We should tailor our solutions to the problems we are presented, not the other way around.
When we’re wrong, we have to let go of our wrong solution, and learn to do it quickly and easily and without ego. And that can be very hard.
Permalink
01.29.09
Posted in General development at 1:45 pm by Kyoryu
While most people quote Occam’s Razor as “the simplest thing is most likely correct,” the actual quote is “do not multiply entities needlessly.”
I’m not sure that this is good programming advice. I do, however, think it’s an accurate description of how most developers behave. Typically, a developer will create as few discrete entities as possible. They will use one class rather than two. They will create a single large interface rather than multiple small interfaces. They will create a single large function rather than break it down into multiple, smaller functions.
It doesn’t seem to be a matter of typing, or of saving characters. The behavior seems to suggest that developers will prefer a single, very large method to two smaller methods, even if the total lines of code is fewer using two smaller methods.
This probably boils down to perceived overhead – creating an ‘x’ may be perceived as managerial type overhead, as opposed to “lines of code” which are real work. If so, it would suggest that the less overhead that’s required to create an entity, the more likely it is that multiple entities will be created.
This is something to keep in mind when designing APIs, user experiences, languages, or other tools that you expect others to use.
Permalink
10.10.08
Posted in General development at 5:41 am by Kyoryu
Using an interface seems like one of those rules. Everybody knows they should do it, because it makes your code more abstracted and… stuff.
However, if you have a gigantic class with a ton of methods, or methods that are very specific to its implementation, then simply providing an interface that mirrors the public methods of the class is of little value. To successfully swap one implementation for another, you would need to understand the behavior of the first implementation so well that you could accurately mimic it – and, frankly, that’s not very likely unless you’re the one to write the first implementation anyway (and probably not too likely even then). While you’ve avoided implementation coupling, you’ve got a kind of conceptual coupling in its place. This is doubly true if the interface specifies that it returns objects that implement another interface (likely as thick as the first).
So, using interfaces as kind of headers doesn’t really help us too much in this case. We’re still realistically tied to an implementation, and now we have interface versioning issues to deal with (which, for C# at least, are worse than class versioning issues, as adding a member to a class does not break backwards compatability – but it does for an interface).
That doesn’t mean that I’m against interfaces. In fact, I love interfaces. I just think that there’s better ways to use them than as sim-headers.
Interfaces should be used to define questions that, as the class you’re writing, you want to ask of your dependencies. This is a bit of an inversion – typically, interfaces are defined from the POV of the class implementing them, not the class using them. But, by controlling the interfaces you use, there’s less chance of them breaking and causing major headaches throughout your codebase.
Interfaces should also be as small as possible, and represent a single aspect of what you can do with an object. IEnumerable<> is a great example – it only lists things that you need to do to enumerate a collection. And because of that, it can be very stable. The more things an interface does, the more likely it is to need to change, and the more code that will be broken when it does.
So, if you’re using interfaces in this way, how do you implement them? Especially if the class you’re dealing with didn’t own the interface to begin with? This isn’t too hard – write a small adapter class that implements the interface, and calls the underlying object in the appropriate way. This has the added advantage of keeping all your dependent code in one spot, making it much, much easier to fix if the dependency ever changes underneath you (assuming that you don’t control it).
Permalink
06.20.08
Posted in General development at 5:51 pm by Kyoryu
“You can’t turn a pig’s ear into a silk purse.”
“It’s not a global, it’s a singleton!”
Description:
An “acceptable” design pattern is placed on top of a concept that is generally avoided
Symptoms:
- Problems associated with a known poor development practice start cropping up.
- Problem areas are defended by spouting off the name of the design pattern that they superficially resemble.
- Patterns are used when the problem that the pattern solves is not demonstrated, but for tangental reasons.
Examples:
The most common form of silk-pursing is singletonitis. Too often, globals are wrapped up in singletons, because that somehow makes them “okay.” Inappropriate use of Service Providers seems to be the next version of singletonitis, in that it is often used to propagate globals rather than the actual purposes (extensibility, etc.)
Usage of try/catch/finally to simulate goto is another example.
Silk-pursing is related to cargo-cult-programming. In both cases, a useful pattern is used inappropriately. In cargo-cult-programming usually involves adding patterns/structures/algorithms/etc. for no apparent reason whatsoever. Silk-pursing is different in that the pattern being abused is applied solely for the purpose of hiding a practice that is frowned upon.
Silk-pursing may seem like gold-plating, but it is different. Gold-plating involves putting extra, unnecessary features into code. Silk-pursing simply hides poor development practices.
Silk-pursing may or may not be a deliberate attempt to conceal. In many cases, developers will actively believe that because they are using something they’ve heard is a beneficial practice, that what they are doing is actually better.
Fixing:
Treat the code as if it were the underlying development practice – treat silk-purse singletons as globals, etc.
Educate developers on the purpose of the design patterns that are being abused.
Educate developers on the fact that concealing a bad practice doesn’t make it any better.
Educate developers on ways to design code that doesn’t involve using the poor practice.
Permalink
05.23.08
Posted in General development at 3:04 pm by Kyoryu
Dependencies and coupling seem to cause the greatest pain in software development. I think it’s useful to look at the types of dependencies that can exist.
Contained Dependencies
A contained dependency is a dependency that a class has, but which is not communicated externally. The class uses the contained object, and so is dependent upon it, but does not propagate the dependency. This is the most benign dependency, as if the dependency breaks, the class may break, but it can not (directly) break other objects.
Direct Dependency
A direct dependency is a dependency which is exposed directly by the class, either in a return value, a parameter, or a base class. Direct dependencies are worse than contained dependencies, as they can directly break classes that use the class under discussion, both in terms of compilation and functionality.
Indirect Dependency
An indirect dependency is a dependency exposed by another dependency. This is worse than direct dependencies, as this is how dependencies propagate, causing the system to become brittle.
Hidden Dependency
A hidden dependency is arguably the worst kind of dependency. A hidden dependency is a contained dependency that can cause side effects, causing other components to fail. Globals are, generally, hidden dependencies.
Permalink
04.26.08
Posted in General development at 3:11 pm by Kyoryu
Ahhh, probably the most controversial subject in development. I don’t know of any single issue that is more likely to get people riled up, either for or against it.
My experience is pretty simple. I’ve never done pair programming “full-time.” But, like most programmers, I’ve done it at times – working with another programmer over a problem.
When I’ve done that, I’ve generally found that a few things happened:
- My knowledge increased
- Hopefully, the knowledge of the other guy increased
- We both understood the system better
- We generally produced better quality code than I was used to seeing from either of us, individually
- We remained more highly focused
There’s been a number of studies done on pair programming. Most of them use similar methodologies, and reach similar conclusions. I’ll dig up references later.
In general, they take developers or students, and divide them into two groups. One of the groups will pair up and work on a project, while the other group will approach the project individually.
In general, these projects are rather small. In one experiment, they were class assignments, over a period of time.
The results generally found were that the pair took less clock-time, but more man-hours to complete the task. Generally, the results from pairs were of higher quality.
One particular study continued the experiment over time, and found that the tax on man-hours dropped from 40% at the beginning, to only about 15% at the end of the experiment.
Now, that’s pretty impressive by itself. If a project would normally take 40 hours for a single developer, and with two, you can get it done in 22 hours and with higher quality, I think that’s a win.
But, I think that the experiments described are testing the wrong things.
I see the real benefit of pair programming not in coming from initial productivity, but from the ability to sustain productivity over time, and to allow higher levels of scaling.
One effect noted was that for pairs to become highly effective, it took some amount of time for both the developers to get used to pair programming, as well as to get used to each other. When doing an experiment on a small or micro basis (projects taking <1 day to complete), the initial cost of this can easily outweight any benefits.
Secondly, one advantage that I see is increased knowledge of systems. By pairing, especially if pairs are not static, developers will work on multiple areas of the project, and will gain understanding of the “big picture” of what they’re doing, rather than their isolated area. This should increase overall quality, as well as remove the “hit-by-a-bus” category of risks. When working on small projects that can be completed in under a week of work, this is mostly irrelevant.
Third, by pairing developers, you will remove some level of the communication tax. Given four developers, if they work individually, they must all coordinate. If they are paired off, you now only have two groups that need to coordinate, instead of 4 individuals. Because the “individual” developers worked as exactly that, they had zero communication tax. Comparing paired developers to two developers collaborating as individuals would be an interesting metric.
Fourth, maintenance is very important. While “passing tests” is an important measure of quality, the ability to maintain and expand code is also extremely important, if hard to measure. If pairing increases quality, then this additional quality should (in theory) allow code to be more easily changed in the future – especially when a larger number of developers have insight into it. Again, on small projects, this benefit is unimportant.
None of these benefits can be measured on toy or small projects.
So, how would I design a pair programming test, in an ideal world?
I think you’d only need to change a few things.
First, projects need to be somewhat longer – at least 40 hours to completion. And that is minimum.
Secondly, to be fair, the test needs to compare equal-sized groups of developers to each other, working individually or in pairs. In my experience, I’m not sure that adding a single additional developer actually increases productivity, but this is a more realistic comparison.
And last, ideally you’d want to scale out even further – four developers or more, working in pairs, against a team of the same size working as individuals. Generally, in a realistic environment, the question is not “should we hire twice as many developers and have them pair?” The typical question is “we have this many developers – should they work alone, or in pairs?”
I honestly don’t know how these results would turn out. But, I suspect that they’d turn out very well for pair programming, especially if combined with other practices (such as TDD).
Permalink
Posted in General development at 1:43 am by Kyoryu
In skiing, there’s a concept called the “fall line.” The fall line just means down. But it doesn’t mean “towards the bottom of the hill,” it just means what is immediately down from the exact location you’re at – if you dropped a ball, or poured some water, which way would it go?
This is important in skiing, because you have to stand perpendicular to the fall line if you don’t want to keep moving.
There’s also a fall line in development. The fall line in development is simply what is the easiest thing to do at the time to solve the immediate problem. This can also vary by individual knowledge level.
In C, if you want to add some function, the fall line is just to declare it before you need it, and go on your merry way. If you need it multiple places in the same file, put a declaration at the top of the file. Only if you actually need it in another file is there any reason to expose it via header.
C++ is kind of opposite in this aspect – if you want something to be a class member, it has to be included in the header. So a lot of little internal methods that, in C, may have been completely hidden from the world become exposed, at least to the extent that they’re in your header. Yes, you can do things like separate implementation classes, but that’s, again, adding more work.
Because of this, I firmly believe that in any kind of API development, you want the easy thing to be the safe thing to do. If something takes more effort to do, people will avoid it unless they absolutely need to do it.
Sometimes, the easy thing to do in a language, or application, isn’t the right thing to do. Sometimes it’s the wrong thing to do. And the typical answer to that is to add process, and force people to do the right thing.
I’m going to suggest that all process can do is add inefficiency, and by doing so, change the fall line of development.
That’s not necessarily a bad thing – if the fall line leads you to write bad code, then adding inefficiencies to gain later benefits can be very useful!
But, there’s two things to keep in mind:
First, if people don’t understand the purpose of the process, they’ll just follow the steps of the process while continuing, essentially, the same underlying behaviors. Imposing a process alone will generally not change the mindset of anybody. If you’re really looking to get a change in behavior, you’ll have to do that through education.
Secondly, the reason that process works is because it adds inefficiencies, and people will avoid inefficiencies. The more heavyweight a process is, the less it will be used. If your checkin process takes a day to navigate, then people will avoid checkins, and do them in large batches. And that might partially defeat the purpose of your checkin process in the first place.
You should always use the minimum process that you can, and use process deliberately to add inefficiencies. The goal of a process should never be to get people to do what they should do, but rather to get people to not do things that they shouldn’t.
I’m a big TDD proponent. I would never suggest that somebody institute a TDD process that was mandatory. If you don’t “get” TDD, then all you’ll do is write bad tests, and make the unit test suite less usable to me, without even getting any of the benefits.
Instead, I’d make a process (probably as part of the build) that ran the appropriate suites, and make some kind of check to make sure they were either run prior to checkin, or immediately afterwards.
Yes, I want people to write tests. But the best way to get them to do that is by example, and showing them how useful the tests can be. What I don’t want is for people to break the tests, rendering them useless. And, I don’t want poorly written tests that are likely to fail due to configuration issues, or take forever to run.
If I make users manually run tests, it’s another process that they have to do – they’ll put it off as long as they can. And if there’s a break, they’ll probably have worked well past the original error, and have to rework a lot of code in order to get everything to work again. They’ll end up hating the unit tests, and the quality of the suite will degrade.
If the tests are in another directory, developers will have to sync two directories, and switch directories to build/run tests. Again, this will make them run them less often.
By making the tests run as part of the build process, I make it hard to break the tests. If the tests break, you don’t get a successful build. Now, you have to circumvent the build to get anything done. If it’s your change that broke the tests, it’s probably easier to just fix your code, since you’ll have to do it anyway.
And, if the tests are part of the build, slow or failure prone tests will become painful for everyone – as long as it’s easier to move the tests to the appropriate suite than it is to circumvent the build, they’ll do that.
Permalink
« Previous entries Next Page » Next Page »