Friday, October 06, 2006

Invert your logic part 2: For loops and while loops

I'm not sure how many people I've convinced or not convinced in my previous post about inverting your logic (I don't really think this is a good term, but so far I don't have a better one). The comments that were left lead me to believe I wasn't 100% successful in my quest.

It may have been that my example was too limited. I don't just apply this style to a single method, but every block of code I write.

Here's a real code sample from my Parallel Web Spider project. In the first version of the code, there was only one pre-condition test that checked to see if the link depth was too great.

Since posting the code early this week, I have made some minor updates to it, including the two extra condition checks you see below that prevent the spider from following binary content and URL protocols that aren't supported.

Here's the code:


...
while (processed != toDo) {
// get result
try {
future = service.take();
processed++;
links = future.get();
} catch (Exception e) {
e.printStackTrace();
continue;
}

// process result
System.out.println("Processing " + links.size() + " link results.");
for (PageLink link : links) {
if (link.getDepth() >= config.getMaxDepth()) {
continue;
}

// don't follow binary content links
if (link.getUrl().endsWith(".pdf") ||
link.getUrl().endsWith(".gz") ||
link.getUrl().endsWith(".tgz") ||
link.getUrl().endsWith(".zip") ||
link.getUrl().endsWith(".doc") ||
link.getUrl().endsWith(".ppt")) {
continue;
}

// we're only going to support http and file for now
if (!link.getUrl().startsWith("http") &&
!link.getUrl().startsWith("file")) {
continue;
}

submit(service, link.getDepth(), link.getUrl());
}
}


The style I am suggesting doesn't just apply to methods, it's for any form of a block.

The benefits to writing code in this stlye are
  • loose coupling - blocks are not dependent. You can add/subtract sections of code without having to worry about the surrounding logic
  • linear flow - mentally, it's easier to grok links in a chain because they require small, focused amounts of brain power. A stack (nested code blocks) requires you to keep state, so the brain power required is additive for each level of nesting you introduce.
Finally, extra credit if you noticed one more application of the same style -- the try/catch block. Notice how much more cleanly I can handle the Exceptions thrown by the take() and get() methods by judiciously using a continue. If I had left the continue out, I would have had to handle the exception way down in the code, nearly 60 lines away.

9 comments:

Curt said...

Break and continue are probably underused in Java. Still, in the example you give, a clearer solution would be:

for (PageLink link : links) {
process(link);
}

Then the original method is clearer and the continues become returns.

Taylor said...

Agreed - I think the art of writing code is where and when to create methods.

If I suspected I might run that bit of code from somewhere else, I would have made it a method. If it gets too long, I would also make it a method.

But it doesn't quite fall into either of these categories -- yet -- even though it is quite close. So for the moment I am leaving it all in one while loop.

The advantage is that the logic is right there, not tucked away in some method.

The disadvantage of course is if it is too long, it's hard to understand the entire scope of what's being done.

Ricky Clarkson said...

It's hard to take someone seriously who writes 'catch (Exception'.

Anyway..

What you're doing in this code is filtering based on a set of conditions.

Roughly,

for (final PageLink link: Collections.filter(links,condition))
submit(service, link.getDepth(), link.getUrl());

I find this clearer.

condition would be defined as:

Function<PageLink,Boolean> condition=new Function<PageLink,Boolean>()
{
public Boolean run(PageLink link)
{
boolean part1=link.getUrl().endsWith(".pdf") ||
link.getUrl().endsWith(".gz") ||
link.getUrl().endsWith(".tgz") ||
link.getUrl().endsWith(".zip") ||
link.getUrl().endsWith(".doc") ||
link.getUrl().endsWith(".ppt");

boolean part2=!link.getUrl().startsWith("http") &&
!link.getUrl().startsWith("file")

return !(part1 && part2);
}
};

This could be simplified further, I'd work with this some more if it was in my own code.

The necessary classes, Function and Collections, are available in fpeas - http://code.google.com/p/functionalpeas/

Cheers.

Taylor said...

Ricky,

Thanks for the comments. True the Exception is a bit broad, but it's demo code, not production - point taken regardless.


I like your suggestion, although it looks a lot more like functor style code so I would think you'd have to make a systemic change to support this style more broadly, or it would look very out place.

The issue I have is that you're solving my problem for this particular instance, but I am trying to point out a more generic use case.

It seems to me that while your solution works well with Collections, it may or may not lend itself to a more generic solution when for example the logic involved doesn't use or iterate over Collections.

Ricky Clarkson said...

"The issue I have is that you're solving my problem for this particular instance"

I'm not sure, but I think I would solve any problem in a different way. I find it hard to generalise about this yet, which is why I haven't said "functional is better" or anything so far.

Rather, I'm learning by implementing. If you provide another (non-straw-man) case then we might have something to talk about. ;)

If I am to generalise based on what I know now, I'd say that safety doesn't have to come at the expense of expressiveness. By safety I mean type safety, etc.

MAG said...

Hi Taylor,

I think you are getting a pretty rough ride here - especially as your point is a good one.

Basically, separate out pre-conditions at the start of the logic not embed them within the block. Great - especially in a imperative language like Java.

Sure, functional is better and it would be great to write functional code, but Java isn't Haskell and sometimes needs must!

alexhiggins732 said...

Let me start of by saying I am a VB programmer and I probably have no right even getting involved here.

However, I tend to agree with the author as I am not familiar with Java. Yet, I am still able to read his examples and extract a general concept that can be applied to several other languages including, VB, C#, C++, PHP, JavaScript, etc.

Additionally, I tend to agree with testing for the null value first as once the object is created, its type can be return (in this case null) faster than testing other conditions such as comparing the list.size to index integer. Although the machine code executes these test super fast think, the extra operations consume more time. For example, lets say we handed coded the operator functions for the statements for the following line of code, but the variables are custom data types.

if (index < 0 || index >= list.size())
We would first have to retrieve the index’s value from the class object, and repeat the same for the integer 0. Then our code needs to compare the two and evaluate the second half of the operator if the first half is valid. Again going onto the second half of the statement we retrieve the index value, the then size of our list data type and decide whether to return true. That is a lot of steps as compared to testing for a single value of the list’s type.

Even the built in operator uses the same logic the author is suggesting
If (index < 0 || ... returns true, and doesn’t even begin to evaluate the rest of the expression

But again, I am a VB programmer and you guys can wipe the floor with me when it comes to this topic. But to reinforce the author’s point I participated in Google’s Code Jam, which is basically an algorithm writing contest. All methods are timed, and the first thing I did to speed up the execution of my algorithms was to return as soon as possible.

Being that VB is a high level programming language, it really can not compete with C++ or Java in terms of execution speed. However, I found several of my methods outperformed Java and C++ written by some of the world’s best programmer’s by returning as soon as possible as opposed to having a single return value.

I would suggest you unit test your theories, and see which perform the best. When it comes to writing algorithms, each additional step of code execution really matters. You might not notice the performance impacts looping through 100 or 100 operations but once you start factoring tens or hundred’s of thousands of variables you will notice the difference.

And even above all that is my first statement. I found the author’s examples readable and understandable enough to post this comment. As where I find this statement hard to follow:
Function<:PageLink,Boolean>: condition=new Function<:PageLink,Boolean>()
{
public Boolean run(PageLink link)
{
boolean part1=link.getUrl().endsWith(".pdf") ||
link.getUrl().endsWith(".gz") ||
link.getUrl().endsWith(".tgz") ||
link.getUrl().endsWith(".zip") ||
link.getUrl().endsWith(".doc") ||
link.getUrl().endsWith(".ppt");

boolean part2=!link.getUrl().startsWith("http") &&
!link.getUrl().startsWith("file")

return !(part1 && part2);
}
};

Avoa said...

Your code is not fully factored -- there is repetition. All the continuing blocks have the same purpose, yet they duplicate the jumping.

The treatment for overlong if-conditions is not to break them into separate ifs, but to factor-out the conditional expressions:

for( PageLink link : links )
{
...if( (link.getDepth() < config.getMaxDepth()) &&
.......!isLinkContentBinary() &&
.......isLinkProtocolOk() )
...{
......submit( service, link.getDepth(), link.getUrl() );
...}
}

When reading your code I first see three blocks. I don't know they all do the same thing until I have read through and seen the continue in each. But if there is only one if, the single purpose is immediately obvious.

Anonymous said...

gclu I like your blog. Thank you. They are really great . Ermunterung ++ .
Some new style Puma Speed is in fashion this year.
chaussure puma is Puma shoes in french . Many Franzose like seach “chaussure sport” by the internet when they need buy the Puma Shoes Or nike max shoes. The information age is really convenient .
By the way ,the nike max ltd is really good NIKE air shoes ,don’t forget buy the puma mens shoes and nike air max ltd by the internet when you need them . Do you know Nike Air Shoes is a best Air Shoes . another kinds of Nike shoes is better . For example , Nike Air Rift is good and Cheap Nike Shoes .the nike shox shoes is fitting to running.
Spring is coming, Do you think this season is not for Ugg Boots? maybe yes .but this season is best time that can buy the cheap ugg boots. Many sellers are selling discounted. Do not miss . Please view my fc2 blog and hair straighteners blog.
.thank you .

I like orange converse shoes ,I like to buy the cheap converse shoes by the internet shop . the puma shoes and the adidas shoes (or addidas shoes) are more on internet shop .i can buy the cheap nike shoes and cheap puma shoes online. It’s really convenient.
Many persons more like Puma basket shoes than nike air rift shoes . the Puma Cat shoes is a kind of Cheap Puma Shoes .
If you want to buy the Cheap Nike Air shoes ,you can buy them online. They are same as the Nike Air shoes authorized shop. Very high-caliber Air shoes and puma cat shoes . the cheap puma shoes as same as other.