Mittwoch, 24. August 2011

More Fun with Scala "Dependent" Types

In my last post I wrote about lists that encode their length within their type and the possibilities that gives us to ensure correctness of our code. In this post I want to expand these ideas.

Let's assume that we have a list with an even number of elements. By knowing that the length is even we could do various things like extracting pairs. Odd length is not very suitable for this… no shit Sherlock!

To make this happen we first need boolean values in the type system to represent that a proposition is actually true and we need to be able to negate them.

sealed trait Bool {
type Not <: Bool
}
sealed trait True extends Bool {
type Not = False
}
sealed trait False extends Bool {
type Not = True
}

In the next step we have to extend our type-level natural numbers. Let's give them a predecessor and a property that indicates if a number is even.

sealed trait Nat {
type IsEven <: Bool
type Pred <: Nat
}
sealed trait Z extends Nat {
type IsEven = True
type Pred = Z
}
sealed trait S[A <: Nat] extends Nat {
type IsEven = A#IsEven#Not
type Pred = A
}

To make it a little nicer to read and write we will add a witness that represents that a natural number is even we check this with the =:= type constraint which witnesses that two types are equal.

sealed trait Even[A <: Nat]
object Even {
implicit def isEven[A <: Nat](implicit e: A#IsEven =:= True): Even[A] =
new Even[A]{}
}

Now the fun part. We now start building our list with a getPairs method that only works on lists with an even length. Here is our interface for this NList:

sealed trait NList[A <: Nat, +B] {
type Length = A
def getPairs(implicit e: Even[Length]): List[(B, B)]
def head: B
def tail: NList[Length#Pred, B]
}

The first case again is trivial, we are dealing with an empty list, the only thing that we can extract is an empty list.

case object NNil extends NList[Z, Nothing] {
def getPairs(implicit e: Even[Length]): List[(Nothing, Nothing)] =
Nil
def head = sys.error("empty list")
def tail = sys.error("empty list")
}

Now this one is a little more tricky, we might know that a natural number A is even but what the typesystem does not know if A#Pred#Pred is also even when we call getPairs recursively. We now need to to create a witness that A#Pred#Pred is even when A is even. We can do this with another method that generates a witness when you hand it the witness for A's evenness, let's call that method predPredEven.

sealed case class NCons[A <: Nat, B](h: B, t: NList[A, B])
extends NList[S[A], B] {
private def predPredEven[A](e: Even[Length]): Even[Length#Pred#Pred] =
new Even[Length#Pred#Pred]{}
def getPairs(implicit e: Even[Length]): List[(B, B)] =
(head, tail.head) :: tail.tail.getPairs(predPredEven(e))
def head: B = h
def tail: NList[Length#Pred, B] = t
}

That's all. How does this look when we put it into action?

scala> val lo = NCons(1, NNil)
lo: nlist.NCons[nlist.Z,Int] = NCons(1,NNil)
scala> le.getPairs
res0: List[(Int, Int)] = List((1,2))
scala> NNil.getPairs
res1: List[(Nothing, Nothing)] = List()
scala> lo.getPairs
:12: error: could not find implicit value for parameter e: nlist.Even[lo.Length]
lo.getPairs
^

Very nice! getPairs only works on lists with an even length, otherwise we get a compile error. Once again: If you want to play with this code check out this gist.

In future blogposts I'd like to show you how a language that bears these techniques in mind handles these problems.

Montag, 22. August 2011

A Taste of Dependent Types in Scala

One thing starts to cross my way very frequently is dependent typing. While I'm not an expert on this topic I found some very interesting things in various papers that I would like to be able to do in Scala. Dependent types help to verify that your program does what it's supposed to do by proving theorems.
Since Scala is not dependently typed we need to become a little tricky, but let's give it a try.

Think about the type signature of the following method called zap (which stands for zip-apply):


def zap[A, B](l: List[A], fs: List[A => B]): List[B]


By looking at the signature it should become obvious what the method does. You have a list l that provides elements of type A and a list fs that provides a function from A to B for each element in l. Lets look at the implementation:


def zap[A, B](l: List[A], fs: List[A => B]): List[B] =
(l, fs) match {
case (_, Nil) | (Nil, _) => Nil
case (a :: as, b :: bs) => b(a) :: zap(as, bs)
}


That's a pretty straight forward example, but there is a problem with this code. What if we do provide lists with different length? In one case we would not execute some functions in the other case we would lose data. So what can we do? We could throw an exception but that's already to late, our program is already in a state where things don't make sense anymore, we have to avoid this situation at all and the best thing to achieve this is that such code does not compile at all.

In a dependently typed language we could create a list that encodes it's length within the type of the list itself and write zap in a way so that it would only accept lists of the same length. But since we are not in a dependently typed language we cannot do that, this concludes this blogpost, see you in another episode… ok, just kidding. We actually CAN do that in Scala. Here is how:

The first thing we need is: natural numbers in the type system to encode the length of our list. That is easy:


sealed trait Nat
sealed trait Z extends Nat
sealed trait S[A <: Nat] extends Nat


Z is our zero here and S[A <: Nat] is the successor of the natural number A, so we can create the natural numbers recursively: Z == 0, S[Z] == 1, S[S[Z]] == 2 etc.

Now let's start with the fun part, the list itself.

The first thing we need is the trait for our NList.


trait NList[A <: Nat, +B] {
type Length = A
def zap[C](l: NList[Length, B => C]): NList[Length, C]
}


We can now see that the type of zap enforces that the both lists need to have the same length A. We can now create our first case where we are zapping two lists of length 0 (or Z). The implementation is rather trivial.


case object NNil extends NList[Z, Nothing] {
def zap[A](l: NList[Length, Nothing => A]): NNil.type =
NNil
}


The next case is not as simple (if someone has a better idea of how to implement this, I would love to see it. Please drop a comment).


case class NCons[A <: Nat, B](head: B, tail: NList[A, B])
extends NList[S[A], B] {
def zap[C](l: NList[Length, B => C]): NList[Length, C] =
l match {
case NCons(f, fs) =>
NCons(f(head), tail zap fs)
}
}


We can see that our NCons is actually a NList that is one element longer
than it's tail which has length A, the pattern match inside our zap method only has to handle the cons case since the type signature does not allow anything else.

Now let's put that little bugger to the test:

First lets define some NLists


scala> val l1 = NCons(1, NCons(2, NCons(3, NNil)))
scala> val l2 = NCons(1, NCons(2, NNil))
scala> val f = (_: Int) + 1
scala> val f1 = NCons(f, NCons(f, NCons(f, NNil)))
scala> val f2 = NCons(f, NCons(f, NNil))


Zapping lists with the same length:


scala> l1 zap f1
res1: nlist.NList[l1.Length,Int] = NCons(2,NCons(3,NCons(4,NNil)))

scala> l2 zap f2
res2: nlist.NList[l2.Length,Int] = NCons(2,NCons(3,NNil))


This works just fine, now lets see what happens if we zap two lists with a different length:


scala> l1 zap f2
:14: error: type mismatch;
found : nlist.NCons[nlist.S[nlist.Z],Int => Int]
required: nlist.NList[l1.Length,Int => ?]
l1 zap f2


Exactly what we wanted! This code does not even compile since it doesn't make any sense.
This is a very basic example of what we can do with the Scala type system. Stay tuned for more!

If you would like to play around with the code check out my gist

Sonntag, 10. Juli 2011

From Functions to Monads in Scala

"Monad" that's a word you hear quite often these days. There is a lot of unjustified FUD whenever this word pops up. People seem to be downright scared about it. You can tell by the amount of rhetorical trick words people use to scare others away from trying to understand monads at all. "Academic", "no use in real world programming" etc etc. I find it quite frustrating to hear something like that even within the Scala community from time to time (even though this is getting rarer and rarer), since these are the "arguments" people bring up when they try to scare others away from Scala.

First of all, I'm not a category theorist. These things just fascinate me, and I enjoy reading about this in my free time. I'm your ordinary "Joe, the programmer" working at a small software company.

Still, I find these things useful in my day job. Why? Because as a programmer I'm building mathematical machines (namely software) all day long and these concepts help me to order my thoughts and make at easier to reason about my software and design stuff.

In this blogpost I want to show how to derive Monads from what you already know and give names to things you maybe never even thought about.

Categories

Since monads come from Category Theory we first have to define what a category is.

A category C basically consists of 2 sets ob(C) and hom(C) where we call ob(C) the objects of C and hom(C) the morphisms of C. Morphisms are maps that go between objects.

Never saw a Category in your every day programming? Sure? There is a Category called Hask, its objects are the Haskell types and its morphisms are Haskell functions. So you can think about that in a Scala context:

Let f be a function of type Int => String. Int and String are both elements of ob(C) and f is an element of hom(C).

There's more to a Category, we need to be able to compose morphisms and there has to be a special element called identity. Lets looks at composition first.

Let f and g be functions:


f: A => B

g: B => C

We can compose those 2 functions in Scala by using the compose method to get a new function.


g compose f // yields a function of type A => C

OK, that's half the rent. Now we just need the special element called identity which basically is a function that does nothing. It just takes a value and returns it unmodified. So when we compose a function f with identity we get back a function that is equivalent to f and it should not matter if we compose f with identity or identity with f.

In short: The following 3 functions are equivalent (which means: when fed with the same input values they will yield the exact same result)


f // A => B

f compose id // A => B

id compose f // A => B

Now we know that a Category is (well, at least we know enough to start, a real category theorist can take you on a breathtaking journey from here :)). Let's move on!

Enter Higher-Kinded-Types

Next "scary word", but again: You use them all the time. First we need to know what a kind is. We know what types and values are. You can think of them as levels. A value like 1 and x => x (that's a function value, yes those are values too!) live a the value level while types like Int and Int => Int live at the type level. We group values into types. true and false are both values of type Boolean (think of types as sets of values).

OK, if we group values into types, can we group types? Yes, we can! We can group types into… *drumroll* KINDS! Now we have a third level and our world looks something like this:


  kinds  

---------

  types

---------

  values

In most languages we only deal with one kind called * (pronounced type) that groups all types, but there are languages that allow infinite levels (I might cover one of them in future blog posts).

We now have a basic idea of what kinds are, so what are higher kinded types? Like High-Order-Functions (functions that take functions as arguments and yield functions e.g. compose)
higher kinded types are type constructors that take type constructors as arguments and yield types (or maybe even type constructors).

"Wait a minute… are you talking about generics?"

Think about List. It's not a type by itself, you need to feed it a type as an argument to get a type. This is called a Type Constructor.List takes one type and returns a type, hence List has the kind * -> * (this notation is borrowed from Haskell). The Scala notation for a type constructor like List is List[_].

So a higher kinded type looks something like this: trait Functor[F[_]] where F could be a List, Option or any other type constructor that takes one argument.

Functors

Now that we have higher kinded types, let's take a look at our functions f and g from the beginning and lets add a type constructor F[_] to the mix (this is Scala notation and means: takes one type and yields a type). We can now produce a whole new set of types with this higher kind (e.g. F[Int], F[String] etc.). We could now create functions that work with these types… hey I know this. This sounds like… a Category! Yes, it is! It's actually a Subcategory of our Hask Category we introduced at the beginning. Let's call this new Category D (so we don't confuse it with the type constructor F). So elements of ob(D) would be something like F[A] and for hom(D) it would be something like F[A] => F[B].

Now wouldn't it be cool if there was a morphism between those two categories, one that preservers composition and the identity rule? Something that could map a function of type A => B to a function of type F[A] => F[B]? That's what's we call a Functor, it's a morphism between categories.

"Why is this useful?" you might ask. Just think about the kind of code reuse you could achieve. Let's look at Option as a concrete example. It's a type constructor. So how do we map a function of type Int => Int to a function of type Option[Int] => Option[Int]? Can't you tell by now? The answer is the map method of Option ;).

Let's check it out:


scala> val f = (_: Int) + 1

f: (Int) => Int = <function1>



scala> Some(1) map f

res0: Option[Int] = Some(2)

We could now reuse our function f with Option, there was no need to write a special version of f that works with Option.

The second ingredient we need to make our Functor complete is a morphism that maps a value of type A to a value of type F[A]. In the case of Option this is just the factory function:


scala> Some(_: Int)

res0: (Int) => Some[Int] = 



scala> Option(_: Int)

res1: (Int) => Option[Int] = 



scala> Option(1)

res2: Option[Int] = Some(1)



scala> Some(1)

res3: Some[Int] = Some(1)


Endofunctors

Now, let's assume that we want something like this:


scala> Some(1) map {

     | case x if x % 2 == 0 => Some("YES!")

     | case _               => None

     | }

res1: Option[Option[java.lang.String]] = Some(None)

We want to use a function of type Int => Option[String] with Option's map method. But what's that? We get an Option[Option[String]]. That's stupid, now I have to unpack one Option to get the result I actually wanted.

What happened here? Remember how we mapped a function of type A => B to a function of type F[A] => F[B] in the previous section about functors? What we did now is: we mapped a function of type Int => Option[String] to a function of Option[Int] => Option[Option[String]]. Yikes, we went a little bit over the top here, huh? We kind of like mapped something into an Option and into an Option this is how we ended up with an Option[Option[String]]. You can think of this as a result of a composition of 2 Option Functors.

This is a special kind of Functor that maps a category into itself and is called an Endofunctor

Natural Transformations

Now that we have morphisms between objects of a Category which are Scala functions and Functors which are basically morphisms between categories it's time to introduce morphisms between Functors.

Why do we need this? Well in the last example we ended up with an Endofunctor (we mapped a Category into itself). We need to get rid of one of the Options. We do this with a morphism that maps the composition of 2 Functors F (F composed with F) to the Functor F.

Lets do a concrete example that involves List


scala> List(1,2,3) map { x => List(x + 1) }

res0: List[List[Int]] = List(List(2), List(3), List(4))

It happened again, but this time we ended up with a List[List[Int]]! We now need a Natural Transformation to join those lists together, this Natural Transformation is called flatten in Scala.


scala> List(1,2,3) map { x => List(x + 1) } flatten

res1: List[Int] = List(2, 3, 4)

List comes with a method that does both in one step, it's called flatMap.


scala> List(1,2,3) flatMap { x => List(x + 1) }

res2: List[Int] = List(2, 3, 4)

This enables us to chain functions together in a very convenient way.


scala> Some(1) flatMap { x => Some(2) flatMap { y => Some(x + y) } }

res3: Option[Int] = Some(3)

or a little more concise


scala> Some(1) flatMap { x => Some(2) map { y => x + y } }

res4: Option[Int] = Some(3)


Monads

This is actually what makes up a Monad. It's an Endofunctor with two Natural Transformations called unit and join (which is called flatten in Scala). They are such a fundamental concept that Scala features a syntactic sugar for them, the for-comprehension. Do NOT confuse this with a mere for-loop, this is a different animal. The last example above can be written in the following way:


scala> for {

     | x <- br="" some="">     | y <- br="" some="">     | } yield x + y

res5: Option[Int] = Some(3)



//translates to



scala> Some(1) flatMap { x => Some(2) map { y => x + y } }

res6: Option[Int] = Some(3)



Conclusion

This is an introduction on how to think about monads and how to derive them from simple building blocks. They are a very powerful abstraction that pops up all the time. You use them all the time without knowing. Since programming is all about abstraction it's useful to recognize those underlying patterns. Don't let others fool you into believing that this is just some bizarre functional stuff that has no use in the real world. This is just the tip of the iceberg, an amazing journey lies ahead. Let's learn!

Samstag, 9. Juli 2011

DTrace and it's impact on JVM performance

So, I did this blogpost that gave a very shallow introduction to what DTrace can do with the JVM and I got an amazing feedback after that. Actually I'm quite flattered that it had such an impact.

One message I received pointed out that DTrace probes are having a significant impact on the JVM's performance, he (the author) would never recommend using the DTrace JVM integration, and then he pointed me… well, a product by his own company (which shall remain nameless, since I don't want to advertise commercial products on this blog). To be fair: I admit that I would do the same. When you put a lot of effort into a piece of software you do it for reasons you believe in.

However, he gave me a link to some benchmarks that showed some impressive results and diagrams about how much the performance of the JVM suffers when DTrace probes are enabled. The bar diagrams were scary, DTrace looked pretty bad compared to the "rival" technology (you can't really call it a rival since DTrace has a completely different objective). But something was fishy about this, and that was: the axes of the diagrams were not labeled. They did show a small blue bar and a big green bar and nothing else. The code for the test case was provided as a gif (no copy and paste to reproduce the results nice and easy). Numbers were not put into any perspective. And blog comments were disabled.

None the less, this was interesting enough to start a little bit of research on the topic.

The benchmarks seemed to focus on how much enabled probes did slow down method calls. I personally don't like benchmarks that use extremely unrealistic cases as a foundation (look how fast I can increment them integers in a tight loop suckaz! Mah language iz teh fast!) but this time I will just go with the flow because this is pretty much what they did in that benchmark (don't try this at home kids, use realistic conditions to benchmark your stuff). I'm not using the same test code here but the idea seems to be pretty much the same.

The system I'm running this on is a Thinkpad X201 with a core i7 and 8 gigs of RAM (yeah I know, I'm just compensating, get over it ;)). The operating system is OpenIndiana Build 151-beta with DTrace 1.7. Java is at 1.6.0_25.

The stupid test case I will be using is this Java program:

class Test {

public int callTest(int i) {
if (i != 0)
callTest(i - 1);

return i;
}

public static void main(String[] args) {
Test t = new Test();

long starttime = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++)
callTest(100)
long endtime = System.currentTimeMillis();

System.out.println(endtime - starttime);
}
}

Once again, this is not a realistic example. I will just use it to demonstrate that there really is an impact.

> java Test
118
> java -XX:+ExtendedDTraceProbes Test
4106

Wow, this really seems to hurt… the programm is about 35 times slower with DTrace probes enabled.

To put this into perspective I will commit a capital crime in programming. I will compare this stupid program in language A to the same stupid program written in language B. Let language B be Python in this case. To be crystal clear about this: this is NOT a comparison of Python and Java performance. I just need some landmarks to make all those numbers meaningful (at least to some extent since this is a very badly chosen scenario to begin with).

Here is our Python test case.

import time

def callTest(i):
if not i == 0:
callTest(i - 1)

return i

r = range(0,1000000)

starttime = time.time()

for i in r:
callTest(100)

endtime = time.time()
print (endtime - starttime)

And the result:

> python test.py
21.9892270565


OK, our software runs slower with probes enabled, but we are still faster than Python and Python's performance is acceptable for a lot of use cases. We now have a slower JVM that can be instrumented. So I'd say: No real harm done.

Now let's use those probes and aggregate some real data. For this test I will use a slightly modified version of j_methodcalls.d, a script by Brendan Gregg that is shipped with the DTrace Toolkit. The script is licensed under CDDL but I did remove the license header here to make it more concise and blog-friendly.

#!/usr/sbin/dtrace -Zs

hotspot$target:::method-entry
{
this->class = (char *)copyin(arg1, arg2 + 1);
this->class[arg2] = '\0';
this->method = (char *)copyin(arg3, arg4 + 1);
this->method[arg4] = '\0';
this->name = strjoin(strjoin(stringof(this->class), "."),
stringof(this->method));
@calls[pid, this->name] = count();
}

Let's run it!

> pfexec ./j_methodcalls.d -c "java -XX:+ExtendedDTraceProbes Test"
[…snip…]
249126

OK, this is A LOT slower, even slower than Python. 4 Minutes!

We are now aggregating data and that means we are copying data from userspace into kernelspace from where it can be fetched by DTrace consumers (in our case the dtrace command line tool). What data did we get in this amount of time? Actually: A LOT. It flooded my terminal and I had to pipe the result into less to be able to read all of it. DTrace recorded every method call that happened in the JVM while the program was running. It counted the calls per method, and the class the method belongs to. Keep in mind that we did not need to modify our code to get these results. We don't even need to restart a running JVM to enable the probes we can activate them by using the jinfo command. And we could have used DTrace to gather system wide data in the same script (something I might demonstrate sometime on this blog)

Now lets use the most naive debugging technique on earth. We will print out "CALLED!" every time our callTest method gets called (if you ever did this before you know how disastrous the result well be). This gives us pretty much no information. We just know that a particular method has been called and we need to modify our code, recompile and load it into running JVM.

> time java Test
[…snip…]
CALLED!
5958514

real 1h39m18.15s
user 3m53.29s
sys 7m55.68s

As we expected, the result is a disaster. Calling print in a tight loop is an extremely stupid thing to do. We could have used a counter that get incremented with every method call, proxy objects, interceptors etc. (all of them would have been significantly faster).

To do something similar like the print example with DTrace I just a another clause to the script:

tick-1s {
printa(@calls);
trunc(@calls);
}

This addition prints out what happened in 1 second intervals

1 75779 :tick-1s
4028 Test.callTest 400379

1 75779 :tick-1s
4028 Test.callTest 404720

1 75779 :tick-1s
4028 Test.callTest 402135

1 75779 :tick-1s
4028 Test.callTest 398934

253064
dtrace: pid 4028 has exited


real 4m14.23s
user 4m13.89s
sys 0m0.46s

The performance impact stays pretty much the same with DTrace, we are done in 4 Minutes while we are presented with a readable stream of information.

There are a lot of ways to generate similar data, but most of them require code changes, are not able to do system wide tracing, are limited to one process and/or just one specific runtime.

Conclusion

Tracing the JVM costs (this shows especially in this pathological use case), but DTrace provides us with a very broad spectrum of probes. The JVM ones are just one source of data. We can actually instrument every part of the system with our DTrace script. Maybe a problem is not even related to our program at all, maybe it's NFS misbehaving, something is wrong with the database or there is some heavy IO going on. With DTrace the whole system becomes transparent. This changes the whole "blame game" and that's the whole point of DTrace. Looking at the system as a whole.

The bottom line is: trace the JVM only if you need to and be aware of the performance impact. This tool is for hunting down problems that are very hard or even impossible to analyze with traditional tools. I did use it to trace obscure memory leaks and dead-locks (both in non-Java contexts) and I was able to exactly pinpoint the culprit.

Don't use DTrace when there is a tool that does a better job for this specific task. Use it wisely. Anyway, it's a great utility to have in your toolbox.

Last but not least: use realistic use cases for benchmarking, label your diagram axes, and compare tools that have the same objective.

Mittwoch, 6. Juli 2011

DTrace and Scala

DTrace is one of my favorite technologies ever, it absolutely redefined my view on software and how it can be debugged. In a lot of situations you are basically forced to debug your software by vomiting print-statements just all over the place. To make things worse you have to take them out when you are actually shipping your software, leaving you blind to errors to come. In other cases you probably are missing some print-statements to trace down a very nasty bug, or the bug is hidden in some 3rd-party lib that would take hours to recompile with your new print statement (don't start thinking about libs that you don't have the code of…). In most cases this can become a very unpleasant situation (to put it mildly). DTrace takes a lot of the hassle away by providing you with a mechanism that can modify your program while it's running without stopping it.
Pretty amazing, huh? Wait, it gets better. You can even trace into the runtime of some high level languages (if they provide the probes). This is also true for the JVM, and that means we can instrument a running Scala program.

In this post I will walk you through a very basic script that shows what methods from Predef get called by your program. DTrace is a tool that reaches deep down into the lower levels of your machine. Sounds scary? Nah not really, one of the design goals of DTrace is safety. Your scripts run in a managed environment that keeps it from doing harmful things.

OK, enough sweet-talk. Time to get our hands dirty by writing a very simple Scala program:

import scala.annotation.tailrec

object Main {
@tailrec
def sayHallo() {
println("HALLO!")
Thread.sleep(1000)
sayHallo()
}

def main(args: Array[String]) {
sayHallo()
}
}

This just prints out "HALLO!" in 1 second intervals (not exactly rocket science, but I put a little sugar on top of it by replacing the while loop with a tail recursive function for fun and profit).

What's that? When running my program DTrace is not showing me ANY probes!!?!?! FFFFUUUUUUUU! That's because we have to enable them first, we can instruct a running JVM to do this by using jinfo. Since I only got one JVM running on this box I will fetch the PID with pgrep.

jinfo -flag +ExtendedDTraceProbes $(pgrep java)

The JVM probes are now armed and "dangerous" (just kiddin') and you will have access to the hotspot provider.

Now lets write the DTrace script. Keep in mind: this script is running in kernel space, so we have to copy in some information from userspace, we do this by using copyin and we have to NULL-terminate the strings ourselves. Yep, this is what it feels like to program low-level stuff, it's not as pretty as FP but aaaaaaaaaanyway, here is the little bugger.

#!/usr/sbin/dtrace -s

#pragma D option quiet

hotspot$$target:::method-entry
{
this->class = (char *) copyin(arg1, arg2 + 1);
this->class[arg2] = '\0';
self->tracefunc = stringof(this->class);
}

hotspot$$target:::method-entry
/self->tracefunc == "scala/Predef$"/
{
this->method = (char *) copyin(arg3, arg4 + 1);
this->method[arg4] = '\0';
printf("%s %Y\n", stringof(this->method), walltimestamp);
self->tracefunc = 0;
}

hotspot$$target:::method-entry
/self->tracefunc/
{
self->tracefunc = 0;
}

This thing will fire whenever a function from Predef is called and will give us the function name (in our test case this is just println) and the time when this function was being called. I run this on OpenIndiana build151-beta by issuing pfexec dtrace ./tracescript.d -p $(pgrep java) after I enabled the hotspot provider on the JVM. (pfexec is kind of like sudo, just use whatever gives you the permission to run dtrace on your box)
The output will look like this:

println 2011 Jul 6 21:27:34
println 2011 Jul 6 21:27:35
println 2011 Jul 6 21:27:36
println 2011 Jul 6 21:27:37
println 2011 Jul 6 21:27:38
println 2011 Jul 6 21:27:39
println 2011 Jul 6 21:27:40
println 2011 Jul 6 21:27:41
println 2011 Jul 6 21:27:42
println 2011 Jul 6 21:27:43
println 2011 Jul 6 21:27:44
println 2011 Jul 6 21:27:45
println 2011 Jul 6 21:27:46
println 2011 Jul 6 21:27:47
println 2011 Jul 6 21:27:48
println 2011 Jul 6 21:27:49
println 2011 Jul 6 21:27:50
println 2011 Jul 6 21:27:51
println 2011 Jul 6 21:27:52
println 2011 Jul 6 21:27:53
println 2011 Jul 6 21:27:54
println 2011 Jul 6 21:27:55


WTF IS THIS I DON'T EVEN!?!?!? TL;DR

OK, this is not even the tip of the iceberg but I think I will wrap it up because there is a lot of ground to cover when it comes to DTrace. If you are hungry for more you should check out the "DTrace Review" by @bcantrill, this stuff will blow your mind (seriously, WATCH IT!) or buy the book by @brendangregg. I will make sure to dig deeper on the topic, so stay tuned. Tell me what you think, good tool or best tool? :P

English, please!

It took me quite some time to make up my mind about this… should I switch the language of this blog to English or should leave it just the way it is. I think it's worth to give this a try, since twitter brought me towards a mostly non-German audience which I also would like to address. This is also a great opportunity to polish my language skills a little since they got kind of neglected in the last few years (so bear with me ;)

So, to all my new readers, here is a little overview on what this blog is about. I work as a developer at a German software company so quite a lot of stuff you will find here will focus on programming and debugging. In my free time I like to play around with different operating systems (mainly Illumos, FreeBSD and DragonFlyBSD) so this place will be stuffed with information about OS specific wizardry :3

OK, folks! Let's give this a shot!

Mittwoch, 23. Februar 2011

Dynamic-Dispatch in Scala

Ich bin ja hin und wieder schon mal richtig neidisch auf Clojures Multimethods, möchte aber nicht wirklich meine statisch typisierte Welt verlassen (auch wenn Clojure vermutlich die einzige Sprache wäre die ich von allen dynamischen Sprachen wählen würde).

Lange Rede, kurzer Unsinn: Ich will Multimethods in Scala! Auf den ersten Blick klingt das als würde man sich eine Sprache zu etwas hinbiegen wollen was sie gar nicht ist, aber in Scala geht das ganze erschreckend schmerzlos.

Zum dispatchen verwende ich einfach ein paar partielle Funktionen die man mit orElse beliebig aneinanderketten kann. Der Vorteil davon ist das ich keinen Code aufmachen muss wenn ich einen weiteren Fall hinzufügen will sondern durch Komposition zum Ziel komme.

trait Dynamic

class A extends Dynamic
class B extends Dynamic

object Dynamic {
// 2 dynamische argumente
type Dyn2 = (Dynamic, Dynamic)

// infix notation fuer PartialFunction
type ~>[A, B] = PartialFunction[A, B]

val f: Dyn2 ~> String = {
case (a: A, b: A) => "A A"
}

val g: Dyn2 ~> String = {
case (a: A, b: B) => "A B"
}

// d ist die dispatchfunktion
def dispatch[A, B](d: A ~> B, a: A): Option[B] =
if (d.isDefinedAt(a)) Some(d(a))
else None
}

In Aktion sieht das Ganze so aus:


scala> :l ./dynamic.scala
Loading ./dynamic.scala...
defined trait Dynamic
defined class A
defined class B
defined module Dynamic

scala> import Dynamic._
import Dynamic._

scala> dispatch(f, (new A, new A))
res0: Option[String] = Some(A A)

scala> dispatch(f, (new A, new B))
res1: Option[String] = None

scala> dispatch(f orElse g, (new A, new B))
res2: Option[String] = Some(A B)

Wie man schön sehen kann werden auch die nicht abgedeckten Fälle behandelt indem wir einfach eine Option verwenden, dadurch könnten wir unseren Dispatch auch in einer for-Comprehension verwenden (Monaden für den Gewinn!)

Viel Spass beim ausprobieren ;)

Hier ein kleiner Nachtrag wozu Multimethods gut sind. Für alle die Das Visitorpattern schon immer gehasst haben ;)

Montag, 21. Februar 2011

Scala: Besseres Pattern-Matching mit Sealed Classes

Neulich bin ich auf eine recht interessante Mail in der jvm-languages Google Group gestossen. Ein Vergleich von Scala und OCaml! Wie spannend!
Besonders interessant fand ich folgendes Beispiel:

# let foo = function
| true, _ -> 0
| _, false -> 1;;
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
(false, true)
val foo : bool * bool -> int =

OCaml liefert hier beim kompilieren einen Fehler: Das Patternmatching würde nicht alle Möglichkeiten erschöpfen. Zusätzlich bekommt man noch ein Beispiel wo der Match fehlschlägt. Tolle Sache! Das Scala Beispiel sieht hier doch eher ernüchternd aus.

scala> def foo(b: (Boolean, Boolean)): Int = b match {
| case (true, _) => 0
| case (_, false) => 0
| }
foo: ((Boolean, Boolean))Int

scala> foo(false, true)
scala.MatchError: (false,true)
at .foo(:4)
at .(:5)
at .()
at java.lang.Class.initializeClass(libgcj.so.81)
at RequestResult$.(:3)
at RequestResult$.()
at java.lang.Class.initializeClass(libgcj.so.81)
at RequestResult$result()
at java.lang.reflect.Method.invoke(libgcj.so.81)
...

Anscheinend ist Scala wohl nicht in der Lage herauszufinden welche Fälle möglich sind und kann so nicht testen ob das Matching auch wirklich alle Möglichkeiten ausschöpft. Aber genau dieser Effekt lässt sich mit Sealed Classes erreichen.

Als Beispiel definiere ich einen eigenen Booltypen mit einem Subtypen für jeweils true und false (ich nehme hier Objects da wir nicht mehr als eine Instanz von True bzw. False brauchen). Das gleiche Beispiel schlägt hier mit einer ähnlichen Fehlermeldung wie bei OCaml fehl:

abstract sealed class MyBool
case object True extends MyBool
case object False extends MyBool

object MyBool {
def test(a: (MyBool, MyBool)): Unit = a match {
case (True, _) => ()
case (_, False) => ()
}
}

MyBool.scala:6: warning: match is not exhaustive!
missing combination False True

def test(a: (MyBool, MyBool)): Unit = a match {
^
one warning found

Die Sealed Class hat zur Folge das wir keine weiteren Subtypen außerhalb unser Quelldatei anlegen können, der Compiler hat jetzt also alle Information die er braucht um den Match zu prüfen.

Mehr zum diesem Thema gibt es wieder mal bei Mario Gleichmann.

Montag, 31. Januar 2011

Scala: Spass mit Streams

Ich wurde ja neulich mal gefragt ob ich mal was über Streams in Scala bloggen könnte (auch wenn es hier eher um IO ging). Da mir da allerdings ein gutes Beispiel gefehlt hat, habe ich das erst einmal auf die lange Bank geschoben (irgendwie blogge ich derzeit glaub ich etwas zu wenig :/).

Aber ok, dank eines hervorragenden Posts von Mario Gleichmann bin ich dann doch noch auf eine ganz brauchbare Idee gekommen. (Schon allein Vollständigkeit halber empfehle ich den Post von Mario zu lesen!)

Im folgenden Code geht es einfach nur darum die nachfolgende Jahreszeit zu bestimmen.

sealed abstract class Season
case object Spring extends Season
case object Summer extends Season
case object Fall extends Season
case object Winter extends Season

object Main {
lazy val seasons: Stream[Season] =
Spring #:: Summer #:: Fall #:: Winter #:: seasons

def nextSeason(now: Season): Season = {
@scala.annotation.tailrec
def getNextFromStream(s: Stream[Season]): Season =
s match {
case a @ x #:: y #:: _ =>
if (now eq x)
y
else
getNextFromStream(a tail)
}
getNextFromStream(seasons)
}

def main(args: Array[String]): Unit = {
println(nextSeason(Spring))
println(nextSeason(Summer))
println(nextSeason(Fall))
println(nextSeason(Winter))
}
}

Um die Jahreszeiten abzubilden nutze ich hier einen Stream (also eine unendliche Liste) die ich rekursiv durchgehe bis ich die aktuelle Jahreszeit gefunden habe und einfach die Nächste ausgebe.

Dieses Beispiel wirkt zugegebenermaßen etwas wie mit Kanonen auf Spatzen zu schiessen, zeigt aber die Ausdrucksstärke von Streams und Lazy-Evaluation. Ein sehr realer Anwendungsfall ist hier z.B. ein Roundrobin-Verfahren. Anstelle also stumpf eine Liste durchzugehen und am Ende wieder an den Anfang springen in Zukunft einfach mal über einen Stream nachdenken ;)