Friday, September 7, 2007

Lesson 2: Input and Output, Variable Binding, and more

Overview


Today we will be examining this simple program:
module Main where

main =
do putStrLn "What is your name?"
name <- getLine
putStrLn ("Hello, " ++ name ++ ". I think you will really like Haskell!")

Copy this code into a file named HelloYou.hs, and then run it in GHCi (C-c C-l, and then run the main function), or compile it an run it (M-C ghc --make -O2 HelloYou.hs -o helloYou).

Files containing Haskell source code will almost always end with the extention .hs. You should follow this convention as well, because the compiler expects it.

do notation


The first new thing we see is the do keyword. In this context, the do keyword indicates that we want to perform several IO (input/output) actions in a row.

Notice how each action is indented the same amount. In Haskell, the layout of the code is significant. The do statement automatically ends when a line that is indented less is encountered.

For example, in this code:

main =
do putStrLn "What is your name?"
name <- getLine
putStrLn ("Hello, " ++ name ++ ". I think you will really like Haskell!")

cheese = "cheddar"

Because cheese = "cheddar" starts at the first column, it is not part of the previous do statement. The blank line before cheese is ignored.

If a line is indented more, then it is considered to be a continuation of the previous line. For example, we can reformat our program so that the third action is split across two lines:

main =
do putStrLn "What is your name?"
name <- getLine
putStrLn ("Hello, " ++ name ++
". I think you will really like Haskell!")

The meaning is not changed at all.

Variable Binding


The next new thing we see is this line:

name <- getLine

getLine is an IO action that reads a line of input from stdin. The <- operator binds the variable name to the value read by getLine.

What Does bind Mean?

You can imagine that the value returned by getLine is a cardboard box with a String inside. binding name to the value is like putting a label on the cardboard box. This makes it easy to refer to that value later, because we can just use the variable name.
Not Like Variables You Have Seen Before


If you have used other programming languages, you are probably familiar with a different kind of variable. For example, in C, you would declare a variable i and assign it the value 9 like this:

int i;
i = 9;

This called a destructive update and is different than binding a variable. If we think about the box analogy, the statement int i; creates a new cardboard box with the label i already attached to it. The statement i = 9; opens up the box, destroys the current contents, and then puts 9 in the box.

This difference is pretty substantial -- just imagine how you would feel if you were still using the old contents of the box! We will cover this concept more in a few lessons, and see why the difference is so exciting.

The ++ operator


The next new thing we see is the ++ operator. ++ concatenates two lists together and returns a new list. In Haskell, a String is just a list a characters.

Where did those () come from?


In the first lesson, we noted that Haskell does not require you to use parentheses when calling a function; but now we have some parentheses, so what's the deal? The parentheses in Haskell are used to group operations, in the same way you would in math.

So in this line:

putStrLn ("Hello, " ++ name ++ ". I think you will really like Haskell!")

The parentheses indicated that we want to concatenate the strings first, and then apply putStrLn to the new string. If we did not use parentheses, the compiler would add implicit parentheses like this:

(putStrLn "Hello, ") ++ (name ++ ". I think you will really like Haskell!")

which does not make any sense; it says we want to print the string Hello , and then append the String, (name ++ ". I think you will really like Haskell!"), to the value returned by putStrLn.

No Variable Declarations, But Still Safe


In our program, we use a variable called name. You may have noticed that we did not declare that variable before we started using it. That saves us some typing, which is nice, but what happens if we make a typo and spell it nmae? Let's load this code into GHCi and see:

module Main where

main =
do putStrLn "What is your name?"
name <- getLine
putStrLn ("Hello, " ++ nmae ++ ". I think you will really like Haskell!")

GHCi reports the following error:

*Main> :load "/root/n-heptane/docs/haskell-lessons/HelloYou.hs"
[1 of 1] Compiling Main ( /root/n-heptane/docs/haskell-lessons/HelloYou.hs, interpreted )

/root/n-heptane/docs/haskell-lessons/HelloYou.hs:6:30:
Not in scope: `nmae'
Failed, modules loaded: none.
Prelude>

Nice! GHC tells us that the identifier `nmae' at line 6, column 30, is not defined. Notice that we have not tried to run the code yet, the bug was detected at compile time. So, we get the best of both worlds: we don't have to tell the compiler about our variables before we use them, but the compiler can still tell us if we accidentally use an undefined variable.

Cool Stuff We Learned Today


Today's code example was pretty simple, but, we managed to avoid four extremely common bugs that have been responsible for thousands of security holes and program crashes.

Safe from Buffer Overflows

The first two bugs we avoid are buffer overflows. Buffer overflows are an extremely common source of security holes and program crashes. Buffer overflows occur when a string is too big to fit in the space allocated for it, or when some code thinks a string is longer than it really is, and tries to read characters beyond the end of the string.

Two common places to encounter buffer overflows are when you are reading input and receive more input than you expected, or when you are concatenating strings and don't allocate enough space, or you accidentally copy too much data.

In our code sample of the day, we read input with getLine and concatenate Strings with ++. But, we never had to worry about how long the Strings were, it was all handled automatically for us.

Safety from buffer overflows is nice, but does not really set Haskell apart. Many other languages, such as Java, Python, Perl, Ruby, etc, are also safe from buffer overflows in this way (as far as I known). So let's look at the next two bugs we managed to avoid, which are a bit more interesting.
Safe from Uninitialized Variables

We saw earlier that when we misspelled name as nmae, the compiler caught our mistake. It noticed that we were trying to use the variable nmae, but nmae had not been bound to anything yet.

This is really nice! In many languages that do not required you to declare your variables in advance, you would not notice this bug until you ran the program. In some cases the program would die when it tried to use nmae. In other cases, it would just assume that nmae was equal to the empty string "", so you might not even notice the problem!

Even in languages where you have to declare your variables in advance, we are susceptible to a similar bug. Consider the following C code:

int a;
int b;
int c;

b = 1;
c = a + b;
printf("%d\n", c);

Even though we declared our variables a,b, and c, I forgot to assign a value to a. This means that when try to use a to calculate c we have absolutely no idea what will happen. On my system c was equal to -1077263575 the first time I ran it, and -1080545591 the second time.

Fortunately, Haskell goes the extra step and requires that all the variables we use are actually bound to a value, so we will not be seeing of those nasty behaviors.
How It Looks Is Meaningful

We saw in this lesson that the formatting of the code is important to the compiler. This is nice because the code looks pretty, and we don't have to type in lots of extra characters like (){};. But it also helps us avoid bugs. Almost all programmers try to format their code so that you can understand the flow of the code by the way it looks. But, in most languages, the compiler does not care how the code looks. Consider the following C code:

if (something)
doSomething();
doSomethingElse();

doSomethingToo();

Even though doSomethingElse() looks like it is inside the if statement, it is not. The compiler reads it like this:

if (something)
doSomething();

doSomethingElse();
doSomethingToo();

Ouch! Since Haskell does care about the formatting, you are far less likely to see things differently than the compiler does.

Summary


Well, I think I went over my time again, but there was a lot of new material, even though today's program was only two lines longer than yesterday's. Next time we will learn about the case statement. This will allow us to do different things depending on what the user enters.

You don't need to memorize or perfectly understand everything in this lesson. We will be exploring these concepts more in the upcoming lessons, which should help you to remember and understand them.

emacs corner: TAB indenting



In the previous lesson, we saw that using emacs makes it easy to load programs into GHCi. You probably also noticed that emacs colored the source code for you. In this lesson, we learned that the indentation of each line is significant. emacs can help here too. If you copied and pasted the code into emacs try typing it in by hand instead. When you need to indent a line, press the TAB key a couple times in a row. You will see that emacs cycles through the different possibilities. Personally, I find this feature extremely useful.

14 comments:

Anonymous said...

Great blog!

I'm eager to read lesson3.

Unknown said...

I'm trying to run this in Textmate (with the Haskell-bundle). It worked great for lesson 1, but here I get the following output:

What is your name?
*** Exception: <stdin>: hGetChar: end of file

Any idea on how to fix that? Is it a Textmate-problem that it can't read from STDIN?

Great lesson so far by the way!

Jeremy Shaw said...

martijn:

It sounds to me like it is indeed a problem with textmate not being able to read stdin. I don't know anything about Textmate, but hopefully someone who does can help you.

Unknown said...

Thank you for the tutorial, it is quite enlightening for a noob like me.

However, I can't seem to be able to compile the code in the example. GHCi spits out an error on line 4:

do putStrLn "What is your name?"

No matter what I do (different indentations), I always get a "parse error in pattern" message from the compiler. (I'm running emacs 23 in Ubuntu Feisty).

Any help will be greatly appreciated.

Jeremy Shaw said...

Enrique:

You should be able to just copy and paste the code directly from your web browser into emacs and run it.

The error you are seeing could be do to something being wrong in the previous line. For example, if you did not put an equals sign after main you would get the error:

/tmp/Test.hs:4:4: parse error on input `do'

The problem could also be in the next line. If I indent the next line name <- getLine too much, then I get the error:

/tmp/Test.hs:4:9: Parse error in pattern.

This is because indenting the next line too much, makes it a continuation of the previous line. So, the compiler reads it as if you had written a single line like,

putStrLn "What is your name?" name <- getLine

(Ugh, that might show up as two lines due to the column width of the blog. If so, pretend it is just one line).

The compiler reports the line number where it ran into the symptoms. But, sometimes the problem may be somewhere else. Since the compiler does not know what you are trying to do, it can only report the symptoms. With a little experience, you will be able to find the cause quickly.

In these lessons, I am trying to show lots of errors and what they mean, since that is often the hardest part of learning a new language. So, thanks for asking!

If you still have difficulties, please email your code to me. jeremy at n-heptane.com.

Unknown said...

Jeremy,

Thank you for the detailed response. Indeed, the parse errors I was getting where due to the fact that I was not properly indenting the function block after the "do" statement.

Beginners take note of the way the lines that start with 'putStrLn', 'name' and 'putStrLn' again are indented in relation to the 'do' statement. Unfortunately emacs' haskell mode has a tendency to over-indent, and when I tried to correct that behaviour, I ended up under-indenting, rather.

Once again thank you, Jeremy, for sharing.

Tom said...

In one of the examples: "main =
do putStrLn "What is your name?"
name <- getLine
putStrLn ("Hello, " ++ nmae ++ ". I think you will really like Haskell!")"

The last line has a spelling error, nmae > name

Jeremy Shaw said...

Tom,

Thanks -- but, that is the whole point of that example. It demonstrates that when you make a typo like that, the compiler will catch it for you at compile time. Did the writeup not make that clear?

- jeremy

Abhimanyu said...

sir my name is Abhimanyu Sharma. I am a student of Centre for Converging Technologies, India.

I have keen interest in Haskell and just started my work on it. your blog has helped very much. i want to be in your contact to sort out my daily problems related to haskell.
i dont find your email address anywhere. i am sending u my mail please add me in your contacts.
thanks
Abhimanyu

frances said...

i try to get an integer or convert user input to integer below but i keep getting this strange error

getFontSize =
do
fontsize <-getLine
let defaultSize = (40::Integer)
let size =read(fontsize::Integer)

{-ERROR: Couldn't match expected type `Integer'
against inferred type `String'
In the first argument of `read', namely `(fontsize :: Integer)'
-}
if size < 1
then return defaultSize
else return size

dont quite know what i'm doing wrong here

MCAndre said...

The following code refuses to run. Yes, I use hard tabs. And even with soft tabs, this fails 90% of the time.

module Main where

main = let x = 1
[TAB][TAB]y = 2
[TAB][TAB]z = 3
[TAB]in putStrLn $ "X = " ++ show x ++ "\nY = " ++ show y ++ "\nZ = " ++ show z

Anonymous said...

I have coppied the lesson sample but i got this error

Prelude> :load "testtwo.hs"
[1 of 1] Compiling Main ( testtwo.hs, interpreted )

testtwo.hs:3:5: parse error on input `<-'
Failed, modules loaded: none.

did i miss something?

Jeremy Shaw said...

@Anonymous

The whitespace in the source code is meaningful. Based on your error message, I believe that in your source code the line,

name <- getLine

is all the way to the left of the line (ie, the first column). The word 'name' must be lined up directly under 'putStrLn' in the previous line.

With name in the first column, the compiler thinks you are trying to define a new function called 'name' and that is why it does not understand what to do with the <-.

Hope this helps!

Paula said...

Thank you sweetheart, people like you make the world a better place ;*