Sci-Fi FilmRank

Sweet circumstance! The huge positronic brain that I built in my cellar to compute the algorithmic movie ranking FilmRank is still warm. So when I saw this list of

how could I resist to submit the question to the electronic abomination? After all, left to its own devices, the machine might otherwise become sentient in its idle time and decide to take over the Internet. And then. The. World.

“Can you give me the 25 most important sci-fi movies, ranked by FilmRank.”

There was a moment’s expectant pause whilst panels slowly came to life on the front of the console. Lights flashed on and off experimentally and settled down into a businesslike pattern. A soft low hum came from the communication channel.

“Difficult. What is this Sci-Fi of which you speak?”

To decide what is Sci-Fi and what isn’t, I again fell back on IMDB, namely on the genre labelling. Now, many movies list dozens of genres, so I arbitrarily chose to only recognise the two or three that make it to the front page.

This leads to some decisions that I disagree with: For example, the hive-mind that is IMDB users don’t classify Der Golem, wie er in die Welt kam (1920) as Sci-Fi, so it doesn’t make the list. On the other hand, Superman (1978) is in there. Even more unbearably, the The Rocky Horror Picture Show would have made a respectable 24th place, had it not been classified as Comedy/Musical.

But of course, the conceit of this whole project is to minimise my influence, so the list is what it is. I defer completely to my algorithmic overlord, and the human life-forms it has absorbed into its computational matrix.

So here it is, presented with infinite majesty and calm. Those movies that appear on both the FilmRank list and on io9’s I have marked with a star.

  1. Star Wars (1977)
  2. 2001: A Space Odyssey (1968) [*]
  3. Metropolis (1927) [*]
  4. Frankenstein (1931)
  5. A Clockwork Orange (1971)
  6. The Terminator (1984)
  7. E.T.: The Extra-Terrestrial (1982) [*]
  8. Star Wars: Episode V – The Empire Strikes Back (1980) [*]
  9. Alien (1979) [*]
  10. The Matrix (1999) [*]
  11. THX 1138
  12. Blade Runner (1982) [*]
  13. Aliens (1986)
  14. Superman (1978)
  15. Back to the Future (1985) [*]
  16. Gojira (1954)
  17. Bride of Frankenstein (1935)
  18. Terminator 2: Judgment Day (1991) [*]
  19. Star Wars: Episode VI – Return of the Jedi (1983)
  20. Planet of the Apes (1968) [*]
  21. Jurassic Park (1993)
  22. The Day the Earth Stood Still (1951) [*]
  23. Close Encounters of the Third Kind (1977)
  24. RoboCop (1987) [*]
  25. Invasion of the Body Snatchers (1956)

Comments? The FilmRank list is a lot more “classical,” which is an artefact of the algorithm. (The damping factor is 0.50.) For example, the spectacularly good District 9 doesn’t have a chance. I note with surprise that Back to the Future appears on both lists, even though it probably wouldn’t make my own. Oh, and FilmRank has Gojira, so it’s full of win. Now, I wonder which parameters I must tweak to get Reptilicus (1961) on the list…

P versus NP versus PR

In August 2010, an enormous public relations opportunity fell into Computer Science’s lap, when Vinay Deolalikar’s attempt to separate P from NP resulted in unprecedented internet interest and media coverage.

I think it was a big success. (The outreach. Not the proof.) There are lots and lots of people who are interested in this stuff, and we should tell them more about it. Here are some reflections on my own tiny involvement.

Timeline

  • On Sunday, 8 August, email from a former student pointed me to Greg Baker’s blog, which had announced Deolalikar’s manuscript the day before. Within two days, and perhaps largely due to Richard Lipton’s enthusiasm, hundreds of blog comments are written, and it is clear to me that this story could make it into regular news. Not because of the proof, but because of the collaborative model.
  • Tuesday, 10 August, I write a crisp but comprehensive summary to the communications department at ITU Copenhagen, explaining the case, and my reasoning for why and how this could attract the attention of journalists. Luckily, they bite, and start pitching the story to Danish media.
  • Wednesday, 11 August. The communication department gets back to me. Both venues I had in mind accepted: Danish public radio DR P1 Morgen and the weekly paper Weekendavisen. (I have appeared in both of these places previously, listen to the former and subscribe to the latter, so they are aimed at the demographic I am in.)

    Meanwhile, the story itself got even better: a wiki page at polymath has been created to discuss the proof. On the other hand, Deolalikar’s proof doesn’t fare too well because of problems with the structure of random k-Sat.

  • Thursday, 12 August. I spend well over an hour in the morning talking to a journalist from DR P1 Morgen on the phone, to prepare for a radio interview for Saturday. This is how it works: she spends a lot of time and energy asking very good question and exploring various angles, actually really trying to understand what’s going on, scientifically. She then writes a brief intro and a list of possible questions and angles to the host of the radio show, who will conduct the broadcast interview.

    In the afternoon, I do basically the same hour-long phone interview with a journalist from Weekendavisen. WA’s deadline is Wednesday evening, so we aim for a longer piece in the paper’s science section next week.

    Meanwhile, Neil Immerman points to big problems with the finite model theory part of Deolalikar’s proof.

  • Saturday, 13 August. I take an early train to Copenhagen, and enter the DR building. I’ve done this part once before, so I already feel like a jaded member of the chattering classes, going through the motions like a frequent airplane traveller through security. I announce my presence at the reception, receive a badge, and an assistant picks me up to lead me to a waiting area. The assistant gives me a brief chance to summarise what I’m going to talk about, probably this just to loosen my tongue. There’s coffee and morning papers. (Nothing about Deolalikar. I had hoped some major newspaper would falsely run the angle that “Outsider solves major math riddle, embarrassing reluctant ivory tower academics.” But no such luck.) After half an hour of waiting I’m ushered into the studio and get a bit more than the previously agreed 8 minutes to say the word algorithms as often as I can.

    I think it went well, though I still haven’t had the guts to hear it.

  • Tuesday, 17 August. I should have had the Weekendavisen article for review by now, but WA has changed its overall design, including the software, so everything is running very late. (But they did change the body typeface to Garamond, making WA officially the best typeset paper on the planet. So all is forgiven, and my subscription renewal is guaranteed.)
  • Wednesday, 18 August. I get the WA article, and I’m quite impressed. The angle is via Wiles and Perelman, then to collaborative software, outsourcing, and what I call the triumph of amateurism. Even some of my concerns about basic research funding are briefly touched upon, mathoverflow and polymath are mentioned. I know full well that I can’t change the angle anyway, but there are of course numerous problems with the details. Predictably, the article writes “mathematics” whenever it should write “computer science.” I spend some more time editing and send my changes back, fingers crossed.
  • Thursday, 19 August. I get WA with my regular Swedish paper in the morning. The article is the front piece for the science section. (WA has no electronic edition I can link to. You have to buy it.)

Even though this all took a lot of time and energy, I’m pretty happy. The angle, i.e., collaborative math, turned out to be just right, and is the same that several other media have chosen. But remember this is all just a platform to say P, NP, algorithms, and complexity in public media as often as possible.

But it’s Wrong!

When doing something like this, know from the beginning that you have no bandwidth. You cannot explain anything. You have no time, and you have no visual aids. News media are neither very good at, nor even interested in, explanation of concepts. So don’t. And refrain from unnecessary precision. I wish I would learn.

The summary of the radio show includes the following (in my translation):

P versus NP is a scientific hypothesis that can be translated, very simplified, to the claim that there are problems that can be solved, and problems that can’t be solved.

Now, this is of course false. There are problems that cannot be solved, whole classes of them. Everything in UNDECIDABLE is undecidable no matter how much time you invest, and the problems in EXPTIME are infeasible in the sense that even though algorithms exist, they don’t run in polynomial time; we know P differs from EXPTIME by the time hierarchy theorem.

Would it have been worth correcting? No. The whole P versus NP problem loses a lot of its appeal once you start with “even though this claim is proved for some problems, there is a certain class of problems called NP for which this is not yet known.” Of course, the trick is instead to establish NP via verification versus computation, and I tried that both in the radio and the paper using Sudoku.

But the important thing is that none of this is the point. My mission is not to ventilate the correct definition of NP; anybody really interested can look it up on Wikipedia. I’ve had students who took whole classes from me who blissfully live in this misunderstanding. Those I try to correct, and sometimes successfully. But for the general public, I think this is unnecessary complication that only weakens the story.

I did get them to sneak “very simplified” in the summary, out of scientific integrity, and left it at that.

I can see that the New York Times currently has the same problem. A pretty good article from 16 August now has the following retraction:

Correction: August 21, 2010.
An article on Tuesday about a proposed solution of a classic mathematical formula called P versus NP described NP incorrectly and misidentified the location of the CERN physics research center. NP stands for the class of problems for which solutions can be verified relatively quickly, not the class of problems that are difficult to solve but easy to verify. And the CERN laboratory is near Geneva, not in Zurich.

Got it? I don’t.

The WA article contains a bigger mistake, and one I feel bad about. In my translation:

A 9×9 Sudoku is just big enough to entertain us during our morning coffee. But if we played Sudoku with letters instead of numbers, the number of combinations would be so huge, that all supercomputers on the planet would need until the end of the Universe to find a solution. “Unless, of course, there’s a really clever way to solve Sudokus that we just haven’t discovered yet,” adds Thore Husfeldt.

The usual Sudoku is a P-problem: it’s possible to write a recipe, an algorithm, to solve it. The big Sudoku is an NP-problem, one of the tasks mathematicians called Herculean, where you can use infinite time to check all combinations without ever finding the solution, but where you could immediately verify one if you had found it.

Vader voice: Noooo! This is the scientifically most ambitious part of the article, and contains crippling mistakes. All I can say is that it was worse when I got the first draft. Some of my edits made it (for example, it now no longer says that an 18×18 Sudoku contains twice as many cells as a 9×9 Sudoku.) Others, as you can see, didn’t. For completeness:

  1. Sudoku is always NP-hard. The small ones are merely small enough that systematic exploration is still feasible. But for NP-hard problems, solution time presumably grows exponentially with problem size, so even the seemingly harmless increase to 25×25 alphabet Sodoku destroys any hope of solving it. (There’s more to say about the hardness of Sudoku. Maybe for another post.)
  2. Even a 25×25 Sudoku still does not use infinite time. It’s a finite problem, after all. To a mathematician, 25×25 Sodoku is an easy problem, because the number of steps it takes is a small integer “close to zero”. But to a computer scientist, it’s important that this finite number of steps exceeds the computational power that remains in the universe for the rest of its lifetime. Yes, even if you install Ubuntu.
  3. “Herculean” was a term suggested in Donald Knuth’s Herculean efforts to give NPC a better name.

Would another round of edits have made a difference? Probably. But because of the narrow schedule imposed by WA’s new layout engine, it was not to be. On the other hand, the new layout means I get to see my name in beautiful Garamond. I retain a weak spot for Aestheticism from my younger days, so I guess I will let the pleasures of form triumph over the constraints of content.

What I Should have Done

Next time, I will try with Impagliazzo’s five worlds, or rather two of them.

When trying to explain P versus NP, talk about Algorithmica versus Cryptomania instead. Morally, it’s the same problem, you need to worry about average-case complexity to see the difference. But it’s so much nicer to talk about. Algorithimica is a country in which you have Star Trek computers (or Skynet, depending no how much of an optimist you are), no mathematicians, etc. In Cryptomania you have crypto. We know that we live in one of these worlds (there are a few more), but we don’t know which. I think that’s a good way of putting it, because it conjures images. So that’s what I’ll try next time.

I just have to wait until the next proof attempt.

The Most Important Movies Of All Time

Does this sound familiar to you? You are watching a Simpsons episode with friends. Bart, waking up from an out-of-body experience where he visited Hell, says,

“I was miles and miles and miles away, writhing in agony in the pits of Hell! And you were there, and you and you and you.”

Everybody snickers knowingly and lets out a satisfied sigh. Ah, the subtext!

But you? You just missed another movie reference! Social ostracism awaits.

But fret not! There are not that many movies, and in fact most of them tend to cite the same classics over and over again. So, as a special service, her is the list of the most important movies ever made, ordered by a carefully constructed principle I call FilmRank. The ranking is the result of an algorithm, similar to what Google uses to rank web pages, and based on the Internet Movie Database’s list of movie references. I haven’t tinkered with the outcome. If you want to read more about the methodology, look at my related post

So, without further ado, here’s the list:

  1. The Wizard of Oz (1939)
  2. Shadow of a Doubt (1943)
  3. King Kong (1933)
  4. The Merry Widow (1934)
  5. Psycho (1960)
  6. Cabiria (1914)
  7. Star Wars (1977)
  8. Metropolis (1927)
  9. Snow White and the Seven Dwarfs (1937)
  10. Casablanca (1942)
  11. 2001: A Space Odyssey (1968)
  12. The Birth of a Nation (1915)
  13. Snow White (1916/I)
  14. Chang: A Drama of the Wilderness (1927)
  15. Frankenstein (1931)
  16. Vertigo (1958)
  17. Jaws (1975)
  18. The Godfather (1972)
  19. Gone with the Wind (1939)
  20. The Lost World (1925)
  21. Dr. No (1962)
  22. The Searchers (1956)
  23. A Clockwork Orange (1971)
  24. Citizen Kane (1941)
  25. Nosferatu, eine Symphonie des Grauens (1922)
  26. Rosemary’s Baby (1968)
  27. Night of the Living Dead (1968)
  28. Bronenosets Potyomkin (1925)
  29. The Exorcist (1973)
  30. The Terminator (1984)
  31. La dolce vita (1960)
  32. The Scarecrow (1920)
  33. Un chien andalou (1929)
  34. Tarzan the Ape Man (1932)
  35. Taxi Driver (1976)
  36. The Wizard of Oz (1925)
  37. The Poor Little Rich Girl (1917)
  38. The Battle at Elderbush Gulch (1913)
  39. Dracula (1931)
  40. Il buono, il brutto, il cattivo. (1966)
  41. Shichinin no samurai (1954)
  42. College (1927)
  43. Apocalypse Now (1979)
  44. His Majesty, the Scarecrow of Oz (1914)
  45. La grande illusion (1937)
  46. The General (1926)
  47. The Lost Patrol (1934)
  48. Singin’ in the Rain (1952)
  49. E.T.: The Extra-Terrestrial (1982)
  50. The Great Train Robbery (1903)

Amazingly, it’s pretty good! There are some really old movies in there because the methodology will tend to favour them in an ever-decreasing pool of available references. But, if I use the same damping factor that Google uses for ranking web pages, this is what you get.

For fun, we can decrease this mysterious factor from .85 to .45. This will decrease the value of long chains of references, giving films like Star Wars more of a chance. Star Wars has tons of references (half a thousand other movies reference it, many more than Oz), but most of them are recent, and haven’t yet amassed the authority that comes from having generations of other movies refer to you. So here’s the second, more modern list:

  1. The Wizard of Oz (1939)
  2. Star Wars (1977)
  3. Psycho (1960)
  4. King Kong (1933)
  5. Shadow of a Doubt (1943)
  6. Casablanca (1942)
  7. 2001: A Space Odyssey (1968)
  8. The Godfather (1972)
  9. Snow White and the Seven Dwarfs (1937)
  10. Metropolis (1927)
  11. Jaws (1975)
  12. Gone with the Wind (1939)
  13. Frankenstein (1931)
  14. Cabiria (1914)
  15. The Birth of a Nation (1915)
  16. The Merry Widow (1934)
  17. Vertigo (1958)
  18. A Clockwork Orange (1971)
  19. Night of the Living Dead (1968)
  20. Citizen Kane (1941)
  21. The Exorcist (1973)
  22. Taxi Driver (1976)
  23. The Terminator (1984)
  24. Rosemary’s Baby (1968)
  25. Apocalypse Now (1979)
  26. E.T.: The Extra-Terrestrial (1982)
  27. Il buono, il brutto, il cattivo. (1966)
  28. Dr. No (1962)
  29. Snow White (1916/I)
  30. Nosferatu, eine Symphonie des Grauens (1922)
  31. Bronenosets Potyomkin (1925)
  32. The Searchers (1956)
  33. Dracula (1931)
  34. Chang: A Drama of the Wilderness (1927)
  35. Tarzan the Ape Man (1932)
  36. Scarface (1983)
  37. Un chien andalou (1929)
  38. Shichinin no samurai (1954)
  39. La dolce vita (1960)
  40. The Shining (1980)
  41. Singin’ in the Rain (1952)
  42. The Texas Chain Saw Massacre (1974)
  43. Star Wars: Episode V – The Empire Strikes Back (1980)
  44. The Lost World (1925)
  45. Pulp Fiction (1994)
  46. Rocky (1976)
  47. Halloween (1978)
  48. It’s a Wonderful Life (1946)
  49. The Wizard of Oz (1925)
  50. Deliverance (1972)

As you see, little Dorothy still wins, but Darth Vader breathes down her neck! I like this list a bit better than the previous, so .45 seems to be the right damping factor for this particular network. Of course, there are many other PageRank-like algorithms in the literature, each with obscure constants to fiddle with. (Google used a long time fiddling with these things before they went public.) But I assume you’ll end up with pretty much the same movies as on the FilmRank lists.

I find the FilmRank lists just as inspiring and authoritative as many similar lists compiled by movie critics or fan votes. But this one is produced completely by machine, no human heart has acted as a mediator. Maybe that’s why Metropolis scores so high?

How FilmRank Works

Ranking—in parcticular, the ranking of information such as web pages—is one of the most visible effects of algorithms to people’s lives. Imagine navigating the internet like we did in 1994, before Google’s PageRank algorithm!

I enjoy explaining how Google works; in fact I have a popular science presentation,

that I use quite a lot.

I think I’ve found a new way of explaining the essence of the PageRank idea, using movies. The outcome, a ranking of all films ever made, is interesting in itself.

Here’s how it works.

Movies can refer to other movies. When Mary McFly in Back to the Future III finds himself alone with a gun and a mirror, he quotes De Niro’s character in Taxi Driver,

“You talkin’ to me? You talkin’ to me? You talkin’ to me? Then who the hell else are you talkin’ to? You talkin’ to me? Well I’m the only one here. Who the fuck do you think you’re talking to?”

In fact, the line has an entire Wikipedia article.
And De Niro’s Army jacket has a patch reading “King Kong Company 1968-1970”, referencing a classic 1933 movie about another monster on the rampage in New York. (This reference is quite deliberate, the patch is described on page 1 in the script, instead of being a random detail inserted by a costume designer.)

Like the basic idea of PageRank, we can use these references as a measure of quality. A film must be important if other filmmakers want to reference it. Like all measures of quality, it is open to ridicule and skepticism. But it is different from other measures, such as box office success or even critical reception. With FilmRank, I put my trust in the good tastes of other filmmakers, just as PageRank puts its trust into other web pages: PageRank is based on incoming links, not on number of visitors.

If you just want to see the results and don’t care about the methodology, read my companion post

Weighted References

What I like about FilmRank is that I can explain the weighting. Assume first that I just count the number of references to a movie. Which film wins? What is the most referenced film of all time? Well, according to the Internet Movie Database, here are the top three:

  1. Star Wars (1977), 570 references
  2. The Wizard of Oz (1939), 554 references
  3. Psycho (1960), 360 references

The oft-cited shower scene from Alfred Hitchcock’s Psycho


In this list, every reference counts the same. E.T.’s reference to Star Wars counts just as much as Eragon’s, even though the former is a much more important movie than the latter.

This is where the weighting comes in. FilmRank works in rounds, as follows. Initially, all films (there are well over 60,000) receive the same rank. Then, each film inherits rank from the films referencing it. For example, if A references B and C, it confers rank in equal measures to B and C, increasing their rank. If these films in turn refer to other films, the rank gets passed on and on and on.

An equivalent way of seeing this is the random moviegoer model. Start viewing some random movie. After it’s done, randomly select on of the movies it referenced and go see that one. Keep doing that hundreds of thousands of times. (If there were no references in the last movie, start over.) In the end, you end up watching certain movies again and again. This frequency is the FilmRank. There is a certain probability (Google uses 85% for PageRank), called the damping factor that you grow tired of your current project and start all over anyway.

Quite formally, the behaviour is a Markov chain modelling a random walk on the directed graph whose vertices are movies, and whose edges are references. The FilmRank is the stable distribution of that random process.

Comparison with PageRank

Dorothy in stunning colour, but not in Kansas

Remember that this is only for fun, and to explain how PageRank works. (Though I like the results.) The huge difference between PageRank and FilmRank is that PageRank uses no human manipulation. The links between web pages are already there, and not subject to interpretation. In contrast, it is a matter of opinion whether a reference to another movie is intentional or not. “You’re not in Kansas anymore” is certainly deliberate, but are all quotes of “I am your father” really meant to remind you of Darth Vader?

Thus, FilmRank loses some of its purity, as well as its efficiency. Moreover, it can be gambled by maliciously editing the underlying database. Moreover, FilmRank is gloriously unnecessary: there are not very many films (only tens of thousands, with few being added every day), so human intervention suffices to rank them and make authoritative lists of top films based on individual movie critics’s taste. Not so with the web: its size made it impossible to rank by the middle of the 1990s, so a fully automatic system like PageRank is essential if we are to be able to navigate it.

Finally, there’s a huge topological difference between the network of linked web pages and the network of film references: The web is very cyclic, whereas films cannot refer to films “in the future”. You can refer to The Time Machine, but you can’t build one. This means that FilmRank will tend to favour older movies, simply by the fact that they are old enough to have assembled lots of references. However, I find this quite refreshing; many other formal measures of quality (such as box office success) will artificially favour newer movies because of inflation.

Implementation

So here’s what I did. The Internet Movie Database maintains exactly the list of movie references we need, and is freely available and can be downloaded. I decided to throw out TV shows and video games, and to honour the connections “references”, “features”, and “remake of”. Then it was a simple matter of coding a mindless implementation of the algorithm in Python:


alpha = .85 # damping factor

q= {}
for i in range(20):
    for v in G: q[v]= 0
    r = 0 # will be sum p[v] over all dangling vertices
    for v in G:
        q[v] += p[v] * (1-alpha)
        if dout[v] == 0:                     # dangling v? then all w get some..
            r += p[v]
        else:                              # ..else v's successors share
            delta=alpha* p[v]/dout[v]
            for w in out[v]: q[w] += delta
    for v in G: q[v] += r/n     # add contribution of all dangling vertices
    p= q.copy()

Probably not very good Python, comments are welcome. The reason to accumulate the contributions of the dangling vertices (instead of spreading them out over the rest of the graph) is of course to avoid quadratic running time. Also, PageRank seems to stabilise to its final distribution after a handful of iterations, so 20 iterations must be enough for our relatively small graph. So I think there’s no need for fancy-shmancy sparse matrix algebra.

ICALP 2010

I’m at ICALP 2010 in sunny Bordeaux. I have been very busy working on two papers with colleagues who happen to attend ICALP as well, so I have missed a bunch of interesting talks on the first two days already.

I did, however, attend the business meeting, an event that combines various reports from ICALP committees with the general assembly of EATCS, the “European SIGACT.”

I normally enjoy business meetings, but ICALP business meetings are relatively stiff and mind-numbingly boring, with no discussion and no free beers. Still, let me give an incomplete and skewed summary of some of the issues that were mentioned.

ICALP publication

EATCS views ICALP as its flagship conference, and ICALP has published with Springer’s LNCS proceedings series for many years – if DBLP is to be trusted, the proceedings from the 4th ICALP (Turku, Finland, 1977) appear as LNCS volume 52. Since a few years ago, Springer has started an LNCS subline called ARCoSS (Springer page about ARCoSS) “in cooperation with” EATCS and ETAPS. These have a nicer cover.

However, it has been unclear in which sense this publication model coincides with the current preferences of the members of EATCS. Just to mention an alternative, the STACS conference now publishes with the LIPIcs series, which is Open Access in the sense that they are available on line and free of charge.

Very commendably, and to my pleasant surprise, EATCS actually implemented a member poll in the Spring of 2010 to determine our opinions on open source publication models, and in particular the outlet for ICALP’s proceedings. The results are online ([poll results at EACTS website]), but for mysterious reasons only accessible to EATCS members. It’s fair to say that there is a large majority of members that would prefer another publication model. For example, only 10% want EATCS to “Publish the proceedings of ICALP as before in a traditional print publication?” To the question “What is the publication model for scientific papers in theoretical computer science that you would prefer to see gain prominence in the near future?”, 83% respond “Open access publication as for LMCS, LIPIcs and EPTCS”

EATCS chairman Burkhard Monien reported that the EATCS Council has now approached Springer about the ICALP proceedings. According to him, Springer has made substantially better offer “very close to open access. We have discussed this carefully in the council.”

I think this is good news. However, I am less than impressed about the speed with which the process will continue. According to Monien, the result of the May 2010 poll were not sufficiently conclusive, so “We’ll ask your opinion again.” This time, there will be mercilessly concrete choices about either (1) going with LIPIcs, or (2) staying with Springer, under specific (and, I assume, new) terms. Monien explained the EATCS council wants to get this right, ICALP is a big steamer, should not go in zig-zag course, and stick to its decisions. He was unsure if the next poll will be implemented in the Fall, but promised “beginning of next year, at the latest.” I’m not holding my breath.

Still, all in all I think this is splendid news, and I’m particularly happy to the the EATCS council involving its members in an important decision process.

ICALP 2010 Organization

The report from the ICALP 2010 organizers was the usual list of statistics. I have become increasingly interested in these details after arranging ALGO 2009 myself. In total, ICALP 2010 has 292 participants, including 55 locals (the latter strikes me as a high number). 200 of these are registrants for ICALP itself, rather than one of the affiliated workshops, 147 pay the full registration fee. Only 3 Swedes and 1 Dane are registered, I wonder which I am.

This ICALP is held in the conference facilities of a large, soul-less hotel in down-town Bordeaux. I’m not a big fan of this kind of conference, and the contrast in atmosphere to the locally organised SWAT I attended in June is enormous. There’s a small lobby to hang out in, with few chairs or tables (let alone blackboards), interaction between attendees is minimal, except for the cliques one is already in. Furthermore, ICALP provides no lunches, so people scatter into small groups and look for a place to eat. Pretty much like the large US conferences.

Also, the Friday banquet is included only for full participants, meaning that many students will not be able to get the 90 Euro dinner ticket reimbursed, and consequently won’t attend. In my experience, this decision is de rigeur for Southern European ICALPs, but I think it is fundamentally misguided. The food better be spectacular.

On the other hand, the proceedings are included in the registration, which is another mistake. Many people never even remove the shrink wrap anymore, paper proceedings are just ballast. Moreover, proceedings are easy to reimburse, even for those students who honestly prefer to pay for the privilege to read the absurdly formatted 12 page LNCS amputated paper rather than reading the full version on the web.

The budget for the whole event is slightly over 100,000€, and the organizer received 2,500 emails. Apparently, EATCS maintains some for of organizer’s manual for ICALP; I’d like to see that. (I have seen the ESA/ALGO manual.) These things should be on-line, and allow for feed-back.

Programme Committees

The reports from ICALPs various programme committees included the usual absurd statistics, which were useless even by the low standards we are used to. The best paper award (Track A) went to Leslie Goldberg and Mark Jerrum’s result about about Approximating the partition function of the ferromagnetic Potts model. This is, of course, utterly splendid work. In total ICALP Track A received 229 submissions (6 were withdrawn) and accepted 60. Denmark submitted 3.33 papers, 1.17 accepted. Sweden submitted 1.83 and accepted 1.17. Again, I wonder which I am.

The programming for ICALP’s Track A has left me, and many others, completely puzzled, and looks as if it’s the result of a grep script based on paper titles. For example, ICALP had two strongly related results about Cuckoo hashing; but the results appeared in different sessions:

  • Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh and Michael Rink’s Tight Thresholds for Cuckoo Hashing via XORSAT appeared in session 3 on “Sorting & hashing,” inconceivably scheduled in parallel with the session 3 on “Data structures”. These are the only sessions on data structures in a 5-day, 3-track conference! In parallel!
  • Nikolaos Fountoulakis and Konstantinos Panagiotou’s Orientability of Random Hypergaphs and the Power of Multiple Choices appeared in session 5, “Graphs and hypergraphs.”

ICALP 2011

Michael Hoffmann presented the ICALP 2011 organisation in Zürich. icalp11.inf.ethz.ch . This seems to be an ICALP model closer to my preferences. The conference will be held in in university facilities, at something called CAB from late 19th century, with several large lecture rooms for workshops or parallel tracks, an even larger lecture hall, WLAN, and plenty of working space.

Also, there is no banquet, but rather a number of receptions with plenty of finger food and drinks. As an added benefit, registration fees are expected to be low. Of course, Zürich is “not among the cheapest places”, and prices are even higher than shown last year, because of the exchange rate between Swiss Franc and Euro, which shows high variance.

I will certainly attend, as I am one of the invited speakers. Apparently, I’m speaking on the 5th.

ICALP 2012

Artur Czumaj presented the plans for ICALP 2012, which (somewhat unusually) is already fixed two years in advance. The dates are July 9–13 2012 in Warwick, for the centenary of Alan Turing’s birth. Thus, ICALP is one of many events in the UK for the Alan Turing year, see the list of events. ICALP will be held on Warwick’s campus, including accommodation and lunches. Satellite events were said to include Wimbledon right before ICALP and the London Olympic games somewhat later.

Possibly this is the right time to start thinking about what we in the greater Copenhagen area (e.g., Sweden and Denmark, or all of Scandinavia) should be doing for the Turing year. Computer Science has far from the pop-sci clout that biology enjoys, but shouldn’t we be able to drum up at least one hundredth of the attention that Darwin got? Ideas are welcome.

For example SWAT 2012 is in the Turing year. We should start branding all our CS-related activities in 2012 as Turing-related, and hopefully concoct some more.

SWAT 2010 (part 2)

The SWAT conference dinner involved a boat trip to an island near Bergen, with a spectacular seafood dinner of Halibut and Wolffish. Also, initiated by Fedor Fomin, SWAT reverted back to its tradition of pressuring attendants to sing (partitioned into small groups by nationality or country of residence) after the main course. Again, the organisers managed to get the atmosphere exactly right.

Invited talks

  1. On Monday, Sanjeev Arora talked about semidefinite programming and approximation algorithms. A lot has happened in this field since what he called “first generation” SDP algorithms. Some of the most recent works concerns re-evaluatating the role played by the Unique Games Conjecture.
  2. On Tuesday, Prabhakar Raghavan, now at Yahoo! Research, talked about quantitative models for user behaviour. He spent some time on models for presenting search results (in particular, images) in two dimensions – which is much less obvious than the top-to-bottom ordering that web pages are presented in. (Left-to-right, row-by-row, is not the right answer for laying out pictures in a grid, because the eye doesn’t move like that.) He used this example to also communicate broader points about the interplay between quantitative sociology, cognitive psychology, and theoretical computer science. Great stuff, and highly interesting to me both because of the “algorithmic lens” propaganda, and because I sometimes give general audience talks about the computer science behind search engines, social networks, etc.
  3. On Wednesday, Dana Randall talked about phase transitions in sampling algorithms and underlying random structures. This was right in the middle of the interplay between statistical physics and theoretical computer science that I am currently fascinated by. The talk was an algorithms-friendly introduction filled with rapidly mixing Markov chains, crisp combinatorics, Ising and Potts models, and colloids (which I hadn’t seen before.)

SWAT 2010 (part 1)

I am attending SWAT in surprisingly sunny Bergen.

The biannual SWAT is the oldest European conference devoted to algorithms. (ICALP is a lot older, and STACS slighly, but both have a wider focus.) It rotates around the Scandinavian countries (with a single exception in 2006) and in 2010 Bergen is, again, the host.

Bergen has a large algorithms group with a bunch of postdocs and graduate students, and this makes a large difference for organising a successful conference. A lot of heart and energy has gone into this, and the atmosphere is absolutely splendid. Even the weather in Bergen seems to, so far, play along.

The Sunday evening reception was on a roof terrace, with live music, wine, and snacks. Several activities were arranged for Monday evening, and I was lucky to be among the handful of people who went sailing on a small sailing boat, late in the evening, under the Northern sun, with a view of beautiful Bergen and the surrounding coastline. Brilliant.

Business meeting

At the business meeting, Magnus Halldorson reported on the steering committee’s thoughts about SWAT. Four changes have been implemented at SWAT 2010:

  1. Name change. SWAT’s name is now “Scandinavian Symposium and Workshops on Algorithm Theory,” rather than “Scandinavian Workshop on Algorithms Theory.” The “workshop” monicker in the original name has been a recurring problem for SWAT attendants who tried to secure travel money, looks bad on CVs outside of our narrow community, and even made it difficult for previous local organisers to attract outside funding.
  2. Economy. SWAT registration fees are seldom cheap because of the high prices in Scandinavia, and Norway is currently extra expensive due to currency exchange rates. Still, the steering committee wants SWAT to remain economically viable for a large number of attendants. The local organisers managed to secure significant outside funding, so that registration fees only make up 30% of the total revenue. Most of the money came from the Norwegian Research Council and Bergen University.
  3. Time slot. SWAT 2010 takes place around Midsummer, rather than the traditional week “before ICALP,” typically the first week of July. I think the SWAT 2010 date makes a lot of sense, but it is unclear whether this will be a long term change. A show of hands from the attendants at the business meeting was slightly in favour of the earlier time slot.
  4. International conference. The steering committee made it clear that SWAT is an international conference with an international PC and no special attention paid to Scandinavian papers. (I note that this SWAT has a total of zero papers from Sweden and Finland.) Moreover, the programme committee and the organising committee are completely de-coupled.

I think these decisions represent a very welcome re-alignment of how SWAT views itself. The attendees of the business meeting seemed to be happy as well.

PC chair Haim Kaplan gave a short overview of the work of the programme committee. The most interesting point is that SWAT 2010 attracted 78 submissions this year, which is markedly less than for previous SWATs. This year, the SWAT submission deadline was before the notification of SOCG 2010 because of the June time slot, which may account for a decrease of the number of submissions in computational geometry, traditionally a thematic focus of SWAT.

A brief discussion was devoted to publication models for contributed SWAT papers; currently the conference is published in Springer’s LNCS series.

Petteri Kaski successfully invited SWAT 2012 to Helsinki.

Motivation for 2010

Log Factors. “A goal is a dream with a deadline.” — Napoleon Hill
2009 was a great year, both for our research group and for theoretical computer science in general, with important new conferences like SLOGN. I firmly believe in the power of good leadership through positive thinking, so in that spirit I have leveraged my considerable artistic talent at a motivational poster. Happy new year!

[High-resolution PDF version.]

The original image is from Wikimedia Commons, which implies that the poster itself is released under the same CC-AS license.

Post ALGO post: Etiquettes and etiquette

This is part 3 in a series of 3 posts on ALGO 2009, largely repeating the organiser’s report from the business meeting. This is the least serious part.

Name tags

One of the small details that I tried to micro-manage are name tags. For some reason, name tags at many conferences display the attendee’s name in very small letters, and leave most of the tag’s area either white, or taken up by the logo of the conference or venue. (Professional name tag designers typically optimise something else than ease of identification of the attendee. That’s their job, and they are good at it. But it’s not the job I want done.) ALGO had nice, legible name tags, and we also tried to double-check various diacritics.

One the other hand, the batch-driven production from a spreadsheet to the final tag messed up some affiliations, for reasons that remained a mystery. Incidentally, I don’t really understand the value of displaying the affiliation on the tag anyway, they seem to serve mostly as an ice-breaker for conversations. (Maybe we should put people’s hobbies on the tags instead.) From that perspective, a wrong affiliation works at least as well.

Closely related to the layout of the name tag are the mechanics of the badge holder. After obsessing over this for many years I have concluded that the pin–clip combo is the way to go, so that’s what we got. We even told people to not clip the badge to their trousers. It was perfect, nothing could go wrong, everything was prepared for the one conference where you wouldn’t have to squint or use x-ray vision to read the name of whats-his-name-I-think-we-met-at-ICALP-last-year.

Alas, it was not enough. A seemingly unrelated decision destroyed all my careful planning: We decided to hand out multi-ride ticket to the Copenhagen subway system, attached to ITU-branded key bands. (This was to prevent the tickets from being accidentally lost in the pile of written material we handed out to attendees.)

The effect?

Driven by their desire to remain anonymous, and a perverse predilection for craftsmanship, many enterprising attendees carefully removed the ticket from the key band, and attached their name tag instead, thereby constructing the mothership of all that is evil about name badges: they dangled at belly button height, typically obscured behind tables or under jackets, and they swivelled around their single, ill-constructed joint, displaying the badge’s empty back side more often than the laws of probability theory predict.

Next time I’ll tatoo the name on people’s forehead.

Computer use

The etiquette about open laptops during talks is very much unclear in the CS community. We politely asked people to avoid accessing the internet during sessions, and if so, to do it from the back row so as to annoy the speaker and the rest of the audience as little as possible. Several attendees told me they were happy about this decision; the auditoriums were certainly relatively free of open laptops. I remain uncertain about how to handle this – I can get a lot of work done while half-way listening to a talk, and the alternative is to stay out of the lecture room…

Speaker instructions

One thing I decided against, for fear of committing a social mistake, was to send instructions to the speakers about how we would have liked the talks at ALGO. Namely, more accessible.

Talk quality at our conferences is generally high: clear, readable, well-presented, entertaining. However, most of the speakers vastly overestimate the audience’s level of familiarity with the subject. This makes it very difficult to attend a presentation to learn something, in particular about a field in which you aren’t an expert already.

One of the problems is that talks are grouped into thematically related sessions. This seems to imply that everybody in the room is already an expert about “latest derandomisation tricks in quantum I/O kernelisation,” so why bother explaining the basics again? An earlier rumination of Michael Mitzenmacher about the STOC 2009 schedule already touched on the bold idea of mixing sessions at random. (But this may introduce more problems than it solves.)

The smaller events at ALGO, such as WAOA and IWPEC, are technically highly specialised workshops in the first place, pitched at an expert audience. So how do we turn this around? It would be great if ALGO attendees could “stay for IWPEC” to finally find out what “this FTP-stuff” is all about – but the atmosphere has the exact opposite effect. Similarly, I talked to a WAOA attendee who was frustrated about the O-part being largely incomprehensible. (The attendee was there for the A-part.) How do we solve this? Clearly, not every IWPEC talk can define treewidth again, but my ambition with ALGO was to reduce balkanisation, not contribute to it.

Many of the “too difficult” talks are given by students. There are a number of reasons for this – students assume that everybody knows what they do (I did), students need to demonstrate that they’re smart and did a difficult thing, some students lack the broad overview to be able to place their subfield into a broader ensemble, previous talks have been given only at departmental seminars, where everybody else is equally interested in derandomising quantum I/O kernelisation, etc. Maybe for this target audience, a “Here’s what we’d like your talk to be”-letter could have done a lot of good.