Category Archives: Uncategorized

Maths on the web solved!

Why didn’t anybody tell me?

I remember staring at the first drafts of the MathML definition almost two decades ago and obsessively clicking “reload” for checking the next browser version to implement it. After the web had facilated communication around the globe in various human languages and alphabets, I eagerly awaited support for that single, universal language that made it all work: mathematics.

Alas, it didn’t happen. Since then, I’ve revisited current standards and adoption states for maths on the web every few years. The situation was dismal.

For years, everybody has been able to read, write, and communicate “square root of two” with all kinds of funny squiggles: “roten ur två”, “квадратный корень из двух”, “השורש הריבועי שלשְׁתַּ֫יִם”, “二的平方根”. But try to put the symbol “2” under the radical √ on the web? No way. As the information age progressed, we became unable to communicate in its underlying language, because the web suddenly became the playground of nonmathematical people.

Turns out: instead of bitching about it, somebody solved it. Just like that.

The technology is called MathJax, written in javascript. It produces beautiful, scaling, copyable maths on the web. And it’s been around for a while, apparently. I just didn’t notice it—I seem to work too much and spend too little time procrastinating on the web.

Since recently, it works on Wikipedia! As of May 2012, log in, go to “my preferences”, choose the “appearance” tab, and select MathJax as your default rendering engine for maths at the bottom. Then, go to some page like Tutte Polynomial; and feast your eyes. Increase the font size just because you can!

On the MathJax page you can see which other websites support it. This looks like a very healthy system.

Only grumble: Since it requires javascript, the solution doesn’t work with my current blogging platform. The easiest solution seems to be to host a wordpress.org blog on a separate server. 

Panmnemonicon

“Computer scientist, public debater.” Next stop: “International man of mystery.”

As the photographic evidence to the right can attest, I am now a “public debater.” Well, I guess the recent passings of Christopher Hitchens and Steve Jobs have left an intellectual vacuum in the public space that is eager to be filled by somebody who combines the modesty of the one with the intellectual honesty of the other. Somebody like me.

In fact, the epithet comes from a poster for a panel debate about the blessings of the Internet arranged by the Social Sciences at Lund University, Debatt i Lund. I was really happy to be invited and had a good time; I hope this sets a precedent for more involvement of academics from the technical and natural sciences. The nerds.

In preparing for the debate, I spent some time reading or re-reading a few general-audience books about various societal aspects of the Internet, including the relatively recent The Filter Bubble and Program or be Programmed. I remain impressed by Blown to Bits, which combines a minimal amount of alarmism with an honest ambition to actually explain what is going on. It’s written by nerds, so the facts are correct. For nonspecialists, this should be the one book to read.

I also refreshed some sociological aspects. As a newly minted public intellectual I feel it behooves me to drop Foucault into conversation at regular intervals, instead of my usual spiel of an intertextual hodgepodge of Star Trek, Monty Python, and fantasy novels.

Foucault is topical because of his treatise on discipline and punishment that includes a famous description of modern society in terms of prison architecture. This design goes back to the English utilitarianist Jeremy Bentham and is called the Panopticon: the prison is circular, with a centrally placed warden able to inspect all the cells on the perimeter. This is clever not so much for the actual surveillance, but for the inmate’s constant knowledge of being potentially watched. It’s a correction facilty by design.

Foucault uses this design to make some point about modern society, with several institutions (such as schools) playing the role of the warden. This is good stuff; I had a chance to reacquaint myself with these notions recently when reading the spectacular The Judging Eye by R. Scott Bakker. (Fantasy novel reference duly dropped.)

As with all dystopias, Foucault’s Panopticon is of course an old hat by now. We’re living it, just as the London of Orwell’s 1984 is in fact the London of today. Some media theorists have taken Foucault’s Panopticon to the next level and speak of a Panspectron to describe our society: instead of us surrounding a centrally placed warden who sees us, we’re now surrounded by countless sensors that measure us in many other ways than just optically. I think Branden Hookway is the one to read about this. (Which I haven’t done.)

While I think this is a valid concept, I’m not too fond of the name. Panspectron. It’s basically the same word as Panopticon, with Latin replacing the Greek.

Thus, wearing my “public intellectual” mantle (which is, in fact, just a well-poured glass of Scotch) I will release a new word into memetic space that I find superior to Panspectron. Here it comes:

Panmnemonicon n. 1. A device or set of devices that records and stores everything. 2. A society operating under the influence of such a device.

Panmnemonicon translates to something like “all-remembering”, which I think is the point of our brave new world. After all, nobody actually watches you or me; mostly because nobody cares about you or me. Instead, everything is stored with the possibility of later being retrieved and mined, should you or I suddenly become interesting. That’s the worrying point, and the one that has potential impact on behaviour.

So there it is. A new word minted. Remember it, as it surely will remember you.

At the time of writing, Panmnemonicon has a total of 0 hits on Google. Panspectron does have some hits, but not enough for me to Google Trend it. We’ll see how it goes. It’s so on, Panspectron.

Cahen’s Constant and the Ghostbusters

Relationship between Cahen’s constant and who you’re gonna call

I’m teaching introductory algorithms and data structures at ITU and having a ball.

Since a while back I’ve used weekly, small, well-defined programming exercises for all my courses. It’s a lot of work to develop these, but also a lot of fun.

The book I’m using, Sedgewick and Wayne’s Algorithms (4th ed.), makes it natural to have an exercise about the four sum problem early on, involving a bit of coding, parsing, simple experimental and theoretical running time analysis, and a clever idea or two. Rasmus Pagh had a brilliant idea how to turn this exercise into something fun, inspired by xkcd 687. After all, finding deep-looking product formulas is just a question of looking (after taking logarithms) for values that add to 0. Given enough interesting numbers, some pattern has to emerge.

I’ve played around with Rasmus’s exercise it a bit more since yesterday, added some spit and polish, and made it slightly crazier. The best new law of physics I’ve discovered this way is shown at the top. It relates

  • the energy equivalent muc2 of the atomic mass constant in in MeV,
  • the energy equivalent mdc2 of the mass of a deuterium atom in Joule,
  • Cahen’s constant (roughly 0.643410), defined by a sum of Sylvester numbers ak, and
  • G, the phone number of the Ghostbusters, 5552368.

You can’t make that up. It’s correct up to rounding errors and a cavalier attitude to dimensions.

The finished exercise, with data files, is at the course blog at ITU; I’m slowly getting myself organised to publishing these things on GitHub to make them more useful for other teachers. (That would include the TeX file for the exercise description, for example.) Stay tuned.

Edit to add: I guess the real conclusion of this formula is that the world finally knows the unit and dimension of US phone numbers, assuming that Cahen’s constant is dimension-less. Phone numbers express inverse energy squared, measured in mega electron Volt Joule. You heard it here first.

Edit to add (25 May 2012): xkcd 1047: approximations

π out of 4 EATCS Members Don’t Care About Publication Models

Bummer.

The European Association for Theoretical Computer Science has around 1000 members, including many European TCS researchers and everybody who attended conferences like ICALP and ESA in a particular year. Traditionally, EATCS has had a close relationship to scientific publisher Springer, who publishes the proceedings of EATCS’s “flagship conference” ICALP.

A recurring theme at the EATCS general assembly meetings for the past many years has been the publication model for ICALP. Should “we” continue with Springer, or move to an open-access publisher like LIPIcs? After considerable deliberation, a ballot among EATCS members was held in late 2011.

For more background:

  1. EATCS general assembly at ICALP 2010 on my blog.
  2. EATCS general assembly at ICALP 2011 on my blog.
  3. EATCS ballot on Luca Aceto’s blog.

I have been absolutely thrilled that the EATCS council decided to send this question to its members. Well done. I want to see more of this.

Unfortunately, the outcome is not what I would have hoped. The EATCS council required a quorum of 25% of EATCS members for the ballot to be valid. I think this is a reasonable requirement – the ICALP publishing model is an important decision that should be safely moored.

Alas, the quorum was not reached. On 9 November 2011, EATCS president Burkhard Monien informed the EATCS that out of the 1094 members, only 260 had voted. And that’s just slightly less than 1 out of 4. It’s 23.8%.

Informative graphic explaining the magnitude of the value 13 in relation to the value 1094


I am, of course, somewhat miffed that 834 EATCS members don’t seem to think that the way their research is published is sufficiently important to execute three or four mouse clicks to make their voice heard one way or the other. After all, it’s only about publication counts, accessibility, dissemination, ethics, prestige, jobs, cvs, and most everything else by which we are evaluated.

Monien adds,

However, I also want to let you know that the majority of the 260 votes were in favor of LIPIcs. EATCS will observe the further development carefully.

So at least we know the opinion of those EATCS member who care. Next time we just need to remember to inform intelligent adults of the importance to actually vote.

Sorting Networks Activity

Sorting Network in action

A 9-input sorting network in action

Computer Science Unplugged is a wonderful initiative that collects activities that communicate some basic ideas of computer science – without using computers.

IT University of Copenhagen opened its doors to the public last Friday in connection with the city-wide “Culture Night”, so I used the opportunity to implement the Sorting Networks activity.

ITU has a big stair case in its atrium, which had the proper size for a 9-input sorting network. I measured everything up, found a good-looking network in Knuth’s The Art of Computer Programming, and sketched the layout for the friendly folks at Facilities Management.

The network was “drawn” on the stairs using adhesive tape, which took a few hours on the day before the event. It looked absolutely splendid. The activity was manned with enthusiastic undergraduate students for several hours, who herded eager and intrigued guests of all ages through the maze of comparators. A smashing success.

Also, a student volunteered to translate the Wikipedia page on sorting networks into Danish, so the outreach activity had a permanent, digital result as well: Sorteringsnetværk.

Originally, I had also promised myself to produce a large poster or two next to the activity, which should explain what’s actually going on, why sorting is important, and (in particular) show a few more concrete networks of various sizes. This would have given the whole activity a bit more scientific muscle. Alas, I failed to get this done in time. Better next time.

I also failed to show up in time for the kick-off, instructing the student volunteers in how the whole thing was supposed to work. Apparently, they spent a good number of attempts running the network in the reverse direction (from the top of the stairs and down, which arguably is the obvious direction), and tried to find the bug! So now they know that sorting networks don’t necessarily work in both directions. (But insertion sort does!) This may be a fresh research question: find an optimal bidirectional sorting network for 9 inputs.

All in all, a splendid event. Thanks to everybody who helped, or just wanted to get themselves sorted out.

ICALP 2011

I am back from sunny Zürich and ICALP 2011.

Attendee at ICALP 2011 business meeting. The image shows the free ice cream and name tag.

This was a very well-organised event hosted at ETH. Michael Hoffmann had everything under complete control and remained confident, alert, calm and accessible. I’ve tried go take a glimpse behind the curtains, such as looking at the two-page to-do list for technical staff in the lecture halls, and the level of detail and relevance is just off the scale. I hope Michael can summarise his preparation and tips for everybody’s benefit.

Future Publication of the ICALP Proceedings

The EATCS general assembly is by tradition a long and dry event without free beer. To break new ground, the organisers handed out free ice cream. ’tis not beer, but a welcome step in the right direction. I hope the organisers of next year’s ICALP, Warwick, take note.

Burkhard Monien’s EATCS Annual report 2010-2011 [PDF] is available on-line, so there’s no need to summarise it here.

The exciting question is the publication model for ICALP: will the EATCS flagship conference stay with Springer’s LCNS series (or some transformation of it) or not? I blogged about this last year, from the 2010 assembly:

Monien explained the EATCS council wants to get this right, ICALP is a big steamer, should not go in zig-zag course, and stick to its decisions. He was unsure if the next poll will be implemented in the Fall, but promised “beginning of next year, at the latest.” I’m not holding my breath.

Well, the Fall of 2010 has come and gone. The EATCS council had another meeting during ICALP 2011, and Monien summarised the decision. Here’s what I was able to jot down quickly from his presentation:

  • The decision is between Springer and the LIPIcs series and will be decided by the members of EATCS by a ballot “at the end of the summer.”
  • The EATCS Council recommends to go with Springer.
  • The poll will include the following documents: The proposals from both Springer and LIPIcs. The recommendation of the EATCS council. A dissenting opinion from a minority in the Council. A list of pros and cons.
  • The final decision is made by simple majority among EATCS members. A quorum of 25% is necessary.

For a quick calculation, the report currently lists 787 EATCS members. So some 200 members would need to respond. Get out the vote!

As a sign that times are changing, the ICALP proceedings were online and digitally available to all conference attendees at the time of the conference. Splendid.

Name Tags

I obsess about name tags.

ICALP 2011 chose the neck band solution, which I derided in a brilliant piece of investigative journalism Post ALGO 2009.

In the comments of that post, G. Philip suggested to print names on both sides of the tag, so as to increase the probability of legibility close to 1. Remarkably, this was exactly what the organisers had done!

(Again, enterprising conference attendees managed to defeat the good intentions. The lunch tickets happened to fit exactly in the name tag pocket, so that with probability close to 1/2, the tag showed the name badge, and with probability close to 1/2, it showed lunch tickets. Incidentally, lunch tickets also had your name on them, but only printed on one side. Thus, in total, attendees were identifiable with probability around 3/4.)

But the big problem with neck bands is that the name badge ends up below table height during lunch and also otherwise is awkward to inspect.

The layout of the ICALP 2011 badge is fair. A good part of the real estate is actually taken up by the attendee’s name. Also, accents were correct as far as I could tell.

In fact, the organisers chose to display first names in larger size, giving the event a friendly touch. I like this. However, during a thoughtful lunch conversation, somebody (I don’t know who because his name tag was dangling below the table) mentioned that the last name is more useful for striking up conversations. Especially newcomers would prefer to know that they’re talking to Turing instead of Alan.

I’m undecided about the name size issue. Maybe another EATCS poll can decide it.

Exponential Time Algorithms at ESA 2011

The list of accepted papers for ESA 2011 is online. Below is my own quick take on which papers are about exponential time algorithms.

ESA 2011 is colocated with IPEC under the ALGO 2011 umbrella, so there will be plenty of exciting results that week.

  • Dimitrios Thilikos. Fast sub-exponential Algorithms and Compactness in Planar Graphs.
  • Gwenaël Joret, Christophe Paul, Ignasi Sau, Saket Saurabh, Stéphan Thomassé, Hitting and Harvesting Pumpkins, arXiv:1105.2704.
  • Fedor Fomin, Ioan Todinca and Yngve Villanger, Exact algorithm for the maximum induced planar subgraph problem, Todinca’s slides from Worksh. Graph. Decomp. 2010.
  • Marek Cygan, Marcin Pilipczuk, Michal Pilipczuk and Jakub Wojtaszczyk, Scheduling partially ordered jobs faster than 2n.

I’m probably missing some results, but without online abstracts it’s hard to tell. I cannot judge a paper from its title alone. Comments and corrections are welcome, in particular links to online manuscripts.

Filter Bubble in Weekendavisen

I managed to excite Danish weekly Weekendavisen about the societal impacts of internet personalisation, along the lines of Eli Pariser’s recent book The Filter Bubble. This resulted in a nice, meaty two page article starting on the front page of the paper’s science section Idéer.

This is part of my ongoing effort to unleash the meme of the Algorithmic Lens in the public discourse. Two years ago I initiated an article about the algorithmic lens on the sciences in the same paper.

The Social Science Filter Bubble

The creation of such an article is a give-and-take between me and the journalist. I’d collaborated with him previously on the production of a TV programme about how Google’s page rank algorithm works, so we had established a level of common trust. Still, we come from vastly different epistemological backgrounds.

Here’s a detail that made me smile.

I originally suggested a formulation along the following lines:

Dewey and Habermas are fine, but today the public sphere is shaped by algorithmic processes, and we’re just in the beginning of that development. And just like we turn to Newton to understand gravity, we must turn to Turing and his disciples to understand some of the forces that influence the behaviour of individuals and groups in the information society.

Plenty of dropped names, and it was rejected for being too opaque. Consequently, in the final version, the references to philosophers Dewey and Habermas remain, but Newton and Turing have been removed. So, the readership of Weekendavisen is of course expected to know the public sphere and the civil society (as rightly they should). But they are protected from having to google who the other two guys are.

This is an example of the “old” filter bubble, where media through self-selection and editorial policy shape a common frame of reference, an understanding of what a Citizen is supposed to educate herself about.

In the Brave New World to come, an algorithm would have processed my quote. It would perhaps have replaced “Dewey and Habermas” by “Philosophers” or “Some social scientists” or “Dead White Males” on a per-reader basis, based on the epistemological priors, background knowledge, and ideological preferences of the reader.

Further reading

If you came here from Weekendavisen because you googled my name – provided your personalisation settings allowed me to burst your filter bubble –, and want to read more, check out thefilterbubble.com. That’s Eli Pariser’s blog, including links to his book, his TED talk video, and a list of 10 ways to pop your filter bubble.

Alternatively, check out my recent (Danish) TV production How Google works, which explains the PageRank algorithm. Now, the Good Olde Days.

Or you can hire me to give a talk; I have several ready, such as Hvordan Google virker or The Algorithmic Lens. See the full list at Popular Science talks.

Update: I also appeared on Danish public radio about this: P1 Morgen 4 Jun 2011, 7m45s, around 9:30.

Update: And even the the local paper Sydsvenskan: De digitala skygglapparna (4 July 2011) (in Swedish).

Update: And even in Svenska Dagbladet:
Isolerade i nätbubblan (15 August 2011) (in Swedish).

Image source: Wikimedia Commons

Including Git Revision Identifiers in LaTeX

I keep most of my LaTeX source files under the revision control system Git.

Stupid git


Git is opinionated software. It allows you to do many things, but others run counter to its dogmas. In particular, Git does not modify source files.

Other systems do. For example, you can ask Subversion to replace the placeholder $Revision in the committed source file with the latest revision number. This is useful for displaying revision numbers or dates in your LaTeX documents, for example.

From Git’s perspective, modification of source file is pure evil. Not only is this feature absent from Git, but the very request betokens moral corruption.

From Git’s perspective, the client programme (in this case, LaTeX) is responsible for including revision information in the output. Git humbly makes that information available. But it does not modify the source.

Stephan Hennig’s vc bundle consists of a number of scripts that extract revision information from various version control systems, including Git, and make it available as LaTeX macros. The information is taken from a separate, automatically generated input file vc.tex, so that the main LaTeX source file remains untouched by Git.

This shifts the problem to the generation of vc.tex. I’ve played around with various solutions based on Makefiles, and here is my current setup.

The vc.tex file is automatically generated and defines three LaTeX macros. Typically, it looks like this:

%%% This file is generated by Makefile.
%%% Do not edit this file!
%%%
\gdef\GITAbrHash{f61c739}
\gdef\GITAuthorDate{Fri May 13 10:34:51 2011 +0200}
\gdef\GITAuthorName{Thore Husfeldt}

The main LaTeX source includes the vc.tex file in the preamble and can now freely use these macros. For example, the revision information can be included in a footnote on the title page.

\documentclass{article}
...
\input{vc.tex}
...
\begin{document}
\maketitle
\let\thefootnote\relax
\footnotetext{Base revision~\GITAbrHash, \GITAuthorDate, \GITAuthorName.}
...

The responsibility of producing an up-to-date vc.tex rests on the Makefile:

latexfile = main

all: $(latexfile).pdf

$(latexfile).pdf : $(latexfile).tex vc.tex
	while (pdflatex $(latexfile) ; \
	grep -q "Rerun to get cross" $(latexfile).log ) do true ; \
	done

vc.tex: .git/logs/HEAD
	echo "%%% This file is generated by Makefile." > vc.tex
	echo "%%% Do not edit this file!\n%%%" >> vc.tex
	git log -1 --format="format:\
		\\gdef\\GITAbrHash{%h}\
		\\gdef\\GITAuthorDate{%ad}\
		\\gdef\\GITAuthorName{%an}" >> vc.tex

The interesting rule is the last one. It runs git log to produce vc.code. The hardest thing for me was to get the dependencies right. I think I’ve got it now. The input file vc.tex needs to be regenerated whenever it predates the last commit. As far as I understand the Git internals, the modification time of .git/logs/HEAD should give me a reliable timestamp for when the last commit happened, so I made my rule for vc.tex depend on that.

Of course, it’s a cheap operation, so we could generate vc.tex anew every time we run pdflatex. But then every call to make would recompile the source (because vc.tex has changed). To avoid that, we could leave vc.tex out of the dependencies for $(latexfile).pdf. But then a commit (which modifies the revision number but not any source files) would not lead to an automatic recompile. The LaTeX document would only display new revision information whenever it is edited after that revision.

If there’s a cleaner way of checking for “is vc.tex outdated compared to the Git repository”, please tell me.

TODO: Make the LaTeX document reflect that it corresponds to uncommitted edits after the latest revision. This should be doable by comparing the modification times of the LaTeX source files and .git/logs/HEAD. A cruder way is to let git status tell the Makefile if working directory is “clean”.

UPDATE (21 Feb 2012): Since this post was written, various other approaches have appeared. (Thanks to the commenters for pointing them out.) The idea of using a post-commit hook instead of a Makefile is now on CTAN: gitinfo package.

Generation Z and the Alphabet

I teach generation Z, people who are now in their early twenties.

Generation Z follows Generation Y, which follows Generation X.

What will we call the next generation? We’ve run out of letters! “Generation [”?

Well, it doesn’t matter. Read on…

I was reviewing old exam questions in my introductory algorithms and data structures class. Here’s the question:

This looks innocent enough. However, several students were openly annoyed by this question.

What’s the problem? It has nothing to do with priority queues or heap order or anything else algorithmic. If you aren’t generation Z, you’ll never guess.

They asked me to please use numbers instead of letters. Why? I turns out that comparison between letters is no longer constant time! As one student put it, with a straight face, it’s really hard to determine if, say, Q is higher or lower than some other letter. Helpful students sagely suggested to their fellow students to just start by making a list of the alphabet on a separate piece of paper for this type of exercise. This was met with earnest nodding.

It was quite clear that I had made this question needlessly difficult by making it about letters.

This is a sterling example of a skill that is utterly natural to my generation, who has looked things up alphabetically countless times. I have no harder time comparing M and S than I have comparing 5 and 13. But, of course, Generation Z has never looked anything up alphabetically. It‘s an utterly useless skill honed in the olden days of outdated information technology, like knowing how a slide rule works or typing on a T9 mobile phone keypad. Generation Z finds this as hard (and as useless) as I find it comparing Ψ and Φ. I can do it, because I memorised the Greek alphabet with I was eight or so, and can still rattle it off in the right order. But it takes linear time in the size of the alphabet.

So, from now on, I guess I use plain old numbers in this type of exam questions.

Also, the generations following Generation Z can be safely called Generation W, V, U, etc. Nobody will notice.

Also, I feel old.