Fun with Wikipedia Networks

So the mouse-over text of today’s xkcd (“Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at ‘Philosophy.'”) has inspired a little playful procrastination. I’d love to put together one of those fun xkcd-style info graphics (the ones with results of interesting little Internet experiments, e.g. “Numbers,” “Regrets,” “Dangers,” etc.) with the results of some collective poking around. Data so far (from myself, Katy “Southside” Huff, Matt Waldron, and Eric “Wolfman” Howell):

“xkcd”: 19 clicks
“Kadevu”: 21 clicks
“Walker Percy”: 27 clicks
“Kevin Bacon”: 13 clicks
“Wisconsin Badgers”: 27 clicks

Also, can someone who knows more about graph theory than I do give us some vocabulary to flesh out the kinds of data we can gather (or wish we could gather)? For instance, Matt Waldron asks via Twitter “I wonder what the longest non-loop answer is (i.e. was the furthest ‘point’ from Philosophy)?” His point about loops (graph theory: “cycles”) is an interesting one. Has anyone found a cycle yet? I thought I had one in the Percy chain, but it turns out there are separate articles for “Meaning (philosophy of language)” and “Meaning (linguistic).” (This is one of those moments where I wish I were a better programmer and could just start writing code to explore all these questions. I’d also need to not be on the clock with someone else’s money, which may actually be all that is stopping me.)

Anyway, if you’re looking for a few minutes off from whatever you doing (I myself am determined to finish my Walker Percy paper for the upcoming Christian Scholars Conference), please consider checking out a few articles’ paths to “Philosophy” and report back!

Go With The (Nework) Flow, Part I

Some preliminaries:

(1) I couldn’t resist posting a link to this New York Times piece about eHarmony, et al. The “Algorithms of Love” in the headline alone made it worth it. (By the way, I love it when copy editors choose to force “EHarmony” and the like when these ridiculously capitalized words come up at the beginning of a sentence. It’s like a little “screw you and your trademark” from the folks for whom sloppy capitalization is almost an affront. Speaking of which, sorry for the up-style headlines on this blog. I abhor up style, but I somehow backed myself into this corner and am not about to back down now.)

(2) My friend Rachel just let me know that you can hear David Foster Wallace reading “The View from Mrs. Thompson’s” from Consider the Lobster on “KCET Podcast: Hammer Conversations” (Episode 16), which is available on iTunes. I listened to it this evening, and it’s terrific. Copy snobs will love the little explanation about his use of em dashes, but anyone will almost certainly be moved by the story. Plus Wallace’s reading voice matches his “authorial voice” really well, in my opinion.

OK, on to the main event. I mentioned a couple posts ago that I hope to use this space as a sort of whiteboard for trying out ideas, and I’m expecting to need such a space in the coming weeks. I’m getting ready to start working on the algorithms for matching material offers and requests in GENIUS and as such am learning about solving network flows problems. Wanna learn a little bit about them with me? If so, read on.

We’ll start with the basic first lesson, which I sat through just the other day. The gist of flow networks is that you’ve got a collection of nodes with material traveling between them along directed connections called arcs. Nodes are either sources (supply nodes that create material), sinks (demand nodes that consume material), or transshipment nodes that simply send a material along.

What we try to solve for in these problems is an optimal flow vector, which is just a fancy name for a long list that says how much of the material flows along each arc. The vector is optimal in the sense that it represents the flow for which the problem constraints are met in the cheapest way possible (there’s a cost associated with moving a unit of material along each arc). The problem constraints are flow bounds (upper and lower limits on how much flow must move along an arc) and conservation of flow, which says that the outflow minus the inflow at each node must equal either zero (for transshipment nodes) or the supply or demand of the node (for sources and sinks, respectively). The second set of constraints are also called divergence equations.

Brief mathematical note for those who are interested: network flow problems are special cases of linear programs, albeit much easier to solve ones (via the network simplex method, rather than general linear programming’s modifier-less simplex method). There are also, apparently, special algorithms for solving various special-case problems that can be posed as network flow problems, including Euler’s famous Konigsberg Bridge Problem.

What does all this have to do with the nuclear fuel cycle? Stay tuned as I try to figure that out.