On the Marq, Finally

A while back, I lamented all the difficulties involved in establishing a paperless seminary workflow. Lots of people chime in, but in the end we didn’t locate an ideal way to do the main task: mark up PDFs (with highlighting, marginalia, etc.). During that process (though not on the comments–perhaps via Twitter?), someone told me about Marqed.com, and online service that provides tools for doing most of the things we had discussed. To my great frustration, however, it was really buggy (perhaps just on my system–Firefox 3.0.17 on Ubuntu 9.04).

However, highlighting at least seems finally to be working well enough to make this a legit go-to tool for now. You get ten PDF uploads per month with the free version, or you can upgrade to unlimited. (The paid account allows you to upload MS Office documents as well, though don’t ask me why you’d want to involve a Web tool to edit a document that can already be marked up natively. Maybe for read-only files?) Of course, I’m not wild about being dependent an Internet connection in order to view my files, but all our classrooms here at VTS have Wi-Fi, so I guess I can deal with this for now. Anyway, I feel like I can finally recommend it. Check it out at www.marqed.com.

In other news, we got some beautiful snow last night here in Northern Virginia. I posted a few quick pictures to Facebook here.

The Ongoing Pursuit of a Paperless Seminary Reading Workflow

In seminary, we read a lot. Like, probably more than we do anything else–including playing intramural sports (a surprising but deeply rewarding time sink), praying (though we’ve received tremendous support in this respect), sleeping (at least it feels that way), and complaining (a necessary thing sometimes, let me tell you).

And–as any humanities major knows but us engineering students are always too busy with problem sets to notice–retaining even a small fraction of that reading is a matter of no small challenge or importance. The old middle school “reading notes” model is an almost laughable prospect due to the shear number of pages we’re talking about here. The highlighter, I’ve been told, is my friend. I have come to agree whole-heartedly.

However, because this school thankfully realizes that part of being good stewards of God’s creation is to learn to use less paper (and because–let’s be honest–who reads paper copies of anything these days, except maybe for actual books?), I find myself with a quandry: how do you highlight PDFs?

You may know that this is a maddeningly difficult question to answer. Trying to do so may be the one thing I’m spending more time on than the actual reading. The problem, as I see it, is that it’s impossible to justify spending the money on programs like Adobe Acrobat or Foxit Editor when all you want to do is highlight some text in any damn document you please. I’m not an expert in digital copyright or fair use, but I really don’t think this is too much to ask.

In a move we’re apparently supposed to interpret as magnanimous, Adobe now allows Reader users (people like me who aren’t willing to pay for Acrobat) to do some basic markup on files with “document rights…enabled.” The problem–and surely the people at Adobe know this–is that I have never, ever, been given a PDF course reading with document rights enabled. Again, some of this may be a matter of legitimate intellectual property concern. But if these files are being used for educational use (and clearly that’s why my professors are allowed to distribute them as PDFs via course management software in the first place), it seems like merely applying a “highlight filter” to a local copy of the document ought to be fair game. Am I off base here?

Anyway, enough complaining…let me tell you what I’ve converged to and then put out a plea for anyone who finds this post and has a better solution to please help me out. After playing quite a bit with PDFedit and finding it summarily difficult to use (or maybe the Ubuntu distribution is just buggy?), I’ve settled on the more user friendly but still unsatistfactory flpsed. Basically, this program lets you do text annotation. As you can see in the screenshot below, the text manages to remain persistent even if you view the re-converted PDF in a program like Evince, which is handy. But this workflow still requires a lot of typing, when all I really want to be able to do is highlight. I’m encouraged by early experiments with Scribus, but I’m still fighting the learning curve.

Am I overlooking a simpler free (or cheap) solution? It wouldn’t be the first time. If so, please enlighten me. Is anyone else as perplexed as I am about this stunning lack of obviously useful functionality?

More Funny Found Science

Man, colloquia abstracts are a seemingly endless source of buried jokes. Check out the grad student dig in the following summary of a talk on using machine learning to study human and animal learning:

Machine learning studies the principles governing all learning systems. Human beings and animals are learning systems too, and can be explored using the same mathematical tools. This approach has been fruitful in the last few decades with standard tools such as reinforcement learning, artificial neural networks, and non-parametric Bayesian statistics. We bring the approach one step further with some latest tools in machine learning, and uncover new quantitative findings. In this talk, I will present three examples: (1) Human semi-supervised learning. Consider a child learning animal names. Dad occasionally points to an animal and says “Dog!” (labeled data). But mostly the child observes the world by herself without explicit feedback (unlabeled data). We show that humans learn from both labeled and unlabeled data, and that a simple Gaussian Mixture Model trained using the EM algorithm provides a nice fit to human behaviors. (2) Human active learning. The child may ask “What’s that?”, i.e. actively selecting items to query the target labels. We show that humans are able to perform good active learning, achieving fast exponential error convergence as predicted by machine learning theory. In contrast, when passively given i.i.d. training data humans learn much slower (polynomial convergence), also predicted by learning theory. (3) Monkey online learning. Rhesus monkeys can learn a “target concept”, in the form of a certain shape or color. What if the target concept keeps changing? Adversarial online learning model provides a polynomial mistake bound. Although monkeys perform worse than theory, anecdotal evidence suggests that they follow the concepts better than some graduate students. Finally, I will speculate on a few lessons learned in order to create better machine learning algorithms. (Source, but ultimately via Eric Howell on the Hacker Within mailing list.)

No exactly stand-up material, but I love that the guy was playful enough to put it in the abstract. I guess I shouldn’t be surprised, though, given what I found on this project’s spring 2009 schedule page. We actually had that same xkcd hanging in our office for a while.

THW on the Radio

A couple weeks back, The Hacker Within‘s fearless leader Milad Fatenejad and I did an interview with Matthew McCormick of Hacker Public Radio. I got notification today that it recently went live. Aside from having to suppress the occasional wince at my usual longwindedness, I had fun re-listening and think it turned out pretty well. If you’re interested in programming/computing or would just like to hear about what we’re up to and why, you can check out the interview here (I had to download it). Thanks very much to Matt for his help as we continue to try to get the word out about the organization and its work.

Another Sweet Google Tool

Three events recently converged to respark my interest in a little mini-project I tried to do some time ago:

(1) At yesterday’s Python subgroup meeting of The Hacker Within, our resident Pythonista got me all excited about developing easy web applications in that language. I write a lot of Python for pre- and post-processing of nuclear fuel cycle systems data, but I’ve never done any web-related Python work except for fixing a bug or two in some Trac instances. Nico got me pumped about the prospect.

(2) I started helping the Diocese of Milwaukee with their new Website, for which we’re using Google Sites in an attempt to improve the ease of collaboration and maintenance. I think Google Sites is pretty terrific, but it does have some limitations, and I’m interested in identifying some Google-compatible solutions. The Python-based Google App Engine seems like a promising direction.

(3) My friend Ryan re-activated pangramaday, which I’ve mentioned here before and is now available via Twitter (@pangramaday).

As it did during my short-lived interest in learning to develop Java Applets, the pangramist’s quandary motivated a little mini-project a few steps more complex than Hello, World! and perfect for learning a new set of interfaces. And this time I can actually publish the result (such as it is), because the Google App Engine framework is just so frickin’ easy to use.

So if for pangram-, crossword- or Wheel-of-Fortune-related purposes you ever need a list of words that all contain some given collection of letters, look no further than pangramhelper. It’s currently both ugly and slow, but if my interest in learning these APIs doesn’t wane too much, that may change.

It’s actually kind of fun to enter random (or not so random–can you tell I’m getting ready for the Easter Vigil?) letters and see what you get:

You wrote:

Christos anesti

We found:

anchorites
characterizations
chlorinates
cinematographers
interscholastic
orchestrating
orchestration
orchestrations
overenthusiastic
rhetoricians
stenographic
theoreticians
thermodynamics

It only took a few hours and about a hundred lines of Python (and most of those are just longhand HTML inside of function calls). Seriously, check out the App Engine.

The Difference Is Maintainability

So I write a lot of Python, and one of the claims promoters of the language usually make is that it helps you write more maintainable code. I think they’re right in that claim, and I think they’re right to stress the centrality of the issue.

We’ve discovered over the years at St. Francis House (and in my research group, for that matter, and at Wisconsin Engineer, if I remember correctly) that maintainability is also essential–and difficult–on the Web (of course, this is really just another kind of source-code-maintenance problem). In a high turnover organization, it’s especially hard to cultivate a continuous Web presence.

Say what you will about the low-powered solution offered by Google Sites, I think they’re on to something, and I’m super-excited that we’ve ported the St. Francis House website over to this system. Sure, I wish it were a little more flexible and powerful. But I think you’ll agree that it lets you construct reasonably attractive and well organized sites (nearby St. Andrew’s uses the system as well), and I can attest to the relative ease of use over other options (and I like screwing around with webpages and have learned a lot about XHTML/CSS in preparation for taking over for the semester as editor of this site about engineering education). Most importantly, no FTP or SCP is required (we computer geeks take these tools for granted, but I think they can be just as much a barrier as HTML).

I think Google’s got another winner here, at least for a presumably significant market niche (groups who want a good site but can’t afford to pay professionals, especially for maintenance and updating). I’ll keep you posted as to whether the feature-set improves in the coming months.

Hacker Within Meeting Friday

I doubt I have too many UW-Madison computer geek readers who don’t already know about this (if indeed I have any at all, which is also doubtful given my dire posting record of late), but on Friday at 2:15 in 414 Engineering Research Building, the Hacker Within computational science interest group that a few of us started this summer is going to be hearing from Tim Tautges:

Component interfaces or APIs should a) have the right level of abstraction, so they can handle new kinds of data without needing to be modified, and b) should be callable from multiple languages, and c) should not get in the way of good performance. I’ll describe the ITAPS mesh interface, which has been designed to meet these constraints.

Sound cool? More importantly, does this look cool?:

If so, you should come by. What better way to celebrate the end of the semester? 😉

Miscellaneous Updates

Let me surface from my digital dormancy (which one of these days I’ll get around to writing a post to explain) for a couple of quick updates.

First, I went with some other UW-Madison folks to UW-Platteville Friday for a conference of the North Midwest region of the American Society for Engineering Education. We didn’t stay for the evening banquet and keynote (nor obviously for the second day of the conference), but a lot of what we saw was interesting and encouraging. I was especially intrigued by Haiyan Zhang’s paper “A Model-Based Multidisciplinary Correspondent Methodology for Design-by-Analogy” and frankly touched by the important work reported in Dale Buechler’s fascinating “An Electrical Engineering Program for Place-Bound Students: The First Two Years.” If you’re interested in our paper, which was about ASEE student sections, you can read it here.

Second, you may notice that the above URL points to a non-UW-Madison domain. I’m trying to get untied from doing all my hosting on UW computers, and as a consequence you can now find this blog at blog.kyleoliver.net. I gotta admit, it’s going to take a little getting used to being a domain owner. One early bummer: Blue Host servers don’t have svn installed. Still, I’m excited to have a reasonably sustainable option for implementing that Holy Grail of personal file organization: putting your entire electronic life under version control (which, as my friend Matt points out, gives you superpowers).