New modes of interaction for Flip: Annotating streaming video!

At the end of ever Spring Semester, the extended ITP community gathers round for a solid week (Monday-Friday, 9:40AM-6PM) of thesis presentations. It’s part gauntlet and part send-off for the graduation class.

This year, with the help of Shawn Van Every (the original architect, builder and maintenance man of the ITP Thesis Streaming Service), I had the opportunity to continue my run of informal experiments in video annotation by trying it out on a handful of thesis presentations.

For the third year running, Thesis Week has been accompanied by the Shep chatroom, created by ITP alumn Steve Klise. Shep immediately turned itself into a great place for backchannel commentary on the presentations…or not. I’ve always felt it would be great to see aggregations of Shep activity attached to the timecode of the presentation videos. Shep conversations unfortunately aren’t logged. I also wondered if turning Shep into some kind of “official record” of audience reactions to thesis would be something of a killjoy.

With the Ponder-Thesis experiment, I wasn’t exactly sure where “annotating thesis presentations” would fall on the work-fun spectrum.

It might be seen as some sort of “semi-official record.” That would make it “work,” and perhaps a bit intimidating, like volunteering to close-caption T.V. programs.

But annotating with Ponder requires more thinking than close-captioning which presumably will soon be replaced by voice recognition software if it hasn’t been already. So maybe it can be fun and engaging in a the same sort of “challenging crossword puzzle” way that Ponder for text can be.

Either way, the end-goal was clear, I was interested in some sort of read out of audience reactions: Cognitively (Did people understand the presentation?); Analytically (Was it credible, persuasive?); and Emotionally (Was it funny, scary, enraging, inspiring?).


We were able to get Ponder-Thesis up and running by Day 3 (Wednesday) of Thesis Week.

I sent out a simple announcement via email to come help test this new interface for annotating thesis presentations in some sort of group record to created an annotated record of what happened.

Unlike previous test sessions, there was no real opportunity to ask questions.

Results: One Size Does Not Fit All

Annotating live events is a completely different kettle of fish than annotating static video.

I had made several changes to the static video interface in preparation for thesis, but in retrospect, they weren’t drastic enough.

All of the things I’ve mentioned before that make video annotation harder work and generally less enjoyable than text annotation are repeated 10-fold when you add the live element because now slowing down to stop and reflect isn’t just onerous, it’s not an option.

As a result, the aspect of Ponder that makes it feel like a “fun puzzle” (figuring out which sentiment tag to apply) can’t be done because there simply isn’t time.

It was challenging even for me (who is extremely familiar with the sentiment tags) to figure out at what point to attach my annotation, which tag to apply *AND* write something coherent, all quickly enough so that I’d be ready in time for the next pearl of wisdom or outrageous claim in need of a response.

There was also hints of wanting to replicate the casual feel of the Shep chatroom. People wanted to say “Hi, I’m here” when they arrived.

Going forward, I would tear down the 2-column “Mine v. Theirs” design in favor of a single-column chat-room style conversation space, but I will go into more detail on next steps after reviewing the data that came out of thesis.

Donna Miller Watts presenting: Fictioning, or the Confession of the Librarian

Donna Miller Watts presenting: Fictioning, or the Confession of the Librarian

The Data

  • 36 presentations were annotated. However, 50% of the responses were made on just 6 of them.
  • 46 unique users made annotations. (Or at the very least 46 unique browser cookies made annotations.)
  • 266 annotations in total, 71 of which were { clap! }.
  • 30 unique sentiment tags were applied.
    • ???, Syntax?, Who knew?
    • How?, Why?, e.g?, Or…, Truly?
    • Yup, Nope
    • Interesting, Good point, Fair point, Too simple
    • { ! }, Ooh, Awesome, Nice, Right?
    • Spot on!, Well said, Brave!
    • { shudder }, { sigh }, Woe, Uh-oh, Doh!
    • HA, { chuckle }, { clap! }
  • At peak, there were ~19-20 people in Ponder-Thesis simultaneously.

Broken down by type, there were 39 Cognitive annotations having to do with having questions or wanting more information. 69 Analytical annotations. 158 Emotional annotations, although almost half (71) of those were the { clap! }.

Over half of the non-clap! responses had written elaborations as well (113).

  • Natasha Dzurny had the most applause at 10.
  • Sergio Majluf had the most responses at 26.
  • Kang-ting had the most emotional responses at 18.
  • Talya Stein Rochlin had the most emotional responses if you exclude applause at 14.
  • Sergio Majluf racked up the most eloquence points with 3 “Well saids!”
  • Talya Stein Rochlin had the most written commetns with 15 and the most laughs at 3.

Below are roll-ups of responses across the 36 presenters categorized by type.

  • Cognitive: Yellow
  • Analytical: Green
  • Emotional: Pink

Below is a forest-for-the-trees view of all responses. Same color-coding applies.

Forest-for-the-Trees view of  responses.

Forest-for-the-Trees view of responses.

Interaction Issues

I made a few design changes for thesis and fixed a number of interaction issues within the first few hours of testing:

  • Reduced the overall number of response tags and made them more straightforward. e.g. Huh. which has always been the short-form of “Food for thought…” became Interesting!
  • Replaced the 3rd-person version of the tags (xxx is pondering) with the 1st-person version: Interesting! because after the last test session, I felt a long list of the 3rd-person responses felt a bit wooden.
  • Added a { clap! } tag for applauding.
  • Made the “nametag” field more discoverable by linking it to clicking on the roster (list of people watching). Probably giving yourself a username should be an overlay that covers the entire page so people don’t have an opportunity to miss it.
  • As responses came in, they fill up in the “Theirs” column below the video. Once there were more comments than would fit in your window viewport, you wouldn’t see new comments unless you explicitly scrolled. We explicitly decided not to auto-scroll the list of responses for static video to automatically keep in time with the video because we thought it would be too distracting. For streaming however, auto-scroll would have just been one less thing for you to have to do while you’re trying to keep apace of the video and thinking about how to comment.

Other issues didn’t become apparent until after it was all over…

  • People didn’t see how to “change camera view.” The default view was a pretty tight shot of the presenter. Really the view you want is a wider shot that includes the presentation materials being projected on the wall behind the speaker.
  • The last test session helped me realize that for video the elaboration text field should stay open between submits. But really, it should probably start off open as it’s too small to be something you “discover” on your own while the video is going.
  • The star button which is meant to allow you to mark points of interest in the video without having to select a tag was never used. I’m not sure how useful it is without the ability to write something.


The obvious first step is to go in and squish the remaining interaction issues enumerated above. But there are more systemic problems that need to be addressed.


  • People wanted to say “hey” when they logged on. The “live” nature of the experience means social lubrication is more important than annotating text or video on your own. ITP Thesis is also a special case because the people annotating not only know the presenters personally but are likely sitting in the same room (as opposed to watching online.) One person said they were less likely to speak their mind on Ponder if they had something critical to say about a presentation.
  • There is also general trepidation over attaching a comment to the wrong point in the timecode. One person who is also familiar with the Ponder text response interface described the problem as “I don’t know what I’m annotating. With text, I know exactly what I’m selecting. With video, I’m doing it blind.”

Solution: Chatroom Layout

Replace the 2-column design in favor of a unified “chatroom” window that encourages more casual chatter. If the timecoding feels incidental (at this point in the presentation, someone happen to say such-and-such) then you worry less about attaching your annotation to the precisely correct moment.

Problem: Too many tags.

The sentiment tags got in the way of commenting. There were simply too many to choose from. They knew what they wanted to write, but the step of selecting a tag slowed them down. This was true of static video for those watching STEM instructional videos as well.

Solution: Reduce and Reorder

  • Slim down the tag choices, in effect trading fidelity of data (we’ll lose a lot of nuance in what we can aggregate) for lowering the bar for engagement. There should probably be something like 3, but alas I can’t figure out how to chuck out 2 of the following 5.
    • Question?!
    • Interesting
    • Are you sure about that?
    • HAHA
    • { clap ! }
  • Reorder the workflow so that you can write first and then assign tags after, not dissimilar to how people use hashtags in Twitter.

This rearrangement of steps would turn the live video workflow into the exact inversion of the text annotation experience, which is tag first then elaborate for very specific reasons that have more or less worked out as theorized.


The modest amount of data we gathered from this years’ thesis presentations was enough to validate the original motivation behind the experiment: Collect audience reactions in a way that can yield meaningful analysis of what happened. However there remains a lot of trial-by-error to be done to figure out the right social dynamics and interaction affordances to improve engagement. There are clear “next steps” to try and that is all you can every ask for.

The only tricky part is finding another venue for testing out the next iteration. If you have video (live or static) and warm bodies (students, friends or people you’ve bribed) and are interested in collaborating with us, get in touch! We are always on the look out for new use cases and scenarios.

Leave a Reply

Your email address will not be published. Required fields are marked *