Research Interests

NodeXL

2008 - Present

I am part of the NodeXL team! NodeXL is a tool for Network Overview, Discovery and Exploration, which uses Excel 2007. Get it for free and find out more here!

We will be publishing a book on Social Media Analysis with NodeXL later this year!

Summer Work

Summer '09

I spent the summer at the Palo Alto Research Center (PARC), working with Nicolas Ducheneaut on impression management in Facebook status updates. Facebook is, among other things, a platform for the presentation of self and the status update mechanism allows Facebook users to not only keep in touch with their friends but also to make impressions of their friends based on the contents of the update. I developed a Facebook application called Rate Your News Feed that asked users to report their impressions about their own and their friends' status updates, and collected these impressions in a non-identifiable manner, in accordance with Facebook's data collection guidelines.

An early analysis of the impression data shows that Facebook users generally make a positive impression upon their friends, but occasionally came across as self-important (a negative impression) without realizing it. This finding suggests that users are not always adept at performance management in online settings, and merits further research. Nic Ducheneaut and I expect to publish our results at ICWSM '10 (Barash et al. - forthcoming).

Summer '08

I spent the summer at Microsoft, working for Marc Smith on online communities. We got to do some really interesting research on the kinds of contributions to online communities, and investigated whether we can predict whether an author is going make a factual, opinionated or chatty contribution based on her past activity, especially on her patterns of interactions with other authors. We have published the results of our work in ICWSM 2009 and C&T 2009.

Summer '07

I've developed a tool for small scale crawls of the Internet Archive + the live Web! The interface is currently hosted by Cornell and is intended for research purposes only, so I'm afraid I can't post the link on this site. If you are a researcher, and interested in using this tool, please email me: vdb5@cornell.edu

Note: the project is currently plagued by strange encoding problems and crawling the Internet Archive is not possible at this time (crawling the live web still works). I am working on the issue, and will post when it is resolved.

 

Diffusion of Complex Contagions

May '07-present I am working with Chris Cameron and Michael Macy at Cornell Sociology to investigate the diffusion of rumors, social movements and products in networks. We are using the Centola and Macy ('07) complex contagion model to simulate the diffusion of costly behavior (such as joining a social movement) on a variety of network structures, including rewired lattices. The major finding so far is the great importance of network structure in determining the extent of diffusion for costly behaviors. In "small-world" network structures that contain both local communities and shortcut ties (making them good models of empirical networks), costly behaviors can spread throughout the entire network, but are just as likely to "die out" very early on, and reach only a small fraction of the population. The critical success factor is the network structure through which the behavior spreads initially, i.e. the ties between the "early adopters" of the behavior. We show analytically that, if the behavior spreads through as little as 2-4% of the entire population, it is extremy likely to spread through the rest. The crux of the analysis is the ability of complex contagions, which represent costly behaviors, to take advantage of shortcut ties in the network. This is a very surprising finding, as the complex contagion model requires redundant ties between infected nodes and uninfected nodes for the contagion to spread, and shortcut ties are not likely to be redundant. We have begun empirical investigations of costly behavior adoption in Flickr and Twitter to test how our analytic results hold on real-world contagions. We can show that an increase in the number of redundant ties between infected and uninfected nodes does, in some cases, precede an increase in behavior adoption. Interestingly, we can also show that adoption in the network precedes widespread knowledge of the behavior - Google Trends about products lag behind the network adoption of those products in online communities by a few days.

 

Online Communities

2009 Upcoming: last.FM How do authors become popular in a music-oriented community? This work investigates diffusion models, including Michael Macy's complex contagion model, in the context of last.FM, a site for purchasing and discussing music. This work has not begun yet, but I am very much looking forward to it!
December '08 - present Amazon.com: a new project with Michael Macy, Patrick Park, Yongren Shi and Fedor Dokshin at Cornell is looking at patterns of review scores at Amazon.com. Motivated by earlier studies, we are looking more closely at the question of of whether earlier review scores affect later review scores. We have found strong patterns of positive path dependence between reviews (later review scores closely resemble earlier review scores of the same book). We think that the driving mechanism behind path dependence is an influx of "enthusiasts" who are both the most likely to leave a glowing review of a book and the most likely to review earlier than the general populace. We have done a comparative analysis of the US and Japanese Amazon book collections, which confirm the validity of our results in different cultural contexts. We have a manuscript ready and hope to publish this work later this year.
May '08 Wikipedia: one of my final class projects considers a set of statistical models inspired by Ted Welser's work on roles in Wikipedia. I've found that certain patterns of past user activity that correspond to highly social roles, also predict future social activity in Wikipedia. I don't know right now when / where this research will be published, but the manuscript is completed and just awaiting a good opportunity :)
May '07 Wikipedia: I have been involved with the Institute for Social Sciences at Cornell, doing research on Wikipedia. This semester saw several very interesting projects, including studies of the diffusion of innovation on Wikipedia, examinations of Wikipedia contributor roles, studies of article quality, and so on. I was involved on the data collection end of a couple of these projects, and got a lot of experience in parsing data. The projects will hopefully culminate in several papers to be finished in the Fall; I will add them to my Publications when I have them.

 

Respondent-Driven Sampling

September '08

I've joined Doug Heckathorn's project on Respondent-Driven Sampling. This project investigates a unique sampling method, developed by Doug Heckathorn in the late 90's for reaching hard-to-access populations (like drug users, MSM and other groups). One of the advantages of Respondent-Driven Sampling is that it uses network structure to construct the sample. Chris Cameron and I are building a simulation frameowrk to test the robustness of the samples created by this method and doing some analysis on the network properties revealed during the sampling process.