May
29

Redprint Experiment Result

Thank you for all the suggestions. They were really helpful.

No code-marking:

I made some major changes in the experiment setup for this round of code-marking experiment. The new code-mark experiment can be seen at below links:

Session 0: http://projects.abhardwaj.org/redprint-experiment/codemark/ps0.php

Session 1: http://projects.abhardwaj.org/redprint-experiment/codemark/ps1.php

Session 1: http://projects.abhardwaj.org/redprint-experiment/codemark/ps2.php

I ran this experiment with 6 more subjects and after discussing the results (only 1 out of 6 code-marked) with Scott, I decided to focus on the other aspects of Redprint which looked more promising. As I had said earlier, “instant example display” was the feature people liked the most and so Scott suggested me to run some experiments which focuses on “instant-search through the editor window” vs. “search through search bar”.

Instant Example Display Experiment Setup:

Randomly, participants were given either instant-enabled or instant-disabled setup.

First session (Session 0) was a practice session (look at the URL below):

Session 0: http://projects.abhardwaj.org/redprint-experiment/instant-enabled/ps0.php

In this session, we asked them to first get familiar with the IDE by doing various tasks. Then we gave them a sample code, which they executed. The sample code was designed in such a way that it introduces them with the APIs that can be useful for the next session.

Second session (Session 1) was the actual coding:

Session 1: http://projects.abhardwaj.org/redprint-experiment/instant-enabled/ps1.php

We gave all the participants a starter code and they were asked to complete the code for computing mean, median and mode of a given dataset. We gave them the algorithms for computing mean, mode and median.

The experiment result seems to be very promising. Result summary can be seen at:

https://spreadsheets0.google.com/spreadsheet/viewanalytics?formkey=dHZob2k5TWkwTlRVdzBzUkpaUnRmRWc6MQ

 

 

 

May
20

Redprint Experiment (Need Suggestions)

In this blog, I will discuss results of the first controlled experiment I conducted for Redprint. I will also touch upon various issues I see with my experiment setup and some of the difficulties I have in simulating some scenarios.  In the end I will talk about a few other observations which I hadn’t thought of initially as part of my experiment but is worth considering.

Participants Background:

This controlled experiment involved 4 subjects over three different sessions. Over the next week, I would be conducting two more experiments with 8 more subjects. Out of the four subjects participated in the experiment, three are Stanford CS students (two MS students and one PhD student). The fourth participant is a recent graduate (completed his MS CS last year from NYU and is currently a s/w developer).

Experiment Tasks:

The participants did one of the two tasks below:

Task #1 (the 3 Stanford students worked on it)

Session 1: Create an array, add some values in in it. Print the array

Session 2: Implement STACK as an ADT (push() and pop())

Session 3: Implement Queue as an ADT (  enqueue() and dequeue() )

Task #2 (the NYU student worked on it)

Session 1: Write a function to create md5 checksum of a string

Session 2: Write a function that can take username and password as an argument and store it in a global array (password should be stored as md5 hash)

Session 3: Write an authentication model which can add user account (username ad password) and authenticate it for a given username and password.

For different participants, the time gap between sessions were different (mainly because of their schedule).

Observations:

Despite the fact that I had told them about code-mark feature, only one out of the four subjects really did code-mark in the first session. I deleted the first session code for all subjects and they were forced to start afresh. Before the start of the second session, I told them that we might delete any of their files and if they really want to reuse their old code, they must code-mark it.

And everybody did code-mark his code in session 2. They reused their code-marked snippet in session 3.

Question: Do you think, it was natural? Seems like, I forced it on them. What can be corrected for other subjects?

 

Most of the people code-marked the entire file. Only one of the four participants code-marked snippets/functions.  After looking at the diff of the code in session 2 and session 3, I think most of them just modified their code-marked snippet to get the things working in session 3 (that may be primarily because of the fact that the task in session 3 was very little different from the task in the session 2). The quality of the code varied among different individuals. One participant wrote a really high quality code while the other three wrote the average quality code.

I couldn’t measure the time for the first session. For the second session: mean time was 30 minutes with a standard deviation of 10 minutes, while for the third session, mean time was 11 minutes with a standard deviation of 4 minutes.

Post Experiment Survey:

All participants were asked to fill a post-experiment survey. Please see the below link for details:

https://spreadsheets.google.com/spreadsheet/ccc?key=0AlQqshOTBvVndE9FZGlJdFhHYWVReDd0SFMyNGFzbmc&hl=en_US&authkey=COCnm4YB

 

Usage of different component of the IDE:

I logged the number of requests made to different modules of Redprint.

Note:  every call for a search  (in Reminisc searchbar) calls the three methods – searchCodemark(), searchExamples() and searchHelp()

Also, this search only includes search requests made by users by typing a query in the search box. It doesn’t include requests made for instant example display while typing code in the editor window.

 

Conclusion

I am still unsure, what to conclude from this data. I also think that I have made an error in my experiment (I shouldn’t have deleted the file from session 1). But the problem is, had I not deleted it, nobody would ever use the codemark as the file would be visible in his workspace and he can simply modify the same file. In order to prove the usefulness of codemark, I must simulate a situation where a user can’t find his old code. I am not sure, what should I change for my remaining experiments. Any suggestion would be welcome.

There are some other findings from this experiment though, which are very encouraging. Redprint IDE is very different in a way that it shows instant example results while typing code in the editor window. Some of the participants really liked it. They say that, this is the best thing they like about Redprint. Do you think it would be worth giving a try comparing variables like code reuse and time with instant example display enabled vs. disabled. I think that would be a good experiment. What do you suggest? I think this can also be a very good hypothesis. As far as I know, no editor has instant example display.

 

May
14

Blueprint PHP is now called – “Redprint”

The first thing first – Name of the editor has been changed to Redprint (earlier we called it Blueprint-PHP ). Scott thinks that this is a completely different system now and should have it’s own unique name. So, let’s call it “Redprint”.

check it live: http://projects.anantbhardwaj.com/redprint/

It’s almost a fully functional editor now (with syntax highlighting and indentation support for html, javascript and PHP). It supports the entire PHP stack (PHP, HTML, Javascript) and all PHP extensions.

Over the last week,  we parsed whole lot of web and now we have 6000 distinct APIs from PHP core and PHP extension modules.  The Redprint IDE has examples and documentation support for all these 6000 APIs.

We have added some really nice features like code walking (keep cursor on any line of the PHP code in the editor window and it will figure out the keywords from that line and will present you with some nice and useful examples). We have added ability to do a natural text search (it’s difficult to remeber APIs). And last but not the least you can execute your code from IDE itself (you don’t need to setup a web server).

Can you imagine if an IDE could provide you a capability to execute a small snippet of independent code in the editor itself? You can do that in Redprint. And finally, you can create, retrieve and manage your code-marks, which is the focus of this experiment.

I am planning to start with my actual user testing this week. I can’t write about the tasks here because a potential subject might read this blog (and he would miss the surprise) and it can also induce a “bias” in my result.

In the end, I will talk a little bit about my previous blog on Google Desktop search. Many people were wondering how can it be possible. Here is my justification:

When I say that precision and recall for Google Desktop search is low, it shouldn’t be generalized. It is only for desktop code searching. We make the cood-marks in the format suitable for our indexer and our indexer knows the format of the bookmarked code, it optimizes the index by doing zone-indexing. Google desktop indexer can’t do a zone-indexing because there is no standard way of storing source code, which Google desktop can rely on.

A question might come in your mind – “then why Google web search still gets you better results for code search?” Answer is – Google web search uses lots of optimizations like fixed credibility score (it boosts the weight of PHP.net or StackOverflow website examples for php code search) which Google desktop can’t do. Also, Google web search uses lots of signals and relevance feedback (they improve their search by learning from hidden signals from billions of users) which might not be applicable for desktop search where there is only one user.

I will try to keep the blog updated. Stay tuned!


May
08

A quick update on Reminisce

  • We have completed our pilot study exercise.
  • We have completed an experimental study of effectiveness of Google Desktop Search for source-code searching on desktop.
  • We have completed developing our own optimized indexing engine for code-marking

 

 

May
06

Why Google Desktop search is not a good idea for searching source code on your desktop

When doing experiments with code-marking (Ref: http://blog.abhardwaj.org/2011/04/21/reminisce-a-tool-for-code-marking/ ) , it is often a good idea to compare the results with Google Desktop search. Why do I need to code-mark if I can get results by using Google Desktop search.

I had Google Desktop running on my machine since last 6-7 months. In the last 6-7 months, I have done many PHP projects and most of these projects use login, sign-up, file operations, mysql operations etc…

For a new project, I wanted to reuse my old login form. I thought, I can search it through Google desktop. It is interesting to note that Google desktop couldn’t find any of the relevant code files in its top-10 results. Furthermore, the top 10 results were not relevant at all. (Ref: Figure 1)

One other interesting point is that Google performed poorly in searching local copy of PHP.NET documentation. For “PHP file read” query, it didn’t fetch any relevant result in top 10. (Ref: Figure 2)

google-desktop-search-1

 Figure 1.


google-desktop-search-2

Google Desktop Search Result for query "PHP file read"

Figure 2.

 

May
06

Blueprint in BASES Product Showcase

A good news to share – my Blueprint-PHP project  got accepted for the showcase at The BASES 150K challenge.

Bases Link: http://bases.stanford.edu

Blueprint Link: http://projects.anantbhardwaj.com/blueprint/

 

May
05

Pilot Experiment: re-using source code.

I conducted an experiment to understand, how much do people reuse source-code. And if they do, does it really help them complete the task faster.

We had 4 participants, all CS grad students. They all used PC at Blackweilder cluster.

Day 1:

Task: We asked them to create a Login page which will authenticate users for logging in to zippings home page.

Scenario: We gave them Zippings database (MySQL)which had a table called User with three fields – id, email, password. The table was already populated with a few records. We allowed them to use Internet or any other aid.

Observations: All four participants searched on Google and copied a code for login. All of them modified the code after copying.  The time taken by each participant is below:

P1A – 35 minutes

P2A – 48 minutes

P1B – 42 minutes

P2B – 46 minutes

Day 2:

Task: We asked them to write a new page to register a new user for the same application (zippings).

Scenario: For Group A participants (P1A and P2A) we gave them access to their previous day code. For the Group B (P1B and P2B), we gave them a raw setup.

Observations: Group A (they had access to their previous day code) didn’t start with Google search like the first day. They copied their previously written login form, renamed it to register in the Login UI page, created a copy of login-exec and renamed it to register-exec. Later they did some modification to the register-exec to make it work. In between they did Google search for getting php code for mysql insert query. The time taken by group A is below:

P1A: 12 minutes

P1B: 17 minutes

Group B went straight to Google. They copied code after searching the web and later modified it to work for them

P2A: 19 minutes

P2B: 21 minutes

Analysis: Although the sample size is very small and it is very difficult to make any conclusion, one thing is clear that people who have access to their previous code tend to use it in their current project, if relevant. Also, in this experiment, people who had access to their previous day code completed their tasks faster than the people who didn’t have access to their previous code.

 

 

May
05

Accurately interpreting clickthrough data as implicit feedback

Paper link: http://portal.acm.org/citation.cfm?id=1076063

It is common for search engine algorithms to use the user clicks as an input signal to improve the search ranking. And, this paper argues that it could be misleading to rely on raw user clicks as an input measure to rank refinement algorithms.

This paper demonstrates that clicking decisions are biased in at least two ways. First, there is a “trust bias” which leads to more clicks on links ranked highly, even if those abstracts are less relevant than other abstracts the user viewed. Second, there is a “quality bias”: the users’ clicking decision is not only influenced by the relevance of the clicked link, but also by the overall quality of the other abstracts in the ranking. This shows that clicks have to be interpreted relative to the order of presentation and relative to the other abstracts.

I think that, the trust bias is pretty obvious. Users believe that the most relevant documents are on the top (which usually is the case). The “quality bias” argument can be further substantiated by the concept of “Information Scent” pointed by Piroli et al. in the paper – “Exploring and Finding Information”.  Information scent concerns the user’s use of environmental cues (abstract summary presented by Google  in this case) in judging information sources and navigating through information spaces.

I liked the experiment, especially the use of eye-tracking to understand users’ decision process. There is another important point to note though – were the abstracts clearly reflecting the relevance. I believe that users make the decision of clicking a link (or not clicking a link), primarily on what is presented in the link summary (abstract). I think this data would have been more insightful, had he presented the quality of the abstract for the different links and also some statistics on how effective the abstracts were to communicate the relevance to the users.

But overall, I think the experiments presented in this paper is convincing enough to believe that raw user clicks data is misleading and it must be interpreted keeping in mind the current ordering of the results for a given query for making any refinement in the search ranking.

 

May
05

Taking the Time to Care: Empowering Low Health Literacy Hospital Patients with Virtual Nurse Agents

http://portal.acm.org/citation.cfm?id=1518891

A great piece of HCI work. I was like – wow. And, it’s not just the idea, it is a wonderful interaction design and a great demonstration of how HCI research can enable the benefits of technology for under-represented community.

I really liked the way they have thought of every minor detail that can improve the interaction between a patient and the system. Mark Weiser in his paper “The computer for the 21st century” points that machines that fit the human environment instead of forcing humans to enter theirs will make using a computer as refreshing as taking a walk in the woods”. And if you look at some of the design choices made in the design of the Virtual Nurse, for eg.  the look and the character, whether or not to repeat the information when a patient fails the comprehension test etc.. – these are such important design considerations for this scenario. Even subtle thing like inclusion of social chat in the conversation, contribute a lot to the satisfaction patients reported after using the system.

About the experiment design, they mention that the final VN system will talk to patients every day they are in the hospital, while this study evaluated a single interaction at the time of discharge. I am not sure but I think that might affect the actual result. There is a philosophy that says, if you are listening to somebody for a first time, you pay more attention. In this experiment, all the subjects used this system for the first time during the discharge.

I was also a little confused about one thing until I read the conclusion – what is the success of this system (can very easy to use with less than a minute of training, and high levels of satisfaction with the system  be considered a complete success?). II would say, it is a minor success. I think a major success would be only when VN leads to fewer re-hospitalizations. I would really be interested in knowing about that finding. (they have mentioned that it would be their “Future Work”).

But I would still give (although I think there is a minor flaw in the experiment design) 6/5 (1 extra point) because I feel it is a great HCI research and a wonderful interaction design.

May
05

Example-Centric Programming: Integrating Web Search into the Development Environment

Paper link: http://portal.acm.org/citation.cfm?id=1753402

Google has been around since more than a decade. All of us (programmers) search web for code examples. I just can’t imagine, what would be the productivity of programmers if there is no web search. And yet it took so long for people to realize that web search can be integrated into programming environment.

I had read this paper about a six months ago and was so much inspired by it that I decided to write a open-source version of Blueprint for one of the most widely used web language – PHP. One major issue, I encountered is the example extraction from the unstructured web. It is very difficult to take a user query, modify it to fit in the context, fire it to Google, post-filter Google results, extract the relevant snippet of code and re-format it, all in real time.

The blueprint presented in this system minimizes this with limiting it’s search to Adobe Community Help website. However, for many popular languages (like HTML, PHP etc…), the search can’t be limited to one website. Also, it is more  difficult to filter out noise as one needs to parse documents in different formats and different structure.

Another issue, I observed is in the trust. People tend to have more trust on search engines like Google than searching through Blueprint web interface. As this paper mentions, people used Blueprint mainly when they encountered the tasks they had already done before (they can just look at the Blueprint result and they know that it is the right code).  Whenever, there is a new code that programmers need to write, they search on web rather than Blueprint.

But, I think Blueprint is a great idea. Most of the problems related to Blueprint are engineering problems which would eventually be solved. And about the trust, I think, as more and more people start benefiting from it, they would start trusting it as well over a period of time.

Older posts «