Hi, my name is Tom White, and I'd like to help develop a few specific features into LimeSurvey.

I've been coding for over  30 years, including 10+ years each of Java/J2EE, SAS, C/C++, Perl, YACC, JavaCC; plus 4 years of PHP, and a smattering of other web, programming, and statistical languages.  My resume, CV, and LinkedIn profile can be found here:  www.tomwhitemd.com

The reason I'm interested in helping is a mix of selfish and altruistic.  I've always been a fan of open-source, and would like to see LimeSurvey compete with the top commercial applications.  It's already well on its way.  Selfishly, I'm trying to get out of having to host my own survey platform, so I'm trying to find a way to migrate current customers over to LimeSurvey.

About a decade ago, I created a free survey tool for researchers called Dialogix.  You can find some details about it here:  http://www.tomwhitemd.com/dialogix, and more detail here:  http://www.tomwhitemd.com/files/AMIA%202007%20Tuturial%20T3-With%20Cover.pdf.  I no longer own the dialogix.org domain, but have a few active instruments hosted here:  http://www.cet-surveys.org/Dialogix/ , although the main entry page is via here:  http://www.cet.org/eng/Tools_ENG.html.  I also have a handful of researchers who are using my tool to conduct field work in several languages including English, Spanish, and Hebrew.  That CET site  is hoping to host its instruments in about 15 languages.

My career has moved on and I can't afford to continue to support my platform.  I've though of open-sourcing it, but the code is pretty ugly at this point, since I didn't take an MVC approach at the start, so I'm not confident of on-going support of my platform.  However, I don't want to abandon my users, so I'm trying to find a viable migration strategy for them.  So far, the platforms which come closest to supporting their needs are LimeSurvey, and REDCap (http://project-redcap.org/). Both are missing some of the key functionality I need, but LimeSurvey may be easier  to modify since you already have strong multi-lingual support, plus matrix-style questions.  I'd rather help extend your platform than recommend that  my users custom-program their surveys.

My research colleagues were mostly building semi-structured diagnostic and epidemiologic interviews ranging in length from 10 minutes to 3 hours (e.g. 50-3500 questions).  Their interviews were multi-lingual (e.g. English/Spanish, English/Hebrew), and were a mix of self-administered and interviewer-administered.  So, they needed to dynamically switch languages.  They needed complex branching/skip-logic (e.g. ask the minimum number of questions needed to rule-in/rule-out each psychiatric diagnosis; conditionally ask follow-up questions based upon which diagnoses a person had).  They also had to dynamically tailor questions, answer choices, and messages (e.g. reports) based upon the information provided by the users.  My users used an  Excel template to author the instruments, and that Excel was imported into Dialogix to build the needed databases.  Those are the main features I need to replicate to support their needs.

For what it's worth, we also wanted the platform to support advanced psychometrics - e.g. help use make our  surveys better. So, we recorded response latency and path analysis data. We kept track of every answer a person gave, even  if they went back and changed their answer and navigated down a different path.  We used the timing / latency data to detect questions which systematically took too long (thereby suggesting they might be hard to understand or answer).  We used a mix of the timing and path analysis data to detect suspicious data on a case-by-case basis (e.g. surveys that were completed faster than should be possible, or sections of surveys that were completed too quickly).  We also had to support NullFlavors (e.g. special codes or flags to indicate that  a user Refused to answer, Didn't Know the answer, or Didn't Understand the Question, plus  a flag to indicate which questions were Not Applicable due to the branching logic). Although it would be nice to replicate those features, they are not critical for my current customers.

Now that I've spent some time reviewing the LimeSurvey functionality and code, here's my understanding of the gap between what LimeSurvey supports and what my current customers need:
(1) New Question-Type for Calculations.  This would support arbitrary mathematical or string calculations, often referring to the prior answers.  the result would be stored in the database so that it could be used in later conditions or piping.
(2) Conditional Tailoring/Piping:  This is needed to change the phrasing of a question, conjugate verbs/decline nouns, and form complex reports.
(3) Alternative Way to Reference Questions:  Rather than an SGQA naming system, my users are used to referencing questions by the question's variable name.  This makes it easier for the head researcher (who isn't a programmer) to be sure of the accuracy of the branching logic, validation logic, and micro-tailoring, as they all consistently refer to the variable names.  This also avoids locking one into a specific order.  My users often had to add, remove, or  re-order questions and groups during both the design and production stages of their research,so the only safe way to refer to their questions was via the variable name.
(4) Import from Excel or other flat-file format - ideally with one row per question. Although the LimeSurvey user interface is nice, once you get past a few dozen questions, it can become cumbersome; and for 3000 questions in a survey, it could get painful.

I think each of  those enhancements can be addressed without too much effort.  I'll open tickets in bugtracker with a proposed strategy so that these ideas can be discussed.

The common theme to the strategy I'll be proposing is to allow an new class of text substitution for questions and answers (like {INSERTANS:SGQA}, but more generic).  I happen to use in-line back-ticks, but any consistent syntax would be OK.  For example, using an embedded Excel-like syntax for true-false conditions, here are three questions that show a simple example of micro-tailoring:

(1) HasChild:  Do you have any childen?  Yes,1|No,0
(2) Gender:  What gender `if(HasChild="1","is your oldest child","might you want your oldest child to be")`?  Male,1,Female,0
(3) Name: What `if(HasChild="1","is","might you want")` `if(Gender="1","his","her")` name `if(HasChild="1","","to be")`?

Operationally, a string parser would break up the text,separating out the parts that are surrounded by back-tiks.  References to variable names would be recognized and expanded to PHP lookup syntax for that question's answer value.  If desired, function names can also be validated against a table of allowable functions.  Alternatively, you could build a simple YACC grammar, but that might be overkill.  I happened to need a library of about 90 functions to support the range of mathematical and text manipulation I needed, but only have a handful of users at this point, so I'd only want a subset of those functions.  Looking up attributes of the question were pretty common, though - such as text of question as seen by the user, answer text, and answer code.

Keep up the great work. I hope I get the opportunity to work with you all on this.


Thomas White, MD, MS, MA
Director, Bureau of Mental Health Informatics
    New York State Office of Mental Health
Associate Professor, Mental Health Informatics
    Columbia University