Re: [limesurvey-developers] LS2.0 Anonymous Surveys
The leading Open Source survey tool
Brought to you by:
c_schmitz
From: Macasek, M. A. <mma...@mi...> - 2008-01-23 14:26:16
|
Thibault (and everyone else!!!), Thank you for the response and details! After reviewing what you wrote I believe in some cases we are talking about the same thing and in other cases I may be introducing a new level of 'anonymous survey'. It seems, based on your responses, the term 'anonymous' has way too many meanings so from this point forward I will try to avoid the word anonymous (or at least limit it's use) in an attempt to remove some of the ambiguity.=20 Let me first start by presenting a few scenarios that would fall outside the 'typical' user authenticated, user data tied directly to responses scenario. Then I will try to describe a set of rules that will describe how the system should handle these scenarios. I am not implying what I am writing is how it must or even should work, these are just my thoughts. Let me apologize up front as I have a feeling this email will be quite long! :) SCENARIO 1 ---------- 1a. A user stumbles across a website with a generic link to a survey on the page. This link is the same for every user, when the user clicks on the link they are taken directly to the survey. 1b. A user stumbles across a website with an embedded survey (or poll). The user directly answers and submits the survey (or poll) from the widget embedded in the site. RULES FOR SCENARIO 1 -------------------- I believe that 1a and 1b, while slightly different in how the survey is presented, have ultimately the same behavior from the LS2.0 system point of view with one small difference.=20 In either case there is no need for a token and the assumption is that the user does not have an account (and would not matter if they did). In this case the LS2.0 system would have to generate a unique user account (essentially just a username with the record marked as anonymous) so the responses can be tied to that single user. No information will be gathered on the user outside of what the survey asks. The difference I mentioned in the first sentence is in scenario 1a the unique user account would be generated once the user clicked on the link as that actions implies the users intent to take the survey. Conversely in 1b the unique user account would be generated after the user submits the survey since the survey widget is embedded in the page we cannot assume the user will take the survey until it is submitted. I think from a data model point of view having the token mechanism outside the user table adds a level of complexity that is not needed. User accounts are tokens which can be marked anonymous (as in the above scenario). LS2.0 terminology can still refer to the generated username as a token but creating the extra level of indirection does not gain much but add unnecessary complexity. SCENARIO 2 ---------- 2. Let assume Mr. Smith has a class of students, all Mr. Smith's students already have accounts in a LS2.0 system as they have already used the tool for a previous survey. These student's accounts are all members of the "Mr. Smith's Class" group. It is now the end of the year and Mr. Smith wants to get feedback from the students about the class but he feels if he knows the identities of the users they will not be honest. There are other users in the system but only members of "Mr. Smith's Class" may take the survey. While he does not want to know how any of the students responded he does want to know who did not take the survey so he can continue to ask the student(s) who has/have not completed the survey to do so, perhaps via an automated email. In this scenario the user MUST authenticate with the system so it can validate they are in the "Mr. Smith's Class" group. The results must be stored in such a way that the average user (or admin) cannot identify which set of responses belong to which user but such that the system can identify which users have not take the survey. The survey will be assigned to the "Mr. Smith's Class" group. The user also needs to complete the survey so they can start taking it, leave, and return to where the left off. RULES FOR SCENARIO 2 -------------------- In this case predefining a token for a user is not necessary as the user already has a token: his or her account. Once the user is authenticate, identified to be in the "Mr. Smith's Group", and entered the satisfaction survey the some unique identifier that to the average human (user or admin) the true user's identify cannot be derived but can be by the system. Once quick approach to this would be to concatenate the username and user record db id and take the MD5 hash of that to produce the unique obfuscated identifier. This would allow the system to reproduce the unique identifier but not allow the average human to make the connection. This would also add a level of 'security' in that removing the response set (or user) from the current database breaks even systems ability to decode the unique identifier. This unique identifier would then be used to tie the response set together via a new user record that would be marked in such a way as to identify it as a 'pseudo-anonymous' record (or a record that is an obfuscation of a real users record). Yes in this case we are creating essentially a duplicate user record that the average human cannot link to it's 'real' user record but the system would only ever need to create this 'pseudo-anonymous' version of the account once and it can be used for every 'pseudo-anonymous' survey. Again as with the previous scenario we can avoid using a separate db entity (the token) which would reduce the complexity of the data model and the code that attempts to use the model. SCENARIO 3 ---------- 3a. A LS2.0 admin has a list of email addresses. The admin wants to send out an email to all these email address with a link to a survey. The results of each survey taken should be tied to the email address of the user who took the survey. 3b. A LS2.0 admin has a list of email addresses. The admin wants to send out an email to all these email address with a link to a survey. The results of each survey taken should NOT be tied to the email address of the user who took the survey. The admin does want to identify which email address have not taken the survey so a second request can be sent out. 3c. A LS2.0 admin has a list of email addresses. The admin wants to send out an email to all these email address with a link to a survey. The results of each survey taken should NOT be tied to the email address of the user who took the survey. The admin does NOT want to identify which email address have not taken the survey. RULES FOR SCENARIO 3 -------------------- Scenario's 3a, 3b, and 3c are similar but have slight differences. 3a is a standard authentication survey, 3b is the 'pseudo-anonymous' survey from SCENARIO 2, and 3c is the 'anonymous' survey from SCENARIO 1.=20 In all the variations of SCENARIO 3 the admin has a list of email address for users who are not already in the LS2.0 system. In this case the LS2.0 system would allow for a mechanism to enter all the email addresses, 'assign' the survey and define the assignment type, and blast out the emails.=20 In 3a the system will create a user record with the user name set to the email address and generate a random password. The system would then send out an email with a link to the survey along with the information needed authenticate (username and password).=20 In 3b the system will create a user record with the user name set to the email address and generate a random password. The system would check that the survey is marked as anonymous, generate a group for the set of users, make the assignment, and then send out an email with a link to the survey along with the information needed authenticate (username and password). Once the user authenticates the system will identify this as a 'pseudo-anonymous' survey from SCENARIO 2 and proceed as defined in RULES FOR SCENARIO 2. In 3c the system will not create any user accounts and will not store the email addresses of the participants (it will only use them to send the email). The system verifies that the survey is marked anonymous and the email will contain a link directly to the survey. Once the user clicks the link the system will proceed as defined in RULES FOR SCENARIO 1. **A quick reminder a survey can only be anonymous or not. END SCENARIOS!!!!! ------------------ So I believe the above scenarios cover the vast majority of the ways a survey can be taken outside of the typical authenticated user with the results tied directly to the user account. I will read over this email a few times before I send it out and hopefully I will have smoothed out any of the ambiguities.=20 As always questions, comment, and suggestions are always welcome and encouraged!!!!!! Thanks, Michael -----Original Message----- From: lim...@li... [mailto:lim...@li...] On Behalf Of Thibault Le Meur Sent: Thursday, January 17, 2008 10:55 AM To: lim...@li... Subject: Re: [limesurvey-developers] LS2.0 Anonymous Surveys Hi Michael, I'm reading my post and I think I need to explain a little more ;-) I=20 wrote it while doing some other things so that it is not really clear=20 sometimes ;-) First of all, please note that if I'm talking about LS1.x it is not to=20 say that this was a better implementation (not at all ;-) ), but only to=20 compare the two implementations and to let you understand the vocabulary=20 I'm using. > We had a lot of discussions on LS1.x about what anonylous means. We had=20 > to be very careful in terminology so that anyone can understand. > =20 In fact in the past several different interpretations were given to the word 'anonymous' along LS1 code so that it resulted in contradictory=20 patches. The word anonymous was used some times to refer to survey=20 without invitation codes, and sometimes to surveys ensuring the response=20 privacy. > I'm going to describe the terms and implementation in LS1.x > > As far as LS1.x is concerned we have agreed on the following terminology: > * "anonymous answers": answers aren't linked to the participants' data > * "public access survey" versus "access-controlled" surveys. The former=20 > can be accessed without any authentication, the later can only be=20 > accessed after having provided an invitation code (called 'token' in=20 > LS1.x). Humm I think I've written said the opposite of what I meant here ;-) Of course I meant * public access surveys: no authentication * controlled access surveys: a kind of authentication required > * 'public access surveys' use always 'anonymous answers' because we=20 > don't know the user details > > * 'controlled access surveys' can use 'non anonymous answers' (answers=20 > are bound to the token invitation code, which in turn is linked to the=20 > user's data > - the token table contains the date of submission > - the response table may contain the date of submission if required > Or > * 'controlled access surveys' can use 'anonymous answers' (answers are=20 > not bound to the token invitation code), the token is still linked to > the user's data. Special care has be taken here to ensure that it will=20 > be difficult to map the answers to the token: > - the token code is not in the answers' table > - the token status is recorded in the token table to know who has=20 > replied and who hasn't replied yet (Y/N) but no date since this date=20 > could be correlated with the record id in the response table. > - the answers table may contain the submission date > > =20 >> This means outside of manually digging through the DB no one will be >> able to identify which set of responses belongs to which user. >> =20 > > Well, it depends how far you need to dig into the DB. If we are to show=20 > an 'anonymous notice' ensure data privacy to the user, then this=20 > shouldn't be possible at all even by digging the DB. > =20 I can't even understand my last sentence here :-( was I drunk ? The fact is that in LS1.x we display a notice to the participant saying that his answers will remain anonymous. If we want to keep this notice, we must ensure that data privacy is really achieved. So if there is a an direct or indirect link between the answers table=20 and the user's data, we musn't consider the survey anonymous anymore:=20 even if the user interface doesn't make it possible to display this binding. What do you think ? Regards, Thibault ----------------------------------------------------------------------- -- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ limesurvey-developers mailing list lim...@li... https://lists.sourceforge.net/lists/listinfo/limesurvey-developers |