Monthly Archives: February 2013

Experiences of NVivo

When I did my PhD I had the chance to have training in NVivo, which is a qualitative data analysis software package provided by QSR International. Initially I had the training (I think in NVivo version 2) quite early on in the PhD – prior to data collection – and the theory was that if we got used to it before collecting data then adding data as we went along would be a piece of cake and we would be thoroughly versed in the software so be able to use it to its full potential. I could certainly see the logic of that, and with retrospect it may well have been that if I had started off with my literature review and research diary I could have perhaps ordered them within NVivo and then used it a bit more consistently throughout the PhD.

Of course real research life is not like that, and hindsight is a wonderful thing! My diary keeping prior to fieldwork was sporadic at best and didn’t really amount to anything much, and the literature review I undertook prior to fieldwork ended up being at least partly irrelevant once I had collected a bit of data and realised that entirely new themes were emerging which I hadn’t even considered; hence I had to completely redo it once I got back. Although this was a bit soul-destroying at the time, it was nonetheless quite a useful exercise because I was reading literature in light of my data, and reviewing the literature at the same time as analysing my data meant that I made so many more connections between them than if they had been done as entirely the linear, separate processes that PhD timetables (and by extension the advice of the NVivo trainers) often seem to assume. I ended up not using NVivo for either diary or literature purposes, but started the process once back from the field of importing my interview transcripts into NVivo (which by now had morphed into NVivo 7, I think) and coding them. I had also collected, over the course of my PhD, more media articles than I knew what to do with, so putting them into NVivo in order to create some sort of order was very useful.

Despite the training I fell into the classic ‘coding trap’ of micro-coding every last little detail, and I found that (for my interview transcripts in particular) this was spectacularly unhelpful as I was left with fragments of texts and no sense at all of any overarching narrative (this was though at least in part mitigated by the ability of the software to grab text a specified number of lines above and below the coded segment). I avoided this trap with the media articles by coding entire articles to just one code (rather than microcoding within that); that was more successful at getting an overall view of what was ‘out there’, but was not without its own problems in that because there were such large chunks of text within a single code, I often found when I tried to do a search that my PC would just crash. However, by using NVivo to identify the documents where I could find articles on particular topics and going to the hard copy just prior to the big search, I managed to avoid the constant crashing, generally found what I was looking for, and overall found that using NVivo was more helpful (from the coding and retrieving point of view) than having paper fragments of texts cut up, differently coloured and stuck together. I can’t say that I ever used NVivo to its full potential – a lot of the whizzy features were pretty much surplus to requirements – but as a tool for coding and retrieval it was fine for what I needed. I could pretty easily specify the location of my respondents and the type of source (blog, newspaper article etc) with a couple of clicks, and that was really helpful in my searching and querying of my data.

Now I am undertaking a new research project for this postdoc position, and still using NVivo. My university is using NVivo 9 now (the most recent version is NVivo 10 apparently), and it appears that a couple of versions down the line, there are more bells and whistles than ever. I have learnt my lessons from the micro-coding debacle of my PhD interviews, and this time am doing a lot of broad-brush coding which is preserving the narrative and enabling me to access the rich data that is emerging. One of the features that seems to have disappeared is that ability to widen out the search to lines above and below the coded segment, so this more broad-brush approach seems to be doing the same thing. For my baseline data I have been able to run reports on codes and access them fairly straightforwardly.

However, now that I have started the next phase of data collection, I now want to add a bit more information to my sources, so that I can differentiate them more easily. Up till now keeping it simple has been fine, as all I’ve really wanted up till now is “everything that every respondent says about X”. Now though I am undertaking a second round of interviews with the same people, and also interviewing a second category of respondents (professionals as well as the patients I have already interviewed), so I want to be able to differentiate between baseline and follow-up interviews, between treatment types, between patient and professional, and also specify gender and location of respondents (amongst other things). Having previously found that reasonably simple (as mentioned above, it was pretty easy to specify that my sources were from one country or the other, or that they were blog posts rather than newspaper articles, by in effect ‘tagging’ the source) I wasn’t expecting this to be a big issue. So I have to say that I am not happy with the amount of time I have spent trying to work out how to do this in NVivo 9. I first tried a few weeks ago, but got so confused by the help topics that I gave up. I have tried again today, and spent pretty much the whole afternoon trying to work it out, writing down everything I’m doing as I go along as I won’t remember it otherwise, and I still haven’t got there. It seems that in adding all the bells and whistles, the manufacturers have created something which is so counter-intuitive I am tearing my hair out. Being able to specify gender and location for respondents should not be difficult – previous versions seemed to be able to do it with a few clicks – but in this version it is so spectacularly complicated that I am still struggling. I am not being helped at all by the help topics – they tell me what to do up to a point, and describe what I should be seeing on the screen, but aren’t telling me why things are as they are. I am also struggling because something which should be so simple is now totally illogical. I appreciate that I am no techie, but I am not stupid and have managed with other software packages, so I know I’m not completely thick! If I want to specify that someone is a woman and is based in Edinburgh, it seems to me that the most logical thing to do would be to ‘tag’ the interview transcript with those attributes. However, in this version I need to create a new node (code) unique to that person (which then sits alongside all my thematic codes), then classify that new node with the classification ‘gender’, and from there I should be able to then specify male or female. However, following the attributes help page I can only get as far as classifying the node – adding separate attributes (male, female) within this is not obvious or intuitive at all. Once I get that far I learn that attribute values are either ‘unassigned’ or ‘not applicable’, but nowhere does it tell me a) the difference between these things, b) what they signify, or c) how to add the specific values ‘male’ and ‘female’. I’m amazed I have any hair left, because I am pulling it out at the moment.

I am persevering with this a. because NVivo is the software the university uses and supports, and b. because once I manage this simple thing I know my coding and retrieval is going to be relatively smooth and not complicated. But I really do want to register my frustration that in making their software all-singing and all-dancing QSR seem to me to have thrown out the simple basic functionality that is always going to be the foundation for any of the more complex searching and analysis. Whilst not being the world’s biggest technophile, I am comfortable with and relatively adept at using technology to assist my work, and I shouldn’t have to spend entire afternoons trying to figure out how to tag a respondent with basic demographic data. Although I mainly want to use NVivo for coding and retrieval, once it’s set up I like that I can, if I want, be more creative with it and do quite complex queries. However, in order to be able to do that I need to get the basics right, and so it seems obvious to me that the basics need to be simple, intuitive and quick.

To give credit where it is due, when I tweeted the NVivo support people they did respond quickly (within the boundaries of different timezones) – but ultimately directed me to the NVivo help pages which as outlined above make an awful lot of assumptions about knowledge and give no illustrative examples. It is still not clear to me why I have to assign demographic classifications to nodes (codes) rather than sources (transcripts), it is still not clear to me the difference between unassigned or N/A attribute values, and most of all it is still not clear to me why something so simple has to be so frustratingly complicated. I really hope NVivo do something about this for the next version, because I have wasted so much time on this today that I am really grumpy and not at all inclined to recommend their software as a research tool. If anybody has any suggestions as to alternatives (especially if they are open source) I’d be all ears.