Lifelogging has recently found its way into public consciousness. More and more devices, sensors and applications become available for the end-user. In this blog post I want to discuss what implications, challenges and opportunities arise for research in the area of information retrieval (IR). Lifelogging is the process of automatically capturing and storing every possible piece of information about a person’s life. A great definition comes from Dodge & Kitchin in their 2007 paper with the title `Outlines of a world coming into existence’: pervasive computing and the ethics of forgetting. They define lifelogging as “…a form of pervasive computing consisting of a unified digital record of the totality of an individual’s experiences, captured multimodally through digital sensors and stored permanently as a personal multimedia archive.” Furthermore, they define the goal of lifelogging to have “…a record of the past that includes every action, every event, every conversation, every material expression of an individual’s life; all events will be accessible at a future date because a life-log will be a searchable and recallable archive”.
The amount of collected data differs depending on the used sensors. An Autographer camera is making up to 1 million photos every day, which sums up to 480GB per year per person. This is only one example for data collection. Data is collected from a whole lot of different sensors. Activity tracking for instance is possible using Fitibit One , Withings Pulse or various other tracking devices. There are numerous Apps like Moves or the Sony LifeLog app tracking what you did and where you’ve been. The wealth of data is overwhelming. Suddenly, we know more about a single user than ever before. Instead of only getting information such as ratings for items, we know the context a user made a decision in. This data allows us to approach new use cases allowing applications to be more useful to a user then previously possible. While a lot of current discussions center around what is technically possible, we want to examine feasible use cases for lifelogging. In their book Total Recall, Bell and Gemmell, identify four main areas for lifelogging use cases:
- Everyday life (social)
Correspondingly, Sellen and Whittaker describe five use cases where lifelogs can be beneficial, the so-called 5 Rs:
- Recollecting: Recalling a specific moment in live (episodic memory).
- Reminiscing: Recalling a specific moment for emotional or sentimental reason, this can be seen as a special case of Recollecting.
- Retrieving: Retrieve a previously encountered digital item or information, such as documents, email, or Web pages.
- Reflecting: A more abstract representation of personal data to facilitate reflection on Reviewing of, past experience.
- Remembering intentions: Remember to do, e.g. remembering to show up for appointments.
These use cases come along with some challenges for IR, which are described in the book LifeLogging: Personal Big Data by Gurrin et. al:
- Data gathering: Data collection is time consuming and requires different sensors and manual effort. Also, the data is private and thus only data from the user itself can be used.
- Data analysis: Understanding data from heterogeneous sources, e.g., multimedia, text and sensors and extract meaning out of it (semantic extraction / semantic organization).
- Search & retrieval: The heterogeneous data makes searching for information more complicated. We have not well understood retrieval requirements and use-cases coming with lifelogging. Instead of using text queries to find documents, we can now find for instance events based on context information we remember.
- Evaluation: Datasets seem be a problem. As the data is private by nature, public datasets will be hard to get.
- Summarization and data mining: Pre-step to a good and helpful presentation allowing the user to take advantage of the collected data. Supporting quantified-self style analysis and narrative/story-telling presentation.
- User interaction and presentation: Lifelogging will produce a big amount of data. We need to define likely usage scenarios, potentially omnipresent and even how to support query formulation for many of the use-cases. This is currently poorly understood.
All of these different use cases and challenges will be problems for some time to come. A lot of effort is already put into some of these challenges for certain use cases but we are just at the beggining and it is exciting what will come. Admittedly, one major real world challenge is filtering out noisy or meaningless data. I just started using an Autographer camera and most pictures so far (the ones not blurred) showing me driving, drinking coffee and sitting in front of a computer…
What needs to be remembered is that the data belongs to the user. This implies that services in the context of lifelogging should leave the user in full control over the data. The user must be able to decide what the data is used for. The full data must be accessible, e.g., by an API. And of course, the user must be allowed to delete the data. Therefore, I don’t think free services will be that way to go here. As the data can’t be the business, the service itself must be the business and thus users should pay for it.