Friday, May 4, 2007

Important differences from IRIS

Reading about the IRIS system (here's the paper ) made by the SRI, I feel humbled. These people really have done tons of work on the project, and are miles ahead of my ideas which seem very primitive in comparison. I am in awe of the work being done on IRIS, but here are a few points on which the SSD will be different:
- Incorporating PIM applications vs Plug-ins. - IRIS incorporates full applications within itself because the aim is to create the universal ontology-aided information client. I like the approach of Gnowsis better. I think it is better for Semantic Desktop to use plug-ins with different applications, so that the user does not have to be tied to specific information access applications. By plug-ins, I mean plug-ins inside the client application (Say Microsoft Outlook VBA programs or Mozilla Addons) and plug-ins for the semantic desktop to receive data from the applications. Surely this is hard and we would like to have control over information clients to provide better usability, but usability is something that differs from user to user. If I'm used to using Firefox as my browser and Gaim as my IM client, I am bound to resist changing those applications because I'm a human.
- Personal Semantic Desktop vs. Professional Semantic Desktop - I would like to aim at creating a Semantic Desktop that is personal and knows about everything including the user's music, photos, and blogs. I think the wall between personal and professional is very thin. If a semantic desktop system is aimed only at work-related projects and work-related files, then we're giving the user one extra problem of classification to solve: do I put this file in the semantic desktop or not? I think there shouldn't be an option and enough trust should be developed with the user that every file in every partition accessible to the user should be indexed and known to the Semantic Desktop.
I want a Semantic Desktop that can discover ontologies in my music and if I've e-mailed my hypothetical girlfriend an mp3 of the latest Nine Inch Nails track, the Semantic Desktop should show a connection there. Again, this might be going too far in the blog-everything-in-your-life direction, but if we have the string "Nine Inch Nails" common between an e-mail and my song database, why shouldn't the Semantic Desktop see some connection?

- 'Learning in the wild' vs 'Learning in the wild as a Community' - It certainly sounds like an interesting idea to build a system that learns as it gets to see more and more data from the user. Such a system will hypothetically make better guesses about the user's ontological data as learning increases.
I think there are lots of similarities between user data across different machines. If you look at the "My Documents" folder for two different College Deans, say, you'll see similar documents put in folders that are similar to each other. Since the Simple Semantic Desktop will index application files, too, such similarities between application files will be huge and thus it makes sense to me that we should have a central networked repository of learning data instead of a limited one.
Instead of having many installations of the software try to learn individually, we can have them learn together and thus accelerate learning by a greater amount. Also, learning can in this way be guided by the people developing the Simple Semantic Desktop.
So, in essence, the Semantic Desktop grows "smarter" the more people are using it and the more its developers research new methodologies, by using updates much like those of an antivirus software.
There is the huge lurking issue of privacy here, and I'm a little bit reluctant to open that can of worms. I think if we have security as a primary concern, and if we make sure that statistics collected and sent to the central repository are guaranteed not to reveal any useful information about the user, then we can actually have a collaboratively learning Semantic Desktop.

No comments: