Kamat's Potpourri: Making of Kamat Research Database

Making of Kamat Research Database

The story of how we scavenged unusual sources of contents to build a Reference Database

by Vikas Kamat

First published in "Turning Pages- Reflections in Info-times" Commemorative Volume, Informatics, 2005

Background

The idea for Kamat Research Database came to me when I was doing some consulting work for a vendor of scholarly research and reference databases. By this time, I was maintaining a popular (not peer-reviewed or scholarly, but contributions came only from highly reliable sources, my parents) website on India, whose readership exceeded three million articles a month at that time (year 2003). What was surprising to me was that the works of my parents --two ordinary citizens of India (although they were scholars, they had no academic attachments), were read and referred by far more people than the thousands of professors in hundreds of universities combined together!

Of course, this was happening because, their work was available on the web, free of cost, accessible to anyone, and indexed by Google.

If you ask any scholar or researcher (in any field) what is the most satisfying part of their work, they will tell you that other than quest for knowledge, it is how (both quality and numbers) their peers review, honor and cite their work is what's most important to them. So I thought that if I could extend the technology and tools I had developed to publish my parents' research, to other scholars, that would be a great service to India.

Scavenging for Content

After my father passed away, one of the admirers wrote in a tribute "Kamat has distributed the scholarship to the common man; the scholarship that once was a prisoner in our campuses" -- I have kept this spirit while compiling the database and have included semi-scholarly, yet authoritative contents (common man is my targeted patron). For example, we have gone through each and every issue of National Geographic since 1915 and have abstracted every article dealing with India. Similarly, we have not hesitated to include articles from Illustrated Weekly of India, even though nobody can say that constitutes scholarly content.

We found un-indexed content in all kinds of unusual places -- at the Office of the Post Master General, at the All India Radio, and at Offices of India's greatest institution -- The Indian National Congress. These institutions especially included the biographies of common and uncommon men and women of India that could otherwise not be found elsewhere.

It is very common in some fields of scholarship in India to publish what are known as "Felicitation Volumes" to honor a colleague, and these sometimes include very well researched contributions. We found that these Felicitation Volumes, like Conference Proceedings, are a valuable, yet ignored source of premier content.

Information Retrieval and Dissemination

Information retrieval (IR) is an application of computer science where a software program makes sense of a forest of content to determine what is of interest and what is noise. Google News is an excellent example of Information Retrieval technology. We re-commissioned our crawler (appropriately named Narad) to recognize scholarly content that was useful to us and unleashed him on some known sources. Much of the abstracts electronically available were thus compiled -- they gave the database much needed volume, in terms of number of abstracts, my mimicking how a researcher would compile his/her source of references.

Content Constellations

There are really two types of research databases -- discovery databases (when you do not know what kind of research is available in a domain) and access databases (when you are looking for a specific article), and since ours was a former, we had to provide numerous ways of exposing the contents to browsers (also known as browsable database, in contrast with a searchable database) and we developed what we called content constellations where the related contents are tied together by a software algorithm. This process was important for our business model to work as well, due to the way search engines (especially Google) determine strength (called authority) of a content item, and we could potentially accept advertisements based on the contents of a page.

Current Status (July 2005)

While not all, much of the contents of Kamat Research Database are available online at the time of this writing, free of cost at our website. Links to full-text articles, many primary sources of content are available only to paid subscribers.

We are very proud to have exposed a lot of contents that otherwise would not be indexed by search engines like Google or Yahoo.