pavka

Web site for pavka (the company) and Pavel Kalinov (the legend).

Last update: Friday, September 3rd 2010
Subscribe to article feed
  • Home
  • Research
  • Company
  • About

Let’s Trust Users – It Is Their Search

Topics: Research    Tags: DMOZ, exploration engine, Intelligent Web Exploration, Rocchio relevance feedback, text classification, Yahoo! Directory

The current search engine model considers users not trustworthy, so no tools are provided to let them specify what they are looking for or in what context, which severely limits what they are able to achieve. Instead, search engines try to guess that, which is currently done using “implicit feedback”.

In this paper we propose a “web exploration engine” – a model where users can use the search engine as their tool and explicitly specify the context of their search. Information about the web has been pre-classified in a large number of categories; users can explore this hierarchy by providing relevance feedback or search within a particular category. Search is truly “local” in the sense that keyword relevance is not global, but specific to that category. In contrast to the existing search engines, users can explore the web without any keywords, guiding the exploration engine with relevance feedback alone.

This article was accepted as a short paper at the Web Intelligence 2010 conference in Toronto, Canada (31 August – 4 September 2010).

Download short PDF version: Let’s Trust Users – It Is Their Search (4 pages, as published).

Download full PDF version: Let’s Trust Users – It Is Their Search (8 pages, as originally submitted).

Building a Dynamic Classifier for Large Text Data Collections

Topics: Research    Tags: DMOZ, Intelligent Web Exploration, Multinomial Naive Bayes, SPDA, Stochastic Prior Distribution Adjustment, text classification

Due to the lack of in-built tools to navigate the web, people have to use external solutions to find information. The most popular of these are search engines and web directories. Search engines allow users to locate specific information about a particular topic, whereas web directories facilitate exploration over a wider topic. In the recent past, statistical machine learning methods have been successfully exploited in search engines. Web directories remained in their primitive state, which resulted in their decline. Exploration however is a task which answers a different information need of the user and should not be neglected. Web directories should provide a user experience of the same quality as search engines. Their development by machine learning methods however is hindered by the noisy nature of the web, which makes text classifiers unreliable when applied to web data. In this paper we propose Stochastic Prior Distribution Adjustment (SPDA) – a variation of the Multinomial Naive Bayes (MNB) classifier which makes it more suitable to classify real-world data. By stochastically adjusting class prior distributions we achieve a better overall success rate, but more importantly we also significantly improve error distribution across classes, making the classifier equally reliable for all classes and therefore more usable.

This article was published at the Twenty-First Australasian Database Conference (ADC2010), Brisbane, Australia, January 2010, part of the Australasian Computer Science Week 2010.

Download full PDF version: Building a Dynamic Classifier for Large Text Data Collections.

Welcome to my site!

Topics: Other stuff    Tags: welcome

Welcome to pavka.com.au!

This is both the personal site of Pavel Kalinov, and the company site for his web development company pavka, registered in Australia.

You will find personal information here, as well as some info on projects of the company.

  • Article categories

    • Other stuff (1)
    • Research (2)
  • Admin

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org

Advertisement

125x125 Hosting & Servers GoDaddy.com Yahoo! Search Marketing GoToMeeting - Online Meetings Made Easy Typepad

Advertise here

  • My other sites

    • ide.li - Web portal for the Bulgarian diaspora.
  • Customers

    • Transdat - Global earthmoving equipment database.
  • Bulgarian sites

    • Capital - Bulgarian weekly newspaper.
    • Dnevnik - Bulgarian daily newspaper.
    • ide.li - Web portal for the Bulgarian diaspora.
    • Themes
    • WordPress Blog
    • WordPress Planet
© pavka 2002-2010 | ABN 35 202 429 323
Powered by WordPress | Theme by Dezzain Studio

RSS