Comentarios y anexos: Tema 1. Características de la búsqueda de información en la WWW Sobre estructura de la WWW: - Kleinberg, JM. Hubs, authorities, and communities, ACM computing surveys 1999. http://www.cs.brown.edu/memex/ACMCSHT/10/10.html - A Borodin, GO Roberts, JS Rosenthal, P. Tsaparas. Finding authorities and hubs from link structures on the World Wide Web. Proc. WWW 2001. http://www10.org/cdrom/papers/314/ Sobre tipología de búsquedas web: - Rose, D. y Levinson, D. Understanding User Goals in Web Search. WWW 2004. http://wwwconf.ecs.soton.ac.uk/archive/00000537/01/p13-rose.pdf Sobre navegación versus consulta: - Marti A. Hearst. Next Generation Web Search: Setting Our Sites In IEEE Data Engineering Bulletin, 2002. http://www.sims.berkeley.edu/hearst/papers/data-engineering - A. Peñas, F. Verdejo, J. Gonzalo, 2002. Terminology Retrieval: towards a synergy between thesaurus and free text searching. Advances in Artificial Intelligence - IBERAMIA 2002, LNAI 2527. http://nlp.uned.es/pergamus/pubs/iberamia2002.pdf
Tema 2. Arquitectura básica de un motor de búsqueda. Sobre crawling: - J Cho, H Garcia-Molina, L Page. Efficient Crawling Through URL Ordering, WWW 1998. - Allan Heydon and Marc Najork. Mercator: A Scalable, Extensible Web Crawler. In Proceedings of World Wide Web Conference, 1999, pages 219-229. Sobre soporte hardware: - L. A. Barroso, J. Dean, U. Hoelzle. Web search for a planet: the Google cluster architecture. IEEE 2003.
Tema 3. Motores de búsqueda pre-Google: recuperación basada en contenidos. - D Hiemstra. Using Language Models for Information Retrieval. CTIT Ph.D. Thesis, 2001. - G Salton, A Wong, CS Yang. A Vector Space Model for Automatic Indexing. Comm. ACM, 1975. - N Fuhr. Probabilistic Models in Information Retrieval. The Computer Journal, 1992.
Tema 4. Motores de búsqueda actuales (generalistas): recuperación basada en autoridad. Referencias: - M Hollander. Google's PageRank Algorithm to Better Internet Searching. TR UMN. - Brin, S. y Page, L. The Anatomy of a Large-Scale Hypertextual Web Search Engine. WWW 1998. - CHQ Ding, X He, P Husbands, H Zha, HD Simon. PageRank, HITS and a unified framework for link analysis. SIGIR 2002. TH Haveliwala. Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search. IEEE T. on Knowledge and data engineering, 2003.
Tema 5. Temas avanzados. - Guha, R. y Garg, A. Disambiguating People in Search. Proc. WWW 2004. - S Lawrence, NJ Princeton. Context in Web Search, IEEE data engineering bulletin, 2000. J Sivic, A Zisserman. Video google: A text retrieval approach to object matching in videos, ICCV 2003. - SK Bhavnani, CK Bichakjian, TM Johnson, RJ Little. Strategy Hubs: Next-Generation Domain Portals with Search Procedures. Proc. ACM Conference on Human Factors in Computing Systems, 2003, ACM Press NY, USA. - T Berners-Lee, J Hendler, O Lassila. The semantic Web. Scientific American, 2001. - J Heflin, J Hendler. A Portrait of the Semantic Web in Action. IEEE Intelligent Systems, 2001. - S Eissen, B Stein. Analysis of Clustering Algorithms for Web-Based Search. Springer-Verlag, 2002. - J. Cigarrán, A. Peñas, J. Gonzalo, F. Verdejo, 2005. Automatic selection of noun phrases as document descriptors in an FCA-based Information Retrieval system. ICFCA 2005. Springer LNCS 3403. Search Engines: Technology, Society, and Business. Materiales online del curso: http://www.sims.berkeley.edu/courses/is141/f05/schedule.html