Information Retrieval: Data Structures and Algorithms

Information Retrieval: Data Structures and Algorithms

Language: English

Pages: 464

ISBN: 0134638379

Format: PDF / Kindle (mobi) / ePub


Information retrieval is a sub-field of computer science that deals with the automated storage and retrieval of documents. Providing the latest information retrieval techniques, this guide discusses Information Retrieval data structures and algorithms, including implementations in C. Aimed at software engineers building systems with book processing components, it provides a descriptive and evaluative explanation of storage and retrieval systems, file structures, term and query operations, document operations and hardware. Contains techniques for handling inverted files, signature files, and file organizations for optical disks. Discusses such operations as lexical analysis and stoplists, stemming algorithms, thesaurus construction, and relevance feedback and other query modification techniques. Provides information on Boolean operations, hashing algorithms, ranking algorithms and clustering algorithms. In addition to being of interest to software engineering professionals, this book will be useful to information science and library science professionals who are interested in text retrieval technology.

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

Database Systems Concepts

Operating Systems: A Spiral Approach

Clustering-Based Support for Software Architecture Restructuring (Software Engineering Research)

Essentials of Error-Control Coding

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

were flagged as unresolved in the final index and fixed in a later pass. With 48 characters per key it was possible to read 600,000 keys into a 32Mb memory. For a 600Mb text, this means that on average we are reading 1 key per kilobyte, so we can use sequential I/O. Each reading of the file needs between 30 and 45 minutes and for 120,000,000 index points it takes 200 passes, or approximately 150 hours. Thus, for building an index for the OED, this algorithm is not as effective as the large

{ /* Part 1: Return right away of there is no tree */ if ( NULL == tree ) return; /* Part 2: Deallocate the subtrees */ if ( NULL != tree->left ) DestroyTree ( tree->left ); if ( NULL != tree->right ) DestroyTree ( tree->right ); /* Part 3: Deallocate the root */ tree->left = tree->right = NULL; (void)free( (char *)tree ); } /* DestroyTree */ /*FN************************************************************************ GetState( machine, label, signature ) Returns: int -- state with the given

label Purpose: Search a machine and return the state with a given state label Plan: Part 1: Search the tree for the requested state Part 2: If not found, add the label to the tree file:///C|/E%20Drive%20Data/My%20Books/Algorithm/DrD...ooks_Algorithms_Collection2ed/books/book5/chap07.htm (22 of 36)7/3/2004 4:19:57 PM Information Retrieval: CHAPTER 7: LEXICAL ANALYSIS AND STOPLISTS Part 3: Return the state number Notes: This machine always returns a state with the given label because if the

/* The current state has nothing but a label, so */ /* the first order of business is to set up some */ /* of its other major fields */ machine->state_table[state].is_final = FALSE; machine->state_table[state].arc_offset = machine->num_arcs; machine->state_table[state].num_arcs = 0; /* Add arcs to the arc table for the current state */ /* based on the state's derived set. Also set the */ /* state's final flag if the empty string is found */ /* in the suffix list */ current_label =

character during input scan */ register int state; /* current state during DFA execution */ /* Part 1: Return NULL immediately if there is no input */ if ( EOF = = (ch = getc(stream)) ) return( NULL ); /* Part 2: Initialize the local variables */ outptr = output; /* Part 3: Main Loop: Put an unfiltered word into the output buffer */ do { /* scan past any leading delimiters */ while ( (EOF != ch ) &&

Download sample

Download