From: Roland, R. M <rmr...@in...> - 2003-06-11 16:26:18
|
Adam, First off, as much as I respect Richard, I'm not him. :) But I = understand, those 'R' names just start to run together. Secondly, that = search function I sent you for pagemaster should only use 2 queries. = However, because I tried to avoid changing anything other than that = function, I had to do alot of array jumping and searching to determine = what page a section belonged to. It's not pretty, but adding that field = to the sections table would greatly simplify the process. =20 Actually, I wrote this post to the mailing list a good three weeks ago, = but it seemed it took awhile to get moderated to the list. Not sure how = to get around this though.=20 Thanks, Ryan Roland Application Developer Information Technology Department Division of Recreational Sports Indiana University rmroland@in... Office: 812.855.9617 Cell: 812.320.0032 -----Original Message----- From: adam@tu... Sent: Wednesday, June 11, 2003 08 31 To: php...@li... Subject: Re: [Phpwebsite-developers] Refactor of Pagemaster Search Richard, You've got me looking closer at the search function in pagemaster now. = :) I think I will add a page_id column to the section table so the section will know it's parent. Then it should be possible to pull all search results using only 2 queries as opposed to N+1 queries. I will make = sure there is an update in the update.php file for pagemaster and release it with a patch release we have planned for sometime this week. I hope = this will help you out as well as all our users who rely on heavy searching = of pages. Adam > I was looking at the search module, (specifically with regard to the > Pagemaster module) and I noticed that the search performed in = Pagemaster > loops through all pages and sections (with a query PER section). That > means if you have N sections in your entire site, that the search = performs > exactly N+1 individual queries to the database. > > I'm working on a possible replacement for the current pagemaster = search > function, and the only stumbling block is that I get sections, but = can't > easily determine their parent pages from them. I'm sure this = difficulty > is why the current scheme was used, but I'm trying to get to where = there > are AT MOST N+2 queries issued to the DB (and that's only if every = single > section has a match to the search criteria). > > Currently I'm looking at trying to use the 'section_order' field in = the > pages table to determine if a given section is in a page, so I ask if > anyone can tell me now if that is a dead end or not. > > Any recommendations would be appreciated. > > > Thanks, > > > Ryan Roland > > Application Developer > Information Technology Department > Division of Recreational Sports > Indiana University > > rmroland@ind... > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of TotalView, The = best > thread debugger on the planet. Designed with thread debugging features > you've never dreamed of, try TotalView 6 free at www.etnus.com. > _______________________________________________ > Phpwebsite-developers mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers > --=20 Adam Morton Developer - Electronic Student Services http://phpwebsite.appstate.edu Founder - Appalachian Linux Users Group http://alug.appstate.edu ------------------------------------------------------- This SF.net email is sponsored by: Etnus, makers of TotalView, The best thread debugger on the planet. Designed with thread debugging features you've never dreamed of, try TotalView 6 free at www.etnus.com. _______________________________________________ Phpwebsite-developers mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers |
From: Roland, R. M <rmr...@in...> - 2003-06-25 18:03:25
|
From Adam's database change, I went through and rewrote the Pagemaster = search function (which I have included below). Search is a large aspect = of what we're doing and I tried to 'googleize' the search results a = little (it's up to you whether this is a good thing or not) to make it = more useful to our users. My version is also streamlined to just two = queries. It also sorts by number of matches found in the content of the = page (ie it strips out all html except META). It also has variables for = maximum number of results to display per page and number of characters = to display on either side of a match. I also tried to be as = non-invasive as possible, so I didn't make any change requirements to = the database and didn't use any additional functions. The only external = change is that this function requires the array of search words to be = passed (ie in the 'mod_search_register' table, the 'search_cols' field = for the Pagemaster record must be blank). =20 If some people can test it out and see what they think, I welcome any = feedback. Thanks, Ryan Roland Application Developer Information Technology Department Division of Recreational Sports Indiana University rmr...@in... Office: 812.855.9617 Cell: 812.320.0032 /** * Function used by search module to search pages * * @author Ryan Roland <rmr...@in...> * @param array Array of search words (strings) * @return array Array of strings of results of search * @access public */ function search($search_array) { // initialization variables =20 // characters on either side of a match to display=20 $chars_surrounding_match =3D 35; =20 // number of results to display per page $max_results_per_page =3D 5; =20 // create where clause $temp_sql =3D ""; $cols_array =3D array("title","text"); for($i=3D0; $i<count($cols_array); $i++) { for($j=3D0; $j<count($search_array); $j++) { $temp_sql .=3D $cols_array[$i] ." LIKE '%" . $search_array[$j] . = "%' "; if($j<count($search_array)) { $temp_sql .=3D "OR "; } } } $where_clause =3D "WHERE (" . substr($temp_sql, 0, -3) . ")"; =20 =20 $resultArray =3D array(); // get all sections with matching string occurences $sql =3D "SELECT * FROM " . $GLOBALS["core"]->tbl_prefix . = "mod_pagemaster_sections $where_clause"; $result =3D $GLOBALS["core"]->getAll($sql); if(!DB::isError($result) && is_array($result) && sizeof($result) > = 0) { $pages =3D array(); $page_count =3D array(); =20 =20 foreach($result as $row) { $section_title =3D strip_tags($row["title"],"<META>"); $section_text =3D strip_tags($row["text"],"<META>"); =20 $pages[$row["page_id"]][$row["id"]]["title"] =3D = $section_title; $pages[$row["page_id"]][$row["id"]]["text"] =3D $section_text; // init number of matches to 0 $section_match_count =3D 0; // loop through the words of the search criteria // counting number of matches found in the section title and = text=20 // this will elminate sections returned by the DB due to matches = in HTML foreach ($search_array as $search_word) { $section_match_count +=3D = substr_count(strtolower($section_title),strtolower($search_word)); $section_match_count +=3D = substr_count(strtolower($section_text),strtolower($search_word)); } =20 if(isset($page_count[$row["page_id"]]))=20 { $page_count[$row["page_id"]] =3D = $page_count[$row["page_id"]] + $section_match_count; } else { $page_count[$row["page_id"]] =3D $section_match_count; } } =20 // sort array ordering by match count highest to lowest arsort($page_count,SORT_NUMERIC); $page_ids =3D implode(" OR ", $page_count); $sql =3D "SELECT id,title,section_order FROM " . = $GLOBALS["core"]->tbl_prefix . "mod_pagemaster_pages WHERE $page_ids"; $result =3D $GLOBALS["core"]->getAll($sql); $ordered_page_results =3D $page_count; if(!DB::isError($result) && is_array($result) && sizeof($result) > = 0) { =20 // loop through all pages foreach($result as $row) { $page_results =3D array(); =20 $page_id =3D $row["id"]; $page_title =3D $row["title"]; $section_order =3D = unserialize($row["section_order"]); $needed_results =3D $max_results_per_page; $pages[$page_id]["page_title"] =3D $page_title; =20 // loop through all sections finding occurrences and parsing = text for ($i=3D0;$i<=3Dsizeof($section_order);$i++) { if ($needed_results > 0) { =20 $section_id =3D $section_order[$i]; =20 $title =3D $pages[$page_id][$section_id]["title"]; $text =3D $pages[$page_id][$section_id]["text"]; =20 foreach ($search_array as $search_word) { // Find occurrences of the search_word(s) in the = sections' title $prev_pos =3D 0; // while there exists another result and we haven't = found all needed results while (($next_pos =3D = strpos(strtolower($title),strtolower($search_word),$prev_pos)) && = $needed_results > 0) { // # of chars from the occurence to the end of the = string $chars_to_end =3D strlen($title) - $next_pos; // add bold tags around the occurence in the text $bolded =3D = substr_replace(substr_replace($title,"<b>",$next_pos,0),"</b>",($next_pos= + 3) + strlen($search_word),0); =20 // set prev_pos to the next char after the current = position $prev_pos =3D $next_pos + 1; // get up to 'chars_surrounding_match' on either side = of the occurrence,=20 // or to the begin/end of the string, whichever comes = first $match_result =3D "..." . substr($bolded,$next_pos - = min($next_pos,$chars_surrounding_match), $chars_surrounding_match * 2) . = "..."; =20 array_push($page_results,$match_result); $needed_results--; }// end while // Find occurrences of the search_word(s) in the = sections' text =20 $prev_pos =3D 0; // while there exists another result and we haven't = found all needed results while (($next_pos =3D = strpos(strtolower($text),strtolower($search_word),$prev_pos)) && = $needed_results > 0) { // # of chars from the occurence to the end of the = string $chars_to_end =3D strlen($title) - $next_pos; // add bold tags around the occurence in the text $bolded =3D = substr_replace(substr_replace($text,"<b>",$next_pos,0),"</b>",($next_pos = + 3) + strlen($search_word),0); =20 // set prev_pos to the next char after the current = position $prev_pos =3D $next_pos + 1; // get up to 'chars_surrounding_match' on either side = of the occurrence,=20 // or to the begin/end of the string, whichever comes = first $match_result =3D "..." . substr($bolded,$next_pos - = min($next_pos,$chars_surrounding_match), $chars_surrounding_match * 2) . = "..."; =20 array_push($page_results,$match_result); $needed_results--; }// end while }// end foreach }// end if }// end for $ordered_page_results[$page_id] =3D $page_results; }// end foreach =20 // loop through results and consolidating them to be returned while($next_page =3D each($page_count)) { $page_id =3D $next_page["key"]; $count =3D $next_page["value"]; =20 $count =3D=3D 1 ? $count_string =3D "match" : $count_string = =3D "matches"; =20 $matches_string =3D ""; foreach($ordered_page_results[$page_id] as $next_match) { $matches_string .=3D $next_match . "<br />"; } =20 $resultArray[$page_id] =3D "<u><b>" . = $pages[$page_id]["page_title"] . "</b> - <b> ". $count ." = </b> ". $count_string ." found : first <b> " . = sizeof($ordered_page_results[$page_id]) . " </b> displayed.</u><br />" . = $matches_string; }// end while }// end if }// end if return $resultArray; }// END FUNC search |
From: <ad...@tu...> - 2003-06-12 16:05:26
|
Ryan, Sorry for the name confusion earlier. I'm really bad with names :) Anyway, I'm finishing up some updates to pagemaster right now that will implement a simplified search with only 2 queries. I added in a page_id column to the mod_pagemaster_sections table. I also implemented a method for adding a URL to uploaded images in sections. I've written an update script the can be used by boost to update current v1.32 to the new v1.4. I will release all this with a patch release we have planned for today or tomorrow. Adam > Adam, > > First off, as much as I respect Richard, I'm not him. :) But I > understand, those 'R' names just start to run together. Secondly, that > search function I sent you for pagemaster should only use 2 queries. > However, because I tried to avoid changing anything other than that > function, I had to do alot of array jumping and searching to determine > what page a section belonged to. It's not pretty, but adding that field > to the sections table would greatly simplify the process. > > Actually, I wrote this post to the mailing list a good three weeks ago, > but it seemed it took awhile to get moderated to the list. Not sure how > to get around this though. > > > > Thanks, > > > Ryan Roland > > Application Developer > Information Technology Department > Division of Recreational Sports > Indiana University > > rmroland@in... > Office: 812.855.9617 > Cell: 812.320.0032 > > > -----Original Message----- > From: adam@tu... > Sent: Wednesday, June 11, 2003 08 31 > To: php...@li... > Subject: Re: [Phpwebsite-developers] Refactor of Pagemaster Search > > > Richard, > > You've got me looking closer at the search function in pagemaster now. :) > I think I will add a page_id column to the section table so the section > will know it's parent. Then it should be possible to pull all search > results using only 2 queries as opposed to N+1 queries. I will make sure > there is an update in the update.php file for pagemaster and release it > with a patch release we have planned for sometime this week. I hope this > will help you out as well as all our users who rely on heavy searching of > pages. > > Adam > >> I was looking at the search module, (specifically with regard to the >> Pagemaster module) and I noticed that the search performed in Pagemaster >> loops through all pages and sections (with a query PER section). That >> means if you have N sections in your entire site, that the search >> performs >> exactly N+1 individual queries to the database. >> >> I'm working on a possible replacement for the current pagemaster search >> function, and the only stumbling block is that I get sections, but can't >> easily determine their parent pages from them. I'm sure this difficulty >> is why the current scheme was used, but I'm trying to get to where there >> are AT MOST N+2 queries issued to the DB (and that's only if every >> single >> section has a match to the search criteria). >> >> Currently I'm looking at trying to use the 'section_order' field in the >> pages table to determine if a given section is in a page, so I ask if >> anyone can tell me now if that is a dead end or not. >> >> Any recommendations would be appreciated. >> >> >> Thanks, >> >> >> Ryan Roland >> >> Application Developer >> Information Technology Department >> Division of Recreational Sports >> Indiana University >> >> rmroland@ind... >> >> >> >> ------------------------------------------------------- >> This SF.net email is sponsored by: Etnus, makers of TotalView, The best >> thread debugger on the planet. Designed with thread debugging features >> you've never dreamed of, try TotalView 6 free at www.etnus.com. >> _______________________________________________ >> Phpwebsite-developers mailing list >> Php...@li... >> https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers >> > > > -- > Adam Morton > Developer - Electronic Student Services > http://phpwebsite.appstate.edu > Founder - Appalachian Linux Users Group > http://alug.appstate.edu > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of TotalView, The best > thread debugger on the planet. Designed with thread debugging features > you've never dreamed of, try TotalView 6 free at www.etnus.com. > _______________________________________________ > Phpwebsite-developers mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers > > > ------------------------------------------------------- > This SF.NET email is sponsored by: eBay > Great deals on office technology -- on eBay now! Click here: > http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 > _______________________________________________ > Phpwebsite-developers mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpwebsite-developers > -- Adam Morton Developer - Electronic Student Services http://phpwebsite.appstate.edu Founder - Appalachian Linux Users Group http://alug.appstate.edu |