|
From: Kentaro H. <ha...@cl...> - 2016-05-30 03:25:01
|
Hi, Groonga 6.0.3 has been released! http://groonga.org/docs/news.html#release-6-0-3 Install: http://groonga.org/docs/install.html Charcteristics: http://groonga.org/docs/characteristic.html The topics in this release: * Supported command version 3 * Improved performance about between() without indexes * Added number and time plugins * Supported window function This release includes some bug fixes, so please upgrade Groonga! # Topics ## Supported command version 3 In this release, Groonga supports command version 3. Groonga command supports revision by command version. In the previous versions, output format is something like: > status [ [0,1464342851.826973,0.0002036094665527344], { "alloc_count":273, "starttime":1464342849, "start_time":1464342849, "uptime":2, "version":"6.0.3", "n_queries":0, "cache_hit_rate":0.0, "command_version":1, "default_command_version":1, "max_command_version":3 } ] If you specify --comand_version 3, the format of response is changed to object literal: > status --command_version 3 { "header":{ "return_code":0, "start_time":1464342859.177147, "elapsed_time":0.0001118183135986328 }, "body":{ "alloc_count":276, "starttime":1464342849, "start_time":1464342849, "uptime":10, "version":"6.0.3", "n_queries":0, "cache_hit_rate":0.0, "command_version":3, "default_command_version":1, "max_command_version":3 } } It makes JavaScript friendly. ## Improved performance about between() without indexes In this release, performance of between() without indexes is improved. Here is the schema for benchmark. table_create Entries TABLE_NO_KEY column_create Entries rank COLUMN_SCALAR Int32 In the above sample, we expects that sequential search is used. There are four test cases. The number of records are different. * 1,000 * 10,000 * 100,000 * 1,000,000 The sequential number is assigned as rank. During benchmark execution, we applied between() for the following cases. * 1,000 (select the number of rank is greater than 500, less than 600) * 10,000 (select the number of rank is greater than 5,000, less than 5,100) * 100,000 (select the number of rank is greater than 50,000, less than 50,100) * 1,000,000 (select the number of rank is greater than 500,000, less than 500,100) Here is the query which is executed. select Entries --cache no --filter 'between(rank, 500, "exclude", 600, "include")' Here is the benchmark result. Note that median is used. * 1,000 4.3ms -> 0.2590ms * 10,000 39.5ms -> 0.6440ms * 100,000 390ms -> 5.1ms * 1,000,000 3798.3ms -> 44.2ms Here is the logarithmic graph. There is a great performance improvement. http://groonga.org/images/blog/ja/2016-05-29-groonga-between-optimization/benchmark-between-optimization.png ## Added number and time plugins In this release, new plugins are added - time and number plugin. It helps you to classify specific range of value as same value. number plugin provides number_classify function. Let's classify price every 100 value. Here is the sample schema and data. plugin_register functions/number table_create Prices TABLE_PAT_KEY Int32 load --table Prices [ {"_key": 0}, {"_key": 1}, {"_key": 99}, {"_key": 100}, {"_key": 101}, {"_key": 199}, {"_key": 200}, {"_key": 201} ] Use the following query which uses number_classify: select Prices --sortby _id --limit -1 --output_columns '_key, number_classify(_key, 100)' number_classify(_key, 100) classifies the value by _key and classifies every 100. [ [ [8], [ ["_key","Int32"], ["number_classify",null] ], [0,0], [1,0], [99,0], [100,100], [101,100], [199,100], [200,200], [201,200] ] ] As every 100, data between 100 and 199 are treated as same range of price. time plugin provides similar functionality about timestamp. time plugin provides the following functions: * time_classify_second * time_classify_minute * time_classify_hour * time_classify_day * time_classify_week * time_classify_month * time_classify_year Next, classify timestamps every 10 minutes, here is the sample schema and data. plugin_register functions/time table_create Timestamps TABLE_PAT_KEY Time load --table Timestamps [ {"_key": "2016-05-05 22:29:59.999999"}, {"_key": "2016-05-05 22:30:00.000000"}, {"_key": "2016-05-05 22:30:00.000001"}, {"_key": "2016-05-05 22:39:59.999999"}, {"_key": "2016-05-05 22:40:00.000000"}, {"_key": "2016-05-05 22:40:00.000001"} ] To classify every minute, use time_classify_minute function. select Timestamps --sortby _id --limit -1 --output_columns '_key, time_classify_minute(_key, 10)' Here is the result: [ [ [6], [ ["_key","Time"], ["time_classify_minute",null] ], [1462454999.999999,1462454400.0], [1462455000.0,1462455000.0], [1462455000.000001,1462455000.0], [1462455599.999999,1462455000.0], [1462455600.0,1462455600.0], [1462455600.000001,1462455600.0] ] ] The time interval is 10 minutes, data between 22:30 and 22:39 are treated as same range of timestamp (1462455000.0). ## Supported window function Supported window function In this release, window function is supported. For example, you can use record_number which returns nth number of record. Here is the sample schema and data. table_create Items TABLE_HASH_KEY ShortText column_create Items price COLUMN_SCALAR UInt32 load --table Items [ {"_key": "item1", "price": 666}, {"_key": "item2", "price": 999}, {"_key": "item3", "price": 777}, {"_key": "item4", "price": 111}, {"_key": "item5", "price": 333}, {"_key": "item6", "price": 222} ] Let's search price data with nth number of record by ascending order. Here is the query for it. select Items \ --columns[nth_record].stage initial \ --columns[nth_record].value 'record_number()' \ The important point is that the value of record_number() is assigned to dynamic column nth_record. [ [ [6], [ ["_key","ShortText"], ["price","UInt32"], ["nth_record","UInt32"] ], ["item1",666,4], ["item2",999,6], ["item3",777,5], ["item4",111,1], ["item5",333,3], ["item6",222,2] ] ] It is obvious that each record is nth number of record. ## Improvements * [experimental] Added GRN_II_OVERLAP_TOKEN_SKIP_ENABLE and GRN_NGRAM_TOKENIZER_REMOVE_BLANK_DISABLE environment variables to improve performance of N-gram tokenizer. [GitHub#533][Patch by Naoya Murakami] * [table_create] Stopped to ignore nonexistent default tokenizer, normalizer or token filters. In the previous versions, Groonga ignored a typo in --default_tokenizer, --normalizer or --token_filters parameter silently. It caused a delay in finding problems. * [select] output_columns v1: Supported expression such as snippet_html(...) in output_columns. * [select] Removed a limitation about the number of labeled drilldowns. In the previous versions, the number of max labeled drilldowns is limited to 10. * [number_classify] Added a number plugin. Use number_classify function to classify by value. * Added a time plugin. Use time_classify_second, time_classify_minute, time_classify_hour, time_classify_day, time_classify_week, time_classify_month, time_classify_year function to classify by value. * [select] Supported dynamic column creation which is used in output_columns, drilldown or sortby [GitHub#539,#541,#542,#544,#545][Patch by Naoya Murakami]: select \ --columns[LABEL].stage filtered \ --columns[LABEL].type ShortText \ --columns[LABEL].flags COLUMN_SCALAR \ --columns[LABEL].value 'script syntax expression' \ ... * [experimental][select] Improved performance for range/equal search and enough filtered case. Set GRN_TABLE_SELECT_ENOUGH_FILTERED_RATIO environment variable to enable this feature. * [select] Supported index used search for filtered tables. * Supported to detect changed database isn't closed. This feature is useful to check database corruption when Groonga is crashed unexpectedly. * [grndb] Supported detecting database wasn't closed successfully case. * Added --drilldown_filter. * Supported filter in labeled drilldown. * Improved performance for using [between] without index. By between() optimization, there is a case that range search is 100x faster than the previous version of between(). * [record_number] Supported window function. * [experimental][select] Supported --slices. * [select] Deprecated --sortby and --drilldown_sortby. Use --sort_keys and -drilldown_sort_keys instead. * [select] Deprecated --drilldown[...]. Use --drilldowns[...] instead. * Added [Command version] 3. It uses object literal based envelope. * [groonga-httpd] Updated bundled nginx version to 1.11.0. ## Fixes * [select] output_columns v2: Fixed a bug that * isn't expand to columns correctly. * Fixed a bug that 1usec information is lost for time value. * Fixed a crash bug when a mruby plugin is initialized with multiple threads. * Fixed a bug that static indexing crashes if a posting list is very long. This bug may occur against enormous size of database. [GitHub#546] ## Thanks * Naoya Murakami -- Kentaro Hayashi <ha...@cl...> |