Re: [Sequoiadb-discuss] Sequoia gets stuck (probably) because of SELECT...FOR UPDATE

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Emmanuel,
Thanks a lot. That was really helpful reply. I will try to fix the issue
with my client application  and see is it works.

Regards,
Rohit

On Wed, Nov 23, 2011 at 5:21 PM, Emmanuel Cecchet <ce...@gm...> wrote:

>  Hi,
>
> Thank you very much for your quick reply emmanuel.  I have already tried
> setting the timeout in load balancer but it doesn't work for me I guess :(
> I have been facing this problem with the SELECT.. FOR UPDATE statements.
> While going through the source code of sequoia I found this in
>
> sequoia-2.10.10-src/src/org/continuent/sequoia/controller/requests/SelectRequest.java
>
>  public class SelectRequest extends AbstractRequest implements
> Serializable
> {
>   private static final long      serialVersionUID = 6498520472410320514L;
>
>    /**
>    * Set to true if this SelectRequest must be broadcasted on the cluster
>    * (useful for queries like SELECT FOR UPDATE
>    */
>   private boolean                mustBroadcast    = false;
>             <Line 76>
>
>  Do I have to change mustBroadcast to true and recompile the sequoia to
> enable it to do SELECT...FOR UPDATE ?
>
> No this value is automatically set by the parser. When the select
> statement is parsed, it is considered as a write on the table and is
> broadcast to all databases to acquire a write lock.
>
>
>  Also, here's the request log from a similar test (it again got stuck at
> same point) .
> I apologize for copy pasting the entire log here.
>
>  06:42:10,221 DEBUG controller.core.PingResponder Response to ping sent
> to /127.0.0.1
> 06:42:11,207 DEBUG controller.core.PingResponder Received ping data from /
> 127.0.0.1
> 06:42:11,207 DEBUG controller.core.PingResponder Response to ping sent to /
> 127.0.0.1
> 06:42:12,351 DEBUG sequoia.controller.loadbalancer Building wait-for
> graph...
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer warehouse locked by 13
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer new_order locked by 12
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer district locked by 13
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer Walking wait-for
> graph...
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer 17 waits for new_order
> locked by 12
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer     12 waits for
> district locked by 13
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer 15 waits for new_order
> locked by 12
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer     12 waits for
> district locked by 13
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer 14 waits for warehouse
> locked by 13
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer Choosing victim node...
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer 0 cycles detected
> 06:42:12,352 DEBUG sequoia.controller.loadbalancer No victim
>
> Looks like transaction 13 is not completing. I have reduced your log to
> the incriminating transactions below:
>
>
>  *Request.log*
>
> 06:41:26,450 tpcw B 12
> 06:41:26,450 tpcw S 175 12 SELECT c_discount, c_last, c_credit, w_tax
>  FROM customer, warehouse WHERE w_id = ? AND w_id = c_w_id AND c_d_id = ?
> AND c_id = ?/<!%I|1|!%><!%I|4|!%><!%I|1258|!%>
> 06:41:26,664 tpcw S 178 12 SELECT d_next_o_id, d_tax FROM district WHERE
> d_id = ? AND d_w_id = ? FOR UPDATE/<!%I|4|!%><!%I|1|!%>
> 06:41:26,673 tpcw W 179 12 INSERT INTO NEW_ORDER (no_o_id, no_d_id,
> no_w_id) VALUES ( ?, ?, ?)/<!%I|3001|!%><!%I|4|!%><!%I|1|!%>
> 06:41:26,917 tpcw W 181 0 UPDATE stock SET s_quantity = ? , s_ytd = s_ytd
> + ?, s_remote_cnt = s_remote_cnt + ?  WHERE s_i_id = ? AND s_w_id =
> ?/<!%I|71|!%><!%I|2|!%><!%I|0|!%><!%I|92253|!%><!%I|2|!%>
> 06:41:27,633 tpcw B 13
> 06:41:27,633 tpcw W 188 13 UPDATE warehouse SET w_ytd = w_ytd + ?  WHERE
> w_id = ?/<!%F|56.68|!%><!%I|1|!%>
> 06:41:27,680 tpcw S 190 13 SELECT w_street_1, w_street_2, w_city, w_state,
> w_zip, w_name FROM warehouse WHERE w_id = ?/<!%I|1|!%>
> 06:41:27,681 tpcw W 191 13 UPDATE district SET d_ytd = d_ytd + ? WHERE
> d_w_id = ? AND d_id = ?/<!%F|56.68|!%><!%I|1|!%><!%I|4|!%>
> 06:41:28,664 tpcw B 14
> 06:41:28,664 tpcw W 204 14 UPDATE warehouse SET w_ytd = w_ytd + ?  WHERE
> w_id = ?/<!%F|4539.02|!%><!%I|2|!%>
> 06:41:30,433 tpcw B 15
> 06:41:30,433 tpcw S 226 15 SELECT c_discount, c_last, c_credit, w_tax
>  FROM customer, warehouse WHERE w_id = ? AND w_id = c_w_id AND c_d_id = ?
> AND c_id = ?/<!%I|3|!%><!%I|2|!%><!%I|2751|!%>
> 06:41:30,500 tpcw S 231 15 SELECT d_next_o_id, d_tax FROM district WHERE
> d_id = ? AND d_w_id = ? FOR UPDATE/<!%I|2|!%><!%I|3|!%>
> 06:41:30,512 tpcw W 233 15 INSERT INTO NEW_ORDER (no_o_id, no_d_id,
> no_w_id) VALUES ( ?, ?, ?)/<!%I|3001|!%><!%I|2|!%><!%I|3|!%>
> 06:41:32,424 tpcw B 17
> 06:41:32,424 tpcw S 249 17 SELECT c_discount, c_last, c_credit, w_tax
>  FROM customer, warehouse WHERE w_id = ? AND w_id = c_w_id AND c_d_id = ?
> AND c_id = ?/<!%I|2|!%><!%I|2|!%><!%I|1676|!%>
> 06:41:32,487 tpcw S 250 17 SELECT d_next_o_id, d_tax FROM district WHERE
> d_id = ? AND d_w_id = ? FOR UPDATE/<!%I|2|!%><!%I|2|!%>
> 06:41:32,489 tpcw W 251 17 INSERT INTO NEW_ORDER (no_o_id, no_d_id,
> no_w_id) VALUES ( ?, ?, ?)/<!%I|3001|!%><!%I|2|!%><!%I|2|!%>
>
>
> I found the problem. The lock that transaction 12 acquired on the district
> table (in its select for update) has not been properly added and the
> deadlock has not been detected.
> So on one side, Sequoia has a bug with not properly updating its lock
> table on this select for update (the query parsing code might have an
> issue).
> On the other side, your transactions are not serializable and they
> deadlock (the application should be fixed). By serialiazing writes per
> table, Sequoia exposes these bugs faster than if you were running directly
> on top of the database (but you should be able to obtain the same deadlock
> directly on the database). Actually is stuck on the database deadlock. I
> don't remember if the version of Postgres you are using has a global
> transaction timeout that you can set so that Postgres would kill the
> deadlocked transactions itself (quick fix).
>
> Hope this helps
> Emmanuel
>
>
>
> On Wed, Nov 23, 2011 at 3:09 PM, Emmanuel Cecchet <ce...@gm...>wrote:
>
>>  Hi,
>>
>> From your log it looks like everyone is waiting for that transaction 65
>> to either commit or rollback.
>> If your thread has crashed and there is no rollback or connection closing
>> in a catch block, all the locks are going to be retained forever (thought
>> there is a possibility to set timeout on transactions in the load balancer
>> settings if I recall).
>> Without the log of the exact requests, it's hard to tell you more than
>> that.
>>
>> Keep us posted with your findings,
>> Emmanuel
>>
>>
>> On 11/23/2011 06:30, rohit dhamane wrote:
>>
>>  Hi,
>>
>>  I have been facing some problem with sequoia. I am trying to run
>> BenchmarkSql-2.3 (http://sourceforge.net/projects/benchmarksql/) with
>> sequoia over two postgresql-7.3 servers but it seems to get stuck
>> indefinitely at the same point. I think it's happening because of the
>> SELECT...FOR UPDATE but not really sure how :( It seems that the
>> transactions are getting into a deadlock. Here's the out put when I turned
>> on the debug mode.
>>
>>  02:56:49,388 DEBUG controller.core.PingResponder Received ping data
>> from /127.0.0.1
>> 02:56:49,388 DEBUG controller.core.PingResponder Response to ping sent to
>> /127.0.0.1
>> 02:56:49,775 DEBUG sequoia.controller.loadbalancer Building wait-for
>> graph...
>> 02:56:49,776 DEBUG sequoia.controller.loadbalancer new_order locked by 64
>> 02:56:49,777 DEBUG sequoia.controller.loadbalancer district locked by 65
>> 02:56:49,777 DEBUG sequoia.controller.loadbalancer warehouse locked by 65
>> 02:56:49,777 DEBUG sequoia.controller.loadbalancer Walking wait-for
>> graph...
>> 02:56:49,778 DEBUG sequoia.controller.loadbalancer 64 waits for district
>> locked by 65
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer 68 waits for new_order
>> locked by 64
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer     64 waits for
>> district locked by 65
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer 67 waits for warehouse
>> locked by 65
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer 66 waits for new_order
>> locked by 64
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer     64 waits for
>> district locked by 65
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer Choosing victim node...
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer 0 cycles detected
>> 02:56:49,779 DEBUG sequoia.controller.loadbalancer No victim
>> 02:56:50,389 DEBUG controller.core.PingResponder Received ping data from /
>> 127.0.0.1
>> 02:56:50,389 DEBUG controller.core.PingResponder Response to ping sent to
>> /127.0.0.1
>>
>>
>>
>>  Here's my configuration file..
>>
>>  <?xml version="1.0" encoding="UTF-8"?>^M
>> <!DOCTYPE SEQUOIA PUBLIC "-//Continuent//DTD SEQUOIA @VERSION@//EN" "
>> http://sequoia.continuent.org/dtds/sequoia-@VERSION@.dtd">^M
>> ^M
>> <SEQUOIA>^M
>> ^M
>>   <VirtualDatabase name="tpcw">^M
>> ^M
>>     <AuthenticationManager>^M
>>       <Admin>^M
>>         <User username="postgres" password=""/>^M
>>       </Admin> ^M
>>       <VirtualUsers>^M
>>         <VirtualLogin vLogin="postgres" vPassword=""/>^M
>>       </VirtualUsers>^M
>>     </AuthenticationManager>^M
>> ^M
>>     <DatabaseBackend name="localhost" driver="org.postgresql.Driver"
>> driverPath="/home/rohit/Software/CJDBC/sequoia-2.10.10-bin/drivers/postgresql7.3.jar"
>> url="jdbc:postgresql://localhost:6551/tpcw" connectionTestStatement="select
>> now()">^M
>>
>>        <ConnectionManager vLogin="postgres" rLogin="postgres"
>> rPassword="">^M
>>         <VariablePoolConnectionManager initPoolSize="20" minPoolSize="5"
>> maxPoolSize="0" idleTimeout="180" waitTimeout="0"/>^M
>>       </ConnectionManager>^M
>>     </DatabaseBackend>^M
>> ^M
>>     <DatabaseBackend name="Remote" driver="org.postgresql.Driver"
>> driverPath="/home/rohit/Software/CJDBC/sequoia-2.10.10-bin/drivers/postgresql7.3.jar"
>> url="jdbc:postgresql://192.168.164.136:6552/tpcw"
>> connectionTestStatement="select now()">^M
>>       <ConnectionManager vLogin="postgres" rLogin="postgres"
>> rPassword="">^M
>>         <VariablePoolConnectionManager initPoolSize="20" minPoolSize="5"
>> maxPoolSize="0" idleTimeout="180" waitTimeout="0"/>^M
>>       </ConnectionManager>^M
>>     </DatabaseBackend>^M
>>      <RequestManager>^M
>>       <RequestScheduler>^M
>>          <RAIDb-1Scheduler level="passThrough"/>^M
>>       </RequestScheduler>^M
>> ^M
>>       <RequestCache>^M
>>          <MetadataCache/>^M
>>          <ParsingCache/>                 ^M
>>          <ResultCache granularity="table" />^M
>>       </RequestCache>^M
>> ^M
>>       <LoadBalancer transactionIsolation="serializable">^M
>>          <RAIDb-1>^M
>>             <WaitForCompletion policy="all"
>>             deadlockTimeoutInMs="30000"/>^M
>>             <RAIDb-1-RoundRobin/>^M
>>          </RAIDb-1>^M
>>       </LoadBalancer>^M
>>      </RequestManager>^M
>> ^M
>>   </VirtualDatabase>^M
>> ^M
>> </SEQUOIA>^M
>>
>>  If anybody has anu idea why this happens please let me know. I would
>> appreciate it a lot. Thanks in advance.
>>
>>  Regards,
>> Rohit
>>
>>
>>  ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure
>> contains a definitive record of customers, application performance,
>> security threats, fraudulent activity, and more. Splunk takes this
>> data and makes sense of it. IT sense. And common sense.http://p.sf.net/sfu/splunk-novd2d
>>
>>
>>
>> _______________________________________________
>> Sequoiadb-discuss mailing lis...@li...://lists.sourceforge.net/lists/listinfo/sequoiadb-discuss
>>
>>
>>
>
>