[sheepdog] [PATCH v2] object cache: fix a race problem

Mon Jun 4 04:04:18 CEST 2012

On 06/03/2012 08:13 PM, Liu Yuan wrote:

> Fix the following problem:
> ...
> Jun 03 18:39:53 do_local_io(52) 2, ac1a3e000019b7 , 1
> Jun 03 18:39:53 object_cache_pull(529) oid ac1a3e000019b7pulled successfully
> Jun 03 18:39:53 object_cache_pull(529) oid ac1a3e000019b7pulled successfully
> Jun 03 18:39:53 create_cache_object(451) 000019b7 already created
> Jun 03 18:39:53 object_cache_rw(415) 000019b7, len 4096, off 1048576
> Jun 03 18:39:53 read_cache_object(396) size 0, count:4096, offset 1048576 File exists
> Jun 03 18:39:53 do_gateway_request(308) failed: 2, ac1a3e000019b7 , 1, 3
> Jun 03 18:39:53 gateway_op_done(151) leaving sheepdog cluster
> ...
> 
> The problem is, suppose we have two cloned VM reads the same COW oid:
> 
>        A                            B
> 
> object_cache_pull() {        object_cache_pull() {
>   create_cache_object() {      create_cache_object() {
>     open(oid);
>                                   open(oid) {
>                                     oid_already_opened() {
>                                       goto out;
>                                     }
>                                   }
>                                }
>                              }
>                              read_cache_object() {
>                                read_size != requested_length;
>                                return EIO;
>                              }
>     wirte(oid);
>   }
> }
> 
> The fix looks more a workaround, I will happy to see a better fix.

Dropped, the real problem is fcntl, which doesn't support lock across FD
even in the same process.

Thanks,
Yuan