[stgt] [BUG] Tgt-1.0.8 exited unexpectedly

Mon Sep 27 14:58:04 CEST 2010

Hi,

> > > > I want VastSky which is a cluster storage system to use TGT as its
> > > 
> > > Interesting. There are some similar experiments. IBM does the similar
> > > and I also did the similar with OpenStack. RedHat Hail does the
> > > similar with their own iSCSI target implementation.
> > > 
> > > Out of curiosity, can VastSky avoid reading the old data?
> > > 
> > > For example, WRITE goes to the three replica nodes, then WRITE to the
> > > same sector fails on the first replica node (e.g. timeout because the
> > > node is too busy) but succeeds with the rest two nodes (so the two
> > > nodes have the newer data).
> > > 
> > > Then if the two nodes having the new data are down, is it possible
> > > that an initiator gets the old data from the first node (when the
> > > initiator issues READ to the same sector)?
> > 
> > In that case, VastSky just returns an EIO error. I won't change this
> 
> I see. Sheepdog works in the same way.
> 
> 
> > policy since volumes VastSky serves are supposed to be used with
> > file-systems on them. If a volume returns the wrong date including
> > the old data will cause a filesystem corruption easily.
> 
> Yeah, but even real disk could return bogus data (that is, silent data
> corruption). So this issue (returning bogus data) is about the
> possibility.

The modern disks and HBAs can detect bogus data in most cases, but
there are still possibilities. Yes.

> In addition, as you know, the recent file systems can handle such
> failure.

Yes, I know some filesystem got such a feature. But there is no point
to return bogus data instead of an EIO error.

> > VastSky updates all the mirrors synchronously. And only after all
> > the I/O requests are completed, it tells the owner that the request
> > is done.
> 
> Undoerstood. Sheepdog works in the same way.
> 
> How does Vastsky detect old data?
> 
> If one of the mirror nodes is down (e.g. the node is too busy),
> Vastsky assigns a new node?

Right.
Vastsky makes the node that seems to be down deleted from the group
and assigns a new one. Then, no one can access to the old one after that.

> Then if a client issues READ and all the
> mirrors are down but the old mirror node is alive, how does Vastsky
> prevent returning the old data?

About this issue, we have a plan:
When a node is down and it's not because of a hardware error,
we will make VastSky try to re-synchronize the node again.
This will be done in a few minutes because VastSky traces all write
I/O requests to know which sectors of the node aren't synchronized.

And you should know VastSky won't easily give up a node which seems
to be down. VastSky tries to reconnect the session and even tries to
use another path to access the node.

Thank you,
Hirokazu Takahashi.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html