[stgt] [BUG] Tgt-1.0.8 exited unexpectedly

Hirokazu Takahashi taka at valinux.co.jp
Mon Sep 27 14:58:04 CEST 2010


> > > > I want VastSky which is a cluster storage system to use TGT as its
> > > 
> > > Interesting. There are some similar experiments. IBM does the similar
> > > and I also did the similar with OpenStack. RedHat Hail does the
> > > similar with their own iSCSI target implementation.
> > > 
> > > Out of curiosity, can VastSky avoid reading the old data?
> > > 
> > > For example, WRITE goes to the three replica nodes, then WRITE to the
> > > same sector fails on the first replica node (e.g. timeout because the
> > > node is too busy) but succeeds with the rest two nodes (so the two
> > > nodes have the newer data).
> > > 
> > > Then if the two nodes having the new data are down, is it possible
> > > that an initiator gets the old data from the first node (when the
> > > initiator issues READ to the same sector)?
> > 
> > In that case, VastSky just returns an EIO error. I won't change this
> I see. Sheepdog works in the same way.
> > policy since volumes VastSky serves are supposed to be used with
> > file-systems on them. If a volume returns the wrong date including
> > the old data will cause a filesystem corruption easily.
> Yeah, but even real disk could return bogus data (that is, silent data
> corruption). So this issue (returning bogus data) is about the
> possibility.

The modern disks and HBAs can detect bogus data in most cases, but
there are still possibilities. Yes.

> In addition, as you know, the recent file systems can handle such
> failure.

Yes, I know some filesystem got such a feature. But there is no point
to return bogus data instead of an EIO error.

> > VastSky updates all the mirrors synchronously. And only after all
> > the I/O requests are completed, it tells the owner that the request
> > is done.
> Undoerstood. Sheepdog works in the same way.
> How does Vastsky detect old data?
> If one of the mirror nodes is down (e.g. the node is too busy),
> Vastsky assigns a new node?

Vastsky makes the node that seems to be down deleted from the group
and assigns a new one. Then, no one can access to the old one after that.

> Then if a client issues READ and all the
> mirrors are down but the old mirror node is alive, how does Vastsky
> prevent returning the old data?

About this issue, we have a plan:
When a node is down and it's not because of a hardware error,
we will make VastSky try to re-synchronize the node again.
This will be done in a few minutes because VastSky traces all write
I/O requests to know which sectors of the node aren't synchronized.

And you should know VastSky won't easily give up a node which seems
to be down. VastSky tries to reconnect the session and even tries to
use another path to access the node.

Thank you,
Hirokazu Takahashi.
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

More information about the stgt mailing list