Re: [Scst-devel] SCST with recent DRBD interoperability

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

 > You are using zero-copy FILEIO, right?

No, i don't use zero_copy, it's set to 0 in all devices.
Or you mean zero_copy TCP provided by put_page_callback patch?

 > Then you must have stable pages on your system,

How do i achieve that? I've read a lot of chatter around
stable pages, but no real examples on how to manipulate
the way how kernel handles them.

On 25.10.2014 7:36, Vladislav Bolkhovitin wrote:
> You are using zero-copy FILEIO, right? Then you must have stable pages on your system,
> otherwise you might see corruptions you are seeing, when data on pages changed under DRBD.
>
> Vlad
>
> Igor Novgorodov wrote on 10/23/2014 11:53 PM:
>> On 24.10.2014 6:46, Vladislav Bolkhovitin wrote:
>>> Igor Novgorodov, on 10/22/2014 11:51 PM wrote:
>>>> Which digests? iSCSI?
>>> Yes, iSCSI
>>>
>>>> Or DRBD?
>>>> Anyway, both iSCSI's Header & Data CRC32 digests and DRBD replication
>>>> SHA1 digests are enabled for a long time.
>>> Did you see occasional errors in the logs?
>> SCST Logs?
>> Only occasional disconnects of one initiator, but i'm not sure that's
>> related:
>>
>> [411169.783716] iscsi-scst: ***ERROR***: Connection with initiator
>> iqn.2011-04.ru.domain:krvm2 unexpectedly closed!
>> [411170.042574] scst: Using security group
>> "iqn.2011-04.ru.domain:VM_STORAGE2_1" for initiator
>> "iqn.2011-04.ru.domain:krvm2" (target iqn.2011-04.ru.domain:VM_STORAGE2_1)
>> [411170.043345] iscsi-scst: Negotiated parameters: InitialR2T No,
>> ImmediateData Yes, MaxConnections 1, MaxRecvDataSegmentLength 1048576,
>> MaxXmitDataSegmentLength 1048576,
>> [411170.043418] iscsi-scst:     MaxBurstLength 1048576, FirstBurstLength
>> 524284, DefaultTime2Wait 0, DefaultTime2Retain 0,
>> [411170.043468] iscsi-scst:     MaxOutstandingR2T 1, DataPDUInOrder Yes,
>> DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
>> [411170.043518] iscsi-scst:     HeaderDigest CRC32C, DataDigest CRC32C,
>> OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
>> [411170.043569] iscsi-scst: Target parameters set for session
>> 4f3c00003d0200: QueuedCommands 32, Response timeout 90, Nop-In interval
>> 30, Nop-In timeout 30
>>
>>>> Concerning stable page writes - should switching to vdisk_blockio help
>>>> me?
>>> Yes, it might help.
>>>
>>>> That should avoid page cache.
>>>> And why this issue causes problems? SCST modifies it's buffer after write()?
>>> Have you checked Google as I recommended? It's really long to describe it here.
>>>
>>> Vlad
>> Yes, i've checked with http://lwn.net/Articles/442355/
>> But that does not explains whether SCST has problems with it or not.
>> As far as i understand the problem occurs when the process issuing
>> write() requests modifies write buffer after write()
>>
>>>> On 23.10.2014 6:38, Vladislav Bolkhovitin wrote:
>>>>> Hmm, stable pages issue (google it)? I'd suggest you to try with data digests enabled.
>>>>>
>>>>> Vlad
>>>>>
>>>>> Igor Novgorodov, on 10/21/2014 10:25 AM wrote:
>>>>>> Hello!
>>>>>>
>>>>>> I've recently upgraded one of my dual-node single-primary clusters
>>>>>> to latest SCST (3.0 branch, rev. 5843, was 2.2 rev 5319 i guess) and
>>>>>> DRBD (8.4.5 branch latest git, was 8.4.3). Kernel 3.14.22 (was 3.4.x)
>>>>>> 2 LUNs (8 and 10 Tb) are exported via iscsi and vdisk_fileio.
>>>>>>
>>>>>> It seems to work OK, but i started getting digest errors when running
>>>>>> online DRBD verification every now and then.
>>>>>>
>>>>>> Primary node:
>>>>>> [236515.107301] block drbd0: Starting Online Verify from sector 6344796192
>>>>>> [241421.058609] block drbd0: Digest mismatch, buffer modified by upper
>>>>>> layers during write: 17187211632s +4096
>>>>>> Secondary node:
>>>>>> [81647.559278] block drbd0: Online Verify start sector: 6344796192
>>>>>> [86553.382954] block drbd0: Digest integrity check FAILED: 17187211632s
>>>>>> +4096
>>>>>>
>>>>>> It then disconnects, connects, resyncs and goes on, but verify is aborted.
>>>>>>
>>>>>> I've read about this error (which is arguably an error), the idea behind
>>>>>> is that some application or kernel is modifying the data buffer while
>>>>>> it is being written to the block device before getting an ack that the
>>>>>> buffer is written.
>>>>>>
>>>>>> So, the questions:
>>>>>> 1. Does SCST really do this nasty kind of thing?
>>>>>> 2. If not, why does that started to happen?
>>>>>> 3. Why does this occur only on one of DRBD devices?
>>>>>>         The other one has been verifying for 30+ hours now without a problem.
>>>>>>         Maybe that's related to it having other i/o pattern, i don't know.
>>>>>>
>>>>>> Glad to hear any suggestions, thanks in advance.