<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-6800934446457898793.post3257292245398828856..comments</id><updated>2011-06-08T06:23:05.835-04:00</updated><category term='trueskill'/><category term='aes'/><title type='text'>Comments on Moserware: How Do Locks Lock?</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.moserware.com/feeds/3257292245398828856/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html'/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>24</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-8467819633459219045</id><published>2011-06-08T06:23:05.835-04:00</published><updated>2011-06-08T06:23:05.835-04:00</updated><title type='text'>I know this is a fairly old discussion but I&amp;#39;d...</title><content type='html'>I know this is a fairly old discussion but I&amp;#39;d like to thank you Jeff and Anonymous for the amazing discussion. The concepts are pretty simple to grasp yet it&amp;#39;s surprising how many people overlook such important aspects of parallel programming.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8467819633459219045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8467819633459219045'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1307528585835#c8467819633459219045' title=''/><author><name>Passer-by</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1390926044'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-5420939082905243568</id><published>2009-03-14T22:50:00.000-04:00</published><updated>2009-03-14T22:50:00.000-04:00</updated><title type='text'>Rishi: Great to hear this was helpful. Congrats on...</title><content type='html'>Rishi: Great to hear this was helpful. Congrats on getting it working in Silverlight!</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5420939082905243568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5420939082905243568'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1237085400000#c5420939082905243568' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-3015942537709732103</id><published>2009-03-08T11:27:00.000-04:00</published><updated>2009-03-08T11:27:00.000-04:00</updated><title type='text'>Hi Jeff, thanks for your answer on stackflow about...</title><content type='html'>Hi Jeff, thanks for your answer on stackflow about Silverlight ReaderWriterLock. I tested your code it ran like 9x v/s  Monitor.Lock blocks. Also, as per your reference I got Vance Morrison's implementation working on silverlight check it out http://www.orkpad.com/Blog/post/2009/03/08/Silverlight-ReaderWriterLock-Implementation.aspx&lt;BR/&gt;&lt;BR/&gt;Cheers</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/3015942537709732103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/3015942537709732103'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1236526020000#c3015942537709732103' title=''/><author><name>Rishi</name><uri>http://www.orkpad.com</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1056333102'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-8627343633007532496</id><published>2008-10-04T07:20:00.000-04:00</published><updated>2008-10-04T07:20:00.000-04:00</updated><title type='text'>Great!&lt;br&gt;&lt;br&gt;  Thanks to you too for keeping an o...</title><content type='html'>Great!&lt;BR/&gt;&lt;BR/&gt;  Thanks to you too for keeping an open mind and conversation.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8627343633007532496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8627343633007532496'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1223119200000#c8627343633007532496' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-520151573'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-920872881124295134</id><published>2008-10-03T22:46:00.000-04:00</published><updated>2008-10-03T22:46:00.000-04:00</updated><title type='text'>anonymous: &lt;br&gt;&lt;br&gt;As a result of our disccussion,...</title><content type='html'>anonymous: &lt;BR/&gt;&lt;BR/&gt;As a result of our disccussion, I wrote Vance and he did a great job of responding quickly with good points:&lt;BR/&gt;&lt;BR/&gt;1. Vance is very upfront in saying that his spin-lock implementation of a reader-writer lock is not a "gold standard for spin locks." I still commend it though on its simplicity and ease of understanding.&lt;BR/&gt;&lt;BR/&gt;2. Microsoft is currently working on &lt;A HREF="http://www.microsoft.com/downloads/details.aspx?FamilyId=348F73FD-593D-4B3C-B055-694C50D2B0F3&amp;displaylang=en" REL="nofollow"&gt;Parallel Extensions&lt;/A&gt; for the .NET framework. This represents the culmination of more detailed work in parallel processing. It has SpinWait and SpinLock classes that can be used as primitives. They're still tuning the exact mix of spinning and sleeping/yielding.&lt;BR/&gt;&lt;BR/&gt;3. Agreement that spin waiting is wasted time and is only beneficial if you have a "strong reason to believe the lock will be freed in the short time you spin."&lt;BR/&gt;&lt;BR/&gt;4. By not getting swapped out, you save calls to the kernel, 2 switches (out and in), as well as some cycles ("100K to 1M") to get your working set back in cache. We had talked about this earlier, but I don't think I realized that this can be the biggest issue (e.g. having to "warm up" a "cool" cache). In a contended lock environment that is held for a very short period (e.g. to only update a few variables), the spin can be effective and adds up over time.&lt;BR/&gt;&lt;BR/&gt;I don't think my tests did a good job at contention and having other work that was ready to run that would swap out the cache lines and thus I was missing a large benefit of the spins. &lt;BR/&gt;&lt;BR/&gt;Furthermore, my limitation of only having a dual-processor machine limited my ability to effectively show the benefits of spins in a heavily loaded environment with contended locks that are all spinning at once.&lt;BR/&gt;&lt;BR/&gt;They're not perfect in user mode, but I think we've established that spins have their place when used properly.&lt;BR/&gt;&lt;BR/&gt;I've updated the offending paragraph as a result of all of this discussion. It's not perfect, but I do think it is better. &lt;BR/&gt;&lt;BR/&gt;Thank you for helping improve this post and enriching the conversation!&lt;BR/&gt;&lt;BR/&gt;P.S. Thanks to Vance Morrison for posting his lock implementation and for taking the time to answer my questions regarding it.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/920872881124295134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/920872881124295134'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1223088360000#c920872881124295134' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-8934487167172624025</id><published>2008-10-03T10:23:00.000-04:00</published><updated>2008-10-03T10:23:00.000-04:00</updated><title type='text'>quote " If I updated my [post]" /quote&lt;br&gt;&lt;br&gt; Tha...</title><content type='html'>quote " If I updated my [post]" /quote&lt;BR/&gt;&lt;BR/&gt; That statement you make sums it up very correctly.&lt;BR/&gt;&lt;BR/&gt; To answer your question about WaitFor usage: my understanding of your code is that the whole purpose of the thing is to avoid the creation of the expensive objects. Thus, the ideal scenario with any level of contention is that the writer lock is never taken because all of the expensive objects have already been created in the warm-up phase of the application.&lt;BR/&gt;&lt;BR/&gt; In this scenario, everyone acquires the read locks, and releases them without ever waiting on anything.&lt;BR/&gt;&lt;BR/&gt; The bottom line of the waitFor call is that it semantically means that the thread should block or stall. Whether this is done through spinning or a genuine WaitFor call is only a detail of it. Given this, the ideal and most often used case will be, in my opinion, that any number of threads will get to this area of code, take out the reader lock (using nothing but a cmpxchg), get their references to ExpensiveObjects, and leave the reader lock area without any blocking. After all, this is ideally what multithreaded applications should be doing. This will be all done without any context switching (and no WaitFor calls).&lt;BR/&gt;&lt;BR/&gt; I think you get the point of everything I said so far given that your questions and comments indicate that you grasp the concepts. &lt;BR/&gt;&lt;BR/&gt; There is one last tiny nit to pick though. Vance's code, namely the implementation of EnterMyLockSpin. I actually don't know where it comes from or who this is, and am feeling to lazy to go in depth into it. However, I think it's (poorly)  re-implementing what a CriticalSection is.&lt;BR/&gt;&lt;BR/&gt; Given what I've already explained (that sleep and WaitFor incur the same penalty of switching to kernel mode and giving up your quantum), this spin lock is inefficient. It is also not absolutely correct, or rather, not fair.&lt;BR/&gt;&lt;BR/&gt; 1) it is inefficient because calling Sleep every 3rd time means that you will incur at least as much a penalty as calling WaitFor once.&lt;BR/&gt; 2) it is not fair, because you might lose your turn if some other thread gets in there during the time you are sleeping. It is incorrect because this could possibly happen for long periods of times given the correct circumstances (e.g. priority inversion). And because you aren't relying on OS synchronization, this kind of priority inversion would not be remedied automagically.&lt;BR/&gt;&lt;BR/&gt; To sum it all up: no, spinlocks aren't to be banished. Yes, re-implementing OS primitives is poor form, risky at best. EnterMyLockSpin is essentially re-implementing what EnterCriticalSection already is. User code shouldn't spin, but we can forgive &lt;I&gt;&lt;B&gt;libraries&lt;/B&gt;&lt;/I&gt; that do so (like EnterCriticalSection or a C# library).</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8934487167172624025'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8934487167172624025'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1223043780000#c8934487167172624025' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1033393270'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-2757492266511871166</id><published>2008-10-03T00:43:00.000-04:00</published><updated>2008-10-03T00:43:00.000-04:00</updated><title type='text'>anonymous:&lt;br&gt;&lt;br&gt;Good continued discussion. Minor...</title><content type='html'>anonymous:&lt;BR/&gt;&lt;BR/&gt;Good continued discussion. Minor points:&lt;BR/&gt;&lt;BR/&gt;"a lock will only call WaitFor if the cmpxchg fails. So a lock that is successfully acquired will fly like the wind through the lock code without ever diving into kernel. If not, it will call waitFor, which semantically speaking is exactly what we want."&lt;BR/&gt;&lt;BR/&gt;I could be wrong here, but as I see it, the cmpxchg is only used to protect the internal data about the lock. Remember, I'm speaking only of the user mode code here.&lt;BR/&gt;&lt;BR/&gt;Picture this scenario on a multicore machine where the numbers represent increasing wall clock time:&lt;BR/&gt;&lt;BR/&gt;1. Thread A does an AcquireWriterLock(...). Thread A does this by entering the internal lock by succeeding on its very first call to cmpxchg and sees that there are no active readers or writers. It then makes a note in the internal data structure that there is an active writer and then immediately exits the internal lock and begins writing on some shared resource.&lt;BR/&gt;&lt;BR/&gt;2. Thread B does an AcquireReaderLock(...). Thread B also does this by entering the internal lock by succeeding on its very first call to cmpxchg and sees that there *is* an active writer (Thread A). It increments the total number of reader waiters *and* makes sure that there is an OS event handle to wait on. Once these two things are done, it exits the internal lock and does a WaitFor on the OS event handle that is used to notify waiting readers. At this point the scheduler swaps out Thread B and makes a note of the handle that it is waiting on and puts it in the "Waiting" queue.&lt;BR/&gt;&lt;BR/&gt;3. Thread A finishes writing to the shared resource and calls ReleaseWriterLock(). The lock implementation handles this by entering in the internal lock and setting the number of writers to 0. Next, it notices that there is a reader waiting (Thread B). It then exits its internal lock and sets the event for readers waiting. By setting this event, the OS can now move Thread B from the "Waiting" queue to the "Ready" queue since the event that it was WaitFor-ing on has been set.&lt;BR/&gt;&lt;BR/&gt;4. Thread B is put on the running queue and begins reading the shared resource that the lock implementation is being used to protect.&lt;BR/&gt;&lt;BR/&gt;Do you agree that this is a very normal scenario for a reader-writer lock? Do you see that even though the cmpxchg succeeded on the first time in every case, the WaitFor was still needed for notification of when it is ok to start reading by Thread B?&lt;BR/&gt;&lt;BR/&gt;I think of WaitFor and Sleep differently. Like you mentioned in previous comments, a WaitFor can be highly optimized by the scheduler and be used to wake up a thread at the precise moment when things are ready (see also my note 12 in the post). A Sleep call is just telling the OS that the thread wants to yield its quantum and be put back on the Ready queue.&lt;BR/&gt;&lt;BR/&gt;Throughout our discussion, we've been saying "WaitFor." I'm assuming that we're talking about APIs like &lt;A HREF="http://msdn.microsoft.com/en-us/library/ms687032.aspx" REL="nofollow"&gt;WaitForSingleObject&lt;/A&gt;, &lt;A HREF="http://msdn.microsoft.com/en-us/library/ms687036(VS.85).aspx" REL="nofollow"&gt;WaitForSingleObjectEx&lt;/A&gt;, &lt;A HREF="http://msdn.microsoft.com/en-us/library/ms687025(VS.85).aspx" REL="nofollow"&gt;WaitForMultipleObjects&lt;/A&gt;, and &lt;A HREF="http://msdn.microsoft.com/en-us/library/ms687028(VS.85).aspx" REL="nofollow"&gt;WaitForMultipleObjectsEx&lt;/A&gt;. In the particular scenario I outlined above, step 2 can use a simple WaitForSingleObject on the kernel handle representing the "waiting readers" event.&lt;BR/&gt;&lt;BR/&gt;&lt;BR/&gt;"this requires the OS to detect a priority inversion etc. etc. You can look it up."&lt;BR/&gt;&lt;BR/&gt;Indeed, this is a cool part about OS's like Windows. The only thing I know about it is from reading the "Priority Boosts for CPU Starvation" in Chapter 6 of Windows Internals 4th Edition: &lt;BR/&gt;&lt;BR/&gt;"Once per second, the balance set manager ... scans the ready queues for any threads that have been in the ready state (that is, haven't run) for approximately 4 seconds. If it finds such a thread, the balance set manager boosts the thread's priority to 15... Once the quantum is expired, the thread's priority decays immediately to its original base priority. If the thread wasn't finished and a higher priority thread is ready to run, the decayed thread will return to the ready queue, where it again becomes eligible for another boost if it remains there for another 4 seconds."&lt;BR/&gt;&lt;BR/&gt;As the book mentions, it's not perfect, but it is a neat way around the problem. Is this roughly what you were referring to in that comment?&lt;BR/&gt;&lt;BR/&gt;As to the elapsed time count with and without spinning, it's too close to call given my unscientific test. The results fluctuated +- 0.1% on who won, so no clear winner came out. However, my test could be a bad example. I encourage anyone reading this to offer up test code either to my personal email address (found on the right hand side of my blog) or on StackOverflow question mentioned above in my previous comments.&lt;BR/&gt;&lt;BR/&gt;I'm not trying to sidestep an error on my part. I mislead myself and readers of this post. If I updated my "hundred thousand times slower" comment to read something like "The lock implementation performs a spin wait with the hope that it might use enough time to happen to allow a thread running on another core to release the internal lock. By spinning, it might avoid a context switch and therefore save some context switches." Would this be satisfactory from your perspective?&lt;BR/&gt;&lt;BR/&gt;I'll agree that .1% to 3% isn't noticable or worth a lot of effort. At the very least, I'd classify it as a neat trick. While it really matters in kernel mode, the application in user mode is marginal. It's noticable more in highly contentended locks, so I'm glad that we both agree that it shouldn't be removed from system libraries since it could have some benefit.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/2757492266511871166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/2757492266511871166'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1223008980000#c2757492266511871166' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-6481685954916318689</id><published>2008-10-02T12:37:00.000-04:00</published><updated>2008-10-02T12:37:00.000-04:00</updated><title type='text'>Oh, I forgot to say: if you need lock prioritizati...</title><content type='html'>Oh, I forgot to say: if you need lock prioritization, you need to make that evident in your semantics. Thus, ticket spinlocks would be inadequate as they are inside a lock which semantically has no intrinsic prioritization.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/6481685954916318689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/6481685954916318689'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222965420000#c6481685954916318689' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-2023434261'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-1671344679630555930</id><published>2008-10-02T12:21:00.000-04:00</published><updated>2008-10-02T12:21:00.000-04:00</updated><title type='text'>Inbeed, everything before the "Is this roughly wha...</title><content type='html'>Inbeed, everything before the "Is this roughly what you meant?" is what I meant, minus one tiny thing: spinning isn't wasteful enough for me to consider it a problem. I'm just saying it's useless at best.&lt;BR/&gt;&lt;BR/&gt;  Now, regarding all of your questions about readers and writers: I don't even think about it that way. A lock is a "primitive" (the mutex primitive) and the reader-writer scenario makes use of that primitive. Other primitives include semaphores (locks with counts), Events, and I'm probably forgetting some more.&lt;BR/&gt;&lt;BR/&gt; The point about spinning or not, is that it's implemented as part of a lock. The specific library implementation of a mutex lock can choose to do spinning or not before it goes from the failed cmpxchg to the WaitFor.&lt;BR/&gt;&lt;BR/&gt; And in that sense, whether the scenario involves reader/writer queues, or flat queue synchronisation, the implementation of the lock is the same.&lt;BR/&gt;&lt;BR/&gt; So to answer your question maybe a bit side ways, a lock will only call WaitFor if the cmpxchg fails. So a lock that is successfully acquired will fly like the wind through the lock code without ever diving into kernel. If not, it will call waitFor, which semantically speaking is &lt;I&gt;exactly&lt;/I&gt; what we want.&lt;BR/&gt;&lt;BR/&gt;  Lock prioritization is indeed a very important topic (re: ticket spinlocks), and is absolutely necessary in kernel in certain areas. There's also other things that OS's do to avoid system wide dead locks, specifically if a process obtains a lock on an object, be it a file or whatever, and then a higher priority process starts waiting on that same object... this requires the OS to detect a priority inversion etc. etc. You can look it up.&lt;BR/&gt;&lt;BR/&gt; Regarding your test, did the Elapsed time reduce by the same amount as the Context switch count? Because the test pretty much guarantees that the context switch count with no spin lock can only be greater than or equal to to that with spinning.&lt;BR/&gt;&lt;BR/&gt;quote "I think one problem that you might be addressing is that in the post I claimed a context switch was disasterous for performance, and you're trying to make sure that I see that it happens much more often and with less impact. Is that a fair statement?"/quote&lt;BR/&gt;&lt;BR/&gt; Bingo. Spinlocks are the smallest form of contribution to multithreaded code while conserving the most wow factor for programmers. Baffles me to this day.&lt;BR/&gt;&lt;BR/&gt; To answer you, 3% is indeed an improvement, if we are indeed talking about elapsed time. But, no human will ever notice a 3% difference. Ever. The only kind of optimization I've ever been paid to do has been 80% and above (more like 10 fold, 200 fold etc).&lt;BR/&gt; When you start playing with gains of 3%, you have to understand it's for NASA/LHC like projects.&lt;BR/&gt;&lt;BR/&gt; This being said, I never said spin locks should be removed from system libraries.&lt;BR/&gt;&lt;BR/&gt;&lt;BR/&gt;quote would you conclude that in a normal app that had some contention, Vance's use of a spinwait is justified for acquiring an internal lock and not just an edge case?&lt;BR/&gt;/quote&lt;BR/&gt;&lt;BR/&gt; The above response should be fair enough. I think it's not essential. Nor is it detrimental. It certainly wouldn't be something I would highlight to a group of peer programmers during a review.&lt;BR/&gt;&lt;BR/&gt; And last answer, re: more processors. The answer is: it all depends on your data/thread structure. It might not at all.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/1671344679630555930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/1671344679630555930'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222964460000#c1671344679630555930' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1997850167'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-5449143720575905018</id><published>2008-10-02T00:45:00.000-04:00</published><updated>2008-10-02T00:45:00.000-04:00</updated><title type='text'>anonymous [although I feel we&amp;#39;re getting to kn...</title><content type='html'>anonymous [although I feel we&amp;#39;re getting to know each other :-) ]:&lt;BR/&gt;&lt;BR/&gt;Please bear with me, I&amp;#39;m still not quite reaching your conclusion, even though I agree with many details of what you say. I do appreciate the depth you&amp;#39;re going at to keep me honest and to inform anyone reading these comments.&lt;BR/&gt;&lt;BR/&gt;In order to be clear, I want to provide some definitions I&amp;#39;ve had in my had while writing the post and these comments:&lt;BR/&gt;&lt;BR/&gt;Contention - The only critical part from a *spinning* perspective that matters is the time that a thread acquires the internal lock until it releases it. That is (in Vance&amp;#39;s code), setting the &amp;quot;myLock&amp;quot; variable from 0 to 1 and then doing bookkeeping and then setting it back to 0. If a thread can set the &amp;quot;myLock&amp;quot; to 1 on a single &amp;quot;lock cmpxchg&amp;quot; instruction, everyone is happy and there is no contention. If the &amp;quot;lock cmpxchg&amp;quot; failed because the compare value was not 0, but 1 (someone else had the *internal* lock), then we have contention.&lt;BR/&gt;&lt;BR/&gt;Bookkeeping - Here is the complete happy path on an AcquireReaderLock: &amp;quot;Interlocked.CompareExchange(ref myLock, 1, 0); if (owners &amp;gt;= 0 &amp;amp;&amp;amp; numWriteWaiters == 0) { owners++; break; } myLock = 0;}&amp;quot; This assumes the CompareExchange works by setting &amp;quot;myLock&amp;quot; to 1 because it was 0 before. In addition, it assumes no writers are active. All that is inside of this bookkeeping is a few compares and sets. Very quick stuff.&lt;BR/&gt;&lt;BR/&gt;-&lt;BR/&gt;&lt;BR/&gt;When I read &amp;quot;if your spinlock does succeed in avoiding a WaitFor call, it&amp;#39;s because the thread went to sleep and woke up during a quantum many many quantums down the road,&amp;quot; I interpret that as &amp;quot;your thread may think that all that is doing is spinning, but what is really happening is that it is being context-switched out while it is doing a run of &amp;#39;pause&amp;#39; assembly instructions&amp;quot;. The scheduler is then deciding other threads should run, and then it will be swapped back in and pick up right where it left off. This will burn a lot more time than the small spin-wait window it thinks it is using&amp;quot;&lt;BR/&gt;&lt;BR/&gt;Is this roughly what you meant?&lt;BR/&gt;&lt;BR/&gt;I fully agree with your WaitFor analysis. The kernel is in a much better position to wake threads up exactly when needed. However, it must be stressed that the &lt;B&gt;only&lt;/B&gt; time the lock will &lt;B&gt;ever&lt;/B&gt; use a WaitFor call is when readers are waiting for a writer to finish, a writer is waiting for readers or another writer to finish, or a reader is waiting to upgrade to a writer. &lt;BR/&gt;&lt;BR/&gt;Do you agree that the previously mentioned events are the only times WaitFor is called? If you don't agree with this, what other time do you see a WaitFor being called? If you do agree with this, then do you agree with the conclusion that if you have 5 active readers and you go to add another reader then there is no chance a WaitFor call will be issued since all that will happen is the internal lock will be entered and the total read count will be incremented and then the internal lock will be exited?&lt;BR/&gt;&lt;BR/&gt;In the situations where WaitFor (e.g. writer waiting for readers to finish), the scheduler will know exactly when it's reasonable to wake up a thread (as discussed in the post).&lt;BR/&gt;&lt;BR/&gt;I'll concede that "ticket spinlocks" are better suited for a kernel level spinlock than for user mode primitives. However, some things should be considered at least with respect to the decision if readers or writers are favored. Typically writers are favored (checked first) since they're assumed to be infrequent. &lt;BR/&gt;&lt;BR/&gt;At the very least, I think "ticket spinlocks" are cool at the kernel level and enjoyed learning about them :)&lt;BR/&gt;&lt;BR/&gt;--&lt;BR/&gt;&lt;BR/&gt;In my highly unscientific test on my dual-core machine with 2 threads with some lock contention (e.g. fighting for "myLock"), the spin-wait enhanced version of the lock squeazed out a 3% reduction in context switches vs a lock that just did a Sleep(0) on contentions. It's not huge by any means, but it saves some time.&lt;BR/&gt;&lt;BR/&gt;I think one problem that you might be addressing is that in the post I claimed a context switch was disasterous for performance, and you're trying to make sure that I see that it happens much more often and with less impact. Is that a fair statement?&lt;BR/&gt;&lt;BR/&gt;Do you at least see the opportunity of a 3% improvement as reasonable for an increase in performance by doing a spin-wait vs. nothing at all?&lt;BR/&gt;&lt;BR/&gt;Thus, in response to your statement of:&lt;BR/&gt;&lt;BR/&gt;"when I see that libraries like C# have spin locks implemented in them, all I can think is that either some programmer thought he was being awesome, or some big vendor had a 0.01% percentile case that required a spinlock and so they asked Microsoft for that code. "&lt;BR/&gt;&lt;BR/&gt;would you conclude that in a normal app that had some contention, Vance's use of a spinwait is justified for acquiring an internal lock and not just an edge case?&lt;BR/&gt;&lt;BR/&gt;Furthermore, would you agree that as more and more processors get added, there is a greater chance for an increase in performance increase by using spin-waits? &lt;BR/&gt;&lt;BR/&gt;--&lt;BR/&gt;&lt;BR/&gt;Regardless, these comments have shown that I'm still doing quite a bit of "handwaving"</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5449143720575905018'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5449143720575905018'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222922700000#c5449143720575905018' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-6337485608212787797</id><published>2008-10-01T11:43:00.000-04:00</published><updated>2008-10-01T11:43:00.000-04:00</updated><title type='text'>Jeff,&lt;br&gt;&lt;br&gt; you are right about the quantums rar...</title><content type='html'>Jeff,&lt;BR/&gt;&lt;BR/&gt; you are right about the quantums rarely expiring if all you do is 70ns long. And this &lt;I&gt;is&lt;/I&gt; the correct direction in which you should be thinking about this...&lt;BR/&gt;&lt;BR/&gt; Indeed, in all likelihood, if you regularly use up your entire quantum (like doing short CPU intensive tasks), then your spinlock might be anywhere inside the 20ms.&lt;BR/&gt;&lt;BR/&gt; However, this does not increase in any way your chances of having your other thread on a parallel processor using up the same quantum time slot.&lt;BR/&gt;&lt;BR/&gt; The reason why I was saying that thing about being pre-empted is this: the odds are &lt;I&gt;much&lt;/I&gt; higher that if your spinlock does succeed in avoiding a WaitFor call, it&amp;#39;s because the thread went to sleep and woke up during a quantum many many quantums down the road. &lt;BR/&gt;&lt;BR/&gt; So, if you did something like a trace/debug printf to a log file, and saw there that the spinlock &amp;quot;worked&amp;quot;, i.e. the WaitFor section of your code was not reached, then it would be much more likely that it was because your thread lost its quantum and started a new one.&lt;BR/&gt;&lt;BR/&gt; The thing that I just barely grazed in my previous posts is that calling WaitFor is potentially much more optimal, as the scheduler will now know exactly what this thread is waiting on. This means that it can for example release this thread right away when the object being waited on is released. This is no small advantage. Btw, this will also address the &amp;quot;fairness&amp;quot; issue that you mention, as modern operating system schedulers tend to be quite fair. If however, you need a queuing action like FIFO, why not simply use a queue? I call &amp;quot;handwaving&amp;quot; on that one.&lt;BR/&gt;&lt;BR/&gt; If you want to see context swaps and the likes, open up your &amp;quot;performances&amp;quot; mmc pane in Administrative Tools. In the &amp;quot;System Monitor&amp;quot; node, you can add counters for Thread &amp;gt; Elapsed Time and Thread &amp;gt; Context Switches /sec, and then set the source as &amp;quot;Select instances from list&amp;quot;. Your app will need to be already running.&lt;BR/&gt;&lt;BR/&gt; What I would recommend is doing a test like so: launch two threads that do the same thing. In an endless loop put three CPU intensive operations before, after and inside the lock area. At the end of the loop, call Sleep() with an adjustable value simulating frequency of use of your app (be it page hits, user requests etc).&lt;BR/&gt;  The CPU parts should be simple code (not Sleep(x) and not some function call to a big library). Ideally do for loop with three configurable upper limits, and a series of&lt;BR/&gt; float var += make_long_float();&lt;BR/&gt;&lt;BR/&gt; Where make_long_float is a series of floating point operations.&lt;BR/&gt;&lt;BR/&gt; Make sure the loops don&amp;#39;t call API calls like ReadFile etc, as these will for sure incur context switches. Just make it a plain vanilla integer/float escalator to nowhere...&lt;BR/&gt;&lt;BR/&gt; Let them run. The smaller the value the Sleep function is called with, the more CPU usage you will get. Then do your tests by varying the length of the three loops, hence the CPU load before, after and during the lock.&lt;BR/&gt;&lt;BR/&gt; See if by enabling a spinlock you can reduce the Context switches per sec, while improving or at least keeping the Elapsed Time value constant. If you actually find a way to do it, please share. But honestly, I&amp;#39;m ready to bet money that you can&amp;#39;t unless you make your loop sleep value be 0, and your thread work be very light: meaning that you come back to the scenario of doing a small task thousands of times per second with contention, which is indeed the prime if not sole candidate for taking advantage of a spin lock.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/6337485608212787797'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/6337485608212787797'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222875780000#c6337485608212787797' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-649775284'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-7761008294135694974</id><published>2008-10-01T08:10:00.000-04:00</published><updated>2008-10-01T08:10:00.000-04:00</updated><title type='text'>There have been some other comments on this post. ...</title><content type='html'>There have been some other comments on this post. &lt;BR/&gt;&lt;BR/&gt;Some on reddit &lt;A HREF="http://www.reddit.com/r/programming/comments/745ud/how_do_locks_lock/" REL="nofollow"&gt;have commented&lt;/A&gt; about ways to make reader-writer locks fairer by implementing a &lt;A HREF="http://lwn.net/Articles/267968/" REL="nofollow"&gt;ticket spinlock&lt;/A&gt; that mimics the "take a number" and "now serving" numbers you see at a deli. I thought this was a really cool way to make locks more FIFO.&lt;BR/&gt;&lt;BR/&gt;Some &lt;A HREF="http://barrapunto.com/articles/08/09/29/1856200.shtml" REL="nofollow"&gt;on barropunto&lt;/A&gt; discuss the post with more references to task scheduling in general.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7761008294135694974'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7761008294135694974'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222863000000#c7761008294135694974' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-7085708137940994123</id><published>2008-10-01T07:59:00.000-04:00</published><updated>2008-10-01T07:59:00.000-04:00</updated><title type='text'>anonymous:&lt;br&gt;&lt;br&gt;I think I see where you're comin...</title><content type='html'>anonymous:&lt;BR/&gt;&lt;BR/&gt;I think I see where you're coming from, but maybe am not quite there yet with your conclusion :)&lt;BR/&gt;&lt;BR/&gt;First, I think we both agree that we're only talking about multiprocessor machines (e.g. 2 or more parallel train tracks). As has been established, I fully agree with your assessment for a uniproc machine.&lt;BR/&gt;&lt;BR/&gt;I'm thinking that since full quantums might never be used (as you indicate), it seems that there is a good chance that the scheduler would not pre-empt in the middle of a 70-1100 ns spin since that'd occur within the scheduled timeslice (~20ms). If it did pre-empt at this point, I agree with your conclusion.&lt;BR/&gt;&lt;BR/&gt;I also agree that kernels don't have the preemption issues that usermode code does since they are in control.&lt;BR/&gt;&lt;BR/&gt;I think a test run is in order to understand this better. I &lt;A HREF="http://stackoverflow.com/questions/155903/how-to-detect-the-number-of-context-switches-that-occurred-while-running-c-code" REL="nofollow"&gt;asked a question on StackOverflow&lt;/A&gt; for help on how to measure the context switches while running a piece of code, but I think this will be hard to do. I'll see if I can try several approaches though.&lt;BR/&gt;&lt;BR/&gt;Do you see where I'm at with my point? Am I way off base to assume it's reasonable that a pre-emption might not occur if full quantums are rarely used?</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7085708137940994123'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7085708137940994123'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222862340000#c7085708137940994123' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-4504010160256673456</id><published>2008-10-01T07:43:00.000-04:00</published><updated>2008-10-01T07:43:00.000-04:00</updated><title type='text'>Perry Hertler:&lt;br&gt;&lt;br&gt;I do the double check on the...</title><content type='html'>Perry Hertler:&lt;BR/&gt;&lt;BR/&gt;I do the double check on the upgraded reader example. I don't do it on the Monitor based example because it doesn't seem to be necessary so long as any look at a sensitive data structure is inside the lock.&lt;BR/&gt;&lt;BR/&gt;Often, it seems that when I need to double check, it's against a null on the data structure itself and not an item in a data structure.&lt;BR/&gt;&lt;BR/&gt;One interesting thing about this is that the old collections (e.g. ArrayList) that had a SyncRoot object (before it was decided in 2.0 to &lt;A HREF="http://blogs.msdn.com/brada/archive/2003/09/28/50391.aspx" REL="nofollow"&gt;remove it&lt;/A&gt;). It's interesting how they got around the double-checking issue which saved an object allocation if this was never used:&lt;BR/&gt;&lt;BR/&gt;if (this._syncRoot == null)&lt;BR/&gt;{&lt;BR/&gt;   Interlocked.CompareExchange(ref this._syncRoot, new object(), null);&lt;BR/&gt;}&lt;BR/&gt;&lt;BR/&gt;I only bring this up because it touches on some similar concepts as the post itself.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/4504010160256673456'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/4504010160256673456'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222861380000#c4504010160256673456' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-314962251570706302</id><published>2008-09-30T15:15:00.000-04:00</published><updated>2008-09-30T15:15:00.000-04:00</updated><title type='text'>Just to something which might make it clear:&lt;br&gt;&lt;b...</title><content type='html'>Just to something which might make it clear:&lt;BR/&gt;&lt;BR/&gt; Kernels don't have this issue with spin locks because they are not time sliced. A kernel runs right now, ie if it is running, the kernel knows that it is running since it itself does the scheduling of things. That is: if I ever check a spinlock and find that it's busy, I &lt;I&gt;know&lt;/I&gt; for fact that another CPU currently has it and is running it (from the mere fact that kernel locks are reserved for kernel use). This means that I &lt;I&gt;know&lt;/I&gt; without a shadow of a doubt that the other CPU is currently working on the operation itself. I know this because I know that both I and the other "thread" are kernel code executing and that we are not being pre-emptively scheduled.&lt;BR/&gt;&lt;BR/&gt; I put "thread" in quotes there because there are no such things as threads as we are used to them in Userland in the kernel. There's threads only of execution.&lt;BR/&gt;&lt;BR/&gt; This means that if I check the cmpxchg and find the lock's taken, I know that right across from my rail tie, on the other rail, the same tie is doing this thing. This is guaranteed - unless hardware interrupts occur, but no need to get into that.&lt;BR/&gt;&lt;BR/&gt; Notice how many times I insisted on the word "know" in that first paragraph. There are many many assumptions you make when you use a spinlock and they are almost never true in user mode.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/314962251570706302'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/314962251570706302'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222802100000#c314962251570706302' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1966456503'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-5562086400984186320</id><published>2008-09-30T14:59:00.000-04:00</published><updated>2008-09-30T14:59:00.000-04:00</updated><title type='text'>Jeff, &lt;br&gt;&lt;br&gt; the point about the spinlock is tha...</title><content type='html'>Jeff, &lt;BR/&gt;&lt;BR/&gt; the point about the spinlock is that even if the operation you are doing (in this case bookkeeping) is light, the thing that implements it is not: namely threads and their associated context switches.&lt;BR/&gt;&lt;BR/&gt; Think of it this way, you have two parallel train tracks from NYC to SF, with the rail ties representing each quantum or slice of time the processor allocates. The whole distance might be, say, 5 seconds of CPU time.&lt;BR/&gt;&lt;BR/&gt; What I'm saying is that for any sort of spinlock scenario to have any effect at all, both the thread currently doing the work and the thread waiting on to do the work by spinning would have to be on two ties at the same distance on both tracks.&lt;BR/&gt;&lt;BR/&gt; A crude figure:&lt;BR/&gt;thread 1: SF ------------------|-----------...--- NYC&lt;BR/&gt;thread 2: SF ------------------|-----------...--- NYC&lt;BR/&gt;&lt;BR/&gt; I'm also saying that the odds of that happening are extremely low, given that in 5 seconds of user time your app will likely use 100 ms of CPU time. Chances are that one rail tie will be allocated in NJ, one in Ohio, and another in Vegas.&lt;BR/&gt;&lt;BR/&gt; Like so:&lt;BR/&gt;&lt;BR/&gt;thread 1: SF ------------------|------------NYC&lt;BR/&gt;thread 2: SF------------|----------|--------NYC&lt;BR/&gt;&lt;BR/&gt; (keep in mind there's millions of rail ties)&lt;BR/&gt; If we agree on this, then you already agree with what I'm saying but you don't realize it yet. The reason is: let's say I'm the first thread and I'm doing my work in the span of the rail tie in NJ. I start my work, and then enter the lock using cmpxchg, and start updating some minor lock variables. No need for spinning yet. Now, as I'm doing this, I get pre-empted. So, even though that thread isn't aware that it has stopped, it in reality has stopped, and the next allocation comes in Vegas. (this is the most likely scenario for a contention to ever occur).&lt;BR/&gt;&lt;BR/&gt; Now, along comes a different thread in Ohio. It starts doing the same work, enters the lock area, does a cmpxchg and realizes that the lock is busy. It thinks, oh, it's busy. But I know what gets done next is only 5 instructions long and then the lock is immediately released, so if I spin for 200 instructions, I'm guaranteed to get it the next time. Right? Well, not so: that other thread/rail tie is scheduled to be in Vegas only. Long long ways ahead. So this rail tie in Ohio, no matter how much it knows that the operation is short, can not possibly hope to spin long enough without itself getting pre-empted and yet live to see the freeing of the lock.&lt;BR/&gt;&lt;BR/&gt; Not unless, as I've been saying, both threads &lt;I&gt;just happen&lt;/I&gt; to have been scheduled on the single same rail tie from NYC to SF.&lt;BR/&gt;&lt;BR/&gt; What will likely happen is one of two things: the Ohio operation will spin and spin, and eventually get pre-empted. Get scheduled for say, Irvine, and then when it "wakes up" (of course itself never realizing it went to sleep), all of a sudden the spin lock will be freed. However, to think the spin lock did the job here is naive. The job got done by the scheduler, and there was a context switch involved. The spin lock only wasted time.&lt;BR/&gt;The other scenario is that the spin will expire, think "ok, it's time to call a heavy function like WaitFor", and voluntarily go to sleep. Probably to awaken right after Vegas this time.&lt;BR/&gt;&lt;BR/&gt; The thing to keep in mind is that the scheduler is not dumb, so while the first scenario will be seen as a simple premption, and there will be no cause to schedule it at any particular time (for example, it might be rescheduled multiple times before the lock is freed, or conversely scheduled long after the lock is freed), the second scenario will be much more efficient: the scheduler will probably wake up the thread right after the vegas operation is done, because it will know that the Ohio thread is specifically waiting for the Vegas one.&lt;BR/&gt;&lt;BR/&gt; Images images, if only I had a wait to post an image, this would be way simpler and much less verbose to explain.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5562086400984186320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5562086400984186320'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222801140000#c5562086400984186320' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-411121112'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-6751356127430876386</id><published>2008-09-30T13:22:00.000-04:00</published><updated>2008-09-30T13:22:00.000-04:00</updated><title type='text'>Jeff,&lt;br&gt;&lt;br&gt;Thanks for the clarification. &lt;br&gt;&lt;br...</title><content type='html'>Jeff,&lt;BR/&gt;&lt;BR/&gt;Thanks for the clarification. &lt;BR/&gt;&lt;BR/&gt;Also, thanks for not adding double-checked locking to your example:)</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/6751356127430876386'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/6751356127430876386'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222795320000#c6751356127430876386' title=''/><author><name>Perry Hertler</name><uri>http://www.blogger.com/profile/16787915862093216541</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-326356306'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-3143212417695660542</id><published>2008-09-30T12:56:00.000-04:00</published><updated>2008-09-30T12:56:00.000-04:00</updated><title type='text'>Perry Hertler:&lt;br&gt;&lt;br&gt;You're absolutely right that...</title><content type='html'>Perry Hertler:&lt;BR/&gt;&lt;BR/&gt;You're absolutely right that in many applications, a normal Monitor based lock is a great choice that could be faster too (less bookkeeping). This is especially true if the probability of a write is close to the probability of a read or if the chance for lock contention is low. &lt;BR/&gt;&lt;BR/&gt;I also agree that my example might be too complex. I was trying to be complete, but you're right that it might obscure the normal case.&lt;BR/&gt;&lt;BR/&gt;One might argue that a reader/writer lock that is almost as fast as a Monitor and uses a clean syntax (like note 1 hints at), then you could make your code more declarative by expressing what the block of code actually needs (a read or a write).&lt;BR/&gt;&lt;BR/&gt;The reader/writer lock could add more complexity than it is worth and careful thought and perfmon-counter-watching should be given to see if lock contention is a real issue.&lt;BR/&gt;&lt;BR/&gt;I suppose I was trying to focus on more of the idea of locks and roughly how they work rather than discussing when one should use a particular type of locks. &lt;BR/&gt;&lt;BR/&gt;Did this help clear anything up?&lt;BR/&gt;&lt;BR/&gt;Thanks for your comment!</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/3143212417695660542'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/3143212417695660542'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222793760000#c3143212417695660542' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-8939171749223745216</id><published>2008-09-30T12:38:00.000-04:00</published><updated>2008-09-30T12:38:00.000-04:00</updated><title type='text'>anonymous: &lt;br&gt;&lt;br&gt;Thanks for the follow-up! I rea...</title><content type='html'>anonymous: &lt;BR/&gt;&lt;BR/&gt;Thanks for the follow-up! I really appreciate the depth that you give in your response. &lt;BR/&gt;&lt;BR/&gt;One thing that I'm having difficulty following your reasoning is that the spin-wait is only for a very small amount of time to update bookkeeping of lock data. It's not actually used to protect the code that a user of a lock would be running. When I wrote the post, I don't think I made this clear enough. Based off comments like these, I should probably update the post.&lt;BR/&gt;&lt;BR/&gt;I assume the lock bookkeeping can be done within a hundred instructions on the "happy path." (e.g. checking for writers, updating reader counts, etc). Since the bookkeeping is small, it seems like it makes sense to go into a spin-wait if somehow this small window of bookkeeping is holding the lock.&lt;BR/&gt;&lt;BR/&gt;If it's highly improbable that two cores will be executing the same piece of code (and it is on the lightly loaded machine), then that seems like a strong argument that lock contention will also be low and thus there will be no need to spin at all (e.g. assume you're just adding another reader to a pool of readers).&lt;BR/&gt;&lt;BR/&gt;To summarize, the spin-wait will only be used while waiting on the *internal* lock that just protects the reader count, writer count, along with the event handles for signaling readers and writers. That's roughly it (maybe a count and event for upgrades as well). &lt;BR/&gt;&lt;BR/&gt;If you focus on just this area of internal bookkeeping where spin-waiting is occurring, do you still think that it doesn't make sense to use a spin-wait here? &lt;BR/&gt;&lt;BR/&gt;Looking forward to future discussions. I'm definitely realizing how much I still don't know :)&lt;BR/&gt;&lt;BR/&gt;P.S. Thanks for the INT 2Eh tip!</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8939171749223745216'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8939171749223745216'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222792680000#c8939171749223745216' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-4226146609298640709</id><published>2008-09-30T12:01:00.000-04:00</published><updated>2008-09-30T12:01:00.000-04:00</updated><title type='text'>Excellent post Jeff!&lt;br&gt;&lt;br&gt;One comment: this is a...</title><content type='html'>Excellent post Jeff!&lt;BR/&gt;&lt;BR/&gt;One comment: this is a great optimization IF NEEDED. Most applications can get away with simply using a regular lock.&lt;BR/&gt;&lt;BR/&gt;private readonly Dictionary&amp;lt;string, NotExpensiveObject&amp;gt; expensiveObjectCache&lt;BR/&gt;            = new Dictionary&amp;lt;string, NotExpensiveObject&amp;gt;();&lt;BR/&gt;&lt;BR/&gt;        private readonly object notExpensiveObjectCacheLock = new object();&lt;BR/&gt;&lt;BR/&gt;        public NotExpensiveObject GetExpensiveObject(string key)&lt;BR/&gt;        {&lt;BR/&gt;            lock (notExpensiveObjectCacheLock)&lt;BR/&gt;            {&lt;BR/&gt;                NotExpensiveObject notExpensiveObject;&lt;BR/&gt;&lt;BR/&gt;                if (!expensiveObjectCache.TryGetValue(key, out notExpensiveObject))&lt;BR/&gt;                {&lt;BR/&gt;                    notExpensiveObject = new NotExpensiveObject();&lt;BR/&gt;                    expensiveObjectCache[key] = notExpensiveObject;&lt;BR/&gt;                }&lt;BR/&gt;                return notExpensiveObject;&lt;BR/&gt;            }&lt;BR/&gt;        }&lt;BR/&gt;&lt;BR/&gt;Thoughts?&lt;BR/&gt;&lt;BR/&gt;I point this out because I don&amp;#39;t like to see code prematurely optimized especially if a lot of complexity is involved.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/4226146609298640709'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/4226146609298640709'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222790460000#c4226146609298640709' title=''/><author><name>Perry Hertler</name><uri>http://www.blogger.com/profile/16787915862093216541</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-326356306'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-5869636128924173425</id><published>2008-09-30T08:40:00.000-04:00</published><updated>2008-09-30T08:40:00.000-04:00</updated><title type='text'>Just as a parenthesis: a case can be made that may...</title><content type='html'>Just as a parenthesis: a case can be made that maybe some call into your application very quickly needs to acquire a lock, and thus, that it would be inappropriate to have the thread lose its current time slice immediately to a sleep (as opposed to spinning at least a while in the hopes that it could avoid a context switch). That is: some operation causes your application to be summoned, thread (a) gets called, is scheduled, eventually runs, and the first thing it does is go to sleep, only to later resume its actual task.&lt;BR/&gt;&lt;BR/&gt; However, I posit that if your thread is taking a lock just as it starts, then the app isn't properly designed. If semantically, the lock is the first thing that the thread needs to acquire, then the thread &lt;I&gt;is&lt;/I&gt; the lock. ie. the thread needs to be queued (via kernel APCs, io completion ports, worker thread queues) or it needs to be waiting on the lock to begin with, or the thread needs to be used as a lock for other performers of the operation.&lt;BR/&gt;&lt;BR/&gt; I readily admit that above paragraph is highly vague and not very rigorous... it's just food for thought.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5869636128924173425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/5869636128924173425'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222778400000#c5869636128924173425' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-957864393'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-7468880779572634676</id><published>2008-09-30T08:25:00.000-04:00</published><updated>2008-09-30T08:25:00.000-04:00</updated><title type='text'>anonymous: Wow! I wish I could thank you by name f...</title><content type='html'>anonymous: Wow! I wish I could thank you by name for your very thoughtful and detailed comment.&lt;BR/&gt;&lt;BR/&gt; My pleasure. To answer you...&lt;BR/&gt;&lt;BR/&gt;quote: "In an ideal case, the code that the lock is being used to protect is really small too and no WaitFor... calls are made because the spin wait used enough CPU to let the thread exit the internal lock, do its real work, and then update the lock. However, I have no data to know how often this occurs." /quote&lt;BR/&gt;&lt;BR/&gt; &lt;BR/&gt; There's a subtle gap in that logic. And that gap is that you are talking about two threads in the same sentence. Namely: the thread (a) that does the spinning (waiting) and the thread (b) that "exit[s] the internal lock". These are by the very design of the thing two separate threads.&lt;BR/&gt;&lt;BR/&gt; So then comes the thought experiment of how can we actually make two threads simultaneously execute. The answer is actually deceptively boring: either we have multiple cores/cpus, or we can't. Now, to come back to the original post I made, even if we do have multiple cores, the odds of the two threads running through a section of code which usually takes nanoseconds to execute is excessively small (like 1 in 10^6 small). More on that later.&lt;BR/&gt;&lt;BR/&gt; However, it is much more likely that the threads &lt;I&gt;appear&lt;/I&gt; to be running in parallel but only so because of the magical aspect of the scheduler as seen in the usermode. As far as kernel mode is concerned, whether thread (a) goes to sleep willingly by calling WaitFor..., yields willingly by calling Sleep(0), or is simply pre-empted, the end result is the same: a context switch occurs, the thread state is swapped out of the registers and stored somewhere and the scheduler subsequently (perhaps up to tens of timeslices later)  resumes this thread. The performance hit is done. &lt;BR/&gt;&lt;BR/&gt; Now, for thread (b) to ever really be able to release the lock while (a) is spinning, it needs to be executing that very same instant. Think about this: the spin is maybe a few hundred cycles long, a CPU executes &lt;I&gt;billions&lt;/I&gt; of cycles per second.&lt;BR/&gt; What I'm saying is that the odds that both threads were scheduled to run even on seperate CPUs during the same 20ms time slice is quite small: ie, odds are much higher that thread (a) gets scheduled 23ms later, or 45ms before. Or even that thread (b)'s time runs out or even yields because of a function call it makes that internally uses some obscure other system call during its lock portion.&lt;BR/&gt;&lt;BR/&gt; The end result is always the same, while in user mode it might appear that thread (a) only spun a few times and never "dove" into kernel mode WaitFor..., the reality is probably that it actually did get context switched by the kernel - just as expensive as willingly calling WaitForXxxObject.&lt;BR/&gt;&lt;BR/&gt; To come back to the above "more on that" statement, if your code somehow does something maybe say 1000 times a second, and you have three simulatenous threads doing this 1000 time a second thing, you are now increasing your odds of a collision somewhat and yes, in this case, it is possibly useful to use spin locks. Possibly. Seriously, I can only think of industrial grade applications like SQL Server or some DTC coordinator or something that would need this.&lt;BR/&gt;&lt;BR/&gt; In all of this, I'd like to make it clear that you are certainly not alone in this "handwaving" as you call it, as spin locks are quite a few places in the wild, even in libraries like C#. When I see that libraries like C# have spin locks implemented in them, all I can think is that either some programmer thought he was being awesome, or some big vendor had a 0.01% percentile case that required a spinlock and so they asked Microsoft for that code. In any case though, it is IMO entirely useless for normal applications for the reasons I've stated above. That being said, spinlocks aren't really a performance issue in themselves to make this a problem for anyone...&lt;BR/&gt;&lt;BR/&gt;PS. I notice a typo in my first post: "Once again, they are only useful on multi processor systems, otherwise they are simply normal locks (read: the critical section is useless)". I meant to say "Once again, they are only useful on multi processor systems, otherwise they are simply normal locks (read: the &lt;B&gt;spinlock&lt;/B&gt; is useless)"&lt;BR/&gt;&lt;BR/&gt;PPS. The "GOTO" or similar statement you posit the existence of is in indeed the "int 2E" opcode, which triggers a software interrupt which the OS handles by switching to kernel mode. It is the entry point for the Native API.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7468880779572634676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7468880779572634676'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222777500000#c7468880779572634676' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-502230592'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-8879489510281981054</id><published>2008-09-29T22:42:00.000-04:00</published><updated>2008-09-29T22:42:00.000-04:00</updated><title type='text'>anonymous: Wow! I wish I could thank you by name f...</title><content type='html'>anonymous: Wow! I wish I could thank you by name for your very thoughtful and detailed comment.&lt;BR/&gt;&lt;BR/&gt;I agree that most of the time a thread doesn't use its full quantum. &lt;BR/&gt;&lt;BR/&gt;I also agree that on a single processor system, spin-waits are useless. I didn't emphasize it, but the code I posted only enters a spinwait if there is more than one processor. Therefore, it will (exactly like you said) immediately sleep and get swapped out and allow the releasing thread to run.&lt;BR/&gt;&lt;BR/&gt;I now realize that I was a bit misleading in the post. Spinlocks are only used to protect the internal lock variables and not much else. If the thread needs to wait for anything, it does a WaitFor... call that would have it most likely be be swapped out.&lt;BR/&gt;&lt;BR/&gt;When I was imagining the spin-wait, I was assuming that the spin-locked section would be very small (on the order of a few instructions) to update the bookkeeping of the lock variables (e.g. number of writers waiting). &lt;BR/&gt;&lt;BR/&gt;In an ideal case, the code that the lock is being used to protect is really small too and no WaitFor... calls are made because the spin wait used enough CPU to let the thread exit the internal lock, do its real work, and then update the lock. However, I have no data to know how often this occurs.&lt;BR/&gt;&lt;BR/&gt;I suppose I just made assumptions that since the .NET framework ReaderWriterLockSlim performed things a certain way that there is a good reason. I'd assume that on a heavily loaded multiprocessor system, user mode spin-waits are worth it.&lt;BR/&gt;&lt;BR/&gt;However, I'm now thinking that this is a "handwavy" response. What do you think? I'd love to continue the discussion.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8879489510281981054'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/8879489510281981054'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222742520000#c8879489510281981054' title=''/><author><name>Jeff Moser</name><uri>http://www.blogger.com/profile/16074905903060665396</uri><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://1.bp.blogspot.com/_Zfbv3mHcYrc/SLDM--5fn8I/AAAAAAAAA1w/EZtLwWvYhdI/S220/facebook+beard2.jpg'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-252333216'/></entry><entry><id>tag:blogger.com,1999:blog-6800934446457898793.post-7428319803117690249</id><published>2008-09-29T21:11:00.000-04:00</published><updated>2008-09-29T21:11:00.000-04:00</updated><title type='text'>I like your post. It talks about a nice area which...</title><content type='html'>I like your post. It talks about a nice area which I find mentally enjoyable...&lt;BR/&gt;&lt;BR/&gt; However, I would like to point out two things for the sake of being a stickler, and just maybe make you see locks in a different way at the same time.&lt;BR/&gt;&lt;BR/&gt; The first thing is that cmpxchg is the essence of any synchronization object on any system. lock cmpxchg is even more powerful than that in that it locks the entire memory  bus (and makes an operation atomic across multiple processors). Obviously, the lock prefix can be a huge performance hit (potentially ruining L2 caches etc), but the cmpxchg instruction really isn't. It's lightning fast.&lt;BR/&gt;&lt;BR/&gt; That being said, locks are not magical mythical beasts. True user mode locks are implemented in kernel mode using a combination of the aforementionned cmpxchg operand, and an active cooperation of the scheduler itself. They are simply features of the scheduler: do the cmpxchg, if the lock is taken, schedule the thread accordingly, otherwise move on. Nothing more.&lt;BR/&gt;&lt;BR/&gt; In a sense, it's all about higher authority: Kernel mode rules User mode, just as service rules client. As a user, you are at the whim of what the kernel decides. Spinlocks in that sense should &lt;I&gt;only ever&lt;/I&gt; exist in kernel mode - and yes, there's an exception which I will get to. Why is this? Because in kernel mode, you can legitimately require a lock that is kernel wide, ie: outside the realm of a scheduler. In this case, when there is no higher authority left, all you can do is pit the multiple processors against each other and waste resources. (Note that spinlocks are &lt;B&gt;entirely&lt;/B&gt; useless on single processor systems).&lt;BR/&gt; The only legitimate use of a spin lock is for Critical Section object which are part of the Win32, or similar implementations of them. Once again, they are only useful on multi processor systems, otherwise they are simply normal locks (read: the critical section is useless). The other good advantage of critical sections is that they can be non blocking: ie you can try and acquire them and return immediately if they're already taken (this without going to kernel).&lt;BR/&gt;&lt;BR/&gt; The reason I'm dwelling so much on a spinlock is this: for a spin lock to become freed (during the spinning motion), another thread &lt;I&gt;has&lt;/I&gt; to release the lock. Now, on the off chance that another processor is running your release code at the very same moment that the spinlock is spinning, you might actually get a chance to get a break. However, &lt;I&gt;in any other case&lt;/I&gt; (and I do mean most other cases since a 70 nanosecond spin is quasi singular if your time slice is generally no shorter than 20ms on NT/2k/xp), even a successful spinlock - which from usermode looks like it never called the WaitFor... API, and hence never blocked - there will necessarily have been a context switch by the kernel from your code to a different thread which releases the lock, and then a subsequent rescheduling of your spin lock thread.&lt;BR/&gt;&lt;BR/&gt; Now, if you go back to what I was pointing out in the beginning, a real lock is only a feature of the scheduler anyways. So, if I go back to this statement:&lt;BR/&gt;&lt;I&gt;"By not being able to satisfy the request with a spin wait, the thread gives up and goes to sleep. This requires intervention by the OS and could be more than a hundred thousand times slower."&lt;/I&gt;&lt;BR/&gt;&lt;BR/&gt; My problem with that statement is simply that if your spinlock succeeded, odds are very high that it succeeded &lt;I&gt;because&lt;/I&gt;: a) your thread got pre-empted while spinning (ie. went to sleep), b) the releasing thread got task switched in, c) the scheduler re-context switched your thread back in.&lt;BR/&gt;&lt;BR/&gt; If you don't believe the odds of what I'm saying, try this math out: open any application (such as firefox or whatever) which you've been using for hours in taskmanager. Look at the total CPU time it's used. It will be in the mere seconds unless you are processing video. Seconds that are scheduled at best in 20ms chunks. 20ms chunks that are rarely if ever fully used up. So you've got over several hours of usage, a few seconds of actual CPU time, spread over small spots of 20ms chunks... how high do you think your odds of hitting two concurrent pieces of code on a 70 &lt;I&gt;nano&lt;/I&gt; second long cross section are?&lt;BR/&gt;&lt;BR/&gt; CmpXchg is great for quickly checking if a lock is taken. Spin locks are wholly useless unless you are positive (after many a profile and stress test) that you somehow manage to get many CPUs running your threads during the same 20ms timeslice, and lighting up at the same lock during the same 70ns duration.&lt;BR/&gt;&lt;BR/&gt;&lt;BR/&gt; Anyways, my two cents. It always helps me to remember that even the most mythical beasts like APCs and IO ports are simple simple things in kernel mode masquerading as complex elves and wizards in userland.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7428319803117690249'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6800934446457898793/3257292245398828856/comments/default/7428319803117690249'/><link rel='alternate' type='text/html' href='http://www.moserware.com/2008/09/how-do-locks-lock.html?showComment=1222737060000#c7428319803117690249' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image xmlns:gd='http://schemas.google.com/g/2005' rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img1.blogblog.com/img/blank.gif'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://www.moserware.com/2008/09/how-do-locks-lock.html' ref='tag:blogger.com,1999:blog-6800934446457898793.post-3257292245398828856' source='http://www.blogger.com/feeds/6800934446457898793/posts/default/3257292245398828856' type='text/html'/><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='blogger.itemClass' value='pid-1871081153'/></entry></feed>
