tag:blogger.com,1999:blog-5156117976307354498.post3472534120363596089..comments2022-12-05T00:02:41.481-08:00Comments on Will Portnoy: Lessons Learned from Implementing PaxosWill Portnoyhttp://www.blogger.com/profile/04506929546376704542noreply@blogger.comBlogger9125tag:blogger.com,1999:blog-5156117976307354498.post-17824863960230265132015-03-20T23:39:02.137-07:002015-03-20T23:39:02.137-07:00Also: "The distributed state machine approach...Also: "The distributed state machine approach is one option for fault tolerance across replicas (vector clocks are another, for example)."<br /><br />What? What has one got to do with the other? Paxos provides consensus and the other provides partial ordering of events.Anonymoushttps://www.blogger.com/profile/00592227112504581670noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-24570797203366067192015-03-20T23:37:50.399-07:002015-03-20T23:37:50.399-07:00Why didn't you use Windows Fabric (I believe L...Why didn't you use Windows Fabric (I believe Lync and DocumentDB use it). It also provides high availability and fault tolerance with a Paxos implementation. Or is this a case of Microsoft internal politics? ;)Anonymoushttps://www.blogger.com/profile/00592227112504581670noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-12334485408972069662015-03-08T17:25:18.975-07:002015-03-08T17:25:18.975-07:00Thanks for the writeup - now studying all the refe...Thanks for the writeup - now studying all the referenced things.<br /><br />P.S. Code bit rendering looks weird in Chrome/Safari<br />https://www.dropbox.com/s/4gidr3lc94jjz0f/Screenshot%202015-03-09%2002.20.24.png?dl=0Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-19369262436800374422014-06-16T16:32:06.670-07:002014-06-16T16:32:06.670-07:00Thanks for your replyThanks for your replyAnonymoushttps://www.blogger.com/profile/10003408322319027861noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-43827952608695650522014-06-16T15:57:05.837-07:002014-06-16T15:57:05.837-07:00If you're using a managed language, you can ob...If you're using a managed language, you can observe failure to execute by an uncaught exception. If your language will take down the process due to executing a command (e.g an access violation), you might consider a watchdog process and a poison message queue, keeping track of the failure to execute. Of course, that comes with costs.<br /><br />Generally, you pass enough of the transaction through the state machine so that all replicas will pass or fail together - there shouldn't be per-replica execution failures beyond those considered to be replica failures.Will Portnoyhttps://www.blogger.com/profile/04506929546376704542noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-87415403306915977272014-06-16T11:14:55.054-07:002014-06-16T11:14:55.054-07:00Hi, Thanks for your nice write-up. I was wonderi...Hi, Thanks for your nice write-up. I was wondering if you can please share your<br />thoughts on the following.<br /><br />Assume a scenario where the state change was successfully added to the log.<br />However, when it was time to apply the change to the state machine, there was<br />persistent failure. Did you run into any situations as above and if so I would be<br />interested in your thoughts about how you handled them. If there is persistent<br />failure, to me one option appears to be to exit the cluster and the resync the entire<br />data from a surviving node.<br /><br />I was trying to figure out how to use paxos if I want to implement replication of<br />transactions. Lets say we treat each transaction as a single operation from<br />paxos stand-point (a logical operation such as an insert, delete). Each paxos<br />operation when applied to the state machine, could result in multiple operations<br />on the state machine (insert into a table, update an index, etc.). This transaction<br />could fail for any number of of reasons. Hence, I was trying to figure out how<br />would one ensure that the state machine is completely identical even in the midst<br />of persistent failure. The only option I see is that the "faulty" nodes exit from the cluster<br />and resyncs the entire state. Thanks for your time.Anonymoushttps://www.blogger.com/profile/10003408322319027861noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-60519965560555526492013-11-01T22:10:22.326-07:002013-11-01T22:10:22.326-07:00You can truncate the paxos log (containing full tr...You can truncate the paxos log (containing full transactions) when you periodically snapshot your state to some external durable store (which you want to do anyway, instead of transmitting and replaying the entire paxos log from the initial position when starting new replicas).Will Portnoyhttps://www.blogger.com/profile/04506929546376704542noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-22647309820815999992013-10-02T09:04:36.148-07:002013-10-02T09:04:36.148-07:00If each entry in the paxos log is a full transacti...If each entry in the paxos log is a full transaction, wouldn't said log file end up unwieldy?Anonymoushttps://www.blogger.com/profile/05700083816158528846noreply@blogger.comtag:blogger.com,1999:blog-5156117976307354498.post-22314861649123604962012-07-05T14:40:01.512-07:002012-07-05T14:40:01.512-07:00Thanks for this post. Very interesting read.Thanks for this post. Very interesting read.Anonymoushttps://www.blogger.com/profile/03064509940903761673noreply@blogger.com