Enron Mail |
Sunday, November 26, at approximately 10:30 CST, EIGRP (routing protocol)
neighbor connections between the Portland router and the Houston router began to timeout. The frequency of this occurrence was mentioned in the previous email from Phillip Platter. The problem cleared up at 16:56 CST. When debugging the problem, all test results pointed to a problem on the EIN. However when the EBS NOC was contacted, they informed us that there had been no known network issues or scheduled changes. Because of what I experienced while debugging, I had an EBS network engineer come to my office this morning to walk thru the historical log data of physical devices within the Houston to Portland path. The problem was finally pinpointed to an EIN DS3 link . A GSR router to which this link was attached was losing OSPF neighbor connections with it's peer on the other side of the link due to a bouncing interface, thus causing the EIGRP neighbor connectivity errors. Realizing this is a critical link on which 24x7 trading is conducted, the following actions items will be addressed: ? We will be implementing our own monitoring device (currently only able to monitor up/down/response time) on the edge of the EIN in an attempt to more proactively monitor the path being utilized. ? I will verify with EBS that all appropriate devices are being monitored by the NOC and that a notification process be implemented (should include both problem and change notification). ? I will push the issue of gaining access to the EIN edge router. This will allow us to more efficiently troubleshoot/manage/monitor EIN link status and errors. Knowing the error type and frequency will allow for quicker determination on severity and, therefore, implementation of secondary solutions. Arshak, Because of the following concerns, it appears to me that the NOC is not appropriately monitoring EIN devices and, in return, negatively impacting Enron business. I am copying you on this note because I need your assistance in identifying a contact for the NOC in which to communicate these concerns and to work with in implementing processes to help alleviate future issues. Concerns: ? Time required to fix this problem (6.5 hours - I think it finally cleared up on its own) ? The lack of notification or knowledge on the problem ? The erroneous information being communicated to the customer (Enron Net Works) . Keith -----Original Message----- From: Dietrich, Dan Sent: Monday, November 27, 2000 9:35 AM To: Belden, Tim; O'Neil, Murray Cc: Cox, Chip; Willigerod, Diana; Abu-Khalaf, Sammy; Bruce Smith/HOU/ECT@ENRON; Poston, David; Nat, Steve; Dziadek, Keith; Richter, Jeff; Forney, John; Guzman, Mark; Causholli, Monika Subject: IT problems Sunday Tim and Murray, Yesterday the real-time desk experienced dropped Terminal Server (TS) sessions which impacted their productivity. Diana and Chip contacted the network team in Houston to assist with troubleshooting the problem. Keith Dziadek monitored the DS3 VPN connection between Portland and Houston and has determined that it is the dynamic fail-over within the Enron Intelligent Network (EIN) which is causing the dropped TS sessions. This dynamic fail-over is a good thing. Our issue is that the sensitivity of the TS sessions is so high that the time of the dynamic fail-over on the EIN causes them to drop while all other applications (i.e. Enpower running locally in Portland) are not affected. Keith is meeting with his team in Houston as well as staff from EBS today to form an action plan. Once the action plan has been established, Keith or I will communicate what and when it will be done to you via email. You can reach me on my cel if you have any questions. Regards, Dan Dietrich Cel: 403-818-0815 ---------------------- Forwarded by Diana Willigerod/PDX/ECT on 11/26/2000 03:33 PM --------------------------- From: Phillip Platter 11/26/2000 03:19 PM To: Jeff Richter/HOU/ECT@ECT, Tim Belden/HOU/ECT@ECT, John M Forney/HOU/ECT@ECT, Murray P O'Neil/HOU/ECT@ECT cc: Mark Guzman/PDX/ECT@ECT, Monika Causholli/PDX/ECT@ECT, Diana Willigerod/PDX/ECT@ECT, Chip Cox/PDX/ECT@ECT, Paul Kane/PDX/ECT@ECT Subject: IT problems Sunday Please be advised of a serious problem with Terminal Server! We were all kicked out of Terminal Server consistently on Sunday. At first it happened about every half hour (9am to 10:30a.m). Then it happened every 5 minutes. Communication seemed slow. We paged and called the Portland IT Team at 11:30 (Diana & Chip). At 12noon Diana said Chip and Paul Kane were working on it. At around 12:30 Paul Kane called back and said "Vince" in Houston was working on it. We asked for Vince's Phone number and Paul stated that Vince would call us instead. Vince called at about 1:00 and said everything was now working. Five minutes later we were still getting kicked out. We then paged and called Chip and he said he was trying to get ahold of Houston. At 1:30 we called the Resolution Center. They indicated they had no clue we were having problems. At 1: 45 we called Greg Marsalis in Houston and he said he was unaware of who was working on it, but was going to look into it. Greg did look into it and said they were still not able to pinpoint the problem. At this writing,(3:10p.m.) we are still not able to use Terminal Server and consequently are unable to transact with the ISO or to enter deals in enpower. This has also affected our connection to PX Trade App. We are effectively prevented from taking advantage of market conditions in California. Earlier I had started to initiate some congestion relief on the Palo Verde tie . Cong was $225 to $250 at Palo and I was confident I could take advantage by moving power out of SP. I was unable to do anything in Caps as I was kicked out every few minutes. I feel the communication process regarding this problem has been flawed. I'm sure there is plenty of blame to go around . I suggest we meet with IT to define a process where we will get immediate attention and results. As of 3:20 we still have the same problems. I am hoping this problem is isolated to Sunday.
|