Thursday, September 7, 2017
Monday, May 18, 2015
OMG. What a nightmare the past three weeks have been.
Started out with Facebook changing security priviledges for "apps", which affected our server's ability to send weather bulletins to Facebook. Until we were able to figure out a workaround for that, everything on Facebook was down. It also affected our postings to the pages of several northern Florida Emergency Management agencies, and made us look very bad...even though it wasn't our fault. And I don't like looking bad.
After spending a god-awful lot of time daytime and into many late evenings and sometimes early mornings, I and a bunch of other beta testers were able to pool our own hands-on, experimental knowledge until we were able to figure out how to work with the Facebook changes.
Right on the heels of getting things going again on Facebook, then the Fireline server crashed. Not sure on the deeper, more technical specifics of how it happened. But...
...Something to do with the server software being old...originally 2003, upgraded to 2007. Then came an automatically downloaded "upgrade" and all hell broke loose. Nothing worked anymore. All dirs on the server had their permissions reset. Dirs became "unmapped". We could work with a top dir, and a third level dir, but the middle dir...it was as if the server "forgot" it was there. ...Things like this. We had to literally start from scratch with Fireline. All web pages ceased to function. People couldn't see them because permissions had been reset and no one was "allowed" access and they got errors when they tried. The maps on the ACS page stopped updating. The marquee scroller and the bulletin scroller both stopped working. Access ceased to the online EMWIN bulletins archive. Bitly links in Facebook postings no longer worked because access to the online EMWIN bulletins archive couldn't be had anymore. You can see all the cascading problems that resulted.
Both I and the Fireline sysop tried really hard, staying up late at night, to try to stay ahead of all of the problems and to try to figure out how to get the Fireline and the AC-EMWIN servers to work together again. We felt like Data trying to stay ahead of all the "cascading failures" with Lal. (...If you remember the ST:TNG episode where Data had built a daughter.) In Data's case, he was unsuccessful and Lal ceased functioning. In our case, I think we actually licked it, and things are again functional. But I'm going to say that with wariness because on Facebook I've said that before and then something drastic happened and we had to start all over again - and I don't want to jinx myself. :)
The Fireline server software is old and no longer supported by Microsoft. The sysop wants to upgrade it but the cost is just ridiculously prohibitive. We're talking on the order of $4500. Double that for the backup server in case the first one goes down.
At this point, I believe we've finally been able to get everything back up and operating EXCEPT for a couple of Emergency Management agencies. But Facebook pages for Alachua Co SKYWARN, the Alachua County EMWIN Project, GVLStorms, GVLWeather, and GNVWeather, should again be operating as normal from this point onwards.
Let's hope something like that that doesn't happen again for a while. I think I've had about enough this past few weeks and I need a serious vacation. (sigh) It's been nerve-racking.
All that being said, we believe in a couple of mottos wholeheartedly, though...
First of course is "NEVER give up! NEVER surrender!" -Peter Quincy Taggert, Galaxy Quest
Secondly, you've got "Failure is not an option!" -Eugene Krantz (during the Apollo 13 disaster)
And well, of course, you have Coln. Jack O'Neill too, who once said, "So, when your back's up against the wall, and there's no tomorrow, just take one day at a time, and remember...the bigger they are, etcetera."
But the best, most important words of advice came from my cat, Stormy, who with a stare quite serious, sternly advised..."Meow." ...And I believe he was right on.
UPDATE - MAY 24, 2015: Everything again functional. texts to email, pagers, cellphones, listgroups, Facebook, Twitter, web pages...everything...including all our Emergency Management Agency customers. Even gained an additional EM customer: Liberty County. So at the moment, we're serving the following EMA Facebook pages: Bradford Co, Gulf County, Holmes Co, Liberty Co, and Washington Co...with the possibility of Bay Co soon joining the bandwagon, too. :)
Tuesday, April 28, 2015
Things go just right, and by sometime tomorrow some of our AC-EMWIN weather bulletins may be coming out up to a minute or so faster as we add a speedier additional bulletin ingest method!
Currently, our bulletins have been coming out about the same time as or up to 15 seconds before the NOAA Weather Radio. This new ingest method could significantly improve our notification time averages.
Can't wait to get it hooked up and see!
Thursday, August 21, 2014
The AC-EMWIN Server program somehow got locked up yesterday afternoon about 2pm and no bulletins went out for about a 24 hours period. This went unnoticed because I was not able to make my normal daily remote checks on the server due to the fact that I had suffered an on-the-job injury and had been busy with doctors appointments and related medical paperwork and dealing with insurance people.
While the server program might have been down the ingest program, however, continued to actually receive all bulletins downloaded through the EMWIN pipeline and on restart they were all eventually distributed, if but late, for some.
At this time, though, the server is again operating normally. We apologize for any inconvenience.
Thursday, March 6, 2014
Looks like the AC-EMWIN server had a hiccup, last night...
The AC-EMWIN server program locked up at about 8:45pm. All the other SUB-programs kept operating and passing on files which were apparently stuck in the ingest directory (the place where the program temporarily places new bulletins just downlinked, which are USUALLY then immediately *deleted* after they've been processed). So ZFPs and HWOs that didn't get deleted from the ingest dir (because the program locked up at just the right time) and the other subprograms just kept pulling them so they got duped to some Facebook pages a couple times overnight. As well, the radar images from 8pm last night kept getting resent all night until I noticed it early this morning after getting up.
As to how I even noticed that things were locked up... On getting up I noticed that the EMWIN test clients - here at the house, and which are connected to the main server remotely - weren't offering up the normal early morning beeps and sounds that occur when they receive the usual morning ZFPs and HWOs and paints things on the map. After years of being used this this, you notice something's not right almost immediately. This caused me to take a look around and check for problems. Indeed, the AC-SKYWARN web page hadn't refreshed the watch/warning map since 8:30pm last night. All the text bulletins on the AC-EMWIN page were "old" from late yesterday afternoon/evening. As well, no (current) bulletins were being sent to the surrounding area Emergency Management Facebook pages which were subscribed to us. (EEK!)
We're in the middle of some current ongoing weather so I didn't even bother to deal with checking logs and figuring things out. I just rebooted the machine entirely right away and that seemed to take care of it.
...So things are again working properly. I'll keep an eye on the server to make sure it doesn't happen again. The program usually worked very efficiently and I almost never had any problems with it and it's pretty stable so I don't expect another problem like that.
Coincidentally, the server hiccup started the process of me checking ALL resources for the possible problem and this included checking Alachua FreeNet. While it's not part of our server problem, I discovered that Alachua FreeNet is also apparently down, too. Something hung on THEIR server, too, last night and some of you may notice that your web page dirs are empty. (EEK!) I've notified AFN admin about it and they're taking a look. So in a way, you can thank our own server hiccup for causing a causal exploration which ended up discovering the AFN outage. :)
Friday, December 13, 2013
Got an email from Cape Coral FD Facebook page's admin advising that they weren't seeing ANY bulletins being sent to their FB page for quite a while. On checking our server logs, we saw no errors. Then again, we copuld confirm nothing going out to CCFD - including the daily ZFPs. On checking them again, we noted that the logs confirmed that other NWS FOs are sending out ZFPs, but there are NONE WHATSOEVER noted coming out of NWS Ruskin/Tampa. It was strange. Sent out a few queries to see if anyone else in EMWIN-land knew what was going on.
First time we'd ever encountered such an odd problem before. It was actually location specific.
On doing further research, it appears that NWS-Ruskin (Tampa) has changed quite a number of zone codes for numerous bulletins handled within it's CWA. Many zones which were previously handled as single counties have now been divided up into "inland" and "coastal". When they made the changeover on the 3rd, the old zone codes for the affected counties suddenly ceased working.
The new zone codes were updated into the server setup and all affected bulletins were again working with our system.
NWS had apparently sent out a Public Notification bulletin (PNS) about it but it slipped by us.