Subject: Re: [NYCGA Internet] Re: Call for volunteers: Linux Administration and MY SQL DBA resources
From: Chaz Cheadle
Date: Wed, 19 Oct 2011 14:57:38 -0400
To: Tom Gillis
CC: internet_working_group@googlegroups.com, Kevin <king.feruke@gmail.com>, Ron Suarez <ron.suarez@loudfeed.com>, Drew H <drew@nycga.net>, Todd Grayson <tgraysonco@gmail.com>, Jake <jakedeg@gmail.com>, ows_solutions <ows_solutions@freenetworkfoundation.org>

We may be faced with that kind of action again on any server host. Rackspace may be better prepared to deal with a similar situation, but then again, perhaps not. We'll just have to keep our fingers crossed.

chaz

On Wed, Oct 19, 2011 at 2:52 PM, Tom Gillis <thomaswgillis@gmail.com> wrote:
We have personal connections and Panix and it sounds like from their
history that they're really good at fielding off legal harassment.

On Wed, Oct 19, 2011 at 2:49 PM, Sam Boyer <act@samboyer.org> wrote:
> the concern with rackspace would be this kinda thing:
>
> https://www.eff.org/cases/indymedia-server-takedown
>
> On 10/19/11 11:41 AM, Chaz Cheadle wrote:
>> I can be available tomorrow in the city, not today though.
>> Why not stay with Rackspace? From reviewing the Panix website, I was not
>> bowled over with confidence at its reliability. I extremely happy with
>> Rackspace technical support. If we can afford to stay with them, I'd do
>> it. Depending on the traffic we hit, their transfer rates are pretty
>> competitive.
>>
>> On Wed, Oct 19, 2011 at 2:33 PM, Tom Gillis <thomaswgillis@gmail.com
>> <mailto:thomaswgillis@gmail.com>> wrote:
>>
>>     *Short summary - site is on Rackspace Cloud, not Panix, for now.
>>     *Let's try to get interested sysadmin folks in a room or skype tongiht.
>>
>>     Chaz / Kevin  - what's your real-time availability in NYC today ? We'd
>>     like to have another work session - Dan R and Jake also have root on
>>     the server and I'd like to get us all in a room or at least a skype to
>>     coordinate (I won't be available onsite until tonight).
>>
>>     BTW - one other thing that we need to come to a consensus on is our
>>     longterm hosting provider - the ppl at the work session decided last
>>     night (around 2 AM) that after the Panix hosting server that i had
>>     access on became unresponsive (looking into the cause of this) and
>>     none of us had the account creds to restart the server or provision
>>     another one we moved the site to a rackspace cloud account (cloned
>>     from a legacy machine that I have set up as a LAMP box for freelance
>>     projects).  Not the best setup but the best we could do at the last
>>     minute, with calling off the site launch (for the 2nd time in a week)
>>     not an option (since the staging site url was already starting to leak
>>     to the general public and it just mean that the data / user migration
>>     was going to get more complicated as time went by).
>>
>>     ANYWAY - folks on the ground can sync up with Jake or Dan to get
>>     access to the box, or we can do it over skype later (I'm trying to
>>     stick to not sending credentials over the internet).
>>
>>     There may be some duplication of effort here - getting a deployment
>>     going that will get us thru the next few weeks, and then coming up
>>     with a better long term solution.
>>
>>
>>
>>     On Wed, Oct 19, 2011 at 1:13 PM, Kevin <king.feruke@gmail.com
>>     <mailto:king.feruke@gmail.com>> wrote:
>>     > lighttpd/Nginx or any async server is exactly where I was going
>>     with fcgi
>>     > ... Having them use fam/gamin will help file stats if we find I/o
>>     being the
>>     > problem
>>     >
>>     > Sorry for short mails but I'm mobile only for a while
>>     >
>>     > DON'T PANIC
>>     >
>>     > On Oct 19, 2011 12:05 PM, "Sam Boyer" <act@samboyer.org
>>     <mailto:act@samboyer.org>> wrote:
>>     >>
>>     >> i wish i could volunteer to hit this round the clock, but i just
>>     can't
>>     >> at the moment :( :(
>>     >>
>>     >> some thoughts on scaling this thing. first - we don't know squat
>>     until
>>     >> we get into the box. second, once we do, installing some monitoring
>>     >> tools - e.g. cacti - should be high priority, otherwise, we're just
>>     >> gonna be flailing around in the dark. nagios is fine, but that'll
>>     get us
>>     >> monitoring, not usage logs. alternatively/additionally we could
>>     look at
>>     >> paying for monitoring from a service like new relic (which might be
>>     >> better if only because it means less that we have to maintain
>>     ourselves,
>>     >> at least at first). beyond that:
>>     >>
>>     >> - get xhprof onto a prod clone somewhere so we can actually look at
>>     >> what's taking up the processing time. beyond low-hanging fruit,
>>     though,
>>     >> that's gonna take some expertise to actually make a dent with.
>>     >> - big duh, but we've got an opcode cache running...right? the
>>     site seems
>>     >> too responsive right now for this NOT to be the case.
>>     >> - getting mysql onto baremetal, or rackspace cloud (though that would
>>     >> mean moving everything to rackspace, and i've already heard security
>>     >> concerns about that), should probably be a priority. heavy db io
>>     through
>>     >> virt layer...meh.
>>     >> - due to the 1s-minimum granularity issue the mysql slow query log is
>>     >> almost a too-late-to-be-useful thing (unless the percona people
>>     FINALLY
>>     >> got that patch in to add ms granularity in mainline...but i doubt
>>     it),
>>     >> but we do need to run it, as it'll give us a hit list of queries for
>>     >> optimization and/or caching.
>>     >> - as kevin mentioned on the other thread, fcgi; and if we do that,
>>     >> really, no reason not to switch to nginx. i don't know what our
>>     request
>>     >> volume looks like so i don't know how much we'd be getting back
>>     there,
>>     >> but really, there's no reason to be serving static assets with bloaty
>>     >> apache workers.
>>     >> - ordinarily, for a drupal site of this type, i'd advocate ESI. i
>>     have
>>     >> no idea how well WP supports content chunking like that (and
>>     truth is,
>>     >> good ESI strategies take a *while* to craft), but at the very
>>     least some
>>     >> internal data caching could help with query volume (e.g., cache the
>>     >> output of the query that generates the global activity feed for
>>     30s or
>>     >> so). again, though, i don't know how easy that is to layer in
>>     with WP,
>>     >> and the more custom we get, the more difficult it's gonna be to
>>     maintain.
>>     >>
>>     >> like i said, though, until we actually *know* where the
>>     problem(s) are,
>>     >> we can't address them. also, somewhere in this thread i remember
>>     seeing
>>     >> someone set up for the expectation that we might just need
>>     ~500MB/proc.
>>     >> dear god, i hope not. if that's the case, we could blow the
>>     entire war
>>     >> chest that's been accumulated thus far for liberty plaza (~$230,000 i
>>     >> read somewhere) and still only be able to support a several hundred
>>     >> concurrent users. that needs to be brought *down*.
>>     >>
>>     >> cheers
>>     >> s
>>     >>
>>     >> On 10/19/11 8:47 AM, Kevin wrote:
>>     >> > Agree nagios for the win, we should get logwatch going as well
>>     >> >
>>     >> > Rackspace cloud machines guarantee proc and allow busting if
>>     available.
>>     >> >
>>     >> > Once we have more than one machine we need to think about config
>>     >> > mgmt...i would suggest puppet. We could use blueprint to
>>     analyze the
>>     >> > current machine and generate puppet files.
>>     >> >
>>     >> > https://github.com/devstructure/blueprint
>>     >> >
>>     >> > DON'T PANIC
>>     >> >
>>     >> > On Oct 19, 2011 11:32 AM, "Chaz Cheadle" <ccheadle@gmail.com
>>     <mailto:ccheadle@gmail.com>
>>     >> > <mailto:ccheadle@gmail.com <mailto:ccheadle@gmail.com>>> wrote:
>>     >> >
>>     >> >        I'd like to suggest zenoss/nagios for monitoring.
>>     >> >     As for hardware configurations, I'd say we definitely
>>     should have
>>     >> >     physical/dedicated DB servers with cloud webhosting. Unless
>>     we're on
>>     >> >     Rackspace or Linode, it may be hard to ensure we'll get the
>>     needed
>>     >> >     processor or I/O from a vps.
>>     >> >       If we have one server now, we can start serving the whole
>>     thing
>>     >> >     from there, then purchase cloud webservers to lighten the
>>     webload
>>     >> >     then add mysql replication in later if the DB reads start
>>     getting
>>     >> >     high. Unless we're doing heavy editing on the site one DB
>>     server for
>>     >> >     now should handle all of the read requests.
>>     >> >        With Zen/nagios we will be able to monitor the server
>>     and make
>>     >> >     decisions on expansion. Let's figure out the resource issue
>>     we have
>>     >> >     now with WP before jumping to cloud web hosts and MySql
>>     replication.
>>     >> >
>>     >> >     What is the current panix host package we're on?
>>     >> >
>>     >> >     chaz
>>     >> >
>>     >> >     On Wed, Oct 19, 2011 at 11:14 AM, Todd Grayson
>>     <tgraysonco@gmail.com <mailto:tgraysonco@gmail.com>
>>     >> >     <mailto:tgraysonco@gmail.com
>>     <mailto:tgraysonco@gmail.com>>> wrote:
>>     >> >
>>     >> >         Adding Chaz and Kevin,
>>     >> >
>>     >> >         guys once consensus can be reached with dev leads,
>>     folks can be
>>     >> >         ID'd and get started on the "how to move forward" as a
>>     working
>>     >> >         team? Tom needs additional eyballs and hands covering
>>     production
>>     >> >         deploy as well as ongoing release engineering. The
>>     subject is
>>     >> >         going to become a bigger deal as work continues.
>>      Please review,
>>     >> >         and lets get a working plan together that approaches
>>     the list in
>>     >> >         a way that resource on-boarding is clean and effective?
>>     >> >
>>     >> >         Todd
>>     >> >
>>     >> >         On 10/19/2011 8:06 AM, Todd Grayson wrote:
>>     >> >>         OK:
>>     >> >>
>>     >> >>         As a conversation before going to the list, I'm
>>     reaching out
>>     >> >>         to you folks to establish consensus on what is going
>>     to happen
>>     >> >>         next.  Please identify WHO should be included in this
>>     >> >>         conversation not currently a part of it.  Once
>>     concensus is in
>>     >> >>         place we can go to the lists for specific volunteers.
>>      To make
>>     >> >>         this efficient and quick the team in NYC should have
>>     on hand
>>     >> >>         the following items for the folks coming forward:
>>     >> >>
>>     >> >>           * Development leads who are overseeing configuration for
>>     >> >>             current wordpress deploy and able to answer questions
>>     >> >>               o available for q&a and facilitating access to
>>     repo's
>>     >> >>                 etc. when needed
>>     >> >>           * ID who is the contacts are for the panix  hosting
>>     >> >>             services, a conference call with them to talk
>>     through of
>>     >> >>             what is being seen now and what we feel will be
>>     needed to
>>     >> >>             reach capacity should be scheduled ASAP
>>     >> >>           * Is there any way to get current perf statistics
>>     from where
>>     >> >>             its running now where its at?
>>     >> >>
>>     >> >>
>>     >> >>         The call for specific volunteers will be based on the
>>     fact we
>>     >> >>         need a team folks to help out with systems and DB
>>     >> >>         administration tasks as well as performance tuning and
>>     >> >>         capacity planning.  This will give the working
>>     technical team
>>     >> >>         depth and allow for a more contineous support model as one
>>     >> >>         worker will only have limited hours in a day to
>>     contribute,
>>     >> >>         where as a team model can support sustained activity
>>     over a
>>     >> >>         period of time.
>>     >> >>
>>     >> >>         Here is what is needed from the current and previous
>>     volunteer
>>     >> >>         lists as well as contacts on the ground once they are
>>     >> >> identified;
>>     >> >>
>>     >> >>           * Technical Project Manager
>>     >> >>           * Linux systems administrators with web hosting
>>     backgrounds
>>     >> >>             (and virtual hosting infrastructure)
>>     >> >>           * MySql DBA's supporting web hosted env's, wordpress
>>     >> >>             environments
>>     >> >>
>>     >> >>         Resources like this have already come forward to the
>>     IWG list,
>>     >> >>         we can start with these people.  IMHO a target team of 6
>>     >> >>         people should be the goal (3 dba, 3 sysadmin)
>>     >> >>         MySql
>>     >> >>
>>     >> >>
>>     http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc
>>     <http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc>
>>     >> >>
>>     >> >>
>>     <http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc
>>     <http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc>>
>>     >> >>         Linux Administration
>>     >> >>
>>     >> >>
>>     http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3
>>     <http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3>
>>     >> >>
>>     >> >>
>>     <http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3
>>     <http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3>>
>>     >> >>
>>     >> >>         This call for volunteers will be the creation of team that
>>     >> >>         will be dedicated to the infrastructure of the wordpress
>>     >> >>         sites, the DB infrastructure supporting them, and the
>>     apache /
>>     >> >>         php / wordpress install and configuration over your
>>     >> >>         dev/test/release environments moving forward.  The
>>     folks that
>>     >> >>         will be coming forward from online will need to be
>>     included in
>>     >> >>         communication, brought into the planning, and then
>>     included in
>>     >> >>         communications as a team moving forward.
>>     >> >>
>>     >> >>         If you want to start the ball rolling on this let me
>>     know who
>>     >> >>         the contacts are from the "on the ground" requirements
>>     and we
>>     >> >>         can get going asap.
>>     >> >>
>>     >> >>         IMHO the actual MySql DB's might have to be on physical
>>     >> >>         hardware if the IO we are seeing on the VM's shared
>>     backplane
>>     >> >>         is the bottleneck.... or just reside in a MySQL DB
>>     farm.. That
>>     >> >>         will have to be evaluated with iostat output as system
>>     access
>>     >> >>         is regained and the cause is isolated.  It might just
>>     simply
>>     >> >>         be memory related; disk IO pressure as paging/swap
>>     attempted
>>     >> >>         to scale for the  demand of resources.   If we know
>>     the right
>>     >> >>         process names ssh pkill statements can be sent to try
>>     and free
>>     >> >>         up the system as well?
>>     >> >>
>>     >> >>         ssh username@hostfqdn 'pkill httpd'
>>     >> >>
>>     >> >>         Todd
>>     >> >>
>>     >> >>         On 10/19/2011 6:33 AM, Tom Gillis wrote:
>>     >> >>>         And I feel like "scalable wordpress deployment" is a
>>     little
>>     >> >>> bit of an
>>     >> >>>         oxymoron - but:
>>     >> >>>         good news - we have the nycga 2.0 site up, and the
>>     >> >>> functionality is
>>     >> >>>         all working as expected.
>>     >> >>>
>>     >> >>>         bad news - we needed to rush deployment so that
>>     working groups
>>     >> >>> could
>>     >> >>>         start using new features, but wordpress is killing
>>     the cpu /
>>     >> >>> memory on
>>     >> >>>         the server (a 16gb virual box) and we know that a single
>>     >> >>> server
>>     >> >>>         hosting setup is not going to be viable.
>>     >> >>>
>>     >> >>>         caching doesn't help us much since most of the content is
>>     >> >>> dynamic, and
>>     >> >>>         near-real time - it's wordpress with budypress on top so
>>     >> >>> there's tons
>>     >> >>>         of forums, and social-networky activity feeds.
>>     >> >>>
>>     >> >>>         what we need:
>>     >> >>>         1 - move mysql to its own server, and set up master /
>>     slave
>>     >> >>>         replication (2 virtual servers)
>>     >> >>>         2 - set up a shared file hosting server for user-uploaded
>>     >> >>> images - nfs
>>     >> >>>         mounts to a single box (1 box)
>>     >> >>>         3 - setting up load-balanced web frontends with sticky
>>     >> >>> sessions (4
>>     >> >>>         virtual boxes probably)
>>     >> >>>
>>     >> >>>         I'm hoping to find a few people who will volunteer to
>>     work
>>     >> >>> with
>>     >> >>>         internet group, either in nyc or remotely, over the
>>     next 72
>>     >> >>> hrs to
>>     >> >>>         make a push to get this infrastructure in place.  in
>>     parallel,
>>     >> >>> we'll
>>     >> >>>         be making code optimizations to the site.  (lots of
>>     >> >>> low-hanging fruit
>>     >> >>>         here, like minifying js and css).   i'm hoping to find
>>     >> >>> somebody who
>>     >> >>>         can set up one aspect of the infrastructure and I'll
>>     hook them
>>     >> >>> up with
>>     >> >>>         a cloned version of the production server, which they can
>>     >> >>> modify to
>>     >> >>>         fulfill one of these other roles - then we can deploy
>>     that
>>     >> >>> back into
>>     >> >>>         the main infrastructure.   I'm probably going to be
>>     asleep
>>     >> >>> until
>>     >> >>>         around 1pm nyc time, but I'm hoping to have some
>>     volunteers by
>>     >> >>> the
>>     >> >>>         time I come back. And right now we really need people
>>     who can
>>     >> >>> free up
>>     >> >>>         most of their time for the rest of the week on this
>>     (we're
>>     >> >>> literally
>>     >> >>>         working around the clock in nyc so you'll have people to
>>     >> >>> coordinate
>>     >> >>>         with no matter where / when you're available)
>>     >> >>>
>>     >> >>>         Any takers?
>>     >> >>
>>     >> >
>>     >> >
>>     >>
>>     >
>>
>>
>
>

< PREV INDEX SEARCH NEXT >