Subject: [NYCGA Internet] Re: Call for volunteers: Linux Administration and MY SQL DBA resources
From: Tom Gillis
Date: Wed, 19 Oct 2011 14:33:42 -0400
To: Kevin
CC: Sam Boyer <act@samboyer.org>, Ron Suarez <ron.suarez@loudfeed.com>, Drew H <drew@nycga.net>, Todd Grayson <tgraysonco@gmail.com>, Jake <jakedeg@gmail.com>, Chaz Cheadle <ccheadle@gmail.com>, Internet Working Group <internet_working_group@googlegroups.com>, ows_solutions <ows_solutions@freenetworkfoundation.org>

*Short summary - site is on Rackspace Cloud, not Panix, for now.
*Let's try to get interested sysadmin folks in a room or skype tongiht.

Chaz / Kevin  - what's your real-time availability in NYC today ? We'd
like to have another work session - Dan R and Jake also have root on
the server and I'd like to get us all in a room or at least a skype to
coordinate (I won't be available onsite until tonight).

BTW - one other thing that we need to come to a consensus on is our
longterm hosting provider - the ppl at the work session decided last
night (around 2 AM) that after the Panix hosting server that i had
access on became unresponsive (looking into the cause of this) and
none of us had the account creds to restart the server or provision
another one we moved the site to a rackspace cloud account (cloned
from a legacy machine that I have set up as a LAMP box for freelance
projects).  Not the best setup but the best we could do at the last
minute, with calling off the site launch (for the 2nd time in a week)
not an option (since the staging site url was already starting to leak
to the general public and it just mean that the data / user migration
was going to get more complicated as time went by).

ANYWAY - folks on the ground can sync up with Jake or Dan to get
access to the box, or we can do it over skype later (I'm trying to
stick to not sending credentials over the internet).

There may be some duplication of effort here - getting a deployment
going that will get us thru the next few weeks, and then coming up
with a better long term solution.



On Wed, Oct 19, 2011 at 1:13 PM, Kevin <king.feruke@gmail.com> wrote:
lighttpd/Nginx or any async server is exactly where I was going with fcgi
... Having them use fam/gamin will help file stats if we find I/o being the
problem

Sorry for short mails but I'm mobile only for a while

DON'T PANIC

On Oct 19, 2011 12:05 PM, "Sam Boyer" <act@samboyer.org> wrote:

i wish i could volunteer to hit this round the clock, but i just can't
at the moment :( :(

some thoughts on scaling this thing. first - we don't know squat until
we get into the box. second, once we do, installing some monitoring
tools - e.g. cacti - should be high priority, otherwise, we're just
gonna be flailing around in the dark. nagios is fine, but that'll get us
monitoring, not usage logs. alternatively/additionally we could look at
paying for monitoring from a service like new relic (which might be
better if only because it means less that we have to maintain ourselves,
at least at first). beyond that:

- get xhprof onto a prod clone somewhere so we can actually look at
what's taking up the processing time. beyond low-hanging fruit, though,
that's gonna take some expertise to actually make a dent with.
- big duh, but we've got an opcode cache running...right? the site seems
too responsive right now for this NOT to be the case.
- getting mysql onto baremetal, or rackspace cloud (though that would
mean moving everything to rackspace, and i've already heard security
concerns about that), should probably be a priority. heavy db io through
virt layer...meh.
- due to the 1s-minimum granularity issue the mysql slow query log is
almost a too-late-to-be-useful thing (unless the percona people FINALLY
got that patch in to add ms granularity in mainline...but i doubt it),
but we do need to run it, as it'll give us a hit list of queries for
optimization and/or caching.
- as kevin mentioned on the other thread, fcgi; and if we do that,
really, no reason not to switch to nginx. i don't know what our request
volume looks like so i don't know how much we'd be getting back there,
but really, there's no reason to be serving static assets with bloaty
apache workers.
- ordinarily, for a drupal site of this type, i'd advocate ESI. i have
no idea how well WP supports content chunking like that (and truth is,
good ESI strategies take a *while* to craft), but at the very least some
internal data caching could help with query volume (e.g., cache the
output of the query that generates the global activity feed for 30s or
so). again, though, i don't know how easy that is to layer in with WP,
and the more custom we get, the more difficult it's gonna be to maintain.

like i said, though, until we actually *know* where the problem(s) are,
we can't address them. also, somewhere in this thread i remember seeing
someone set up for the expectation that we might just need ~500MB/proc.
dear god, i hope not. if that's the case, we could blow the entire war
chest that's been accumulated thus far for liberty plaza (~$230,000 i
read somewhere) and still only be able to support a several hundred
concurrent users. that needs to be brought *down*.

cheers
s

On 10/19/11 8:47 AM, Kevin wrote:
Agree nagios for the win, we should get logwatch going as well

Rackspace cloud machines guarantee proc and allow busting if available.

Once we have more than one machine we need to think about config
mgmt...i would suggest puppet. We could use blueprint to analyze the
current machine and generate puppet files.

https://github.com/devstructure/blueprint

DON'T PANIC

On Oct 19, 2011 11:32 AM, "Chaz Cheadle" <ccheadle@gmail.com
<mailto:ccheadle@gmail.com>> wrote:

       I'd like to suggest zenoss/nagios for monitoring.
    As for hardware configurations, I'd say we definitely should have
    physical/dedicated DB servers with cloud webhosting. Unless we're on
    Rackspace or Linode, it may be hard to ensure we'll get the needed
    processor or I/O from a vps.
      If we have one server now, we can start serving the whole thing
    from there, then purchase cloud webservers to lighten the webload
    then add mysql replication in later if the DB reads start getting
    high. Unless we're doing heavy editing on the site one DB server for
    now should handle all of the read requests.
       With Zen/nagios we will be able to monitor the server and make
    decisions on expansion. Let's figure out the resource issue we have
    now with WP before jumping to cloud web hosts and MySql replication.

    What is the current panix host package we're on?

    chaz

    On Wed, Oct 19, 2011 at 11:14 AM, Todd Grayson <tgraysonco@gmail.com
    <mailto:tgraysonco@gmail.com>> wrote:

        Adding Chaz and Kevin,

        guys once consensus can be reached with dev leads, folks can be
        ID'd and get started on the "how to move forward" as a working
        team? Tom needs additional eyballs and hands covering production
        deploy as well as ongoing release engineering. The subject is
        going to become a bigger deal as work continues.  Please review,
        and lets get a working plan together that approaches the list in
        a way that resource on-boarding is clean and effective?

        Todd

        On 10/19/2011 8:06 AM, Todd Grayson wrote:
        OK:

        As a conversation before going to the list, I'm reaching out
        to you folks to establish consensus on what is going to happen
        next.  Please identify WHO should be included in this
        conversation not currently a part of it.  Once concensus is in
        place we can go to the lists for specific volunteers.  To make
        this efficient and quick the team in NYC should have on hand
        the following items for the folks coming forward:

          * Development leads who are overseeing configuration for
            current wordpress deploy and able to answer questions
              o available for q&a and facilitating access to repo's
                etc. when needed
          * ID who is the contacts are for the panix  hosting
            services, a conference call with them to talk through of
            what is being seen now and what we feel will be needed to
            reach capacity should be scheduled ASAP
          * Is there any way to get current perf statistics from where
            its running now where its at?


        The call for specific volunteers will be based on the fact we
        need a team folks to help out with systems and DB
        administration tasks as well as performance tuning and
        capacity planning.  This will give the working technical team
        depth and allow for a more contineous support model as one
        worker will only have limited hours in a day to contribute,
        where as a team model can support sustained activity over a
        period of time.

        Here is what is needed from the current and previous volunteer
        lists as well as contacts on the ground once they are
identified;

          * Technical Project Manager
          * Linux systems administrators with web hosting backgrounds
            (and virtual hosting infrastructure)
          * MySql DBA's supporting web hosted env's, wordpress
            environments

        Resources like this have already come forward to the IWG list,
        we can start with these people.  IMHO a target team of 6
        people should be the goal (3 dba, 3 sysadmin)
        MySql

http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc

<http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc>
        Linux Administration

http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3

<http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3>

        This call for volunteers will be the creation of team that
        will be dedicated to the infrastructure of the wordpress
        sites, the DB infrastructure supporting them, and the apache /
        php / wordpress install and configuration over your
        dev/test/release environments moving forward.  The folks that
        will be coming forward from online will need to be included in
        communication, brought into the planning, and then included in
        communications as a team moving forward.

        If you want to start the ball rolling on this let me know who
        the contacts are from the "on the ground" requirements and we
        can get going asap.

        IMHO the actual MySql DB's might have to be on physical
        hardware if the IO we are seeing on the VM's shared backplane
        is the bottleneck.... or just reside in a MySQL DB farm.. That
        will have to be evaluated with iostat output as system access
        is regained and the cause is isolated.  It might just simply
        be memory related; disk IO pressure as paging/swap attempted
        to scale for the  demand of resources.   If we know the right
        process names ssh pkill statements can be sent to try and free
        up the system as well?

        ssh username@hostfqdn 'pkill httpd'

        Todd

        On 10/19/2011 6:33 AM, Tom Gillis wrote:
        And I feel like "scalable wordpress deployment" is a little
bit of an
        oxymoron - but:
        good news - we have the nycga 2.0 site up, and the
functionality is
        all working as expected.

        bad news - we needed to rush deployment so that working groups
could
        start using new features, but wordpress is killing the cpu /
memory on
        the server (a 16gb virual box) and we know that a single
server
        hosting setup is not going to be viable.

        caching doesn't help us much since most of the content is
dynamic, and
        near-real time - it's wordpress with budypress on top so
there's tons
        of forums, and social-networky activity feeds.

        what we need:
        1 - move mysql to its own server, and set up master / slave
        replication (2 virtual servers)
        2 - set up a shared file hosting server for user-uploaded
images - nfs
        mounts to a single box (1 box)
        3 - setting up load-balanced web frontends with sticky
sessions (4
        virtual boxes probably)

        I'm hoping to find a few people who will volunteer to work
with
        internet group, either in nyc or remotely, over the next 72
hrs to
        make a push to get this infrastructure in place.  in parallel,
we'll
        be making code optimizations to the site.  (lots of
low-hanging fruit
        here, like minifying js and css).   i'm hoping to find
somebody who
        can set up one aspect of the infrastructure and I'll hook them
up with
        a cloned version of the production server, which they can
modify to
        fulfill one of these other roles - then we can deploy that
back into
        the main infrastructure.   I'm probably going to be asleep
until
        around 1pm nyc time, but I'm hoping to have some volunteers by
the
        time I come back. And right now we really need people who can
free up
        most of their time for the rest of the week on this (we're
literally
        working around the clock in nyc so you'll have people to
coordinate
        with no matter where / when you're available)

        Any takers?






< PREV INDEX SEARCH NEXT >