Subject: Re: [NYCGA Internet] Re: Call for volunteers: Linux Administration and MY SQL DBA resources
From: Tom Gillis
Date: Wed, 19 Oct 2011 14:52:10 -0400
To: internet_working_group@googlegroups.com
CC: Chaz Cheadle <ccheadle@gmail.com>, Kevin <king.feruke@gmail.com>, Ron Suarez <ron.suarez@loudfeed.com>, Drew H <drew@nycga.net>, Todd Grayson <tgraysonco@gmail.com>, Jake <jakedeg@gmail.com>, ows_solutions <ows_solutions@freenetworkfoundation.org>

We have personal connections and Panix and it sounds like from their
history that they're really good at fielding off legal harassment.

On Wed, Oct 19, 2011 at 2:49 PM, Sam Boyer <act@samboyer.org> wrote:
the concern with rackspace would be this kinda thing:

https://www.eff.org/cases/indymedia-server-takedown

On 10/19/11 11:41 AM, Chaz Cheadle wrote:
I can be available tomorrow in the city, not today though.
Why not stay with Rackspace? From reviewing the Panix website, I was not
bowled over with confidence at its reliability. I extremely happy with
Rackspace technical support. If we can afford to stay with them, I'd do
it. Depending on the traffic we hit, their transfer rates are pretty
competitive.

On Wed, Oct 19, 2011 at 2:33 PM, Tom Gillis <thomaswgillis@gmail.com
<mailto:thomaswgillis@gmail.com>> wrote:

    *Short summary - site is on Rackspace Cloud, not Panix, for now.
    *Let's try to get interested sysadmin folks in a room or skype tongiht.

    Chaz / Kevin  - what's your real-time availability in NYC today ? We'd
    like to have another work session - Dan R and Jake also have root on
    the server and I'd like to get us all in a room or at least a skype to
    coordinate (I won't be available onsite until tonight).

    BTW - one other thing that we need to come to a consensus on is our
    longterm hosting provider - the ppl at the work session decided last
    night (around 2 AM) that after the Panix hosting server that i had
    access on became unresponsive (looking into the cause of this) and
    none of us had the account creds to restart the server or provision
    another one we moved the site to a rackspace cloud account (cloned
    from a legacy machine that I have set up as a LAMP box for freelance
    projects).  Not the best setup but the best we could do at the last
    minute, with calling off the site launch (for the 2nd time in a week)
    not an option (since the staging site url was already starting to leak
    to the general public and it just mean that the data / user migration
    was going to get more complicated as time went by).

    ANYWAY - folks on the ground can sync up with Jake or Dan to get
    access to the box, or we can do it over skype later (I'm trying to
    stick to not sending credentials over the internet).

    There may be some duplication of effort here - getting a deployment
    going that will get us thru the next few weeks, and then coming up
    with a better long term solution.



    On Wed, Oct 19, 2011 at 1:13 PM, Kevin <king.feruke@gmail.com
    <mailto:king.feruke@gmail.com>> wrote:
    > lighttpd/Nginx or any async server is exactly where I was going
    with fcgi
    > ... Having them use fam/gamin will help file stats if we find I/o
    being the
    > problem
    >
    > Sorry for short mails but I'm mobile only for a while
    >
    > DON'T PANIC
    >
    > On Oct 19, 2011 12:05 PM, "Sam Boyer" <act@samboyer.org
    <mailto:act@samboyer.org>> wrote:
    >>
    >> i wish i could volunteer to hit this round the clock, but i just
    can't
    >> at the moment :( :(
    >>
    >> some thoughts on scaling this thing. first - we don't know squat
    until
    >> we get into the box. second, once we do, installing some monitoring
    >> tools - e.g. cacti - should be high priority, otherwise, we're just
    >> gonna be flailing around in the dark. nagios is fine, but that'll
    get us
    >> monitoring, not usage logs. alternatively/additionally we could
    look at
    >> paying for monitoring from a service like new relic (which might be
    >> better if only because it means less that we have to maintain
    ourselves,
    >> at least at first). beyond that:
    >>
    >> - get xhprof onto a prod clone somewhere so we can actually look at
    >> what's taking up the processing time. beyond low-hanging fruit,
    though,
    >> that's gonna take some expertise to actually make a dent with.
    >> - big duh, but we've got an opcode cache running...right? the
    site seems
    >> too responsive right now for this NOT to be the case.
    >> - getting mysql onto baremetal, or rackspace cloud (though that would
    >> mean moving everything to rackspace, and i've already heard security
    >> concerns about that), should probably be a priority. heavy db io
    through
    >> virt layer...meh.
    >> - due to the 1s-minimum granularity issue the mysql slow query log is
    >> almost a too-late-to-be-useful thing (unless the percona people
    FINALLY
    >> got that patch in to add ms granularity in mainline...but i doubt
    it),
    >> but we do need to run it, as it'll give us a hit list of queries for
    >> optimization and/or caching.
    >> - as kevin mentioned on the other thread, fcgi; and if we do that,
    >> really, no reason not to switch to nginx. i don't know what our
    request
    >> volume looks like so i don't know how much we'd be getting back
    there,
    >> but really, there's no reason to be serving static assets with bloaty
    >> apache workers.
    >> - ordinarily, for a drupal site of this type, i'd advocate ESI. i
    have
    >> no idea how well WP supports content chunking like that (and
    truth is,
    >> good ESI strategies take a *while* to craft), but at the very
    least some
    >> internal data caching could help with query volume (e.g., cache the
    >> output of the query that generates the global activity feed for
    30s or
    >> so). again, though, i don't know how easy that is to layer in
    with WP,
    >> and the more custom we get, the more difficult it's gonna be to
    maintain.
    >>
    >> like i said, though, until we actually *know* where the
    problem(s) are,
    >> we can't address them. also, somewhere in this thread i remember
    seeing
    >> someone set up for the expectation that we might just need
    ~500MB/proc.
    >> dear god, i hope not. if that's the case, we could blow the
    entire war
    >> chest that's been accumulated thus far for liberty plaza (~$230,000 i
    >> read somewhere) and still only be able to support a several hundred
    >> concurrent users. that needs to be brought *down*.
    >>
    >> cheers
    >> s
    >>
    >> On 10/19/11 8:47 AM, Kevin wrote:
    >> > Agree nagios for the win, we should get logwatch going as well
    >> >
    >> > Rackspace cloud machines guarantee proc and allow busting if
    available.
    >> >
    >> > Once we have more than one machine we need to think about config
    >> > mgmt...i would suggest puppet. We could use blueprint to
    analyze the
    >> > current machine and generate puppet files.
    >> >
    >> > https://github.com/devstructure/blueprint
    >> >
    >> > DON'T PANIC
    >> >
    >> > On Oct 19, 2011 11:32 AM, "Chaz Cheadle" <ccheadle@gmail.com
    <mailto:ccheadle@gmail.com>
    >> > <mailto:ccheadle@gmail.com <mailto:ccheadle@gmail.com>>> wrote:
    >> >
    >> >        I'd like to suggest zenoss/nagios for monitoring.
    >> >     As for hardware configurations, I'd say we definitely
    should have
    >> >     physical/dedicated DB servers with cloud webhosting. Unless
    we're on
    >> >     Rackspace or Linode, it may be hard to ensure we'll get the
    needed
    >> >     processor or I/O from a vps.
    >> >       If we have one server now, we can start serving the whole
    thing
    >> >     from there, then purchase cloud webservers to lighten the
    webload
    >> >     then add mysql replication in later if the DB reads start
    getting
    >> >     high. Unless we're doing heavy editing on the site one DB
    server for
    >> >     now should handle all of the read requests.
    >> >        With Zen/nagios we will be able to monitor the server
    and make
    >> >     decisions on expansion. Let's figure out the resource issue
    we have
    >> >     now with WP before jumping to cloud web hosts and MySql
    replication.
    >> >
    >> >     What is the current panix host package we're on?
    >> >
    >> >     chaz
    >> >
    >> >     On Wed, Oct 19, 2011 at 11:14 AM, Todd Grayson
    <tgraysonco@gmail.com <mailto:tgraysonco@gmail.com>
    >> >     <mailto:tgraysonco@gmail.com
    <mailto:tgraysonco@gmail.com>>> wrote:
    >> >
    >> >         Adding Chaz and Kevin,
    >> >
    >> >         guys once consensus can be reached with dev leads,
    folks can be
    >> >         ID'd and get started on the "how to move forward" as a
    working
    >> >         team? Tom needs additional eyballs and hands covering
    production
    >> >         deploy as well as ongoing release engineering. The
    subject is
    >> >         going to become a bigger deal as work continues.
     Please review,
    >> >         and lets get a working plan together that approaches
    the list in
    >> >         a way that resource on-boarding is clean and effective?
    >> >
    >> >         Todd
    >> >
    >> >         On 10/19/2011 8:06 AM, Todd Grayson wrote:
    >> >>         OK:
    >> >>
    >> >>         As a conversation before going to the list, I'm
    reaching out
    >> >>         to you folks to establish consensus on what is going
    to happen
    >> >>         next.  Please identify WHO should be included in this
    >> >>         conversation not currently a part of it.  Once
    concensus is in
    >> >>         place we can go to the lists for specific volunteers.
     To make
    >> >>         this efficient and quick the team in NYC should have
    on hand
    >> >>         the following items for the folks coming forward:
    >> >>
    >> >>           * Development leads who are overseeing configuration for
    >> >>             current wordpress deploy and able to answer questions
    >> >>               o available for q&a and facilitating access to
    repo's
    >> >>                 etc. when needed
    >> >>           * ID who is the contacts are for the panix  hosting
    >> >>             services, a conference call with them to talk
    through of
    >> >>             what is being seen now and what we feel will be
    needed to
    >> >>             reach capacity should be scheduled ASAP
    >> >>           * Is there any way to get current perf statistics
    from where
    >> >>             its running now where its at?
    >> >>
    >> >>
    >> >>         The call for specific volunteers will be based on the
    fact we
    >> >>         need a team folks to help out with systems and DB
    >> >>         administration tasks as well as performance tuning and
    >> >>         capacity planning.  This will give the working
    technical team
    >> >>         depth and allow for a more contineous support model as one
    >> >>         worker will only have limited hours in a day to
    contribute,
    >> >>         where as a team model can support sustained activity
    over a
    >> >>         period of time.
    >> >>
    >> >>         Here is what is needed from the current and previous
    volunteer
    >> >>         lists as well as contacts on the ground once they are
    >> >> identified;
    >> >>
    >> >>           * Technical Project Manager
    >> >>           * Linux systems administrators with web hosting
    backgrounds
    >> >>             (and virtual hosting infrastructure)
    >> >>           * MySql DBA's supporting web hosted env's, wordpress
    >> >>             environments
    >> >>
    >> >>         Resources like this have already come forward to the
    IWG list,
    >> >>         we can start with these people.  IMHO a target team of 6
    >> >>         people should be the goal (3 dba, 3 sysadmin)
    >> >>         MySql
    >> >>
    >> >>
    http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc
    <http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc>
    >> >>
    >> >>
    <http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc
    <http://groups.google.com/group/internet_working_group/browse_thread/thread/4bde061a2adacee6/ede85b8a1812c3cc?lnk=gst&q=MySQL#ede85b8a1812c3cc>>
    >> >>         Linux Administration
    >> >>
    >> >>
    http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3
    <http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3>
    >> >>
    >> >>
    <http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3
    <http://groups.google.com/group/internet_working_group/browse_thread/thread/694bde580564c681/f4afcbc78aa06aa3?lnk=gst&q=Linux+Administration#f4afcbc78aa06aa3>>
    >> >>
    >> >>         This call for volunteers will be the creation of team that
    >> >>         will be dedicated to the infrastructure of the wordpress
    >> >>         sites, the DB infrastructure supporting them, and the
    apache /
    >> >>         php / wordpress install and configuration over your
    >> >>         dev/test/release environments moving forward.  The
    folks that
    >> >>         will be coming forward from online will need to be
    included in
    >> >>         communication, brought into the planning, and then
    included in
    >> >>         communications as a team moving forward.
    >> >>
    >> >>         If you want to start the ball rolling on this let me
    know who
    >> >>         the contacts are from the "on the ground" requirements
    and we
    >> >>         can get going asap.
    >> >>
    >> >>         IMHO the actual MySql DB's might have to be on physical
    >> >>         hardware if the IO we are seeing on the VM's shared
    backplane
    >> >>         is the bottleneck.... or just reside in a MySQL DB
    farm.. That
    >> >>         will have to be evaluated with iostat output as system
    access
    >> >>         is regained and the cause is isolated.  It might just
    simply
    >> >>         be memory related; disk IO pressure as paging/swap
    attempted
    >> >>         to scale for the  demand of resources.   If we know
    the right
    >> >>         process names ssh pkill statements can be sent to try
    and free
    >> >>         up the system as well?
    >> >>
    >> >>         ssh username@hostfqdn 'pkill httpd'
    >> >>
    >> >>         Todd
    >> >>
    >> >>         On 10/19/2011 6:33 AM, Tom Gillis wrote:
    >> >>>         And I feel like "scalable wordpress deployment" is a
    little
    >> >>> bit of an
    >> >>>         oxymoron - but:
    >> >>>         good news - we have the nycga 2.0 site up, and the
    >> >>> functionality is
    >> >>>         all working as expected.
    >> >>>
    >> >>>         bad news - we needed to rush deployment so that
    working groups
    >> >>> could
    >> >>>         start using new features, but wordpress is killing
    the cpu /
    >> >>> memory on
    >> >>>         the server (a 16gb virual box) and we know that a single
    >> >>> server
    >> >>>         hosting setup is not going to be viable.
    >> >>>
    >> >>>         caching doesn't help us much since most of the content is
    >> >>> dynamic, and
    >> >>>         near-real time - it's wordpress with budypress on top so
    >> >>> there's tons
    >> >>>         of forums, and social-networky activity feeds.
    >> >>>
    >> >>>         what we need:
    >> >>>         1 - move mysql to its own server, and set up master /
    slave
    >> >>>         replication (2 virtual servers)
    >> >>>         2 - set up a shared file hosting server for user-uploaded
    >> >>> images - nfs
    >> >>>         mounts to a single box (1 box)
    >> >>>         3 - setting up load-balanced web frontends with sticky
    >> >>> sessions (4
    >> >>>         virtual boxes probably)
    >> >>>
    >> >>>         I'm hoping to find a few people who will volunteer to
    work
    >> >>> with
    >> >>>         internet group, either in nyc or remotely, over the
    next 72
    >> >>> hrs to
    >> >>>         make a push to get this infrastructure in place.  in
    parallel,
    >> >>> we'll
    >> >>>         be making code optimizations to the site.  (lots of
    >> >>> low-hanging fruit
    >> >>>         here, like minifying js and css).   i'm hoping to find
    >> >>> somebody who
    >> >>>         can set up one aspect of the infrastructure and I'll
    hook them
    >> >>> up with
    >> >>>         a cloned version of the production server, which they can
    >> >>> modify to
    >> >>>         fulfill one of these other roles - then we can deploy
    that
    >> >>> back into
    >> >>>         the main infrastructure.   I'm probably going to be
    asleep
    >> >>> until
    >> >>>         around 1pm nyc time, but I'm hoping to have some
    volunteers by
    >> >>> the
    >> >>>         time I come back. And right now we really need people
    who can
    >> >>> free up
    >> >>>         most of their time for the rest of the week on this
    (we're
    >> >>> literally
    >> >>>         working around the clock in nyc so you'll have people to
    >> >>> coordinate
    >> >>>         with no matter where / when you're available)
    >> >>>
    >> >>>         Any takers?
    >> >>
    >> >
    >> >
    >>
    >





< PREV INDEX SEARCH NEXT >