I can be available tomorrow in the city, not today though.
Why not stay with Rackspace? From reviewing the Panix website, I was not
bowled over with confidence at its reliability. I extremely happy with
Rackspace technical support. If we can afford to stay with them, I'd do
it. Depending on the traffic we hit, their transfer rates are pretty
On Wed, Oct 19, 2011 at 2:33 PM, Tom Gillis <email@example.com
*Short summary - site is on Rackspace Cloud, not Panix, for now.
*Let's try to get interested sysadmin folks in a room or skype tongiht.
Chaz / Kevin - what's your real-time availability in NYC today ? We'd
like to have another work session - Dan R and Jake also have root on
the server and I'd like to get us all in a room or at least a skype to
coordinate (I won't be available onsite until tonight).
BTW - one other thing that we need to come to a consensus on is our
longterm hosting provider - the ppl at the work session decided last
night (around 2 AM) that after the Panix hosting server that i had
access on became unresponsive (looking into the cause of this) and
none of us had the account creds to restart the server or provision
another one we moved the site to a rackspace cloud account (cloned
from a legacy machine that I have set up as a LAMP box for freelance
projects). Not the best setup but the best we could do at the last
minute, with calling off the site launch (for the 2nd time in a week)
not an option (since the staging site url was already starting to leak
to the general public and it just mean that the data / user migration
was going to get more complicated as time went by).
ANYWAY - folks on the ground can sync up with Jake or Dan to get
access to the box, or we can do it over skype later (I'm trying to
stick to not sending credentials over the internet).
There may be some duplication of effort here - getting a deployment
going that will get us thru the next few weeks, and then coming up
with a better long term solution.
On Wed, Oct 19, 2011 at 1:13 PM, Kevin <firstname.lastname@example.org
> lighttpd/Nginx or any async server is exactly where I was going
> ... Having them use fam/gamin will help file stats if we find I/o
> Sorry for short mails but I'm mobile only for a while
> DON'T PANIC
> On Oct 19, 2011 12:05 PM, "Sam Boyer" <email@example.com
>> i wish i could volunteer to hit this round the clock, but i just
>> at the moment :( :(
>> some thoughts on scaling this thing. first - we don't know squat
>> we get into the box. second, once we do, installing some monitoring
>> tools - e.g. cacti - should be high priority, otherwise, we're just
>> gonna be flailing around in the dark. nagios is fine, but that'll
>> monitoring, not usage logs. alternatively/additionally we could
>> paying for monitoring from a service like new relic (which might be
>> better if only because it means less that we have to maintain
>> at least at first). beyond that:
>> - get xhprof onto a prod clone somewhere so we can actually look at
>> what's taking up the processing time. beyond low-hanging fruit,
>> that's gonna take some expertise to actually make a dent with.
>> - big duh, but we've got an opcode cache running...right? the
>> too responsive right now for this NOT to be the case.
>> - getting mysql onto baremetal, or rackspace cloud (though that would
>> mean moving everything to rackspace, and i've already heard security
>> concerns about that), should probably be a priority. heavy db io
>> virt layer...meh.
>> - due to the 1s-minimum granularity issue the mysql slow query log is
>> almost a too-late-to-be-useful thing (unless the percona people
>> got that patch in to add ms granularity in mainline...but i doubt
>> but we do need to run it, as it'll give us a hit list of queries for
>> optimization and/or caching.
>> - as kevin mentioned on the other thread, fcgi; and if we do that,
>> really, no reason not to switch to nginx. i don't know what our
>> volume looks like so i don't know how much we'd be getting back
>> but really, there's no reason to be serving static assets with bloaty
>> apache workers.
>> - ordinarily, for a drupal site of this type, i'd advocate ESI. i
>> no idea how well WP supports content chunking like that (and
>> good ESI strategies take a *while* to craft), but at the very
>> internal data caching could help with query volume (e.g., cache the
>> output of the query that generates the global activity feed for
>> so). again, though, i don't know how easy that is to layer in
>> and the more custom we get, the more difficult it's gonna be to
>> like i said, though, until we actually *know* where the
>> we can't address them. also, somewhere in this thread i remember
>> someone set up for the expectation that we might just need
>> dear god, i hope not. if that's the case, we could blow the
>> chest that's been accumulated thus far for liberty plaza (~$230,000 i
>> read somewhere) and still only be able to support a several hundred
>> concurrent users. that needs to be brought *down*.
>> On 10/19/11 8:47 AM, Kevin wrote:
>> > Agree nagios for the win, we should get logwatch going as well
>> > Rackspace cloud machines guarantee proc and allow busting if
>> > Once we have more than one machine we need to think about config
>> > mgmt...i would suggest puppet. We could use blueprint to
>> > current machine and generate puppet files.
>> > https://github.com/devstructure/blueprint
>> > DON'T PANIC
>> > On Oct 19, 2011 11:32 AM, "Chaz Cheadle" <firstname.lastname@example.org
>> > <mailto:email@example.com <mailto:firstname.lastname@example.org>>> wrote:
>> > I'd like to suggest zenoss/nagios for monitoring.
>> > As for hardware configurations, I'd say we definitely
>> > physical/dedicated DB servers with cloud webhosting. Unless
>> > Rackspace or Linode, it may be hard to ensure we'll get the
>> > processor or I/O from a vps.
>> > If we have one server now, we can start serving the whole
>> > from there, then purchase cloud webservers to lighten the
>> > then add mysql replication in later if the DB reads start
>> > high. Unless we're doing heavy editing on the site one DB
>> > now should handle all of the read requests.
>> > With Zen/nagios we will be able to monitor the server
>> > decisions on expansion. Let's figure out the resource issue
>> > now with WP before jumping to cloud web hosts and MySql
>> > What is the current panix host package we're on?
>> > chaz
>> > On Wed, Oct 19, 2011 at 11:14 AM, Todd Grayson
>> > <mailto:email@example.com
>> > Adding Chaz and Kevin,
>> > guys once consensus can be reached with dev leads,
folks can be
>> > ID'd and get started on the "how to move forward" as a
>> > team? Tom needs additional eyballs and hands covering
>> > deploy as well as ongoing release engineering. The
>> > going to become a bigger deal as work continues.
>> > and lets get a working plan together that approaches
the list in
>> > a way that resource on-boarding is clean and effective?
>> > Todd
>> > On 10/19/2011 8:06 AM, Todd Grayson wrote:
>> >> OK:
>> >> As a conversation before going to the list, I'm
>> >> to you folks to establish consensus on what is going
>> >> next. Please identify WHO should be included in this
>> >> conversation not currently a part of it. Once
concensus is in
>> >> place we can go to the lists for specific volunteers.
>> >> this efficient and quick the team in NYC should have
>> >> the following items for the folks coming forward:
>> >> * Development leads who are overseeing configuration for
>> >> current wordpress deploy and able to answer questions
>> >> o available for q&a and facilitating access to
>> >> etc. when needed
>> >> * ID who is the contacts are for the panix hosting
>> >> services, a conference call with them to talk
>> >> what is being seen now and what we feel will be
>> >> reach capacity should be scheduled ASAP
>> >> * Is there any way to get current perf statistics
>> >> its running now where its at?
>> >> The call for specific volunteers will be based on the
>> >> need a team folks to help out with systems and DB
>> >> administration tasks as well as performance tuning and
>> >> capacity planning. This will give the working
>> >> depth and allow for a more contineous support model as one
>> >> worker will only have limited hours in a day to
>> >> where as a team model can support sustained activity
>> >> period of time.
>> >> Here is what is needed from the current and previous
>> >> lists as well as contacts on the ground once they are
>> >> identified;
>> >> * Technical Project Manager
>> >> * Linux systems administrators with web hosting
>> >> (and virtual hosting infrastructure)
>> >> * MySql DBA's supporting web hosted env's, wordpress
>> >> environments
>> >> Resources like this have already come forward to the
>> >> we can start with these people. IMHO a target team of 6
>> >> people should be the goal (3 dba, 3 sysadmin)
>> >> MySql
>> >> Linux Administration
>> >> This call for volunteers will be the creation of team that
>> >> will be dedicated to the infrastructure of the wordpress
>> >> sites, the DB infrastructure supporting them, and the
>> >> php / wordpress install and configuration over your
>> >> dev/test/release environments moving forward. The
>> >> will be coming forward from online will need to be
>> >> communication, brought into the planning, and then
>> >> communications as a team moving forward.
>> >> If you want to start the ball rolling on this let me
>> >> the contacts are from the "on the ground" requirements
>> >> can get going asap.
>> >> IMHO the actual MySql DB's might have to be on physical
>> >> hardware if the IO we are seeing on the VM's shared
>> >> is the bottleneck.... or just reside in a MySQL DB
>> >> will have to be evaluated with iostat output as system
>> >> is regained and the cause is isolated. It might just
>> >> be memory related; disk IO pressure as paging/swap
>> >> to scale for the demand of resources. If we know
>> >> process names ssh pkill statements can be sent to try
>> >> up the system as well?
>> >> ssh username@hostfqdn 'pkill httpd'
>> >> Todd
>> >> On 10/19/2011 6:33 AM, Tom Gillis wrote:
>> >>> And I feel like "scalable wordpress deployment" is a
>> >>> bit of an
>> >>> oxymoron - but:
>> >>> good news - we have the nycga 2.0 site up, and the
>> >>> functionality is
>> >>> all working as expected.
>> >>> bad news - we needed to rush deployment so that
>> >>> could
>> >>> start using new features, but wordpress is killing
the cpu /
>> >>> memory on
>> >>> the server (a 16gb virual box) and we know that a single
>> >>> server
>> >>> hosting setup is not going to be viable.
>> >>> caching doesn't help us much since most of the content is
>> >>> dynamic, and
>> >>> near-real time - it's wordpress with budypress on top so
>> >>> there's tons
>> >>> of forums, and social-networky activity feeds.
>> >>> what we need:
>> >>> 1 - move mysql to its own server, and set up master /
>> >>> replication (2 virtual servers)
>> >>> 2 - set up a shared file hosting server for user-uploaded
>> >>> images - nfs
>> >>> mounts to a single box (1 box)
>> >>> 3 - setting up load-balanced web frontends with sticky
>> >>> sessions (4
>> >>> virtual boxes probably)
>> >>> I'm hoping to find a few people who will volunteer to
>> >>> with
>> >>> internet group, either in nyc or remotely, over the
>> >>> hrs to
>> >>> make a push to get this infrastructure in place. in
>> >>> we'll
>> >>> be making code optimizations to the site. (lots of
>> >>> low-hanging fruit
>> >>> here, like minifying js and css). i'm hoping to find
>> >>> somebody who
>> >>> can set up one aspect of the infrastructure and I'll
>> >>> up with
>> >>> a cloned version of the production server, which they can
>> >>> modify to
>> >>> fulfill one of these other roles - then we can deploy
>> >>> back into
>> >>> the main infrastructure. I'm probably going to be
>> >>> until
>> >>> around 1pm nyc time, but I'm hoping to have some
>> >>> the
>> >>> time I come back. And right now we really need people
>> >>> free up
>> >>> most of their time for the rest of the week on this
>> >>> literally
>> >>> working around the clock in nyc so you'll have people to
>> >>> coordinate
>> >>> with no matter where / when you're available)
>> >>> Any takers?