i wish i could volunteer to hit this round the clock, but i just can't
at the moment :( :(
some thoughts on scaling this thing. first - we don't know squat until
we get into the box. second, once we do, installing some monitoring
tools - e.g. cacti - should be high priority, otherwise, we're just
gonna be flailing around in the dark. nagios is fine, but that'll get us
monitoring, not usage logs. alternatively/additionally we could look at
paying for monitoring from a service like new relic (which might be
better if only because it means less that we have to maintain ourselves,
at least at first). beyond that:
- get xhprof onto a prod clone somewhere so we can actually look at
what's taking up the processing time. beyond low-hanging fruit, though,
that's gonna take some expertise to actually make a dent with.
- big duh, but we've got an opcode cache running...right? the site seems
too responsive right now for this NOT to be the case.
- getting mysql onto baremetal, or rackspace cloud (though that would
mean moving everything to rackspace, and i've already heard security
concerns about that), should probably be a priority. heavy db io through
- due to the 1s-minimum granularity issue the mysql slow query log is
almost a too-late-to-be-useful thing (unless the percona people FINALLY
got that patch in to add ms granularity in mainline...but i doubt it),
but we do need to run it, as it'll give us a hit list of queries for
optimization and/or caching.
- as kevin mentioned on the other thread, fcgi; and if we do that,
really, no reason not to switch to nginx. i don't know what our request
volume looks like so i don't know how much we'd be getting back there,
but really, there's no reason to be serving static assets with bloaty
- ordinarily, for a drupal site of this type, i'd advocate ESI. i have
no idea how well WP supports content chunking like that (and truth is,
good ESI strategies take a *while* to craft), but at the very least some
internal data caching could help with query volume (e.g., cache the
output of the query that generates the global activity feed for 30s or
so). again, though, i don't know how easy that is to layer in with WP,
and the more custom we get, the more difficult it's gonna be to maintain.
like i said, though, until we actually *know* where the problem(s) are,
we can't address them. also, somewhere in this thread i remember seeing
someone set up for the expectation that we might just need ~500MB/proc.
dear god, i hope not. if that's the case, we could blow the entire war
chest that's been accumulated thus far for liberty plaza (~$230,000 i
read somewhere) and still only be able to support a several hundred
concurrent users. that needs to be brought *down*.
On 10/19/11 8:47 AM, Kevin wrote:
Agree nagios for the win, we should get logwatch going as well
Rackspace cloud machines guarantee proc and allow busting if available.
Once we have more than one machine we need to think about config
mgmt...i would suggest puppet. We could use blueprint to analyze the
current machine and generate puppet files.
On Oct 19, 2011 11:32 AM, "Chaz Cheadle" <email@example.com
I'd like to suggest zenoss/nagios for monitoring.
As for hardware configurations, I'd say we definitely should have
physical/dedicated DB servers with cloud webhosting. Unless we're on
Rackspace or Linode, it may be hard to ensure we'll get the needed
processor or I/O from a vps.
If we have one server now, we can start serving the whole thing
from there, then purchase cloud webservers to lighten the webload
then add mysql replication in later if the DB reads start getting
high. Unless we're doing heavy editing on the site one DB server for
now should handle all of the read requests.
With Zen/nagios we will be able to monitor the server and make
decisions on expansion. Let's figure out the resource issue we have
now with WP before jumping to cloud web hosts and MySql replication.
What is the current panix host package we're on?
On Wed, Oct 19, 2011 at 11:14 AM, Todd Grayson <firstname.lastname@example.org
Adding Chaz and Kevin,
guys once consensus can be reached with dev leads, folks can be
ID'd and get started on the "how to move forward" as a working
team? Tom needs additional eyballs and hands covering production
deploy as well as ongoing release engineering. The subject is
going to become a bigger deal as work continues. Please review,
and lets get a working plan together that approaches the list in
a way that resource on-boarding is clean and effective?
On 10/19/2011 8:06 AM, Todd Grayson wrote:
As a conversation before going to the list, I'm reaching out
to you folks to establish consensus on what is going to happen
next. Please identify WHO should be included in this
conversation not currently a part of it. Once concensus is in
place we can go to the lists for specific volunteers. To make
this efficient and quick the team in NYC should have on hand
the following items for the folks coming forward:
* Development leads who are overseeing configuration for
current wordpress deploy and able to answer questions
o available for q&a and facilitating access to repo's
etc. when needed
* ID who is the contacts are for the panix hosting
services, a conference call with them to talk through of
what is being seen now and what we feel will be needed to
reach capacity should be scheduled ASAP
* Is there any way to get current perf statistics from where
its running now where its at?
The call for specific volunteers will be based on the fact we
need a team folks to help out with systems and DB
administration tasks as well as performance tuning and
capacity planning. This will give the working technical team
depth and allow for a more contineous support model as one
worker will only have limited hours in a day to contribute,
where as a team model can support sustained activity over a
period of time.
Here is what is needed from the current and previous volunteer
lists as well as contacts on the ground once they are
* Technical Project Manager
* Linux systems administrators with web hosting backgrounds
(and virtual hosting infrastructure)
* MySql DBA's supporting web hosted env's, wordpress
Resources like this have already come forward to the IWG list,
we can start with these people. IMHO a target team of 6
people should be the goal (3 dba, 3 sysadmin)
This call for volunteers will be the creation of team that
will be dedicated to the infrastructure of the wordpress
sites, the DB infrastructure supporting them, and the apache /
php / wordpress install and configuration over your
dev/test/release environments moving forward. The folks that
will be coming forward from online will need to be included in
communication, brought into the planning, and then included in
communications as a team moving forward.
If you want to start the ball rolling on this let me know who
the contacts are from the "on the ground" requirements and we
can get going asap.
IMHO the actual MySql DB's might have to be on physical
hardware if the IO we are seeing on the VM's shared backplane
is the bottleneck.... or just reside in a MySQL DB farm.. That
will have to be evaluated with iostat output as system access
is regained and the cause is isolated. It might just simply
be memory related; disk IO pressure as paging/swap attempted
to scale for the demand of resources. If we know the right
process names ssh pkill statements can be sent to try and free
up the system as well?
ssh username@hostfqdn 'pkill httpd'
On 10/19/2011 6:33 AM, Tom Gillis wrote:
And I feel like "scalable wordpress deployment" is a little
bit of an
oxymoron - but:
good news - we have the nycga 2.0 site up, and the
all working as expected.
bad news - we needed to rush deployment so that working groups
start using new features, but wordpress is killing the cpu /
the server (a 16gb virual box) and we know that a single
hosting setup is not going to be viable.
caching doesn't help us much since most of the content is
near-real time - it's wordpress with budypress on top so
of forums, and social-networky activity feeds.
what we need:
1 - move mysql to its own server, and set up master / slave
replication (2 virtual servers)
2 - set up a shared file hosting server for user-uploaded
images - nfs
mounts to a single box (1 box)
3 - setting up load-balanced web frontends with sticky
virtual boxes probably)
I'm hoping to find a few people who will volunteer to work
internet group, either in nyc or remotely, over the next 72
make a push to get this infrastructure in place. in parallel,
be making code optimizations to the site. (lots of
here, like minifying js and css). i'm hoping to find
can set up one aspect of the infrastructure and I'll hook them
a cloned version of the production server, which they can
fulfill one of these other roles - then we can deploy that
the main infrastructure. I'm probably going to be asleep
around 1pm nyc time, but I'm hoping to have some volunteers by
time I come back. And right now we really need people who can
most of their time for the rest of the week on this (we're
working around the clock in nyc so you'll have people to
with no matter where / when you're available)