Wednesday, November 24, 2010

Building the Perfect SharePoint Farm

SharePoint Server Roles: Web Front end, Server

Single Server Farm (simplest, everything on one server, only for dev or similar use, can run on Win7) – he does use for his laptop due to resources. Can put ISO into your machine, choose defaults & fine (just for dev).

Small Farm – a little more robust – break out sQL server, WFE & Svc App layer running on same box, recommended for small workgroups.

Medium Farm – break out sQL server, WFE & Svc app layer running on indiv servers, can handle large amoutns of users & content, no redundancy. Option 2: WFE & Svc App layer running on same servers (but two of them) – can handle larger amts of users but App Layer impacts perf (would need to crawl overnight) - gives some redundancy – 2 WFEs can handle up to 100K users.

["SQL server is the water for our fish"]

Large Redundant Farm: Multiple WFEs, multiple svc app layers, clustered (highly available) SQL Server (should find someone who really knows how to do it right), no single point of failure.

The “Perfect” SharePoint Farm? How can one, or even four, farm architectures satisfy every need? Need to define perfection – different for every implementation. Issues to determine biz reqts for: Capability, performance, reliability, scalability, user experience (ie like Tron, fight for the users – mobile? [must turn on basic auth to access via Android]), security, cost (will help determine what they can get).

Capability – what are we trying to accomplish? Doc Mgt (need to consider search & versioning – estimate # of docs x avg size x # of versions + 25%), Records Mgt, Web content Mgt (.com site – most are relatively small – usually 10-20G – but then reliability is ultra important), Social collab, BI platform, enterprise messaging (intranet portal).

Performance – how fast does it need to be? Is there an established SLA, and should there be? Page load vs transaction time (what can we directly control). Avg user load (estimate total users x % concurrent (actively clicking) users / time per request = requests per second – can get 50 from one WFE).

Reliability: (are you sure you can afford it?) Expensive. No really… it’s expensive (to achieve 99.999%) - hdwe more than doubled, support staff, staff competency must be higher. Challenge early and often (vs cost). Most co’s need weekly mnte window (5 9′s means 6 sec allowed per week – 99% is still under 2 hrs per wk for maint – could be Sat 2am – might mean not requiring SQL cluster – but probably not OK in middle of EOM close).

Scalability – What is my growth curve? Establish metrics on your farm & stay on top of them (eg how are versions really working – use reports to monitor svc constantly). Know scaling options: server size/numbers/role – WFE (users), Svc app (usage), SQL (size/use of content). Should be part of initial plan – if x grows to x, here’s what we’ll need.

User Experience (the mythical user): Needs & wants; Goals, motivations & triggers; obstacles/limitations; tasks, activities & behaviors; geography/language; envir/gear, work life/experience.

Security (info wants to be free!): Easiest would be give everyone admin access, but obv can’t. Encrypt data (even internal communication – most info theft happens inside org – see diagram on his slide – sometimes some data must be both in transit & at rest; maybe use custom code to mask b4 & after storage), use SSL (but offload certifs to TMG [Threat Mgt Gateway]/F5 etc), segment SP on dedicated VLAN, use reverse proxy servers (TMG), use dedicated admin accounts (if I am admin of boxes, do not use normal user acct – each admin should have their own admin acct - more of a policy). Co might decided to accept certain amt of risk (vs cost). Point of vulnerability is always SP admins.

Cost – plan for now & future. Know licensing options (Foundation free, Std vs Enterprise, Internet sites). Don’t forget to include all other server costs (Windows/SQL ($12,500 per processor)/TMG/etc). Hardware costs (servers, routers). When you are building SP, you’re building a home; pour foundation first & never again – need someone who really knows what they’re doing. Changing org structure etc is like knocking down & rearranging walls, can be done tho difficult. Rearranging furniture is easy (things users can do – build lists).

Secret sauce (how do we put it all together): Planning. Roles (WFE server – very sim to MOSS, primary scale component is # of users connecting, 8GB RAM, 4 “cores”, 80GB HD (his rec); SQL Server – clustered or other HA tech, can have multiple instances to share load, perf critical to farm, 16-64G RAM, 8+ cores, 80G HD + data [on C drive]).

SQL considerations: alias your server (SQL alias or DNS [usually OK]). Size of content DB determines min RAM (med (<4TB content) -> 32GB RAM, lg -> 64DB RAM. Disk perf critical (tune SAN for optimal perf by use – RAID 10 best (more efficient but eats more disk space), RAID 5 OK – data/log/temp DB all on sep LUNS (logical unit ____) & logical disks – ensure SAN NIC isn’t saturated).

Service App Servers: Automatic scale & load balance. Can partition svc to specific servers. Search very scalable, but very complex – crawl servers (can have mult servers crawling data, query data sent directly to query servers not on crawl) – Query Servers (can create mult partitions, 10 mill items upper limit – about 5% query, about 25% for search – can have active/passive failover query partitions (can have dup on another box to split up load to cut time in half – see slide diagram – significantly more complex – FAST is something different) - query size est at 10% of crawled content (usual smaller)).

Virtualization – Problem or Solution? He says we can virtualize everything (WFE/Svc App. SQL maybe – you can, but most ppl don’t). Allows us to dynamically manage farm as needed by reqts (so resources aren’t being wasted & can be used for elsewhat). Snapshots are your friend (always b4 hotfix).

High Availability, DR, Virtualization: Plan for emergencies (plan for worst, hope for best – Recovery time Objective & Recovery Point Objective (about 2x size of log shipment[??]). Prevent single point of failure. Consider Virtual DR. Also plan for effort to recover from failover (how much work involved).

Can the Cloud save us? Transfers risk to cloud provider (servers/network/patches/backup) – you make sure my servers are running, etc. Reduces “dark” hardware costs. Can be cheaper (shared), or more expensive (private).

Define, then Design (server roles, server size/#, virtualized components, availability, DR, test farm – “everyone has a test farm, some have production farm”)

Develop: Stick to plan, ensure all pre-reqs and patches are installed, script install, snapshot everything b4 & after, take lots of screen shots.