CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Jeremy D. Miller -- The Shade Tree Developer

Under the hood and working with .Net, TDD, Software Design, and Agile Stuff

Database per Developer and Environment

Our Continuous Integration talk Monday got a bit rushed at the end and I missed an important point when we talked about CI with databases.  For the love of all that is good and holy in the world create a separate database/schema for:

  • Each development workstation
  • Build environment
  • Test environment 
  • A "Sheep" database for the Business Analysts (baaaa) or Product Manager to use for ongoing demos to the customer

Running automated tests against a database is a complete joke unless the database is in a completely known state.  There are some "hacky" alternatives, but the easiest alternative by far is a completely isolated database with only one user.  Think about going to a hospital and seeing them reuse needles.  A shared development database should make you recoil in horror in much the same way.

"Jeremy, everybody already knows that and it's been a best practice since before you were born."  I didn't know that a couple of years ago on my first CI/TDD project and I know there are people out there who haven't thought about this issue at all.  I've been a pair of projects now where we attempted to run automated testing against shared databases (in both cases it was organizational stupidity and bureaucracy in Fortune 100 internal IT departments).  It was excruciating.  An untrustworthy automated build is completely useless.  If your initial reaction to the build busting is "Oh, it's probably a database conflict.  I'll just force the build again" -- your investment in CI has been squandered.   

It's also more fuel on the fire for being able to test the majority of your code without the database being involved.  You'll also need the ability to automatically create a database structure from scratch inside the build to pull off the separate database scheme.

 



Comments

Sahil Malik said:

There is one issue though. What if the development database turns out to be 40 gigs? Then it just isn't practical enough to have it on each development workstation.

Then what?
# September 21, 2005 5:10 PM

Carlton said:

Just a question - why is the development database, i.e. the coder's sandbox, so large? I would think that a sandbox environment would be pretty small since you cannot generate 20 GB worth of data, especially if you are dropping it all the time.

Is the development database pulling data out of the production environment? If so, maybe that data should go to staging DB and not development DB (what I consider the integration\automated build environment)? IME, staging should mirror real world more than the developer's sandbox. The sandbox is for bashing the code the way you want it to and should be light.
# September 21, 2005 5:31 PM

Sahil Malik said:

Carlton,

There are indeed situations where a dev database could get so large.

There are two kinds of data, seed data, and non seed data. Seed data typically refers to as admin data, i.e. application setup data - a good example is US Zipcodes. Now in many cases you may not be able to restrict what day you may be able to exclude out of the dev database.

Of course, the right answer is "Just put the seed data only in a different shared db".

Well the issue is

a) Foreign keys
b) Transactions between dbs.
and
c) Well the last developer just didn't do it, nor was he good at maintaining data scrubbing scripts - and I just inherited this damn system, and I need to get this feature in .. by TOMMOROW.

Then what? :). Real world scenario - you are stuck with development databases that are 40 gigs. Now what?
# September 22, 2005 2:19 AM

DevPrime said:

There's another valid statement to go with the 40 GB database - writing good tests is something devs don't often do, even in a TDD environment. If you expect a large database or complex queries, then you are also testing performance to a large scale (or SHOULD be). Unacceptable performance is a FAILED test. You can't test for these things on a tiny subset of a database. Unfortunately, while TDD may lead developers to water, it can't make them drink - in other words, devs typically don't write good tests just because they're using TDD.
# September 22, 2005 11:09 AM

Carlton said:

So what I hear you are saying is you MUST test against a 40 GB database and you need to get it to work yesterday. Then my only advice is make it work in the environment it needs to and worry about the sandbox later.

I once worked in a situation like you described, massive foreign keys, transactions between DB, etc., etc., and I can tell you that it will be many months until you can tease apart all the databases and automate them. In the situation I observed, you will have to work hand-in-hand with the database group (IMO, things only get this bad when you add dedicated DB's in the mix) and the testers to create an independent test environment. No one will want to do it and management will not want to spend the resources to fix the broken infrastructure.

As for performance, I don't think performance should be a goal for TDD. Performance is an important goal, but it is not what a developer should be focused on in the initial TDD - we are writing unit tests in TDD, so by definition we are not talking to outside systems. So where does performance fit in in the mix? I would suggest at the integration level or even perhaps at the customer acceptance test level. Depends on who wants it.
# September 22, 2005 12:19 PM

Jeremy D. Miller said:

Sahil,

A.) You very likely *don't* need the entire 40GB database for testing

B.) In that exact situation, we just used an Oracle Synonym for the large static tables. That's always an option if the data is read-only.

"Database per developer" doesn't necessarily mandate a separate physical server, only database independence. You always have to compromise. The real goal is to have reliable database testing that doesn't give lots of false negatives because of corrupted database states from simultaneous testing runs.
# September 22, 2005 2:49 PM

Jeremy D. Miller said:

DevPrime-

Dude, I think you're more than a little off the mark on all accounts. When I'm testing that the intended functionality works I want fast tests. It's brain dead stupid to run *those* tests against a massive database. You cannot do TDD effectively without rapid feedback.

Performance tests are a completely different animal. Of course you run those on a realistic (or 2-3x larger) sized database. Keep this in mind, when you're making coding changes for performance optimization, don't you want to make sure you don't cause new functional defects? Fast unit, integration, and acceptance tests gives you an ability to safely make performance optimizations. All those reasons that we do TDD and CI to make a code base safer to change and evolve also make it safer to optimize.
# September 22, 2005 2:55 PM

Sahil Malik said:

Jeremy,

There are times when you *need* the 40 gigs dev. database for your app to even run :), much less test.

I know this may sound as a result of "Just scrub the data", but consider this (and this is a real world example out of my own development experience).

An insurance rating engine, which rates auto policies - it needs zipcode data, which is huge, it needs vin#, which is huge, and it needs location data, which is huge, plus other such stuff. You need those tables, in entirety to ensure your tests run.

What we did over there was, yes, the dev database was a shared db, admin (seed) data was in a seperate schema, and developers had their own schemas.

The problem is, that doesn't work with SQL Server. :), and that is a big pain in the arse. (You can't split data into databases, and expect multi db FKs, and Transactions to work properly).

Frankly I don't have a good solution to that much too common problem, so I was wondering if you did. "Have a small DB ain't the solution".

- SM
# September 24, 2005 2:16 AM

Jeremy D. Miller -- The Shade Tree Developer said:

Between being extremely short handed at work, tech' reviewing a new book, a
possible book proposal...
# August 7, 2006 4:50 PM

Jeremy D. Miller -- The Shade Tree Developer said:

Between being extremely short handed at work, tech' reviewing a new book, a possible book proposal

# September 1, 2006 2:32 PM

Carmelo Lisciotto said:

Nicely prepared article!

Carmelo Lisciotto

www.carmelolisciotto.com

# July 5, 2007 9:42 AM

Jeremy D. Miller -- The Shade Tree Developer said:

To everybody that attended one of my talks at DevTeach this week. All of the materials are now online

# November 29, 2007 12:03 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add

About Jeremy D. Miller

Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy previously worked as a systems architect building mission critical supply chain software for a Fortune 100 company and learned agile development practices as a .Net consultant at ThoughtWorks, one of the pioneers of agile development. Jeremy is the author of the open source StructureMap (http://structuremap.sourceforge.net) tool for Dependency Injection with .Net and the forthcoming StoryTeller (http://storyteller.tigris.org) tool for supercharged FIT testing in .Net. Jeremy's thoughts on just about everything software related can be found on his weblog "The Shade Tree Developer" at http://codebetter.com/blogs/jeremy.miller, part of the popular CodeBetter site. Jeremy is a Microsoft MVP for C#. Check out Devlicio.us!

Our Sponsors

Free Tech Publications

This Blog

Syndication

News

All opinions expressed here constitute my (Jeremy D. Miller's) personal opinion, and do not necessarily represent the opinion of any other organization or person, including (but not limited to) my fellow employees, my employer, its clients or their agents.

About Me

"Best Of" Compendium

StructureMap (Dependency Injection for .Net)

StoryTeller (Supercharged Fit)

Build your own Cab

TestDriven

MVP