Suppose we have a field
in a curved spacetime, and we want to know how fast it is changing as you move in some direction in space or time. Because there is more than one possible direction to move in, we have to select a vector
which tells us which direction in the coordinate space
to move in (remember,
stands for a list of all 4 spacetime coordinates.) Then we can calculate it by taking a partial derivative. If your calculus is rusty, the partial derivative
is defined by:

In other words, we compare the value of

at two different points (

and

). As

gets smaller, these two points get closer and closer together, so the values of

typically get more and more similar, but because we divide by

we end up with a nonzero answer in the limit. I've written

instead of

because I'm lazy.
That was the formula for the partial derivative in a particular direction
(which is itself a list of 4 numbers). If we want to have a list of all 4 possible partial derivatives at each point, we can just write
without the
. This is the partial derivative covector, where a covector is a thing which eats a vector (like
) and spits out a number. That's almost the same thing as a vector, but not quite, which is why its index is downstairs instead of upstairs. (You can convert between covectors and vectors by using the metric, e.g.
, where as usual we sum over all 4 possible values of the index.)
Now,
was a scalar field, meaning that it didn't have any indices attached to it. What if we tried to do the same trick with some vector field
(or a covector
)? Well, nothing stops us from taking the partial derivative of a vector in the exact way:

Unfortunately, this turns out to be a stupid thing to do. The problem is that (before we take the limit) it involves
comparing two vectors at different points. But in a curved spacetime, it doesn't make sense to talk about the same direction at different points, because
coordinates are arbitrary. There's no particular sense in comparing the "t" component of a vector at a point

with the "t" component of a vector at another point

, because the definition of "t" is arbitrary. If you change the coordinate system at

but not

you'll get confused.
In a curved spacetime, you can only compare vectors at different points if you select a specific path to go between the two points. You can then drag (or if you prefer, parallel transport) the vector along this path, but if you choose a different path you might get a different answer.
Well here, because the points are really close, there's an obvious path to pick. Since spacetime looks flat when you zoom up really close, you can just parallel transport along the very short straight line connecting the two points. This allows you to relate the coordinate system at the starting point
to the destination point
. Thus, when we take the derivative, we want to compare
not to the same coordinate component of
, but to the parallel translated component of the vector. When we do this, we get the covariant derivative, defined as follows:
:

Well, that's not very useful until I tell you what capital gamma means. It's called the
Christoffel symbol or the
connection, and it tells us how to parallel transport vectors by an infinitesimal amount. Basically if you take a vector pointing in the

direction and drag it a little bit in the

direction, then

says how much your vector ends up shifting in the

direction, relative to your system of coordinates. It turns out that the bottom two indices are symmetric:

.
Similarly, if you want to define the covariant derivative of a covector, you just have to attach the indices a little bit differently:

The minus sign comes in because covectors are the
opposite of vectors, so they need to do behave oppositely under a coordinate change. Or, if you have a complicated tensor with multiple upstairs or downstairs indices, you have to have a separate correction term involving

for
each of the indices. How tedious! But, in the case of a scalar field

, we get off scot free: the covariant and partial derivative are just the same.
If your spacetime is flat and you use Minkowski coordinates, then
. But even in flat spacetime you can have
if you use a weird coordinate system, like polar coordinates.
All of this is a little bit circular so far, since I haven't actually told you how to calculate
yet. It's just some thing with the right number of indices to do what it does. In fact, you could choose to think of the connection
as a fundamental field in its own right, in which case there would be no need to define it in terms of anything else. But that is NOT what people normally do in general relativity. Instead they define the connection in terms of the metric
, because it turns out there is a slick way to do it.
We want to find a way to use the metric to compare things at two different points. In other words, the metric is a sort of standard measuring stick we want to use to see how other things change. But obviously the metric cannot change relative to itself. (If you define a yard as the length of a yardstick, then other things can change in size, but the stick will always be 1 yard by definition.) Therefore, the covariant derivative of the metric itself is zero:
. But if we write out the correction terms we get:

We can use this equation to solve for

in terms of the metric. To do this, we just switch around the roles of the

,

, and

indices to get

and

By adding up two of these equations and subtracting the other, and dividing by two, one can prove that

We can then define

directly as

To do that, we had to introduce something called the inverse metric

. You get this by writing the metric

out as a matrix and
inverting it. (Technically we write

where

is a very boring tensor which is always 1 if

and

are the same index, and 0 if they are different.)
So then, the connection (which allows us to transport vectors from place to place) can be written in terms of the first derivative of the metric. We'll need to take a second derivative of the metric to get the curvature
, but that will be the subject of another post.