Reference: Here, which is a well-written introduction to both concepts.

Entropy

“The entropy of a random variable is a function which attempts to characterize the “unpredictability” of a random variable.” The unpredictability is both related to the frequency and the number of outcomes. A fair 666-sided die is more unpredictable than 6-sided die. But if we cheat on 666-sided one by making the side with number 1 super heavy, we may then find the 666-sided die more predictable.

For a random variable $X$, with possible value ${x_1,x_2,\dots}$ and under the distribution $P(X)$, the entropy for it is:

$$H(X)=-\sum_{x\in X}P(x)logP(x)$$

In another form as:

$$H(P(X))=H(P)=H(X)$$

Notice that entropy has certain unit. If the log has base 2, then it is expressed by bit.

Joint Entropy

Joint entropy is the entropy for joint distribution, or multi-valued random variable. For random variable $E$ and $C$, the joint distribution is $P(E,C)$, and the joint entropy is:

$$H(E,C)=H(P(E,C))=-\sum_{e\in E} \sum{c\in C}P(e,c)logP(e,c)$$

Mutual Information

Mutual information is a quantity that measures a relationship between two random variables that are sampled simultaneously. In particular, it measures how much information is communicated, on average, in one random variable about another. Intuitively, one might ask, how much does one random variable tell me about another?

If $X$ is the result of a fair die, $Y$ tells whether the result is even and $Z$ is the result of the other fair die. $X$ and $Z$ has no mutual information(independent), while $Y$ tells something about $X$.

The mutual information, denoted by $I$ is:

$$I(X;Y)=\sum_{x \in X} \sum_{y \in Y} P(x,y)log\frac{P(x,y)}{P(x)P(y)}$$

where $P(X)$ $P(Y)$ are the marginal distributions obtained through marginalization process.