Information is simply the data received about a fact that resolves uncertainty.
Information theory as a field was created by Claude Shannon’s work, A Mathematical Theory of Communication. I think his treatise is the first to look information mathematically. Shannon was interested in reducing the communication losses in transmission over phone lines. So a lot of textbooks use examples of transmission of text messages across phone lines.
Information theory is also closely related to probability.
Essentially more surpising the outcome of an event, the more information it carries.
Information is quantified in the language of the probability:
Given a discrete random variable X, with N possible values x1, x2,….,xn and their associated probabilities p1, p2,….,pn, the information received when learning that choice was xi:

L(Xi) = log2(1/pi)


The question I had was why was base 2 used for the log. Surprisingly the Wikipedia page on this is quite useful and accessible to a noob :

the information that can be stored in a system is proportional to the logarithm of N possible states of that system, denoted logbN


Effectively, 1 bit is the answer to a yes/no question. Yes and No have equal probabilities