Contact T: 01138187800 Email, Twitter or Quick Index

Connected-uk.com

All posts tagged data

This post takes me back to my favourite subject of statistics

We’ve all heard the “lies, damn lies and statistics” quote and still, all too frequently, statistics have been made “to fit” a required outcome

Most people know that this is wrong but still blindly accept the results, probably from a lack of their own self-confidence to question the results

The point still remains the same and the goal is unchanged; we want to better understand and predict the world around us and mathematics offers us the clearest path to that understanding What’s lacking, quite often, is the right information required to correctly analyse a situation and come to a correct answer

Probability 101

Q. I have two children and one is a boy, what is the probability that I have boys?

Common sense tells me that the other child has (for the purposes of this experiment) a 50/50 chance of being either gender so the the common sense answer would be 1/2. Except that this is not true as is has a precedent (I already have one boy). The possible combinations of children are BG, GB, BB or GG and since I already have one boy this removes GG from the equation leaving the probability as 1/3 of having two boys.

Probability 102

Q. I have two children and one is boy born on Tuesday. What is the probability that I have two boys?

Again, common sense suggests that it will be the same as above, why would the day of birth make any difference to the statistical outcome. But it does.

Let’s, using the above naming convention, call a boy born on a Tuesday a BTu. This gives the following scenarios.

* When the first child is a BTu and the second is girl born on any day of the week there are SEVEN possibilities.

* When the first child is a girl born on any day of the week and the second is a BTu there is an additional SEVEN possibilities.

* When the first child is a BTu and the second is a boy born on any day of the week then, again, there are SEVEN possibilities.

* Finally, there is a situation where the first child is a boy born on any day of the week and the second child is a BTu. Again there are seven possibilities but, and here it gets interesting, one of them has been counted before so there are only SIX possibilities.

Counting likely outcomes we then have a total of 7+7+7+6=27 different combinations and 13 of them include two boys the answer is 13/27, wildly different to the 18/27 (1/3) answer to the first question. This is surprisingly odd and (entertainingly) illustrates that seemingly unconnected pieces of information can make a huge and statistically very important difference to outcomes.Whilst this post is folly of sorts it does have a serious side. When you are trying to measure information to produce meaningful outcomes you really must be very careful to decide what to include and what to exclude. And, you must have a grasp of how to use the information correctly, even if the mathematics required were learned when you were 13.


The oft misused phrase that “information is power” generates some pretty big headaches for organisations. Just by gathering information on, for example, web-site activity suddenly turns ageing IT and obsolete marketing departments into “great houses of power”. Funny, eh?

One of the great dilemmas facing organisations embracing the web today is not information poverty, as it was in the late 90s but information overload as we trudge, neck-deep in data. It is quite surprising that organisations have not recognised this more and look to block this ever-increasing problem.

The simplest answer is to hire a Chief Information Officer (CIO) who has great background in statistics, data mining and a focus on the commercial aspects of the organisations. There is a skill in converting information into knowledge and, sadly, even the early generation of CIOs are not quite there yet. They need more power, more budget and more say in the strategy of the organisations.

So as most organisations are forced to live in the information age the really bright ones are embracing the knowledge age.


Following on from Sam’s excellent post recently I thought I’d take a look at where location based services might go in the next few years.

Before I start I should add that there are a plethora of really good location-based services out there including FourSquare, Gowalla and Twitter but these are really just the tip of the iceberg. In fact everything from Google maps to digital cameras with GPS are utilising location to enhance the experience of the service or product offered and with GPS chips now costing a few pence you can easily see GPS being available in every device that could possibly expect to use physical location as a useful tool.

What does location recording give us?

At first sight people are put off by the idea that they are being “tracked” but once over that irrational hurdle you can see there is a great deal of value in knowing where you are, where something happened, who and what is around you and how far it is to something you might desire.

Put simply GPS allows for an event or object to be located to within a few feet anywhere in the world. This is the basics of location services but provide little by itself. It needs applications layered on top of this to provide a richer experience such as augmented reality browsing. The reality is that very few services or applications would NOT benefit from location services. Time and distance are related, so tying location to, for example, a photograph together with a normal time/date-stamp gives you, as the photographer a great way to sort, find, tag and share you photographs. Add a sharing element to this hypothetical photograph and now you can see photographs taken at the same location, same time, one year ago, pointing the same direction etc.

This, naturally, doesn’t just have to apply to photographs. You could tag other documents the same way and enhance the data so it includes not just where and when the document was created (not always the most relevant use of location data) but also where and when it was consumed. I’m sure its more relevant to know which venue Bill Gates first presented the future of gesture computing rather than knowing which of his houses he wrote the presentation in.

So, location computing is not just about creation but also about consumption and, obviously, editing. In fact, the best way to explain this is to use the word CONTEXT. Location, time/date information should be used in the most likely context that serves the best purpose to the user at a given moment in time. Its not then location computing but context computing that matters and location is just the latest tool in the context toolbox since GPS has become ubiquitous.

At it’s simplest it is another form of meta-data and a bloody useful one.

What next in context meta-data?


It appears that the old chestnut of “privacy” and “data protection” might put a bloody great hole in the “Client-side analytics” ship. Specifically our old friend Google Analytics has been declared illegal in Germany.

“So what”, I here you cry, “we’re not in Germany”. But we are, sort of, in the Federated European States of Brussels so how long before the privacy boys start raising the issue elsewhere in Europe. The problem stems from Google acquiring unregulated information from the German people and shipping that data to the US for analysis and use is some ill-defined and opaque manner.

I hope to see the end of Google’s information monopoly in the next few years and, besides, most forward-thinking organisations already realise the importance of owning their own information and shy away from service-led offerings such as GA.

A good place to find out a bit more about Google’s less-than-transparent approach to the web is GoogleWatch.

Remember. “We are moving to a Google that knows more about you.”Google CEO Eric Schmidt, speaking to financial analysts, February 9, 2005, as quoted in the New York Times the next day.

Thanks to GoogleWatch for the use of the logo above, which is a comedy item and not to be taken seriously.



Passport is the central repository that VITES uses to store and recall information on the visitor bases. Storing every interaction with every visitor requires a flexible, scalable and powerful database solution. Connected are a committed LAMP development environment so all Passport data is held in a series of SQL databases providing an open and simple-to-implement recording and reporting environment... Read more