Documenting the Columns in the downloadable Cells database CSV


#1

Hi,

I checked around, and I can’t seem to find documentation on what the different columns in the data mean. I’m referring to the complete dataset that is available for download (~3 GB).

E.g., what is “range”, “mcc”, “samples”?

Please let me know where these things are documented.

Thanks,
Navid


#2

Hey Navid,

You’re right, we’re going to publish an updated documentation sheet on the columns soon. I’ve mentioned the column descriptions below:

Radio: The generation of broadband cellular network technology
MCC: Mobile Country Code
Net: Mobile Network Code
Area: Location Area Code (LAC)
Cell: Cell tower code (CID)
lat, long: Approx coordinates of the cell tower
range: Approximate area within which the cell could be. (radius in meters)
samples: No of measures processed to get this data
changeble:

  • 1 = The location is determined by processing samples
  • 0 = We got the location directly from the telecom firm

created: When the cell was first added to database (UNIX timestamp)
updated: When the cell was last seen (UNIX timestamp)

I hope this helps!


#3

What does the averageSignal column mean?


#4

A quick overview of how we get this column:

To get the positions of cells, We first process measurements from our data contributors. Each measurement includes GPS location of device + Scanned cell identifier (MCC-MNC-LAC-CID) + Other device properties (Signal strength).

  • GPS location is a must
  • Scanned cell id is a must
  • Other properties aren’t absolutely necessary to find the cell location

We process thousands of measurements for a particular MCC-MNC-LAC-CID are we figure out the estimate location of a cell. In this process, signal strength of the device is averaged. Most ‘averageSignal’ values are 0 because we simply didn’t receive signal strength values.


#5

Could you clarify the range column definition? The given documentation states

range: Approximate area within which the cell could be. (In meters)

Is it a radius in meters, or an area in meters^2?


#6

Hey Dominic,

You’re right, I didn’t mention that. It is a radius in meters.

I’ll update my post


#7

Hi Sagar,

samples: No of measures processed to get this data

Is this the total no of measures(calls) for the 24 hour period of the data? Can we get the data over time in a day?


#8

Nope, samples relates to the cumulative number of measures (not calls, but datapoints) submitted for that particular cell.

You can get the data in real time via the API, but we process data exports on a daily basis.


#9

Hi Sagar,

Would the number of datapoints translate to activities of the cell? I’m trying to produce a map of “digital activity” in an area through the cell towers. Would the datapoints be a measure of digital activity of the cell?

If I use the API, which field denotes the calls?


#10

No, I don’t see how the number of samples could translate to activity in that region. If you mean usage of that cell - We’re only collecting cell positions via contributions. We have no way of knowing the number of devices connected to the cell.

You can probably make a case for associating higher volume of samples for a particular cell to higher activity in that region, but it would not be a strong relation. Our data contributors are a very small percentage of the entire population.

OpenCelliD has better data in cities vs towns/ villages - this could simply be because cities have larger population.

You can produce a map of cell coverage in a particular area and then assume (roughly) the ability to have digital activity. For example, if you see a bunch of CDMA/GSM cells in region ‘A’ and see a bunch of LTE cells in region ‘B’ - region ‘B’ is more likely to have a larger digital footprint.


#11

I see, thanks for the explanation. Much clearer now.