Documenting the Columns in the downloadable Cells database CSV

Hi,

I checked around, and I can’t seem to find documentation on what the different columns in the data mean. I’m referring to the complete dataset that is available for download (~3 GB).

E.g., what is “range”, “mcc”, “samples”?

Please let me know where these things are documented.

Thanks,
Navid

Hey Navid,

You’re right, we’re going to publish an updated documentation sheet on the columns soon. I’ve mentioned the column descriptions below:

Radio: The generation of broadband cellular network technology
MCC: Mobile Country Code
Net: Mobile Network Code
Area: Location Area Code (LAC)
Cell: Cell tower code (CID)
lat, long: Approx coordinates of the cell tower
range: Approximate area within which the cell could be. (radius in meters)
samples: No of measures processed to get this data
changeble:

  • 1 = The location is determined by processing samples
  • 0 = We got the location directly from the telecom firm

created: When the cell was first added to database (UNIX timestamp)
updated: When the cell was last seen (UNIX timestamp)

I hope this helps!

6 Likes

What does the averageSignal column mean?

A quick overview of how we get this column:

To get the positions of cells, We first process measurements from our data contributors. Each measurement includes GPS location of device + Scanned cell identifier (MCC-MNC-LAC-CID) + Other device properties (Signal strength).

  • GPS location is a must
  • Scanned cell id is a must
  • Other properties aren’t absolutely necessary to find the cell location

We process thousands of measurements for a particular MCC-MNC-LAC-CID are we figure out the estimate location of a cell. In this process, signal strength of the device is averaged. Most ‘averageSignal’ values are 0 because we simply didn’t receive signal strength values.

1 Like

Could you clarify the range column definition? The given documentation states

range: Approximate area within which the cell could be. (In meters)

Is it a radius in meters, or an area in meters^2?

Hey Dominic,

You’re right, I didn’t mention that. It is a radius in meters.

I’ll update my post

Hi Sagar,

samples: No of measures processed to get this data

Is this the total no of measures(calls) for the 24 hour period of the data? Can we get the data over time in a day?

Nope, samples relates to the cumulative number of measures (not calls, but datapoints) submitted for that particular cell.

You can get the data in real time via the API, but we process data exports on a daily basis.

Hi Sagar,

Would the number of datapoints translate to activities of the cell? I’m trying to produce a map of “digital activity” in an area through the cell towers. Would the datapoints be a measure of digital activity of the cell?

If I use the API, which field denotes the calls?

No, I don’t see how the number of samples could translate to activity in that region. If you mean usage of that cell - We’re only collecting cell positions via contributions. We have no way of knowing the number of devices connected to the cell.

You can probably make a case for associating higher volume of samples for a particular cell to higher activity in that region, but it would not be a strong relation. Our data contributors are a very small percentage of the entire population.

OpenCelliD has better data in cities vs towns/ villages - this could simply be because cities have larger population.

You can produce a map of cell coverage in a particular area and then assume (roughly) the ability to have digital activity. For example, if you see a bunch of CDMA/GSM cells in region ‘A’ and see a bunch of LTE cells in region ‘B’ - region ‘B’ is more likely to have a larger digital footprint.

1 Like

I see, thanks for the explanation. Much clearer now.

I am still not understanding the “range” parameter. Does this mean this is the area where the cell tower is located (i.e. an error in the lat-long), or is it the range of the tower signal - i.e. the effective distance from the lat-long of the tower where one can get a signal?

Hey Abhishek,

The range denotes approximation of location of cell - not coverage.

If range value is 1000 - it says that the cell can be at x,y or anywhere in the range of 1000 sq meters around it

The units in the meta data say meters, not meter square, just FYI.
Can I say “The range of a tower is the range at which a mobile device can connect reliably to the tower”?
Also, is this an average value, or a maximum value for towers with multiple samples?

I wouldn’t say that. We’re not looking at this from the point of view of a mobile device or cell coverage. The position of the cell itself can be at this coordinate or within x meters (range).

This is an average value from multiple measurements.

So, I am looking at the Burkina Faso data, and the average for “range” is 1.9 km - does that mean that the average error range of the cell tower from the given lat-long in the data is 1.9 km? I see some values as high as 150-200 km, is that right?

I think to avoid confusion this variable should be labeled as error, instead of range. Clearly the British government is interpreting it incorrectly in their OpenCellID code book (should probably correct them, eh?)

Hi,

Can I get a more precise definition of the the “approximate location” field? Is that the location of the UE that produced the report? Or if not, how is it computed?

Thanks!
Eric

Hey Eric,

We do not return the position of the UE (User Equipment) - we approximate position of a cell based on information scanned by the UE. Each submitted scan is known as a measurement. We process billions of measurements to determine positions of millions of cells.

For example, Your current device is able to scan cell A (signal strength 85%) at GPS position of your device (x,y).

  • Device position x1,y1 - Cell A signal strength 95%
  • Device position x2,y2 - Cell A signal strength 73%
  • Device position x3,y3 - Cell A signal strength 68%
  • Device position x4,y4 - Cell A signal strength 81%

You can see how hundreds of such submissions can give us enough data to approximate position of cell A.

Hi Sagar,

Thank you. Is that algorithm documented somewhere? Some fellow researchers (both at NIST and elsewhere) and I are to do some further post-analysis using the OpenCellID data, and it would be very helpful to know exactly what went into it.

For example, is it simple multilateration based on signal strength? Is antenna directionality (at the cell site primarily, but maybe also the receiver) considered? What about transmit power variation over time? And differing receiver sensitivity / calibration? What about colinear / non-orthogonal measurement points? Or non-uniform propagation losses with distance (e.g. some observations are weak because the UE is in a valley, not because the cell is far away)?

Thanks!
Eric

Hey Eric,

We use a simple triangulation algorithm. I’m sure it’s available in a number of places online. The challenge is to curate input data - this is where we excel with a bunch of intelligent algorithms that verify the quality of cells we receive in measurements.

Signal strength is only one of the factors and is a part of a standard triangulation algorithm.

We do not receive info on antenna direction, transmit power, receiver calibration. As for uniform propagation losses, we use our existing data on cells to qualify each measurement and flag out cells that were affected because of a simple interference.

I’ll be happy to hop on a quick call and elaborate. Reach out to us at [email protected]