I checked around, and I can’t seem to find documentation on what the different columns in the data mean. I’m referring to the complete dataset that is available for download (~3 GB).
E.g., what is “range”, “mcc”, “samples”?
Please let me know where these things are documented.
You’re right, we’re going to publish an updated documentation sheet on the columns soon. I’ve mentioned the column descriptions below:
Radio: The generation of broadband cellular network technology MCC: Mobile Country Code Net: Mobile Network Code Area: Location Area Code (LAC) Cell: Cell tower code (CID) lat, long: Approx coordinates of the cell tower range: Approximate area within which the cell could be. (radius in meters) samples: No of measures processed to get this data changeble:
1 = The location is determined by processing samples
0 = We got the location directly from the telecom firm
created: When the cell was first added to database (UNIX timestamp) updated: When the cell was last seen (UNIX timestamp)
To get the positions of cells, We first process measurements from our data contributors. Each measurement includes GPS location of device + Scanned cell identifier (MCC-MNC-LAC-CID) + Other device properties (Signal strength).
GPS location is a must
Scanned cell id is a must
Other properties aren’t absolutely necessary to find the cell location
We process thousands of measurements for a particular MCC-MNC-LAC-CID are we figure out the estimate location of a cell. In this process, signal strength of the device is averaged. Most ‘averageSignal’ values are 0 because we simply didn’t receive signal strength values.
Would the number of datapoints translate to activities of the cell? I’m trying to produce a map of “digital activity” in an area through the cell towers. Would the datapoints be a measure of digital activity of the cell?
No, I don’t see how the number of samples could translate to activity in that region. If you mean usage of that cell - We’re only collecting cell positions via contributions. We have no way of knowing the number of devices connected to the cell.
You can probably make a case for associating higher volume of samples for a particular cell to higher activity in that region, but it would not be a strong relation. Our data contributors are a very small percentage of the entire population.
OpenCelliD has better data in cities vs towns/ villages - this could simply be because cities have larger population.
You can produce a map of cell coverage in a particular area and then assume (roughly) the ability to have digital activity. For example, if you see a bunch of CDMA/GSM cells in region ‘A’ and see a bunch of LTE cells in region ‘B’ - region ‘B’ is more likely to have a larger digital footprint.
I am still not understanding the “range” parameter. Does this mean this is the area where the cell tower is located (i.e. an error in the lat-long), or is it the range of the tower signal - i.e. the effective distance from the lat-long of the tower where one can get a signal?
The units in the meta data say meters, not meter square, just FYI.
Can I say “The range of a tower is the range at which a mobile device can connect reliably to the tower”?
Also, is this an average value, or a maximum value for towers with multiple samples?
I wouldn’t say that. We’re not looking at this from the point of view of a mobile device or cell coverage. The position of the cell itself can be at this coordinate or within x meters (range).
This is an average value from multiple measurements.
So, I am looking at the Burkina Faso data, and the average for “range” is 1.9 km - does that mean that the average error range of the cell tower from the given lat-long in the data is 1.9 km? I see some values as high as 150-200 km, is that right?
I think to avoid confusion this variable should be labeled as error, instead of range. Clearly the British government is interpreting it incorrectly in their OpenCellID code book (should probably correct them, eh?)
Can I get a more precise definition of the the “approximate location” field? Is that the location of the UE that produced the report? Or if not, how is it computed?
We do not return the position of the UE (User Equipment) - we approximate position of a cell based on information scanned by the UE. Each submitted scan is known as a measurement. We process billions of measurements to determine positions of millions of cells.
For example, Your current device is able to scan cell A (signal strength 85%) at GPS position of your device (x,y).
Device position x1,y1 - Cell A signal strength 95%
Device position x2,y2 - Cell A signal strength 73%
Device position x3,y3 - Cell A signal strength 68%
Device position x4,y4 - Cell A signal strength 81%
You can see how hundreds of such submissions can give us enough data to approximate position of cell A.
Thank you. Is that algorithm documented somewhere? Some fellow researchers (both at NIST and elsewhere) and I are to do some further post-analysis using the OpenCellID data, and it would be very helpful to know exactly what went into it.
For example, is it simple multilateration based on signal strength? Is antenna directionality (at the cell site primarily, but maybe also the receiver) considered? What about transmit power variation over time? And differing receiver sensitivity / calibration? What about colinear / non-orthogonal measurement points? Or non-uniform propagation losses with distance (e.g. some observations are weak because the UE is in a valley, not because the cell is far away)?
We use a simple triangulation algorithm. I’m sure it’s available in a number of places online. The challenge is to curate input data - this is where we excel with a bunch of intelligent algorithms that verify the quality of cells we receive in measurements.
Signal strength is only one of the factors and is a part of a standard triangulation algorithm.
We do not receive info on antenna direction, transmit power, receiver calibration. As for uniform propagation losses, we use our existing data on cells to qualify each measurement and flag out cells that were affected because of a simple interference.
I’ll be happy to hop on a quick call and elaborate. Reach out to us at [email protected]