First some background: I am building a solar radiation sensor (measures watts/meter2) from a TSL2561 luminosity sensor. The sensor will be placed under a pane of glass, coated with white translucent paint for light dispersion. The output of the sensor will be a number, representative of lux received at my house. I have 5 personal weather stations within a 15 mile radius who have expensive solar radiation sensors from which I can obtain data readings via Weather Underground. Given that cloud cover will alter the readings in each case, I am assuming that a statistical analysis is warranted. The data is also at the same time somewhat spatially distributed. I need to get the value from my DIY sensor into the ballpark of the other sensors.
My approach so far:
- Gather the data from the 5 sites each day for an extended period of time.
- Use a Chi-square test to check if the data samples are likely from within the same distribution set (note that solar radiation results tend to follow an almost bell-shaped curve over the course of a day).
- Use a sample T-test to establish a mean and standard deviation representative of the set which varies throughout the day.
- Use the mean applied against the value determined by my DIY sensor to formulate an equation (assume a linear fit for right now).
- Check the altered output of my DIY sensor against values from the other 5 sites in a Chi-square test to determine if I am likely from the same distribution.
- Run some sort of correlation against the data set to further solidify the equation that I have found.
To Be Continued ...