preload button hover image:
HomeSearchRankingsStatisticsAboutContact
Current Time: Mar 29, 2024, 06:22:27 am
 Dwarf Fortress File Depot » Miscellaneous » S Transform, how Dwarf Therapist v22 derives %'s
      
Registration is not required to download. However, it will allow you to vote, comment, and upload. Forgot your password?
File Listing: S Transform, how Dwarf Therapist v22 derives %'s
Last Updated: Oct 23, 2014, 11:23:24 pm
First Created: Jul 13, 2014, 11:42:25 am
File version: v5
For DF version: Multiple
Downloads: 21 (39) Size: 63.3 KB
Views: 1,531 (1,988) Type: 7Z
Rating (0 votes): Unrated
Description
Opens with LibreOffice

The attached sheet shows how a method is applied ("S Transform") to skewed data to 'normalize' it.

It's basic principle is: it works by using the area of a distribution that has the least amount of disparity (between mean and median), and does a min-max transform around each (mean and median) side (ie min to mean to max; min to median to max)

Example:

minmax drawing = ordered drawing but on a 0 to 1 scale.

http://imgur.com/5rtUH5a

Any distribution with uniformly distinct values (where no single value dominates the majority of the distribution, ex... when mode exceeds 50% of values) can be normalized using this method. What do I mean by 'uniformly distinct'? A distribution is not uniformly distinct when a value represents more than 50% of the distribution. We test If a distribution's 1st Quartile = it's median, if so we call an alternative formula than s transform.*

*In that case we separate the data into two parts, into values <= median, and values >median. and run a minmax basically on the >median values, and a rank-ecdf with a factor derived value applied on the <=Median values to ensure we achieve .5 mean.

- Note: This method is described in another document. This spreadsheet talks about distributions that have uniformly distinct values, such as attributes and traits.

"S Transform" has the behaviour of pushing the distribution to the center.

the old [beta v15] method used a flat distribution curve [imagine a die has 1/6 equally flat chances on a scale from 0 to 100%], which is how rank-ecdf worked, in other words ~= (rank / count).

The old method of transforming values relied on a different formula that achieved a flat distribution. What you saw above was not a flat distribution, it was a curved one on a definite scale from 0 to 100%.

What it achieves is a ~50% mean, and min represents 0% and max represents ~100%. (see cdf and pdf in pic at top)

How does it work?

This method always ensure we scale to min and max appropriately, but if any value get's transformed more than others, it's the values inbetween median and mean.
First we transform minmax from 0 to 50% from min to mean, and 50% to 100% [around/] from mean to max.

Then again on the outputted data from: min to median then median to max in a similar fashion.

Then we find the difference it's average is from .5, and add that to each value to achieve a .5 mean, but now we don't have 0 to 100%.

So then we do a minmax around the .5 value. So we go from min to .5, as 0 to 50%; and .5 to max as 50% to 100%.


This ensures any data will have .5 mean and be scaled from 0 to 100% while preserving more of the original meaning of data thank rank-ecdf allowed.
Raw Data: JSON / Text
Checksum / Hash
SHA-256: 1c7acb465866dc5d7a48ca0d0c49b34548557ad581e3cd990c7ac0e3fcdbde68
IP: logged
Commands
More From This Author
Comments
No comments have been posted for this file yet.
HomeSearchRankingsStatisticsAboutContact

Website by Brett Flannigan. The core site script is PHCDownload (© 2005-2024).
Hosted by Linode.