This string specifies the name of this test run configuration
Directory path to use as the root folder for all of the runner output (logs, reports, etc)
Configuration of analytics backend to be used for storing and retrieving test metrics. This plays a major part in optimising performance and mitigating flakiness.
By default no analytics backend is expected which means that each test will be treated as a completely new test.
Assuming you’ve done the setup for InfluxDB you need to provide:
Database name is quite useful in case you have multiple configurations of tests/devices and you don’t want metrics from one configuration to affect the other one, e.g. regular and end-to-end tests.
Pooling strategy affects how devices are grouped.
All connected devices are merged into one group. This is the default mode.
Devices are grouped by their ABI, e.g. x86 and mips.
Devices are grouped by manufacturer, e.g. Samsung and Yota.
Devices are grouped by model name, e.g. LG-D855 and SM-N950F.
Devices are grouped by OS version, e.g. 24 and 25.
Sharding is a mechanism that allows the marathon to affect the tests scheduled for execution inside each pool
Executes each test in parallel on all of the available devices in pool. This is the default behaviour.
Executes each test count times inside each pool. For example you want to test the flakiness of a specific test hence you need to execute this test a lot of times. Instead of running the build X times just use this sharding strategy and the test will be executed X times.
In order to optimise the performance of test execution tests need to be sorted. This requires analytics backend enabled since we need historical data in order to anticipate tests behaviour like duration and success/failure rate.
No sorting of tests is done at all. This is the default behaviour.
For each test analytics storage is providing the success rate for a time window specified by time timeLimit parameter. All the tests are then sorted by the success rate in an increasing order, that is failing tests go first and successful tests go last.
For each test analytics storage is providing the X percentile duration for a time window specified by time timeLimit parameter. Percentile is configurable via the percentile parameter. All the tests are sorted so that long tests go first and short tests are executed last. This allows marathon to minimise the error of balancing the execution of tests at the end of execution.
Batching mechanism allows you to trade off stability for performance. A group of tests executed using one single run is called a batch. Most of the times this means that between tests in the same batch you’re sharing the device state so there is no clean-up. On the other hand you gain some performance improvements since the execution command usually is quite slow (up to 10 seconds for some platforms).
No batching is done at all, each test is executed using separate command execution, that is performance is sacrificed in favor of stability. This is the default mode.
Each batch is created based on the size parameter which is required. When a new batch of tests is needed the queue is dequeued for at most size tests.
Optionally if you want to limit the batch duration you have to specify the timeLimit for the test metrics time window and the durationMillis. For each test the analytics backend is accessed and percentile of it’s duration is queried. If the sum of durations is more than the durationMillis then no more tests are added to the batch.
This is useful if you have very very long tests and you use batching, e.g. you batch by size 10 and your test run duration is roughly 10 minutes, but you have tests that are expected to run 2 minutes each. If you batch all of them together then at least one device will be finishing it’s execution in 20 minutes while all other devices might already finish. To mitigate this just specify the time limit for the batch using durationMillis.
Another optional parameter for this strategy is the lastMileLength. At the end of execution batching tests actually hurts the performance so for the last tests it’s much better to execute them in parallel in separate batches. This works only if you execute on multiple devices. You can specify when this optimisation kicks in using the lastMileLength parameter, the last lastMileLength tests will use this optimisation.
This is the main anticipation logic for marathon. Using the analytics backend we can understand the success rate and hence queue preventive retries to mitigate the flakiness of the tests and environment.
Nothing is done with this mode. This is the default behaviour.
The main idea is that flakiness strategy anticipates the flakiness of the test based on the probability of test passing and tries to maximise the probability of passing when executed multiple times. For example the probability of test A passing is 0.5 and configuration has probability of 0.8 requested, then the flakiness strategy multiplies the test A to be executed 3 times (0.5 x 0.5 x 0.5 = 0.125 is the probability of all tests failing, so with probability 0.875 > 0.8 at least one of tests will pass).
The minimal probability that you want is specified using minSuccessRate during the time window controlled by the timeLimit. Additionally if you specify too high minSuccessRate you’ll have too many retries, so the upper bound for this is controlled by the maxCount parameter so that this strategy will calculate the required number of retries according to the minSuccessRate but if it’s higher than the maxCount it will choose maxCount.
This is the logic that kicks in if our preventive logic failed to anticipate such high number of retries. This works after the tests were actually executed.
As the name implies, no retries are done. This is the default mode.
Parameter totalAllowedRetryQuota specifies how many retries at all (for all the tests is total) are allowed. retryPerTestQuota controls how many retries can be done for each test individually.
Filtering of tests is important since usually we as developers have the same codebase for all the different types of tests we want to execute. In order to indicate to marathon which tests you want to execute you can use the whitelist and blackist parameters. First whitelist is applied, then the blacklist. Each accept a TestFilter based on the class name, fully qualified class name, package, annotation or method. Each expects a regular expression as a value.
In order to filter using multiple filters at the same time a composition filter is also available which accepts a list of base filters and also an operation such as UNION, INTERSECTION or SUBTRACT. This allows to create complex filters such as get all the tests starting with E2E but get only methods from there ending with Test.
An important thing to mention is that by default platform specific ignore options are not taken into account. This is because a cross-platform test runner cannot account for all the possible test frameworks out there. However each framework’s ignore option can still be “explained” to marathon, e.g. JUnit’s org.junit.Ignore annotation can be specified in the filtering configuration.
By default the build fails if some tests failed. If you want to the build to succeed even if some tests failed use true.
By default test classes are searched with the
"^((?!Abstract).)*Test$" regex. You can override this if you need to.
This parameter specifies the behaviour for the underlying test executor to timeout if there is no output. By default this is set to 60 seconds.
Enabled very verbose logging to stdout of all the marathon components. Very useful for debugging.
To better understand the use-cases that marathon is used for we’re asking you to provide us with anonymised information about your usage. By default this is disabled. Use true to enable.
By default tests that don’t have any status reported after execution (for example a device disconnected during the execution) retry indefinitely. You can limit the number of total execution for such cases using this option.
By default if one of the test retries succeeds then the test is considered successfully executed. If you require success status only when all retries were executed successfully you can enable the strict mode. This may be useful to verify that flakiness of tests was fixed for example.